DuckDB in Action: Some neat DuckDB specific SQL extension

Editor's note: DuckDB strives to make it easy to write SQL, even when it requires introducing non-standard syntax. See the great blog posts by Alex Monahan or explore the Manning "DuckDB in Action" book by the author of this snippet.

Attach and select MotherDuck database

Data shared/available on MotherDuck

ATTACH 'md:_share/duckdb_in_action_ch3_4/d0c08584-1d33-491c-8db7-cf9c6910eceb' AS duckdb_in_action_ch3_4;
USE duckdb_in_action_ch3_4;

Copy code

DuckDB specific extensions: Project all columns matching a patternSQL

SELECT COLUMNS('valid.*') FROM prices LIMIT 3;

Copy code

DuckDB specific extensions: Apply an aggregation to several columnsSQL

SELECT max(COLUMNS('valid.*')) FROM prices;

Copy code

DuckDB specific extensions: Apply one condition to many columnsSQL

FROM prices
WHERE COLUMNS('valid.*') BETWEEN '2020-01-01' AND '2021-01-01';

Copy code

Michael Simons

Expand

Share link


DuckDB in Action, Examples from chapters 3 and 4: Having fun with power production measurements

Editor's note: the Manning book "DuckDB in Action" has tons of great SQL exercises, including a few chapters which work on power production data. Michael, one of the authors, shares some of these queries here [and on other snippets published on this site].

Attach and select MotherDuck database

Data shared/available on MotherDuck

ATTACH 'md:_share/duckdb_in_action_ch3_4/d0c08584-1d33-491c-8db7-cf9c6910eceb' AS duckdb_in_action_ch3_4;
USE duckdb_in_action_ch3_4;

Copy code

3.17 Using arg_max to find sibling values of aggregated values computed in a common table expressionSQL

WITH per_hour AS (
    SELECT system_id,
           date_trunc('hour', read_on) AS read_on,
           avg(power) / 1000 AS kWh
    FROM readings
    GROUP BY ALL
)
SELECT name,
       max(kWh),
       arg_max(read_on, kWh) AS 'Read on'
FROM per_hour
   JOIN systems s ON s.id = per_hour.system_id
WHERE system_id = 34
GROUP by s.name;

Copy code

4.18 Using named windows defined with a partition and a range to compute several aggregates at onceSQL

SELECT system_id,
       day,
       min(kWh) OVER seven_days AS "7-day min",
       quantile(kWh, [0.25, 0.5, 0.75])
         OVER seven_days AS "kWh 7-day quartile",
       max(kWh) OVER seven_days AS "7-day max",
FROM v_power_per_day
WINDOW
    seven_days AS (
        PARTITION BY system_id, month(day)
        ORDER BY day ASC
        RANGE BETWEEN INTERVAL 3 Days PRECEDING
                  AND INTERVAL 3 Days FOLLOWING
    )
ORDER BY system_id, day;

Copy code

4.23 Filtering outside the where clause to find a time period with a high production over 7 daysSQL

SELECT system_id,
       day,
       avg(kWh) OVER (
            PARTITION BY system_id
            ORDER BY day ASC
            RANGE BETWEEN INTERVAL 3 Days PRECEDING
                      AND INTERVAL 3 Days FOLLOWING
       ) AS "kWh 7-day moving average"
FROM v_power_per_day
QUALIFY "kWh 7-day moving average" > 875 
ORDER BY system_id, day;

Copy code

4.26 Easily pivoting a result set (around a year)SQL

PIVOT (FROM v_power_per_day)
ON year(day)
USING sum(kWh);

Copy code

4.28 Using the ASOF join to pick prices that have been valid up to the point in time of sellingSQL

WITH prices AS (
  SELECT range AS valid_at,
         random()*10 AS price
  FROM range(
       '2023-01-01 01:00:00'::timestamp,
       '2023-01-01 02:00:00'::timestamp, INTERVAL '15 minutes')
),
sales AS (
  SELECT range AS sold_at,
         random()*10 AS num
  FROM range(
      '2023-01-01 01:00:00'::timestamp,
      '2023-01-01 02:00:00'::timestamp, INTERVAL '5 minutes')
)
SELECT sold_at, valid_at AS 'with_price_at', round(num * price,2) as price
FROM sales
ASOF JOIN prices
    ON prices.valid_at <= sales.sold_at;

Copy code

4.29 Using the ASOF join together with a window function to compute accumulated earningsSQL

SELECT power.day,
       power.kWh,
       prices.value as 'ct/kWh',
       round(sum(prices.value * power.kWh)
           OVER (ORDER BY power.day ASC) / 100, 2)
           AS 'Accumulated earnings in EUR'
FROM v_power_per_day power
    ASOF JOIN prices
    ON prices.valid_from <= power.day
WHERE system_id = 34
ORDER BY day;

Copy code

4.33 Using a LATERAL Join for looping through a JSON array returned by a solar radiation APISQL

-- The below query generates a series of 7 days, 
-- joins those with the hours 8, 13 and 19 (7pm) to create indexes.
-- Those indexes are the day number * 24 plus the desired hour of the
-- day to find the value in the JSON array.
-- That index is the lateral driver for the sub-query:
--
-- It then calls Open Meteo for Offenbach, apparently the city with 
-- the most sunshine in germany to get shortwave radition 
-- at those times
WITH days AS (
  SELECT generate_series AS value FROM generate_series(7)
), hours AS (
  SELECT unnest([8, 13, 18]) AS value
), indexes AS (
  SELECT days.value * 24 + hours.value AS i
  FROM days, hours
)
SELECT date_trunc('day', now()) - INTERVAL '7 days' +
         INTERVAL (indexes.i || ' hours') AS ts,
       ghi.v AS 'GHI in W/m^2'
FROM indexes,
LATERAL (
  SELECT hourly.shortwave_radiation_instant[i+1] AS v
  FROM read_json_auto('https://api.open-meteo.com/v1/forecast?latitude=48.47377&longitude=7.94495&hourly=shortwave_radiation_instant&past_days=7')
) AS ghi
ORDER BY ts;

Copy code

Show less
Michael Simons

Expand

Share link