DuckDB in Action: Some neat DuckDB specific SQL extension
Editor's note: DuckDB strives to make it easy to write SQL, even when it requires introducing non-standard syntax. See the great blog posts by Alex Monahan or explore the Manning "DuckDB in Action" book by the author of this snippet.
DuckDB specific extensions: Project all columns matching a patternSQL
SELECT COLUMNS('valid.*') FROM prices LIMIT 3;
Copy code
DuckDB specific extensions: Apply an aggregation to several columnsSQL
SELECT max(COLUMNS('valid.*')) FROM prices;
Copy code
DuckDB specific extensions: Apply one condition to many columnsSQL
FROM prices WHERE COLUMNS('valid.*') BETWEEN '2020-01-01' AND '2021-01-01';
Copy code
Michael Simons
Expand
Share link
DuckDB in Action, Examples from chapters 3 and 4: Having fun with power production measurements
Editor's note: the Manning book "DuckDB in Action" has tons of great SQL exercises, including a few chapters which work on power production data. Michael, one of the authors, shares some of these queries here [and on other snippets published on this site].
3.17 Using arg_max to find sibling values of aggregated values computed in a common table expressionSQL
WITH per_hour AS ( SELECT system_id, date_trunc('hour', read_on) AS read_on, avg(power) / 1000 AS kWh FROM readings GROUP BY ALL ) SELECT name, max(kWh), arg_max(read_on, kWh) AS 'Read on' FROM per_hour JOIN systems s ON s.id = per_hour.system_id WHERE system_id = 34 GROUP by s.name;
Copy code
4.18 Using named windows defined with a partition and a range to compute several aggregates at onceSQL
SELECT system_id, day, min(kWh) OVER seven_days AS "7-day min", quantile(kWh, [0.25, 0.5, 0.75]) OVER seven_days AS "kWh 7-day quartile", max(kWh) OVER seven_days AS "7-day max", FROM v_power_per_day WINDOW seven_days AS ( PARTITION BY system_id, month(day) ORDER BY day ASC RANGE BETWEEN INTERVAL 3 Days PRECEDING AND INTERVAL 3 Days FOLLOWING ) ORDER BY system_id, day;
Copy code
4.23 Filtering outside the where clause to find a time period with a high production over 7 daysSQL
SELECT system_id, day, avg(kWh) OVER ( PARTITION BY system_id ORDER BY day ASC RANGE BETWEEN INTERVAL 3 Days PRECEDING AND INTERVAL 3 Days FOLLOWING ) AS "kWh 7-day moving average" FROM v_power_per_day QUALIFY "kWh 7-day moving average" > 875 ORDER BY system_id, day;
Copy code
4.26 Easily pivoting a result set (around a year)SQL
PIVOT (FROM v_power_per_day) ON year(day) USING sum(kWh);
Copy code
4.28 Using the ASOF join to pick prices that have been valid up to the point in time of sellingSQL
WITH prices AS ( SELECT range AS valid_at, random()*10 AS price FROM range( '2023-01-01 01:00:00'::timestamp, '2023-01-01 02:00:00'::timestamp, INTERVAL '15 minutes') ), sales AS ( SELECT range AS sold_at, random()*10 AS num FROM range( '2023-01-01 01:00:00'::timestamp, '2023-01-01 02:00:00'::timestamp, INTERVAL '5 minutes') ) SELECT sold_at, valid_at AS 'with_price_at', round(num * price,2) as price FROM sales ASOF JOIN prices ON prices.valid_at <= sales.sold_at;
Copy code
4.29 Using the ASOF join together with a window function to compute accumulated earningsSQL
SELECT power.day, power.kWh, prices.value as 'ct/kWh', round(sum(prices.value * power.kWh) OVER (ORDER BY power.day ASC) / 100, 2) AS 'Accumulated earnings in EUR' FROM v_power_per_day power ASOF JOIN prices ON prices.valid_from <= power.day WHERE system_id = 34 ORDER BY day;
Copy code
4.33 Using a LATERAL Join for looping through a JSON array returned by a solar radiation APISQL
-- The below query generates a series of 7 days, -- joins those with the hours 8, 13 and 19 (7pm) to create indexes. -- Those indexes are the day number * 24 plus the desired hour of the -- day to find the value in the JSON array. -- That index is the lateral driver for the sub-query: -- -- It then calls Open Meteo for Offenbach, apparently the city with -- the most sunshine in germany to get shortwave radition -- at those times WITH days AS ( SELECT generate_series AS value FROM generate_series(7) ), hours AS ( SELECT unnest([8, 13, 18]) AS value ), indexes AS ( SELECT days.value * 24 + hours.value AS i FROM days, hours ) SELECT date_trunc('day', now()) - INTERVAL '7 days' + INTERVAL (indexes.i || ' hours') AS ts, ghi.v AS 'GHI in W/m^2' FROM indexes, LATERAL ( SELECT hourly.shortwave_radiation_instant[i+1] AS v FROM read_json_auto('https://api.open-meteo.com/v1/forecast?latitude=48.47377&longitude=7.94495&hourly=shortwave_radiation_instant&past_days=7') ) AS ghi ORDER BY ts;
Copy code
Show less
Michael Simons
Expand
Share link