DuckDB in Action, Examples from chapters 3 and 4: Having fun with power production measurements
Editor's note: the Manning book "DuckDB in Action" has tons of great SQL exercises, including a few chapters which work on power production data. Michael, one of the authors, shares some of these queries here [and on other snippets published on this site].
3.17 Using arg_max to find sibling values of aggregated values computed in a common table expressionSQL
WITH per_hour AS ( SELECT system_id, date_trunc('hour', read_on) AS read_on, avg(power) / 1000 AS kWh FROM readings GROUP BY ALL ) SELECT name, max(kWh), arg_max(read_on, kWh) AS 'Read on' FROM per_hour JOIN systems s ON s.id = per_hour.system_id WHERE system_id = 34 GROUP by s.name;
Copy code
4.18 Using named windows defined with a partition and a range to compute several aggregates at onceSQL
SELECT system_id, day, min(kWh) OVER seven_days AS "7-day min", quantile(kWh, [0.25, 0.5, 0.75]) OVER seven_days AS "kWh 7-day quartile", max(kWh) OVER seven_days AS "7-day max", FROM v_power_per_day WINDOW seven_days AS ( PARTITION BY system_id, month(day) ORDER BY day ASC RANGE BETWEEN INTERVAL 3 Days PRECEDING AND INTERVAL 3 Days FOLLOWING ) ORDER BY system_id, day;
Copy code
4.23 Filtering outside the where clause to find a time period with a high production over 7 daysSQL
SELECT system_id, day, avg(kWh) OVER ( PARTITION BY system_id ORDER BY day ASC RANGE BETWEEN INTERVAL 3 Days PRECEDING AND INTERVAL 3 Days FOLLOWING ) AS "kWh 7-day moving average" FROM v_power_per_day QUALIFY "kWh 7-day moving average" > 875 ORDER BY system_id, day;
Copy code
4.26 Easily pivoting a result set (around a year)SQL
PIVOT (FROM v_power_per_day) ON year(day) USING sum(kWh);
Copy code
4.28 Using the ASOF join to pick prices that have been valid up to the point in time of sellingSQL
WITH prices AS ( SELECT range AS valid_at, random()*10 AS price FROM range( '2023-01-01 01:00:00'::timestamp, '2023-01-01 02:00:00'::timestamp, INTERVAL '15 minutes') ), sales AS ( SELECT range AS sold_at, random()*10 AS num FROM range( '2023-01-01 01:00:00'::timestamp, '2023-01-01 02:00:00'::timestamp, INTERVAL '5 minutes') ) SELECT sold_at, valid_at AS 'with_price_at', round(num * price,2) as price FROM sales ASOF JOIN prices ON prices.valid_at <= sales.sold_at;
Copy code
4.29 Using the ASOF join together with a window function to compute accumulated earningsSQL
SELECT power.day, power.kWh, prices.value as 'ct/kWh', round(sum(prices.value * power.kWh) OVER (ORDER BY power.day ASC) / 100, 2) AS 'Accumulated earnings in EUR' FROM v_power_per_day power ASOF JOIN prices ON prices.valid_from <= power.day WHERE system_id = 34 ORDER BY day;
Copy code
4.33 Using a LATERAL Join for looping through a JSON array returned by a solar radiation APISQL
-- The below query generates a series of 7 days, -- joins those with the hours 8, 13 and 19 (7pm) to create indexes. -- Those indexes are the day number * 24 plus the desired hour of the -- day to find the value in the JSON array. -- That index is the lateral driver for the sub-query: -- -- It then calls Open Meteo for Offenbach, apparently the city with -- the most sunshine in germany to get shortwave radition -- at those times WITH days AS ( SELECT generate_series AS value FROM generate_series(7) ), hours AS ( SELECT unnest([8, 13, 18]) AS value ), indexes AS ( SELECT days.value * 24 + hours.value AS i FROM days, hours ) SELECT date_trunc('day', now()) - INTERVAL '7 days' + INTERVAL (indexes.i || ' hours') AS ts, ghi.v AS 'GHI in W/m^2' FROM indexes, LATERAL ( SELECT hourly.shortwave_radiation_instant[i+1] AS v FROM read_json_auto('https://api.open-meteo.com/v1/forecast?latitude=48.47377&longitude=7.94495&hourly=shortwave_radiation_instant&past_days=7') ) AS ghi ORDER BY ts;
Copy code
Show less
Michael Simons
Expand
Share link
StackOverflow Analytics
Editor's note: Michael shares Stackoverflow data in MotherDuck as part of this snippet as well as typical aggregate analytics on the data. There are additional sample data sets attached by default in MotherDuck as the 'sample_data' share.
Which 5 questions have the most comments, what is the post title and comment countSQL
SELECT Title, CommentCount FROM posts WHERE PostTypeId = 1 ORDER BY CommentCount DESC LIMIT 5;
Copy code
User reputation and reputation rate per daySQL
SELECT name, reputation, round(reputation/day(today()-createdAt)) as rate, day(today()-createdAt) as days, createdAt FROM users WHERE reputation > 1000000 ORDER BY rate DESC;
Copy code
Reputation rate as bar chart with CTESQL
WITH top_users as ( SELECT name, reputation, round(reputation/day(today()-createdAt)) as rate, day(today()-createdAt) as days, createdAt FROM users WHERE reputation > 1000000 ) SELECT name, reputation, rate, bar(rate,150,300,35) AS bar FROM top_users;
Copy code
Post statistics per yearSQL
SELECT year(CreationDate) AS year, count(*), round(avg(ViewCount)), max(AnswerCount) FROM posts GROUP BY year ORDER BY year DESC LIMIT 10;
Copy code
Posting Frequency with bar chart on Weekdays for "sql" tagSQL
SELECT count(*) as freq, dayname(CreationDate) AS day, bar(freq, 0, 150000,20) AS plot FROM posts WHERE posttypeid = 1 AND tags LIKE '%<sql>%' GROUP BY all ORDER BY freq DESC;
Copy code
Posting Frequency with bar chart on Weekdays for "rust" tagSQL
SELECT count(*) as freq, dayname(CreationDate) AS day, bar(freq, 0, 10000,20) AS plot FROM posts WHERE posttypeid = 1 AND tags LIKE '%<rust>%' GROUP BY all ORDER BY freq DESC;
Copy code
Show less
Michael Hunger
Expand
Share link
Execute this SQL
with bits as ( select -- add 8 bits to the end to account for the delimiter bit_length(columns(*)) + 8 from <TABLE> ), -- aggregate all columns bits_agg as ( select sum(columns(*)) from bits ), -- unpivot a wide single row of aggs to single column bits_col as ( unpivot bits_agg on columns(*) ) -- add them all up & convert to mb select sum(value) / (8 * 1024 ** 2) as mb from bits_col
Copy code
Hamilton Ulmer
Copy code
Expand
Share link