City air quality insights based on WHO data
Editor's note: Mehdi's database share includes air quality data from the World Health Organization (WHO). Use his example queries to understand pollution in particular areas. You might even try combining with the spatial extension discussed in other snippets.
Annual city air quality rating based on WHO dataSQL
SELECT city, year, CASE WHEN AVG(pm25_concentration) <= 10 AND AVG(pm10_concentration) <= 20 AND AVG(no2_concentration) <= 40 THEN 'Good' WHEN AVG(pm25_concentration) > 10 AND AVG(pm10_concentration) > 20 AND AVG(no2_concentration) > 40 THEN 'Poor' ELSE 'Moderate' END AS airqualityrating FROM sample_data.who.ambient_air_quality GROUP BY city, year ORDER BY city, year;
Copy code
Yearly average pollutant concentrations of a city (Berlin)SQL
SELECT year, AVG(pm25_concentration) AS avg_pm25, AVG(pm10_concentration) AS avg_pm10, AVG(no2_concentration) AS avg_no2 FROM sample_data.who.ambient_air_quality WHERE city = 'Berlin' GROUP BY year ORDER BY year DESC;
Copy code
Expand
Share link
Top 25 Songs at Number 1, Billboard Hot 100SQL
Editor's note: do you like looking at data around the top music, including when an album debuted and when it appeared on the Billboard Hot 100 list? David has you covered with some simple analytical SQL, along with a MotherDuck share of the data.
Execute this SQL
-- Switch to the attached music database use music; -- Top 25 songs at number 1, Billboard Hot 100 SELECT song, performer, COUNT(*) AS weeks_at_no_1, chart_debut, MIN(chart_date) AS first_chart_date_as_no_1, MAX(chart_date) AS last_chart_date_as_no_1 FROM billboard_hot_100 WHERE chart_position = 1 GROUP BY ALL ORDER BY weeks_at_no_1 DESC, chart_debut LIMIT 25;
Copy code
Expand
Share link
DuckDB in Action: Some neat DuckDB specific SQL extension
Editor's note: DuckDB strives to make it easy to write SQL, even when it requires introducing non-standard syntax. See the great blog posts by Alex Monahan or explore the Manning "DuckDB in Action" book by the author of this snippet.
DuckDB specific extensions: Project all columns matching a patternSQL
SELECT COLUMNS('valid.*') FROM prices LIMIT 3;
Copy code
DuckDB specific extensions: Apply an aggregation to several columnsSQL
SELECT max(COLUMNS('valid.*')) FROM prices;
Copy code
DuckDB specific extensions: Apply one condition to many columnsSQL
FROM prices WHERE COLUMNS('valid.*') BETWEEN '2020-01-01' AND '2021-01-01';
Copy code
Expand
Share link
DuckDB in Action, Examples from chapters 3 and 4: Having fun with power production measurements
Editor's note: the Manning book "DuckDB in Action" has tons of great SQL exercises, including a few chapters which work on power production data. Michael, one of the authors, shares some of these queries here [and on other snippets published on this site].
3.17 Using arg_max to find sibling values of aggregated values computed in a common table expressionSQL
WITH per_hour AS ( SELECT system_id, date_trunc('hour', read_on) AS read_on, avg(power) / 1000 AS kWh FROM readings GROUP BY ALL ) SELECT name, max(kWh), arg_max(read_on, kWh) AS 'Read on' FROM per_hour JOIN systems s ON s.id = per_hour.system_id WHERE system_id = 34 GROUP by s.name;
Copy code
4.18 Using named windows defined with a partition and a range to compute several aggregates at onceSQL
SELECT system_id, day, min(kWh) OVER seven_days AS "7-day min", quantile(kWh, [0.25, 0.5, 0.75]) OVER seven_days AS "kWh 7-day quartile", max(kWh) OVER seven_days AS "7-day max", FROM v_power_per_day WINDOW seven_days AS ( PARTITION BY system_id, month(day) ORDER BY day ASC RANGE BETWEEN INTERVAL 3 Days PRECEDING AND INTERVAL 3 Days FOLLOWING ) ORDER BY system_id, day;
Copy code
4.23 Filtering outside the where clause to find a time period with a high production over 7 daysSQL
SELECT system_id, day, avg(kWh) OVER ( PARTITION BY system_id ORDER BY day ASC RANGE BETWEEN INTERVAL 3 Days PRECEDING AND INTERVAL 3 Days FOLLOWING ) AS "kWh 7-day moving average" FROM v_power_per_day QUALIFY "kWh 7-day moving average" > 875 ORDER BY system_id, day;
Copy code
4.26 Easily pivoting a result set (around a year)SQL
PIVOT (FROM v_power_per_day) ON year(day) USING sum(kWh);
Copy code
4.28 Using the ASOF join to pick prices that have been valid up to the point in time of sellingSQL
WITH prices AS ( SELECT range AS valid_at, random()*10 AS price FROM range( '2023-01-01 01:00:00'::timestamp, '2023-01-01 02:00:00'::timestamp, INTERVAL '15 minutes') ), sales AS ( SELECT range AS sold_at, random()*10 AS num FROM range( '2023-01-01 01:00:00'::timestamp, '2023-01-01 02:00:00'::timestamp, INTERVAL '5 minutes') ) SELECT sold_at, valid_at AS 'with_price_at', round(num * price,2) as price FROM sales ASOF JOIN prices ON prices.valid_at <= sales.sold_at;
Copy code
4.29 Using the ASOF join together with a window function to compute accumulated earningsSQL
SELECT power.day, power.kWh, prices.value as 'ct/kWh', round(sum(prices.value * power.kWh) OVER (ORDER BY power.day ASC) / 100, 2) AS 'Accumulated earnings in EUR' FROM v_power_per_day power ASOF JOIN prices ON prices.valid_from <= power.day WHERE system_id = 34 ORDER BY day;
Copy code
4.33 Using a LATERAL Join for looping through a JSON array returned by a solar radiation APISQL
-- The below query generates a series of 7 days, -- joins those with the hours 8, 13 and 19 (7pm) to create indexes. -- Those indexes are the day number * 24 plus the desired hour of the -- day to find the value in the JSON array. -- That index is the lateral driver for the sub-query: -- -- It then calls Open Meteo for Offenbach, apparently the city with -- the most sunshine in germany to get shortwave radition -- at those times WITH days AS ( SELECT generate_series AS value FROM generate_series(7) ), hours AS ( SELECT unnest([8, 13, 18]) AS value ), indexes AS ( SELECT days.value * 24 + hours.value AS i FROM days, hours ) SELECT date_trunc('day', now()) - INTERVAL '7 days' + INTERVAL (indexes.i || ' hours') AS ts, ghi.v AS 'GHI in W/m^2' FROM indexes, LATERAL ( SELECT hourly.shortwave_radiation_instant[i+1] AS v FROM read_json_auto('https://api.open-meteo.com/v1/forecast?latitude=48.47377&longitude=7.94495&hourly=shortwave_radiation_instant&past_days=7') ) AS ghi ORDER BY ts;
Copy code
Expand
Share link