SQL, Python & More for DuckDB - Page 8

‌

DuckDB in Action, Examples from chapters 3 and 4: Having fun with power production measurements

Editor's note: the Manning book "DuckDB in Action" has tons of great SQL exercises, including a few chapters which work on power production data. Michael, one of the authors, shares some of these queries here [and on other snippets published on this site].

Attach and select MotherDuck database
Data shared/available on MotherDuck

ATTACH 'md:_share/duckdb_in_action_ch3_4/d0c08584-1d33-491c-8db7-cf9c6910eceb' AS duckdb_in_action_ch3_4;
USE duckdb_in_action_ch3_4;
Copy code

3.17 Using arg_max to find sibling values of aggregated values computed in a common table expressionSQL

WITH per_hour AS (
    SELECT system_id,
           date_trunc('hour', read_on) AS read_on,
           avg(power) / 1000 AS kWh
    FROM readings
    GROUP BY ALL
)
SELECT name,
       max(kWh),
       arg_max(read_on, kWh) AS 'Read on'
FROM per_hour
   JOIN systems s ON s.id = per_hour.system_id
WHERE system_id = 34
GROUP by s.name;
Copy code

4.18 Using named windows defined with a partition and a range to compute several aggregates at onceSQL

SELECT system_id,
       day,
       min(kWh) OVER seven_days AS "7-day min",
       quantile(kWh, [0.25, 0.5, 0.75])
         OVER seven_days AS "kWh 7-day quartile",
       max(kWh) OVER seven_days AS "7-day max",
FROM v_power_per_day
WINDOW
    seven_days AS (
        PARTITION BY system_id, month(day)
        ORDER BY day ASC
        RANGE BETWEEN INTERVAL 3 Days PRECEDING
                  AND INTERVAL 3 Days FOLLOWING
    )
ORDER BY system_id, day;
Copy code

4.23 Filtering outside the where clause to find a time period with a high production over 7 daysSQL

SELECT system_id,
       day,
       avg(kWh) OVER (
            PARTITION BY system_id
            ORDER BY day ASC
            RANGE BETWEEN INTERVAL 3 Days PRECEDING
                      AND INTERVAL 3 Days FOLLOWING
       ) AS "kWh 7-day moving average"
FROM v_power_per_day
QUALIFY "kWh 7-day moving average" > 875 
ORDER BY system_id, day;
Copy code

4.26 Easily pivoting a result set (around a year)SQL

PIVOT (FROM v_power_per_day)
ON year(day)
USING sum(kWh);
Copy code

4.28 Using the ASOF join to pick prices that have been valid up to the point in time of sellingSQL

WITH prices AS (
  SELECT range AS valid_at,
         random()*10 AS price
  FROM range(
       '2023-01-01 01:00:00'::timestamp,
       '2023-01-01 02:00:00'::timestamp, INTERVAL '15 minutes')
),
sales AS (
  SELECT range AS sold_at,
         random()*10 AS num
  FROM range(
      '2023-01-01 01:00:00'::timestamp,
      '2023-01-01 02:00:00'::timestamp, INTERVAL '5 minutes')
)
SELECT sold_at, valid_at AS 'with_price_at', round(num * price,2) as price
FROM sales
ASOF JOIN prices
    ON prices.valid_at <= sales.sold_at;
Copy code

4.29 Using the ASOF join together with a window function to compute accumulated earningsSQL

SELECT power.day,
       power.kWh,
       prices.value as 'ct/kWh',
       round(sum(prices.value * power.kWh)
           OVER (ORDER BY power.day ASC) / 100, 2)
           AS 'Accumulated earnings in EUR'
FROM v_power_per_day power
    ASOF JOIN prices
    ON prices.valid_from <= power.day
WHERE system_id = 34
ORDER BY day;
Copy code

4.33 Using a LATERAL Join for looping through a JSON array returned by a solar radiation APISQL

-- The below query generates a series of 7 days, 
-- joins those with the hours 8, 13 and 19 (7pm) to create indexes.
-- Those indexes are the day number * 24 plus the desired hour of the
-- day to find the value in the JSON array.
-- That index is the lateral driver for the sub-query:
--
-- It then calls Open Meteo for Offenbach, apparently the city with 
-- the most sunshine in germany to get shortwave radition 
-- at those times
WITH days AS (
  SELECT generate_series AS value FROM generate_series(7)
), hours AS (
  SELECT unnest([8, 13, 18]) AS value
), indexes AS (
  SELECT days.value * 24 + hours.value AS i
  FROM days, hours
)
SELECT date_trunc('day', now()) - INTERVAL '7 days' +
         INTERVAL (indexes.i || ' hours') AS ts,
       ghi.v AS 'GHI in W/m^2'
FROM indexes,
LATERAL (
  SELECT hourly.shortwave_radiation_instant[i+1] AS v
  FROM read_json_auto('https://api.open-meteo.com/v1/forecast?latitude=48.47377&longitude=7.94495&hourly=shortwave_radiation_instant&past_days=7')
) AS ghi
ORDER BY ts;
Copy code

Show less

Michael Simons

Edited 02/23/24

‌

Top 25 Songs at Number 1, Billboard Hot 100SQL

Editor's note: do you like looking at data around the top music, including when an album debuted and when it appeared on the Billboard Hot 100 list? David has you covered with some simple analytical SQL, along with a MotherDuck share of the data.

Attach and select MotherDuck database
Data shared/available on MotherDuck

ATTACH 'md:_share/music/af2a4a73-d4ef-4c82-98ef-e49f60adccd2' AS music;
USE music;
Copy code

Execute this SQL

-- Switch to the attached music database
use music;

-- Top 25 songs at number 1, Billboard Hot 100
SELECT song, performer, 
  COUNT(*) AS weeks_at_no_1,
  chart_debut,
  MIN(chart_date) AS first_chart_date_as_no_1, 
  MAX(chart_date) AS last_chart_date_as_no_1
FROM billboard_hot_100
WHERE chart_position = 1
GROUP BY ALL
ORDER BY weeks_at_no_1 DESC, chart_debut
LIMIT 25;
Copy code

David Neal

Edited 02/23/24

‌

UK Bank HolidaysSQL

The UK government publish the UK bank holidays as a JSON object at a public endpoint. DuckDB can read the data from the endpoint, unpivot it, and expand it in a few lines!

Execute this SQL

/* division, title, date, notes, bunting */
select
    division,
    unnest(events.events, recursive:=true)
from (
    unpivot 'https://www.gov.uk/bank-holidays.json'
    on "england-and-wales", "scotland", "northern-ireland"
    into name division value events
)
Copy code

Bill Wallis

Created 04/08/24

‌

Pretty-printing floating-pont numbersSQL

By casting the result to a `DECIMAL` with a fixed number of digits after the decimal point, we can pretty-print it while keeping it as a numeric value.

Execute this SQL

COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x))
TO 'example.csv';

SELECT x::DECIMAL(15, 3) AS x
FROM 'example.csv';
Copy code

Gábor Szárnyas

Created 08/13/24

‌

Copying the schema of a tableSQL

We can use `LIMIT` to copy the schema of an existing table without repeating the specification of its column names/types.

Execute this SQL

COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x))
TO 'example.csv';

CREATE TABLE tbl AS
    FROM example
    LIMIT 0;
Copy code

Gábor Szárnyas

Created 08/13/24

‌

Deterministically shuffling dataSQL

Sometimes it's beneficial to shuffle data sets in a deterministic, repeatable way. We can achieve this by using the `hash` function on the `rowid` pseudocolumn.

Execute this SQL

COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x))
TO 'example.csv';

CREATE OR REPLACE TABLE example AS FROM 'example.csv';
FROM example ORDER BY hash(rowid + 42);
Copy code

Gábor Szárnyas

Created 08/13/24

‌

Updating CSV files in-placeSQL

DuckDB allows updating CSV files in-place.

Execute this SQL

COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x))
TO 'example.csv';

COPY (SELECT s FROM 'example.csv') TO 'example.csv';
Copy code

Gábor Szárnyas

Created 08/13/24

‌

Query from Google SheetsSQL

Sometimes you have data over in Google Sheets and you want quickly and easily load it into DuckDB. With read_csv and the Google Sheets 'export to csv', its just one line of code.

Execute this SQL

-- you will need to create a sharelink for your sheet.
-- note the tab id (gid) and sheet id (in the url)

from read_csv('https://docs.google.com/spreadsheets/d/{sheet_id}/export?format=csv&gid={tab_id}')
Copy code

Jacob Matson

Created 08/28/24

DuckDB in Action, Examples from chapters 3 and 4: Having fun with power production measurements

Attach and select MotherDuck databaseData shared/available on MotherDuck

3.17 Using arg_max to find sibling values of aggregated values computed in a common table expressionSQL

4.18 Using named windows defined with a partition and a range to compute several aggregates at onceSQL

4.23 Filtering outside the where clause to find a time period with a high production over 7 daysSQL

4.26 Easily pivoting a result set (around a year)SQL

4.28 Using the ASOF join to pick prices that have been valid up to the point in time of sellingSQL

4.29 Using the ASOF join together with a window function to compute accumulated earningsSQL

4.33 Using a LATERAL Join for looping through a JSON array returned by a solar radiation APISQL

Top 25 Songs at Number 1, Billboard Hot 100SQL

Attach and select MotherDuck databaseData shared/available on MotherDuck

Execute this SQL

UK Bank HolidaysSQL

Execute this SQL

Pretty-printing floating-pont numbersSQL

Execute this SQL

Copying the schema of a tableSQL

Execute this SQL

Deterministically shuffling dataSQL

Execute this SQL

Updating CSV files in-placeSQL

Execute this SQL

Query from Google SheetsSQL

Execute this SQL

Attach and select MotherDuck database
Data shared/available on MotherDuck

Attach and select MotherDuck database
Data shared/available on MotherDuck