Michael Simons | DuckDB Snippets

Back to Code Snippets

Michael Simons

@michael-simons

‌

Split strings into version numbers and order properSQL

This snippet takes version numbers that might contain arbitrary additional information and splits them into a list of integers, that one can sort like `sort -V` does.

Execute this SQL

SELECT v FROM VALUES ('1.10.0'), ('1.3.0'), ('1.13.0.RELEASE') f(v) ORDER BY list_transform(string_split(v, '.'), x -> TRY_CAST (x AS INTEGER)) ASC;
Copy code

Michael Simons

Created 10/30/24

‌

Aggregate rows into a sorted list.SQL

Execute this SQL

-- list and array_agg take in their own ORDER BY clause, so that you
-- can sort the aggregate. The statements order by cannot be used 
-- as the columns that are used for sorting then need to be a grouping
-- key and cannot be used in the aggregate

SELECT name, 
       -- Order the aggregated list by another column from line_items  
       list(item_name ORDER BY pos ASC) items 
FROM orders 
JOIN line_items ON order_id = id 
GROUP BY ALL
-- Order by grouping keys is ofc possible
ORDER BY name;
Copy code

Michael Simons

Created 12/18/23

‌

DuckDB in Action: Some neat DuckDB specific SQL extension

Editor's note: DuckDB strives to make it easy to write SQL, even when it requires introducing non-standard syntax. See the great blog posts by Alex Monahan or explore the Manning "DuckDB in Action" book by the author of this snippet.

Attach and select MotherDuck database
Data shared/available on MotherDuck

ATTACH 'md:_share/duckdb_in_action_ch3_4/d0c08584-1d33-491c-8db7-cf9c6910eceb' AS duckdb_in_action_ch3_4;
USE duckdb_in_action_ch3_4;
Copy code

DuckDB specific extensions: Project all columns matching a patternSQL

SELECT COLUMNS('valid.*') FROM prices LIMIT 3;
Copy code

DuckDB specific extensions: Apply an aggregation to several columnsSQL

SELECT max(COLUMNS('valid.*')) FROM prices;
Copy code

DuckDB specific extensions: Apply one condition to many columnsSQL

FROM prices
WHERE COLUMNS('valid.*') BETWEEN '2020-01-01' AND '2021-01-01';
Copy code

Michael Simons

Edited 02/23/24

‌

DuckDB in Action, Examples from chapters 3 and 4: Having fun with power production measurements

Editor's note: the Manning book "DuckDB in Action" has tons of great SQL exercises, including a few chapters which work on power production data. Michael, one of the authors, shares some of these queries here [and on other snippets published on this site].

Attach and select MotherDuck database
Data shared/available on MotherDuck

ATTACH 'md:_share/duckdb_in_action_ch3_4/d0c08584-1d33-491c-8db7-cf9c6910eceb' AS duckdb_in_action_ch3_4;
USE duckdb_in_action_ch3_4;
Copy code

3.17 Using arg_max to find sibling values of aggregated values computed in a common table expressionSQL

WITH per_hour AS (
    SELECT system_id,
           date_trunc('hour', read_on) AS read_on,
           avg(power) / 1000 AS kWh
    FROM readings
    GROUP BY ALL
)
SELECT name,
       max(kWh),
       arg_max(read_on, kWh) AS 'Read on'
FROM per_hour
   JOIN systems s ON s.id = per_hour.system_id
WHERE system_id = 34
GROUP by s.name;
Copy code

4.18 Using named windows defined with a partition and a range to compute several aggregates at onceSQL

SELECT system_id,
       day,
       min(kWh) OVER seven_days AS "7-day min",
       quantile(kWh, [0.25, 0.5, 0.75])
         OVER seven_days AS "kWh 7-day quartile",
       max(kWh) OVER seven_days AS "7-day max",
FROM v_power_per_day
WINDOW
    seven_days AS (
        PARTITION BY system_id, month(day)
        ORDER BY day ASC
        RANGE BETWEEN INTERVAL 3 Days PRECEDING
                  AND INTERVAL 3 Days FOLLOWING
    )
ORDER BY system_id, day;
Copy code

4.23 Filtering outside the where clause to find a time period with a high production over 7 daysSQL

SELECT system_id,
       day,
       avg(kWh) OVER (
            PARTITION BY system_id
            ORDER BY day ASC
            RANGE BETWEEN INTERVAL 3 Days PRECEDING
                      AND INTERVAL 3 Days FOLLOWING
       ) AS "kWh 7-day moving average"
FROM v_power_per_day
QUALIFY "kWh 7-day moving average" > 875 
ORDER BY system_id, day;
Copy code

4.26 Easily pivoting a result set (around a year)SQL

PIVOT (FROM v_power_per_day)
ON year(day)
USING sum(kWh);
Copy code

4.28 Using the ASOF join to pick prices that have been valid up to the point in time of sellingSQL

WITH prices AS (
  SELECT range AS valid_at,
         random()*10 AS price
  FROM range(
       '2023-01-01 01:00:00'::timestamp,
       '2023-01-01 02:00:00'::timestamp, INTERVAL '15 minutes')
),
sales AS (
  SELECT range AS sold_at,
         random()*10 AS num
  FROM range(
      '2023-01-01 01:00:00'::timestamp,
      '2023-01-01 02:00:00'::timestamp, INTERVAL '5 minutes')
)
SELECT sold_at, valid_at AS 'with_price_at', round(num * price,2) as price
FROM sales
ASOF JOIN prices
    ON prices.valid_at <= sales.sold_at;
Copy code

4.29 Using the ASOF join together with a window function to compute accumulated earningsSQL

SELECT power.day,
       power.kWh,
       prices.value as 'ct/kWh',
       round(sum(prices.value * power.kWh)
           OVER (ORDER BY power.day ASC) / 100, 2)
           AS 'Accumulated earnings in EUR'
FROM v_power_per_day power
    ASOF JOIN prices
    ON prices.valid_from <= power.day
WHERE system_id = 34
ORDER BY day;
Copy code

4.33 Using a LATERAL Join for looping through a JSON array returned by a solar radiation APISQL

-- The below query generates a series of 7 days, 
-- joins those with the hours 8, 13 and 19 (7pm) to create indexes.
-- Those indexes are the day number * 24 plus the desired hour of the
-- day to find the value in the JSON array.
-- That index is the lateral driver for the sub-query:
--
-- It then calls Open Meteo for Offenbach, apparently the city with 
-- the most sunshine in germany to get shortwave radition 
-- at those times
WITH days AS (
  SELECT generate_series AS value FROM generate_series(7)
), hours AS (
  SELECT unnest([8, 13, 18]) AS value
), indexes AS (
  SELECT days.value * 24 + hours.value AS i
  FROM days, hours
)
SELECT date_trunc('day', now()) - INTERVAL '7 days' +
         INTERVAL (indexes.i || ' hours') AS ts,
       ghi.v AS 'GHI in W/m^2'
FROM indexes,
LATERAL (
  SELECT hourly.shortwave_radiation_instant[i+1] AS v
  FROM read_json_auto('https://api.open-meteo.com/v1/forecast?latitude=48.47377&longitude=7.94495&hourly=shortwave_radiation_instant&past_days=7')
) AS ghi
ORDER BY ts;
Copy code

Show less

Michael Simons

Edited 02/23/24

‌

Split a String and cast all elements to a desired datatypeSQL

Execute this SQL

-- Here we make use of list_transform and 
-- the fact that DuckDB knows lambda expressions
SELECT list_transform(split('01:08:22', ':'), x -> CAST (x AS INTEGER));
Copy code

Michael Simons

Created 08/22/23

‌

Replacing one a column in a select * with an expression under a different nameSQL

Execute this SQL

-- * EXCLUDE plus any additional expression does the trick
SELECT * EXCLUDE (valid_from),  -- valid_from used in computation
       year(valid_from) AS year -- A new expression
FROM prices
 -- Note that the new expression can be referred to in the WHERE clause
WHERE year BETWEEN 2019 AND 2020;
Copy code

Michael Simons

Created 08/22/23

‌

Compute the length of a GPX track in metersSQL

Execute this SQL

LOAD spatial;

-- Pick for example any Garmin Connect or Strava export as GPX file
-- https://en.wikipedia.org/wiki/GPS_Exchange_Format

SELECT round(
         -- Compute the length of the geomentry in the units of the
         -- reference system being used
         ST_Length(
           -- Transform from standard GPS WGS-84 into a reference system 
           -- that uses meters. EPSG:25832 stands for ETRS89 / UTM zone 32N,
           -- Europe between 6°E and 12°E
           ST_transform(
             -- Fix the order of coordinates 
             -- (GXP read as long/lat, we need lat/long)
             ST_FlipCoordinates(
               -- Turn the WKT into a DuckDB Geometry
               ST_GeomFromWKB(wkb_geometry)
             ),
            'EPSG:4326',
            'EPSG:25832'
           )
       ) /1000, 2) AS 'Distance (km)'
FROM st_read('activity_11501782487.gpx', layer='tracks');
Copy code

Michael Simons

Created 07/08/23

‌

Compute the altitude gain and loss of a GPX trackSQL

Execute this SQL

-- This requires the spatial extension
LOAD spatial;

-- Pick for example any Garmin Connect or Strava export as GPX file
-- https://en.wikipedia.org/wiki/GPS_Exchange_Format

WITH elevation_gains AS (
  SELECT ele - lag(ele) OVER (ORDER BY track_seg_point_id ASC) AS value
  FROM st_read('activity_11501782487.gpx', layer='track_points')
)
SELECT 
  sum(e.value) FILTER (WHERE e.value >= 0) AS ascent,
  sum(e.value) FILTER (WHERE e.value < 0) AS descent
FROM elevation_gains e;
Copy code

Michael Simons

Created 07/08/23

Split strings into version numbers and order properSQL

Execute this SQL

Aggregate rows into a sorted list.SQL

Execute this SQL

DuckDB in Action: Some neat DuckDB specific SQL extension

Attach and select MotherDuck databaseData shared/available on MotherDuck

DuckDB specific extensions: Project all columns matching a patternSQL

DuckDB specific extensions: Apply an aggregation to several columnsSQL

DuckDB specific extensions: Apply one condition to many columnsSQL

DuckDB in Action, Examples from chapters 3 and 4: Having fun with power production measurements

Attach and select MotherDuck databaseData shared/available on MotherDuck

3.17 Using arg_max to find sibling values of aggregated values computed in a common table expressionSQL

4.18 Using named windows defined with a partition and a range to compute several aggregates at onceSQL

4.23 Filtering outside the where clause to find a time period with a high production over 7 daysSQL

4.26 Easily pivoting a result set (around a year)SQL

4.28 Using the ASOF join to pick prices that have been valid up to the point in time of sellingSQL

4.29 Using the ASOF join together with a window function to compute accumulated earningsSQL

4.33 Using a LATERAL Join for looping through a JSON array returned by a solar radiation APISQL

Split a String and cast all elements to a desired datatypeSQL

Execute this SQL

Replacing one a column in a select * with an expression under a different nameSQL

Execute this SQL

Compute the length of a GPX track in metersSQL

Execute this SQL

Compute the altitude gain and loss of a GPX trackSQL

Execute this SQL

Attach and select MotherDuck database
Data shared/available on MotherDuck

Attach and select MotherDuck database
Data shared/available on MotherDuck