SQL, Python & More for DuckDB - Page 7

‌

Compute a metric for each numeric column and return the values in a long table (requires 0.7.2+)SQL

Editor's note: if your data originates as different types or in a format like CSV, you might want to do so without risking throwing an error for oddly-typed values. You can do so with TRY_CAST(), which will attempt a CAST but return NULL if not possible.

Execute this SQL

with computed as (
    select sum(try_cast(columns(*) as double))
    from read_csv_auto('aapl.csv')
)

select
    -- restore original column names
    trim(list_element(regexp_extract_all(name,'\.(.*?) AS',1),1),'"') as name,
    value
from (pivot_longer computed on *)
Copy code

Ben Ayre

Edited 02/23/24

‌

Read an MS Excel File with the spatial extensionSQL

Editor's note: It might seem a bit odd, but the DuckDB spatial extension includes a function for reading Microsoft Excel XLSX files into DuckDB. This is because a lot of geospatial files are shared this way, but you can take advantage of this capability even if you have no spatial data!

Execute this SQL

install spatial;
load spatial;
from st_read('file.xlsx',layer='sheet_name');
Copy code

Edited 02/23/24

‌

DuckDB in Action, Examples from chapters 3 and 4: Having fun with power production measurements

Editor's note: the Manning book "DuckDB in Action" has tons of great SQL exercises, including a few chapters which work on power production data. Michael, one of the authors, shares some of these queries here [and on other snippets published on this site].

Attach and select MotherDuck database
Data shared/available on MotherDuck

ATTACH 'md:_share/duckdb_in_action_ch3_4/d0c08584-1d33-491c-8db7-cf9c6910eceb' AS duckdb_in_action_ch3_4;
USE duckdb_in_action_ch3_4;
Copy code

3.17 Using arg_max to find sibling values of aggregated values computed in a common table expressionSQL

WITH per_hour AS (
    SELECT system_id,
           date_trunc('hour', read_on) AS read_on,
           avg(power) / 1000 AS kWh
    FROM readings
    GROUP BY ALL
)
SELECT name,
       max(kWh),
       arg_max(read_on, kWh) AS 'Read on'
FROM per_hour
   JOIN systems s ON s.id = per_hour.system_id
WHERE system_id = 34
GROUP by s.name;
Copy code

4.18 Using named windows defined with a partition and a range to compute several aggregates at onceSQL

SELECT system_id,
       day,
       min(kWh) OVER seven_days AS "7-day min",
       quantile(kWh, [0.25, 0.5, 0.75])
         OVER seven_days AS "kWh 7-day quartile",
       max(kWh) OVER seven_days AS "7-day max",
FROM v_power_per_day
WINDOW
    seven_days AS (
        PARTITION BY system_id, month(day)
        ORDER BY day ASC
        RANGE BETWEEN INTERVAL 3 Days PRECEDING
                  AND INTERVAL 3 Days FOLLOWING
    )
ORDER BY system_id, day;
Copy code

4.23 Filtering outside the where clause to find a time period with a high production over 7 daysSQL

SELECT system_id,
       day,
       avg(kWh) OVER (
            PARTITION BY system_id
            ORDER BY day ASC
            RANGE BETWEEN INTERVAL 3 Days PRECEDING
                      AND INTERVAL 3 Days FOLLOWING
       ) AS "kWh 7-day moving average"
FROM v_power_per_day
QUALIFY "kWh 7-day moving average" > 875 
ORDER BY system_id, day;
Copy code

4.26 Easily pivoting a result set (around a year)SQL

PIVOT (FROM v_power_per_day)
ON year(day)
USING sum(kWh);
Copy code

4.28 Using the ASOF join to pick prices that have been valid up to the point in time of sellingSQL

WITH prices AS (
  SELECT range AS valid_at,
         random()*10 AS price
  FROM range(
       '2023-01-01 01:00:00'::timestamp,
       '2023-01-01 02:00:00'::timestamp, INTERVAL '15 minutes')
),
sales AS (
  SELECT range AS sold_at,
         random()*10 AS num
  FROM range(
      '2023-01-01 01:00:00'::timestamp,
      '2023-01-01 02:00:00'::timestamp, INTERVAL '5 minutes')
)
SELECT sold_at, valid_at AS 'with_price_at', round(num * price,2) as price
FROM sales
ASOF JOIN prices
    ON prices.valid_at <= sales.sold_at;
Copy code

4.29 Using the ASOF join together with a window function to compute accumulated earningsSQL

SELECT power.day,
       power.kWh,
       prices.value as 'ct/kWh',
       round(sum(prices.value * power.kWh)
           OVER (ORDER BY power.day ASC) / 100, 2)
           AS 'Accumulated earnings in EUR'
FROM v_power_per_day power
    ASOF JOIN prices
    ON prices.valid_from <= power.day
WHERE system_id = 34
ORDER BY day;
Copy code

4.33 Using a LATERAL Join for looping through a JSON array returned by a solar radiation APISQL

-- The below query generates a series of 7 days, 
-- joins those with the hours 8, 13 and 19 (7pm) to create indexes.
-- Those indexes are the day number * 24 plus the desired hour of the
-- day to find the value in the JSON array.
-- That index is the lateral driver for the sub-query:
--
-- It then calls Open Meteo for Offenbach, apparently the city with 
-- the most sunshine in germany to get shortwave radition 
-- at those times
WITH days AS (
  SELECT generate_series AS value FROM generate_series(7)
), hours AS (
  SELECT unnest([8, 13, 18]) AS value
), indexes AS (
  SELECT days.value * 24 + hours.value AS i
  FROM days, hours
)
SELECT date_trunc('day', now()) - INTERVAL '7 days' +
         INTERVAL (indexes.i || ' hours') AS ts,
       ghi.v AS 'GHI in W/m^2'
FROM indexes,
LATERAL (
  SELECT hourly.shortwave_radiation_instant[i+1] AS v
  FROM read_json_auto('https://api.open-meteo.com/v1/forecast?latitude=48.47377&longitude=7.94495&hourly=shortwave_radiation_instant&past_days=7')
) AS ghi
ORDER BY ts;
Copy code

Show less

Michael Simons

Edited 02/23/24

Compute a metric for each numeric column and return the values in a long table (requires 0.7.2+)SQL

Execute this SQL

Read an MS Excel File with the spatial extensionSQL

Execute this SQL

DuckDB in Action, Examples from chapters 3 and 4: Having fun with power production measurements

Attach and select MotherDuck databaseData shared/available on MotherDuck

3.17 Using arg_max to find sibling values of aggregated values computed in a common table expressionSQL

4.18 Using named windows defined with a partition and a range to compute several aggregates at onceSQL

4.23 Filtering outside the where clause to find a time period with a high production over 7 daysSQL

4.26 Easily pivoting a result set (around a year)SQL

4.28 Using the ASOF join to pick prices that have been valid up to the point in time of sellingSQL

4.29 Using the ASOF join together with a window function to compute accumulated earningsSQL

4.33 Using a LATERAL Join for looping through a JSON array returned by a solar radiation APISQL

Attach and select MotherDuck database
Data shared/available on MotherDuck