Show supported DuckDB extensionsSQL

Editor's note: DuckDB has many supported extensions for everything from data formats (JSON, parquet, excel, iceberg) to specific types (IP addresses, time zones) to indexing (full-text-search) and more. This table function will tell you which extensions are supported in your local DuckDB install.

Execute this SQL

-- show supported duckdb extensions
FROM duckdb_extensions();

Copy code

Mehdi Ouazza

Expand

Share link


ATTACH 'other.db';SQL

Editor's note: DuckDB allows you to attach multiple databases at once. For example, you can attach a local file, an in-memory database and a database from MotherDuck and work with all of them simultaneously. The ATTACH statement is executed for each database to be attached.

Execute this SQL

-- attach another database, alias inferred from the name ("other")
ATTACH 'other.db';
SELECT * FROM other.some_table;

Copy code

Carlo Piovesan

Expand

Share link


No more error on on end of line commasSQL

Editor's note: this query demonstrates default behavior for DuckDB, but boy does it make it easier to comment your SQL lines out without fail.

Execute this SQL

# 🤓 Courtesy of Michael Simons (aka. @rotnroll666)
# 🐦 https://twitter.com/rotnroll666/status/1671066790368010241

SELECT foo,
    bar,
    # hello,
    world,
    # dummy,
FROM bazbar;

Copy code

SALES

Expand

Share link


build-your-own SUMMARIZE with columns(*), unpivot, and struct.*SQL

Editor's note: Although SUMMARIZE() is fully supported in DuckDB, Hamilton shows you how to build this capability yourself using more primitive functions.

Execute this SQL

-- C = number of columns in TBL, S = number of aggregates
-- ------------------------------------------------------
-- (1) use columns(*) to produce S aggregates
-- for each column as a single struct,
-- producing 1 struct row and C columns
with aggregate as (
  from sf311 select 
    {
      name: first(alias(columns(*))),
      type: first(typeof(columns(*))),
      max: max(columns(*))::varchar,
      min: min(columns(*))::varchar,
      approx_unique: approx_count_distinct(columns(*)),
      nulls: count(*) - count(columns(*))
    }
),
-- (2) unpivot to get a result with C rows and 1 column of structs
columns as (unpivot aggregate on columns(*))
-- (3) explode the struct to get a result with C rows and S columns
select value.* from columns;

Copy code

Hamilton Ulmer

Expand

Share link


City air quality insights based on WHO data

Editor's note: Mehdi's database share includes air quality data from the World Health Organization (WHO). Use his example queries to understand pollution in particular areas. You might even try combining with the spatial extension discussed in other snippets.

Attach and select MotherDuck database

Data shared/available on MotherDuck

ATTACH 'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6' AS sample_data;
USE sample_data;

Copy code

Annual city air quality rating based on WHO dataSQL

SELECT
    city,
    year,
    CASE
        WHEN
            AVG(pm25_concentration) <= 10
            AND AVG(pm10_concentration) <= 20
            AND AVG(no2_concentration) <= 40
            THEN 'Good'
        WHEN
            AVG(pm25_concentration) > 10
            AND AVG(pm10_concentration) > 20
            AND AVG(no2_concentration) > 40
            THEN 'Poor'
        ELSE 'Moderate'
    END AS airqualityrating
FROM
    sample_data.who.ambient_air_quality
GROUP BY
    city,
    year
ORDER BY
    city,
    year;

Copy code

Yearly average pollutant concentrations of a city (Berlin)SQL

SELECT
    year,
    AVG(pm25_concentration) AS avg_pm25,
    AVG(pm10_concentration) AS avg_pm10,
    AVG(no2_concentration) AS avg_no2
FROM sample_data.who.ambient_air_quality 
WHERE city = 'Berlin'
GROUP BY year
ORDER BY year DESC;

Copy code

Mehdi Ouazza

Expand

Share link


Loading remote parquet files

Editor's note: DuckDB supports lambda functions for easy transformation. It also supports reading data over http/https [in addition from blob stores]. Simon's snippet shows you how to use both of those capabilities to dynamically generate filenames and query them when globbing is unsupported.

Load a single parquet fileSQL

-- Query a single remote parquet file 
-- Query the January data from NY taxi data set
SELECT tpep_pickup_datetime, total_amount
FROM read_parquet('https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-01.parquet')
LIMIT 10;

Copy code

Dynamically iterate over remote parquet files - lambda function and list_transformSQL

-- Dynamically create the expected filenames to load
-- eg, the files names `2023-01.parquet` to `2023-06.parquet`
-- a lambda function can use the result of the generate_series(1, 6) 
-- and the format command can the `{:02d}` format specifier 
-- to left-pad filenames with 0's

SELECT filename, 
min(tpep_pickup_datetime), 
max(tpep_pickup_datetime), 
avg(total_amount)
FROM read_parquet(
    list_transform(
        generate_series(1, 6),
        n -> format('https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-{:02d}.parquet', n) 
    ), filename=true
)
GROUP BY filename
ORDER BY filename;

Copy code

Simon Aubury

Expand

Share link