Show supported DuckDB extensionsSQL
Editor's note: DuckDB has many supported extensions for everything from data formats (JSON, parquet, excel, iceberg) to specific types (IP addresses, time zones) to indexing (full-text-search) and more. This table function will tell you which extensions are supported in your local DuckDB install.
Execute this SQL
-- show supported duckdb extensions FROM duckdb_extensions();
Copy code
Expand
Share link
ATTACH 'other.db';SQL
Editor's note: DuckDB allows you to attach multiple databases at once. For example, you can attach a local file, an in-memory database and a database from MotherDuck and work with all of them simultaneously. The ATTACH statement is executed for each database to be attached.
Execute this SQL
-- attach another database, alias inferred from the name ("other") ATTACH 'other.db'; SELECT * FROM other.some_table;
Copy code
Expand
Share link
No more error on on end of line commasSQL
Editor's note: this query demonstrates default behavior for DuckDB, but boy does it make it easier to comment your SQL lines out without fail.
Execute this SQL
# 🤓 Courtesy of Michael Simons (aka. @rotnroll666) # 🐦 https://twitter.com/rotnroll666/status/1671066790368010241 SELECT foo, bar, # hello, world, # dummy, FROM bazbar;
Copy code
Expand
Share link
build-your-own SUMMARIZE with columns(*), unpivot, and struct.*SQL
Editor's note: Although SUMMARIZE() is fully supported in DuckDB, Hamilton shows you how to build this capability yourself using more primitive functions.
Execute this SQL
-- C = number of columns in TBL, S = number of aggregates -- ------------------------------------------------------ -- (1) use columns(*) to produce S aggregates -- for each column as a single struct, -- producing 1 struct row and C columns with aggregate as ( from sf311 select { name: first(alias(columns(*))), type: first(typeof(columns(*))), max: max(columns(*))::varchar, min: min(columns(*))::varchar, approx_unique: approx_count_distinct(columns(*)), nulls: count(*) - count(columns(*)) } ), -- (2) unpivot to get a result with C rows and 1 column of structs columns as (unpivot aggregate on columns(*)) -- (3) explode the struct to get a result with C rows and S columns select value.* from columns;
Copy code
Expand
Share link
City air quality insights based on WHO data
Editor's note: Mehdi's database share includes air quality data from the World Health Organization (WHO). Use his example queries to understand pollution in particular areas. You might even try combining with the spatial extension discussed in other snippets.
Annual city air quality rating based on WHO dataSQL
SELECT city, year, CASE WHEN AVG(pm25_concentration) <= 10 AND AVG(pm10_concentration) <= 20 AND AVG(no2_concentration) <= 40 THEN 'Good' WHEN AVG(pm25_concentration) > 10 AND AVG(pm10_concentration) > 20 AND AVG(no2_concentration) > 40 THEN 'Poor' ELSE 'Moderate' END AS airqualityrating FROM sample_data.who.ambient_air_quality GROUP BY city, year ORDER BY city, year;
Copy code
Yearly average pollutant concentrations of a city (Berlin)SQL
SELECT year, AVG(pm25_concentration) AS avg_pm25, AVG(pm10_concentration) AS avg_pm10, AVG(no2_concentration) AS avg_no2 FROM sample_data.who.ambient_air_quality WHERE city = 'Berlin' GROUP BY year ORDER BY year DESC;
Copy code
Expand
Share link
Loading remote parquet files
Editor's note: DuckDB supports lambda functions for easy transformation. It also supports reading data over http/https [in addition from blob stores]. Simon's snippet shows you how to use both of those capabilities to dynamically generate filenames and query them when globbing is unsupported.
Load a single parquet fileSQL
-- Query a single remote parquet file -- Query the January data from NY taxi data set SELECT tpep_pickup_datetime, total_amount FROM read_parquet('https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-01.parquet') LIMIT 10;
Copy code
Dynamically iterate over remote parquet files - lambda function and list_transformSQL
-- Dynamically create the expected filenames to load -- eg, the files names `2023-01.parquet` to `2023-06.parquet` -- a lambda function can use the result of the generate_series(1, 6) -- and the format command can the `{:02d}` format specifier -- to left-pad filenames with 0's SELECT filename, min(tpep_pickup_datetime), max(tpep_pickup_datetime), avg(total_amount) FROM read_parquet( list_transform( generate_series(1, 6), n -> format('https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-{:02d}.parquet', n) ), filename=true ) GROUP BY filename ORDER BY filename;
Copy code
Expand
Share link