Query an Authenticated API EndpointSQL
Hit an endpoint that needs an API key. In this case, Stripe. The demo uses a public test stripe API key from the Stripe docs
Execute this SQL
CREATE SECRET http ( TYPE HTTP, EXTRA_HTTP_HEADERS MAP { 'Authorization': 'Bearer sk_test_VePHdqKTYQjKNInc7u56JBrQ' } ); select unnest(data) as customers from read_json('https://api.stripe.com/v1/customers');
Copy code
Expand
Share link
Unnest JSON Array into Rows (pseudo-json_each)SQL
Editor's note: as APIs and applications increasingly represent tabular data in JSON format, we sometimes want the comfort of seeing it as rows, queryable in SQL. DuckDB has a great extension for parsing JSON, including the ability to convert into a DuckDB list which can then be unnested into rows.
Execute this SQL
select -- Split DuckDB list into multiple rows unnest( -- Convert JSON array into DuckDB list of JSON elements from_json( json('[{"col1":1, "col2":2}, {"col1":10, "col2":20}]'), '["JSON"]' ) ); /* Returns: {"col1":1,"col2":2} {"col1":10,"col2":20} */
Copy code
Expand
Share link
KMeans on one dimensional data with recursive CTESQL
Compute kmeans thresholds from a table with 2 columns : id (unique) and numeric. Outputs a list. Easy to extend to 2 dimensions data (x,y).
Execute this SQL
CREATE FUNCTION kmeans(tname, idcol_name, numcol_name, bins:=5, maxiter:=100) AS ( WITH RECURSIVE clusters(iter, cid, x) AS ( WITH t1 AS (FROM query_table(tname) SELECT idcol_name AS id, numcol_name AS x) (SELECT 0, id, x FROM t1 LIMIT bins-1) UNION ALL SELECT iter + 1, cid, avg(px) FROM ( SELECT iter, cid, p.x as px, rank() OVER (PARTITION BY p.id ORDER BY (p.x-c.x)^2, c.x^2) r FROM t1 p, clusters c ) x WHERE x.r = 1 and iter < maxiter GROUP BY ALL ) SELECT list(x) FROM (FROM clusters WHERE iter = maxiter ORDER BY x) ) ;
Copy code
Expand
Share link
build-your-own SUMMARIZE with columns(*), unpivot, and struct.*SQL
Editor's note: Although SUMMARIZE() is fully supported in DuckDB, Hamilton shows you how to build this capability yourself using more primitive functions.
Execute this SQL
-- C = number of columns in TBL, S = number of aggregates -- ------------------------------------------------------ -- (1) use columns(*) to produce S aggregates -- for each column as a single struct, -- producing 1 struct row and C columns with aggregate as ( from sf311 select { name: first(alias(columns(*))), type: first(typeof(columns(*))), max: max(columns(*))::varchar, min: min(columns(*))::varchar, approx_unique: approx_count_distinct(columns(*)), nulls: count(*) - count(columns(*)) } ), -- (2) unpivot to get a result with C rows and 1 column of structs columns as (unpivot aggregate on columns(*)) -- (3) explode the struct to get a result with C rows and S columns select value.* from columns;
Copy code
Expand
Share link
Parquet to ExcelSQL
Editor's note: although not very intuitive, format for MS Excel XLSX files is provided by the spatial extension. While other snippets show you how to read Excel files, this shows you how to write data from a named Parquet file (or any DuckDB resultset) to Excel.
Execute this SQL
INSTALL spatial; LOAD spatial; COPY (SELECT * FROM 'source.parquet') TO 'target.XLSX' WITH (FORMAT GDAL, DRIVER 'XLSX');
Copy code
Expand
Share link
Execute this SQL
COPY (SELECT * FROM read_csv_auto( 'http://raw.githubusercontent.com/fivethirtyeight/data/master/bechdel/movies.csv')) TO 'movies.parquet' (FORMAT 'parquet');
Copy code
Copy code
Expand
Share link
Dynamic SQL in DuckDB CLISQL
Editor's note: this gets a little recursive for all those SQL nerds out there. It shows you how to use SQL to generate SQL. It also shows you some of the dot commands for interacting with the DuckDB CLI: changing the output format and writing SQL to a file which is then read in for execution.
Execute this SQL
-- Use SQL to generate SQL statements. -- Write them to a file, then execute that file. -- This uses DuckDB CLI-specific features, so it only works in the CLI. .mode list .header off .once my_dynamic_sql.sql select 'CREATE TABLE t' || i || ' AS SELECT ' || i || ' AS my_column;' from generate_series(5) x(i); .read my_dynamic_sql.sql select table_name from duckdb_tables(); -- Check our work!
Copy code
Expand
Share link
DuckDB in Action: Some neat DuckDB specific SQL extension
Editor's note: DuckDB strives to make it easy to write SQL, even when it requires introducing non-standard syntax. See the great blog posts by Alex Monahan or explore the Manning "DuckDB in Action" book by the author of this snippet.
DuckDB specific extensions: Project all columns matching a patternSQL
SELECT COLUMNS('valid.*') FROM prices LIMIT 3;
Copy code
DuckDB specific extensions: Apply an aggregation to several columnsSQL
SELECT max(COLUMNS('valid.*')) FROM prices;
Copy code
DuckDB specific extensions: Apply one condition to many columnsSQL
FROM prices WHERE COLUMNS('valid.*') BETWEEN '2020-01-01' AND '2021-01-01';
Copy code
Expand
Share link