Use method chaining to filter tables
This is just a simple example of how to implement method chaining using `list` and `unnest` without any other extensions to provide a functional interface.
ExampleSQL
SELECT list(t) .list_filter(lambda c: c.registration_date >= '2023-02-01') .list_transform(lambda c: struct_insert(c, is_north_american:=c.country in ('USA', 'Mexico', 'Canada'))) .list_filter(lambda c: c.is_north_american) .unnest(recursive:=true) FROM 'customers.csv' AS t;
Copy code
customers.csvBash
customer_id,name,email,age,tier,country,registration_date,lifetime_value 1,Alice Johnson,alice.johnson@email.com,29,Premium,USA,2023-01-15,2847.50 2,Bob Smith,bob.smith@email.com,34,Standard,Canada,2023-02-03,1256.80 3,Charlie Brown,charlie.brown@email.com,42,Premium,UK,2023-01-22,3421.90 4,Diana Ross,diana.ross@email.com,38,Gold,USA,2023-03-10,5678.25 5,Eve Wilson,eve.wilson@email.com,26,Standard,Australia,2023-02-18,845.60 6,Frank Miller,frank.miller@email.com,31,Premium,Germany,2023-01-08,2156.75 7,Grace Lee,grace.lee@email.com,28,Standard,Canada,2023-03-05,967.40 8,Henry Davis,henry.davis@email.com,45,Gold,USA,2023-01-29,4532.80 9,Ivy Chen,ivy.chen@email.com,33,Premium,Singapore,2023-02-14,2890.45 10,Jack Wilson,jack.wilson@email.com,39,Standard,UK,2023-03-12,1134.20 11,Kate Brown,kate.brown@email.com,27,Premium,Australia,2023-01-18,2234.85 12,Leo Garcia,leo.garcia@email.com,36,Gold,Spain,2023-02-27,3789.60 13,Mia Taylor,mia.taylor@email.com,30,Standard,USA,2023-03-08,1089.95 14,Noah Kim,noah.kim@email.com,41,Premium,South Korea,2023-01-25,2567.30 15,Olivia Jones,olivia.jones@email.com,35,Gold,Canada,2023-02-11,4123.75 16,Paul Anderson,paul.anderson@email.com,32,Standard,UK,2023-03-15,876.40 17,Quinn Lee,quinn.lee@email.com,29,Premium,Singapore,2023-01-12,2678.95 18,Rachel Green,rachel.green@email.com,37,Gold,USA,2023-02-06,3956.80 19,Sam Wilson,sam.wilson@email.com,28,Standard,Australia,2023-03-01,1245.30 20,Tina Rodriguez,tina.rodriguez@email.com,44,Premium,Mexico,2023-01-31,2345.65
Copy code
Expand
Share link
Top values of all columnsSQL
Get the top values by count for all columns, useful for searching and analysis
Execute this SQL
select approx_top_k(columns(*), 3) from tbl;
Copy code
Expand
Share link
Remove NULL columnsSQL
Clean a table removing columns with only NULL values
Execute this SQL
CREATE OR REPLACE MACRO notnullcols(tablename) AS ( WITH t1 AS ( UNPIVOT( FROM query_table(tablename) SELECT max( columns(*)) ) ON columns(*)::varchar ) FROM t1 SELECT list(name) ) ; SET VARIABLE notnullcols = notnullcols(t); FROM t SELECT columns(c -> list_contains(getvariable('notnullcols'), c)) ;
Copy code
Expand
Share link
Delete NULL lines SQL
Clean a table removing lines where all cols values are NULL
Execute this SQL
WITH t AS ( VALUES (NULL, NULL), ('john', 'smith'), ('mike', NULL), (NULL, 'jones'), (NULL, NULL) ) FROM t EXCEPT FROM t WHERE COLUMNS(*) IS NULL ;
Copy code
Expand
Share link
Macro to glimpse the table in the current DBSQL
An attempt to replicate the functionality of dplyrs (and polars) glimpse, with a simple function which takes the name of a table in the current database as a quoted string. Shows column name, data types and the first few values. Intended for use in the CLI. usage: FROM glimpse('table_name');
Execute this SQL
CREATE OR REPLACE MACRO glimpse(table_name) AS TABLE WITH TableSchema AS ( SELECT cid, name AS column_name, type AS column_type FROM pragma_table_info(table_name) ), SampleData AS ( -- Select the first 5 rows from the target table SELECT * FROM query_table(table_name) LIMIT 5 ), SampleDataUnpivoted AS ( -- Unpivot the sample data: columns become rows UNPIVOT (SELECT list(COLUMNS(*)::VARCHAR) FROM SampleData) ON COLUMNS(*) INTO NAME column_name VALUE sample_values_list -- This will be a list of strings ) -- Final selection joining schema info with sample data SELECT ts.column_name, ts.column_type, -- Convert the list to string and remove brackets for cleaner display regexp_replace(usp.sample_values_list::VARCHAR, '^\[|\]$', '', 'g') AS sample_data FROM TableSchema ts JOIN SampleDataUnpivoted usp ON ts.column_name = usp.column_name ORDER BY ts.cid;
Copy code
Expand
Share link
Easy GraphQL QueryingSQL
This uses the http_client and json extensions to create a (relatively) flexible and easy to use GraphQL reader. This only supports POST, but it'd be pretty easy to change to GET. There are parameters to control what and how the results are extracted into a DuckDB table.
Execute this SQL
INSTALL http_client FROM community; LOAD json; LOAD http_client; -- Run a GraphQL query and get back fields with various diagnostics and intermediate values CREATE OR REPLACE MACRO query_graphql(endpoint, query, -- GraphGQL endpoint and query params:=NULL, -- GraphQL parameters headers:=MAP{}, -- HTTP headers (e.g., auth) result_path:='$', -- json_path to use as results result_structure:=NULL, -- json_structure for results or auto detect strict:=true -- Error on failure to transform results ) AS TABLE SELECT response: http_post(endpoint, headers, IF(params IS NULL, {'query': query}, {'query': query, 'params': params})), body: (response->>'body')::JSON, json: body->result_path, structure: IF(result_structure IS NULL, json_structure(json), result_structure), result: IF(strict, from_json_strict(json, structure), from_json(json, structure)); -- Simple helper to extract GraphQL results to a "normal" table CREATE OR REPLACE MACRO read_graphql(endpoint, query, -- Same as above ... params:=NULL, headers:=MAP{}, result_path:='$', result_structure:=NULL, strict:=true, -- ... Same as above unnest_levels:=0 -- Apply unnest this many times to create table ) AS TABLE WITH nested AS ( SELECT [result] AS result FROM query_graphql(endpoint, query, params:=params, headers:=headers, result_path:=result_path, result_structure:=result_structure, strict:=strict) ) SELECT unnest(result, max_depth:=unnest_levels+1) AS result FROM nested; ------------------- Example usage --------------------- FROM read_graphql('https://rickandmortyapi.com/graphql', $$ query Query($name: String) { characters(page: 2, filter: {name: $name}) { info { count } results { id name gender } } } $$, params:={'name': 'Morty'}, result_path:='$.data.characters.results', result_structure:=[{id:'int',name:'string',gender:'string'}], unnest_levels:=2 ); -- Example Results ------------------ -- ┌─────────┬────────────────────┬─────────┐ -- │ id │ name │ gender │ -- │ int32 │ varchar │ varchar │ -- ├─────────┼────────────────────┼─────────┤ -- │ 21 │ Aqua Morty │ Male │ -- ... -- ├─────────┴────────────────────┴─────────┤ -- │ 20 rows 3 columns │ -- └────────────────────────────────────────┘
Copy code
Expand
Share link
Export/import/share DuckDB UI Notebooks
The DuckDB UI stores notebook content in an internal database called _duckdb_ui. You can query and export notebook content, as well as insert new definitions into the database. Warning: Modifying the internal database may lead to corruption and data loss. Be cautious and use it at your own risk!
Export a notebook definition to JSONSQL
copy ( select "json" from _duckdb_ui.notebook_versions where 1=1 and title = 'MySingleNotebook' and expires is null ) to 'exported-notebook.json'
Copy code
Import notebook definitionSQL
set variable notebook_content = (select json from 'exported-notebook.json'); set variable notebook_id = uuid(); set variable current_timestamp = now(); begin transaction; insert into _duckdb_ui.notebooks (id, name, created) select getvariable('notebook_id'), 'notebook_' || getvariable('notebook_id'), getvariable('current_timestamp') ; insert into _duckdb_ui.notebook_versions (notebook_id, version, title, json, created, expires) select getvariable('notebook_id'), 1, 'imported-notebook-' || getvariable('current_timestamp'), getvariable('notebook_content'), getvariable('current_timestamp'), null ; commit;
Copy code