Load content from Strapi CMS REST API to Parquet fileSQL
A nice trick to load data from Strapi CMS. The Api Token can be obtained in the Settings menu of Strapi. A nice way to let users maintain reference data using the CMS and be able to use it directly in DuckDB. Should work for both Strapi self-hosted and cloud.
Execute this SQL
INSTALL httpfs; LOAD httpfs; CREATE SECRET http ( TYPE HTTP, EXTRA_HTTP_HEADERS MAP { 'Authorization': 'Bearer [Api Token]' } ); -- Replace strapi.mydomain.com with your Strapi URL and replace `pets` with your content type COPY (SELECT unnest(data, recursive:= true) FROM read_json_auto('https://strapi.mydomain.com/api/pets')) TO 'pets.parquet';
Copy code
Expand
Share link
Reading a fixed-width file in DuckDBSQL
Fixed-width files can be little difficult but IF you consider each line of data as a string which you can attack with duckdb and substr() its not that difficult ;)
Execute this SQL
CREATE OR REPLACE TABLE example_table AS SELECT CAST(substr(line, 1, 4) AS INTEGER) AS activity_year, CAST(substr(line, 5, 10) AS VARCHAR(10)) AS lei_or_respondent_id, CAST(substr(line, 15, 1) AS CHAR(1)) AS agency_code, CAST(substr(line, 16, 1) AS CHAR(1)) AS loan_type, CAST(substr(line, 17, 1) AS CHAR(1)) AS loan_purpose, CAST(substr(line, 18, 1) AS CHAR(1)) AS occupancy_type FROM (SELECT column0 AS line FROM read_csv('data.txt', AUTO_DETECT=TRUE, skip=1));
Copy code
Expand
Share link
Query an Authenticated API EndpointSQL
Hit an endpoint that needs an API key. In this case, Stripe. The demo uses a public test stripe API key from the Stripe docs
Execute this SQL
CREATE SECRET http ( TYPE HTTP, EXTRA_HTTP_HEADERS MAP { 'Authorization': 'Bearer sk_test_VePHdqKTYQjKNInc7u56JBrQ' } ); select unnest(data) as customers from read_json('https://api.stripe.com/v1/customers');
Copy code
Expand
Share link
Discretize numeric column following specified thresholdsSQL
discretize() converts a numeric column into discrete ordered ids, taking into account a list of thresholds.
Execute this SQL
CREATE OR REPLACE MACRO discretize(v, l) AS ( WITH t1 AS ( SELECT unnest(list_distinct(l)) as j ), t2 AS ( SELECT COUNT(*) + 1 c FROM t1 WHERE try_cast(j AS float) <= v ) FROM t2 SELECT IF(v IS NULL, NULL, c) ) ; --Usage FROM 'https://raw.githubusercontent.com/thewiremonkey/factbook.csv/master/data/c2127.csv' SELECT name, value, discretize(value, [2,3,4,5]) AS class ;
Copy code
Expand
Share link
KMeans on one dimensional data with recursive CTESQL
Compute kmeans thresholds from a table with 2 columns : id (unique) and numeric. Outputs a list. Easy to extend to 2 dimensions data (x,y).
Execute this SQL
CREATE FUNCTION kmeans(tname, idcol_name, numcol_name, bins:=5, maxiter:=100) AS ( WITH RECURSIVE clusters(iter, cid, x) AS ( WITH t1 AS (FROM query_table(tname) SELECT idcol_name AS id, numcol_name AS x) (SELECT 0, id, x FROM t1 LIMIT bins-1) UNION ALL SELECT iter + 1, cid, avg(px) FROM ( SELECT iter, cid, p.x as px, rank() OVER (PARTITION BY p.id ORDER BY (p.x-c.x)^2, c.x^2) r FROM t1 p, clusters c ) x WHERE x.r = 1 and iter < maxiter GROUP BY ALL ) SELECT list(x) FROM (FROM clusters WHERE iter = maxiter ORDER BY x) ) ;
Copy code
Expand
Share link
Add a unique column (key) to an existing table in DuckDBSQL
We can use DuckDB's `SEQUENCE` to add a unique key to a table.Many source data files (such as CSV files) do not include a unique key.
Execute this SQL
CREATE TABLE tbl (s VARCHAR); INSERT INTO tbl VALUES ( 'hello'),('world'); CREATE SEQUENCE id_sequence; ALTER TABLE tbl ADD COLUMN id INTEGER DEFAULT nextval ('id_sequence'); SELECT * FROM tbl;
Copy code
Expand
Share link
Query from Google SheetsSQL
Sometimes you have data over in Google Sheets and you want quickly and easily load it into DuckDB. With read_csv and the Google Sheets 'export to csv', its just one line of code.
Execute this SQL
-- you will need to create a sharelink for your sheet. -- note the tab id (gid) and sheet id (in the url) from read_csv('https://docs.google.com/spreadsheets/d/{sheet_id}/export?format=csv&gid={tab_id}')
Copy code
Expand
Share link
Emulate the Initcap functionSQL
In many database systems, there is a SQL function called something like INITCAP which capitalizes the first letter of all the words in a text string. Unfortunately, DuckDB doesn’t have this built-in, so let’s see if we can emulate it using function chaining and list comprehension.
Execute this SQL
SELECT ([upper (x[1])||x[2:] for x in ( 'the quick brown fox jumped over the lazy dog' ).string_split(' ')]).list_aggr('string_agg',' ')
Copy code
Expand
Share link