UK Bank HolidaysSQL
The UK government publish the UK bank holidays as a JSON object at a public endpoint. DuckDB can read the data from the endpoint, unpivot it, and expand it in a few lines!
Execute this SQL
/* division, title, date, notes, bunting */ select division, unnest(events.events, recursive:=true) from ( unpivot 'https://www.gov.uk/bank-holidays.json' on "england-and-wales", "scotland", "northern-ireland" into name division value events )
Copy code
Expand
Share link
Pretty-printing floating-pont numbersSQL
By casting the result to a `DECIMAL` with a fixed number of digits after the decimal point, we can pretty-print it while keeping it as a numeric value.
Execute this SQL
COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x)) TO 'example.csv'; SELECT x::DECIMAL(15, 3) AS x FROM 'example.csv';
Copy code
Expand
Share link
Copying the schema of a tableSQL
We can use `LIMIT` to copy the schema of an existing table without repeating the specification of its column names/types.
Execute this SQL
COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x)) TO 'example.csv'; CREATE TABLE tbl AS FROM example LIMIT 0;
Copy code
Expand
Share link
Deterministically shuffling dataSQL
Sometimes it's beneficial to shuffle data sets in a deterministic, repeatable way. We can achieve this by using the `hash` function on the `rowid` pseudocolumn.
Execute this SQL
COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x)) TO 'example.csv'; CREATE OR REPLACE TABLE example AS FROM 'example.csv'; FROM example ORDER BY hash(rowid + 42);
Copy code
Expand
Share link
Updating CSV files in-placeSQL
DuckDB allows updating CSV files in-place.
Execute this SQL
COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x)) TO 'example.csv'; COPY (SELECT s FROM 'example.csv') TO 'example.csv';
Copy code
Expand
Share link
Query from Google SheetsSQL
Sometimes you have data over in Google Sheets and you want quickly and easily load it into DuckDB. With read_csv and the Google Sheets 'export to csv', its just one line of code.
Execute this SQL
-- you will need to create a sharelink for your sheet. -- note the tab id (gid) and sheet id (in the url) from read_csv('https://docs.google.com/spreadsheets/d/{sheet_id}/export?format=csv&gid={tab_id}')
Copy code
Expand
Share link
Reading a fixed-width file in DuckDBSQL
Fixed-width files can be little difficult but IF you consider each line of data as a string which you can attack with duckdb and substr() its not that difficult ;)
Execute this SQL
CREATE OR REPLACE TABLE example_table AS SELECT CAST(substr(line, 1, 4) AS INTEGER) AS activity_year, CAST(substr(line, 5, 10) AS VARCHAR(10)) AS lei_or_respondent_id, CAST(substr(line, 15, 1) AS CHAR(1)) AS agency_code, CAST(substr(line, 16, 1) AS CHAR(1)) AS loan_type, CAST(substr(line, 17, 1) AS CHAR(1)) AS loan_purpose, CAST(substr(line, 18, 1) AS CHAR(1)) AS occupancy_type FROM (SELECT column0 AS line FROM read_csv('data.txt', AUTO_DETECT=TRUE, skip=1));
Copy code
Expand
Share link
Kernel Density Estimation - Epanechnikov KernelSQL
KDE estimates the probability distribution of a random variable. The bandwidth parameter controls the width of the kernel, influencing how smooth or detailed the estimated density curve is. A smaller bandwidth results in a more detailed estimation, while a larger bandwidth produces a smoother curve.
Execute this SQL
CREATE OR REPLACE MACRO KDE_EPANECH(data, varnum, bandwidth, bin_count := 30) AS TABLE WITH hist AS ( FROM histogram_values(data, varnum, bin_count := bin_count) ) SELECT hist.bin, k.kernel_value FROM hist, LATERAL ( SELECT 100 * AVG( IF(abs(hist.bin - varnum) / bandwidth < 1, 0.75 * (1 - POW(abs(hist.bin - varnum) / bandwidth, 2)) / bandwidth, 0)) AS kernel_value FROM query_table(data) ) k ORDER BY hist.bin ; -- Following David Scott's rule, here is an estimate for bandwidth: CREATE OR REPLACE MACRO KDE_BANDWIDTH(data, varnum) AS ( FROM query_table(data) SELECT 1.06 * stddev(varnum) * pow(count(*), -1/5) ); -- Usage SET VARIABLE bandwidth = (SELECT KDE_BANDWIDTH(mydata, myvarnum)) ; FROM KDE_EPANECH(mydata, myvarnum, getvariable('bandwidth')) ; -- Inspiration and illustration: https://observablehq.com/@d3/kernel-density-estimation
Copy code
Expand
Share link