Top 25 Songs at Number 1, Billboard Hot 100SQL
Editor's note: do you like looking at data around the top music, including when an album debuted and when it appeared on the Billboard Hot 100 list? David has you covered with some simple analytical SQL, along with a MotherDuck share of the data.
Execute this SQL
-- Switch to the attached music database use music; -- Top 25 songs at number 1, Billboard Hot 100 SELECT song, performer, COUNT(*) AS weeks_at_no_1, chart_debut, MIN(chart_date) AS first_chart_date_as_no_1, MAX(chart_date) AS last_chart_date_as_no_1 FROM billboard_hot_100 WHERE chart_position = 1 GROUP BY ALL ORDER BY weeks_at_no_1 DESC, chart_debut LIMIT 25;
Copy code
Expand
Share link
UK Bank HolidaysSQL
The UK government publish the UK bank holidays as a JSON object at a public endpoint. DuckDB can read the data from the endpoint, unpivot it, and expand it in a few lines!
Execute this SQL
/* division, title, date, notes, bunting */ select division, unnest(events.events, recursive:=true) from ( unpivot 'https://www.gov.uk/bank-holidays.json' on "england-and-wales", "scotland", "northern-ireland" into name division value events )
Copy code
Expand
Share link
Pretty-printing floating-pont numbersSQL
By casting the result to a `DECIMAL` with a fixed number of digits after the decimal point, we can pretty-print it while keeping it as a numeric value.
Execute this SQL
COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x)) TO 'example.csv'; SELECT x::DECIMAL(15, 3) AS x FROM 'example.csv';
Copy code
Expand
Share link
Copying the schema of a tableSQL
We can use `LIMIT` to copy the schema of an existing table without repeating the specification of its column names/types.
Execute this SQL
COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x)) TO 'example.csv'; CREATE TABLE tbl AS FROM example LIMIT 0;
Copy code
Expand
Share link
Deterministically shuffling dataSQL
Sometimes it's beneficial to shuffle data sets in a deterministic, repeatable way. We can achieve this by using the `hash` function on the `rowid` pseudocolumn.
Execute this SQL
COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x)) TO 'example.csv'; CREATE OR REPLACE TABLE example AS FROM 'example.csv'; FROM example ORDER BY hash(rowid + 42);
Copy code
Expand
Share link
Updating CSV files in-placeSQL
DuckDB allows updating CSV files in-place.
Execute this SQL
COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x)) TO 'example.csv'; COPY (SELECT s FROM 'example.csv') TO 'example.csv';
Copy code
Expand
Share link
Query from Google SheetsSQL
Sometimes you have data over in Google Sheets and you want quickly and easily load it into DuckDB. With read_csv and the Google Sheets 'export to csv', its just one line of code.
Execute this SQL
-- you will need to create a sharelink for your sheet. -- note the tab id (gid) and sheet id (in the url) from read_csv('https://docs.google.com/spreadsheets/d/{sheet_id}/export?format=csv&gid={tab_id}')
Copy code
Expand
Share link
Kernel Density Estimation - Epanechnikov KernelSQL
KDE estimates the probability distribution of a random variable. The bandwidth parameter controls the width of the kernel, influencing how smooth or detailed the estimated density curve is. A smaller bandwidth results in a more detailed estimation, while a larger bandwidth produces a smoother curve.
Execute this SQL
CREATE OR REPLACE MACRO KDE_EPANECH(data, varnum, bandwidth, bin_count := 30) AS TABLE WITH hist AS ( FROM histogram_values(data, varnum, bin_count := bin_count) ) SELECT hist.bin, k.kernel_value FROM hist, LATERAL ( SELECT 100 * AVG( IF(abs(hist.bin - varnum) / bandwidth < 1, 0.75 * (1 - POW(abs(hist.bin - varnum) / bandwidth, 2)) / bandwidth, 0)) AS kernel_value FROM query_table(data) ) k ORDER BY hist.bin ; -- Following David Scott's rule, here is an estimate for bandwidth: CREATE OR REPLACE MACRO KDE_BANDWIDTH(data, varnum) AS ( FROM query_table(data) SELECT 1.06 * stddev(varnum) * pow(count(*), -1/5) ); -- Usage SET VARIABLE bandwidth = (SELECT KDE_BANDWIDTH(mydata, myvarnum)) ; FROM KDE_EPANECH(mydata, myvarnum, getvariable('bandwidth')) ; -- Inspiration and illustration: https://observablehq.com/@d3/kernel-density-estimation
Copy code
Expand
Share link