Generate series of numbers in DuckDB
DuckDB has two common ways to generate a series of numbers: the range() function and the generate_series() function. They differ only in that the generate_series() function has a 'stop' value that's inclusive, while the 'stop' value of range() is exclusive.
generate_series with inclusive stop valueSQL
// generate_series(start, stop, step) // get all even numbers, starting at 0 up to and including 100 SELECT * FROM generate_series(0,100,2);
Copy code
range with exclusive stop valueSQL
// range(start, stop, step) // get all even numbers, starting at 0 up to and including 98 SELECT * FROM range(0,100,2);
Copy code
Generate range() as arraySQL
// Using range() as a column value instead of a table // in your SQL statement will return an array of the // numbers in the range SELECT range(0,100,2)
Copy code
Expand
Share link
Run SQL file in DuckDB CLI
The DuckDB CLI enables you to execute a set of SQL statements stored in a file or passed in via STDIN. There are a few variants of this capability demonstrated below.
Read and execute SQL using init CLI argument and prompt for additional SQL statementsBash
# executes SQL in create.sql, and then prompts for additional # SQL statements provided interactively. note that when specifying # an init flag, the ~/.duckdbrc file is not read duckdb -init create.sql
Copy code
Read and execute SQL using init CLI argument and immediately exit Bash
# executes SQL in create.sql and then immediately exits # note that we're specifying a database name so that we # can access the created data later. note that when specifying # an init flag, the ~/.duckdbrc file is not read duckdb -init create.sql -no-stdin mydb.ddb
Copy code
Pipe SQL file to the DuckDB CLI and exitBash
duckdb < create.sql mydb.ddb
Copy code
Expand
Share link
Split strings into version numbers and order properSQL
This snippet takes version numbers that might contain arbitrary additional information and splits them into a list of integers, that one can sort like `sort -V` does.
Execute this SQL
SELECT v FROM VALUES ('1.10.0'), ('1.3.0'), ('1.13.0.RELEASE') f(v) ORDER BY list_transform(string_split(v, '.'), x -> TRY_CAST (x AS INTEGER)) ASC;
Copy code
Expand
Share link
Replace string multiple timesSQL
`replace` target string multiple time with list of replacements.
Execute this SQL
SELECT reduce([['', content], ['foo','FOO'], ['bar', 'BAR']], (x, y, i)-> ['', replace(x[2], y[1], y[2])]) FROM posts;
Copy code
Expand
Share link
SQL with PipesSQL
Pipes in SQL via psql extension created by Yannick Welsch
Execute this SQL
install psql from community; load psql; from 'https://sampledata.sidequery.ai/earthquakes.parquet' |> limit 10000 |> where status = 'Reviewed' |> select data_type, avg(depth), avg(magnitudo) group by all;
Copy code
Expand
Share link
Kernel Density Estimation - Epanechnikov KernelSQL
KDE estimates the probability distribution of a random variable. The bandwidth parameter controls the width of the kernel, influencing how smooth or detailed the estimated density curve is. A smaller bandwidth results in a more detailed estimation, while a larger bandwidth produces a smoother curve.
Execute this SQL
CREATE OR REPLACE MACRO KDE_EPANECH(data, varnum, bandwidth, bin_count := 30) AS TABLE WITH hist AS ( FROM histogram_values(data, varnum, bin_count := bin_count) ) SELECT hist.bin, k.kernel_value FROM hist, LATERAL ( SELECT 100 * AVG( IF(abs(hist.bin - varnum) / bandwidth < 1, 0.75 * (1 - POW(abs(hist.bin - varnum) / bandwidth, 2)) / bandwidth, 0)) AS kernel_value FROM query_table(data) ) k ORDER BY hist.bin ; -- Following David Scott's rule, here is an estimate for bandwidth: CREATE OR REPLACE MACRO KDE_BANDWIDTH(data, varnum) AS ( FROM query_table(data) SELECT 1.06 * stddev(varnum) * pow(count(*), -1/5) ); -- Usage SET VARIABLE bandwidth = (SELECT KDE_BANDWIDTH(mydata, myvarnum)) ; FROM KDE_EPANECH(mydata, myvarnum, getvariable('bandwidth')) ; -- Inspiration and illustration: https://observablehq.com/@d3/kernel-density-estimation
Copy code
Expand
Share link