Run SQL file in DuckDB CLI
The DuckDB CLI enables you to execute a set of SQL statements stored in a file or passed in via STDIN. There are a few variants of this capability demonstrated below.
Read and execute SQL using init CLI argument and prompt for additional SQL statementsBash
# executes SQL in create.sql, and then prompts for additional # SQL statements provided interactively. note that when specifying # an init flag, the ~/.duckdbrc file is not read duckdb -init create.sqlCopy code
Read and execute SQL using init CLI argument and immediately exit Bash
# executes SQL in create.sql and then immediately exits # note that we're specifying a database name so that we # can access the created data later. note that when specifying # an init flag, the ~/.duckdbrc file is not read duckdb -init create.sql -no-stdin mydb.ddbCopy code
Pipe SQL file to the DuckDB CLI and exitBash
duckdb < create.sql mydb.ddbCopy code
Expand
Share link
Split strings into version numbers and order properSQL
This snippet takes version numbers that might contain arbitrary additional information and splits them into a list of integers, that one can sort like `sort -V` does.
Execute this SQL
SELECT v FROM VALUES ('1.10.0'), ('1.3.0'), ('1.13.0.RELEASE') f(v) ORDER BY list_transform(string_split(v, '.'), x -> TRY_CAST (x AS INTEGER)) ASC;Copy code
Expand
Share link
Replace string multiple timesSQL
`replace` target string multiple time with list of replacements.
Execute this SQL
SELECT reduce([['', content], ['foo','FOO'], ['bar', 'BAR']], (x, y, i)-> ['', replace(x[2], y[1], y[2])]) FROM posts;Copy code
Expand
Share link
SQL with PipesSQL
Pipes in SQL via psql extension created by Yannick Welsch
Execute this SQL
install psql from community; load psql; from 'https://sampledata.sidequery.ai/earthquakes.parquet' |> limit 10000 |> where status = 'Reviewed' |> select data_type, avg(depth), avg(magnitudo) group by all;Copy code
Expand
Share link
Kernel Density Estimation - Epanechnikov KernelSQL
KDE estimates the probability distribution of a random variable. The bandwidth parameter controls the width of the kernel, influencing how smooth or detailed the estimated density curve is. A smaller bandwidth results in a more detailed estimation, while a larger bandwidth produces a smoother curve.
Execute this SQL
CREATE OR REPLACE MACRO KDE_EPANECH(data, varnum, bandwidth, bin_count := 30) AS TABLE WITH hist AS ( FROM histogram_values(data, varnum, bin_count := bin_count) ) SELECT hist.bin, k.kernel_value FROM hist, LATERAL ( SELECT 100 * AVG( IF(abs(hist.bin - varnum) / bandwidth < 1, 0.75 * (1 - POW(abs(hist.bin - varnum) / bandwidth, 2)) / bandwidth, 0)) AS kernel_value FROM query_table(data) ) k ORDER BY hist.bin ; -- Following David Scott's rule, here is an estimate for bandwidth: CREATE OR REPLACE MACRO KDE_BANDWIDTH(data, varnum) AS ( FROM query_table(data) SELECT 1.06 * stddev(varnum) * pow(count(*), -1/5) ); -- Usage SET VARIABLE bandwidth = (SELECT KDE_BANDWIDTH(mydata, myvarnum)) ; FROM KDE_EPANECH(mydata, myvarnum, getvariable('bandwidth')) ; -- Inspiration and illustration: https://observablehq.com/@d3/kernel-density-estimationCopy code
Expand
Share link
Load content from Strapi CMS REST API to Parquet fileSQL
A nice trick to load data from Strapi CMS. The Api Token can be obtained in the Settings menu of Strapi. A nice way to let users maintain reference data using the CMS and be able to use it directly in DuckDB. Should work for both Strapi self-hosted and cloud.
Execute this SQL
INSTALL httpfs; LOAD httpfs; CREATE SECRET http ( TYPE HTTP, EXTRA_HTTP_HEADERS MAP { 'Authorization': 'Bearer [Api Token]' } ); -- Replace strapi.mydomain.com with your Strapi URL and replace `pets` with your content type COPY (SELECT unnest(data, recursive:= true) FROM read_json_auto('https://strapi.mydomain.com/api/pets')) TO 'pets.parquet';Copy code
Expand
Share link