Generate series of numbers in DuckDB

DuckDB has two common ways to generate a series of numbers: the range() function and the generate_series() function. They differ only in that the generate_series() function has a 'stop' value that's inclusive, while the 'stop' value of range() is exclusive.

generate_series with inclusive stop valueSQL

// generate_series(start, stop, step)
// get all even numbers, starting at 0 up to and including 100
SELECT * FROM generate_series(0,100,2);

Copy code

range with exclusive stop valueSQL

// range(start, stop, step)
// get all even numbers, starting at 0 up to and including 98
SELECT * FROM range(0,100,2);

Copy code

Generate range() as arraySQL

// Using range() as a column value instead of a table
// in your SQL statement will return an array of the
// numbers in the range
SELECT range(0,100,2)

Copy code

Ryan Boyd

Expand

Share link


Run SQL file in DuckDB CLI

The DuckDB CLI enables you to execute a set of SQL statements stored in a file or passed in via STDIN. There are a few variants of this capability demonstrated below.

Read and execute SQL using init CLI argument and prompt for additional SQL statementsBash

# executes SQL in create.sql, and then prompts for additional 
# SQL statements provided interactively. note that when specifying
# an init flag, the ~/.duckdbrc file is not read
duckdb -init create.sql

Copy code

Read and execute SQL using init CLI argument and immediately exit Bash

# executes SQL in create.sql and then immediately exits
# note that we're specifying a database name so that we
# can access the created data later. note that when specifying
# an init flag, the ~/.duckdbrc file is not read
duckdb -init create.sql -no-stdin mydb.ddb

Copy code

Pipe SQL file to the DuckDB CLI and exitBash

duckdb < create.sql mydb.ddb

Copy code

Ryan Boyd

Expand

Share link


Split strings into version numbers and order properSQL

This snippet takes version numbers that might contain arbitrary additional information and splits them into a list of integers, that one can sort like `sort -V` does.

Execute this SQL

SELECT v FROM VALUES ('1.10.0'), ('1.3.0'), ('1.13.0.RELEASE') f(v) ORDER BY list_transform(string_split(v, '.'), x -> TRY_CAST (x AS INTEGER)) ASC;

Copy code

Michael Simons

Expand

Share link


Replace string multiple timesSQL

`replace` target string multiple time with list of replacements.

Execute this SQL

SELECT reduce([['', content], ['foo','FOO'], ['bar', 'BAR']], (x, y, i)-> ['', replace(x[2], y[1], y[2])])
FROM posts;

Copy code

Katsuma Ito

Expand

Share link


SQL with PipesSQL

Pipes in SQL via psql extension created by Yannick Welsch

Execute this SQL

install psql from community;
load psql;

from 'https://sampledata.sidequery.ai/earthquakes.parquet' |>
limit 10000 |>
where status = 'Reviewed' |>
select
    data_type, 
    avg(depth), 
    avg(magnitudo)
group by all;

Copy code

Nico Ritschel

Expand

Share link


Kernel Density Estimation - Epanechnikov KernelSQL

KDE estimates the probability distribution of a random variable. The bandwidth parameter controls the width of the kernel, influencing how smooth or detailed the estimated density curve is. A smaller bandwidth results in a more detailed estimation, while a larger bandwidth produces a smoother curve.

Execute this SQL

CREATE OR REPLACE MACRO KDE_EPANECH(data, varnum, bandwidth, bin_count := 30) AS TABLE 
WITH hist AS (
	FROM histogram_values(data, varnum, bin_count := bin_count)
)
SELECT hist.bin, k.kernel_value
FROM hist, LATERAL (
    SELECT 100 * AVG(
    IF(abs(hist.bin - varnum) / bandwidth  < 1,
    0.75 * (1 - POW(abs(hist.bin - varnum) / bandwidth, 2)) / bandwidth,
    0)) AS kernel_value
	FROM query_table(data)
) k
ORDER BY hist.bin ;

-- Following David Scott's rule, here is an estimate for bandwidth:
CREATE OR REPLACE MACRO KDE_BANDWIDTH(data, varnum) AS (
	FROM query_table(data)
	SELECT 1.06 * stddev(varnum) * pow(count(*), -1/5)
);

-- Usage
SET VARIABLE bandwidth = (SELECT KDE_BANDWIDTH(mydata, myvarnum)) ;

FROM KDE_EPANECH(mydata, myvarnum, getvariable('bandwidth')) ;

-- Inspiration and illustration: https://observablehq.com/@d3/kernel-density-estimation

Copy code

Éric Mauvière

Expand

Share link