Back to Code Snippets

Ryan Boyd

@ryan-1


Generate series of numbers in DuckDB

DuckDB has two common ways to generate a series of numbers: the range() function and the generate_series() function. They differ only in that the generate_series() function has a 'stop' value that's inclusive, while the 'stop' value of range() is exclusive.

generate_series with inclusive stop valueSQL

// generate_series(start, stop, step)
// get all even numbers, starting at 0 up to and including 100
SELECT * FROM generate_series(0,100,2);

Copy code

range with exclusive stop valueSQL

// range(start, stop, step)
// get all even numbers, starting at 0 up to and including 98
SELECT * FROM range(0,100,2);

Copy code

Generate range() as arraySQL

// Using range() as a column value instead of a table
// in your SQL statement will return an array of the
// numbers in the range
SELECT range(0,100,2)

Copy code

Ryan Boyd

Expand

Share link


Run SQL file in DuckDB CLI

The DuckDB CLI enables you to execute a set of SQL statements stored in a file or passed in via STDIN. There are a few variants of this capability demonstrated below.

Read and execute SQL using init CLI argument and prompt for additional SQL statementsBash

# executes SQL in create.sql, and then prompts for additional 
# SQL statements provided interactively. note that when specifying
# an init flag, the ~/.duckdbrc file is not read
duckdb -init create.sql

Copy code

Read and execute SQL using init CLI argument and immediately exit Bash

# executes SQL in create.sql and then immediately exits
# note that we're specifying a database name so that we
# can access the created data later. note that when specifying
# an init flag, the ~/.duckdbrc file is not read
duckdb -init create.sql -no-stdin mydb.ddb

Copy code

Pipe SQL file to the DuckDB CLI and exitBash

duckdb < create.sql mydb.ddb

Copy code

Ryan Boyd

Expand

Share link


Select a sample of rows from a tableSQL

DuckDB allows sampling of data in your tables using a several different statistical techniques, usually to increase performance. The default sampling method is used in this case- this is a bernoulli variant where each vector has a specified chance of being included in the result set.

Execute this SQL

-- uses system sampling to select 10% of the people rows
SELECT * FROM people USING SAMPLE 10%;

Copy code

Ryan Boyd

Expand

Share link


Using the EXCLUDE() function in DuckDB

Tired of copying/pasting many column names to select all columns except a handful? The EXCLUDE() function in SQL allows you to exclude specific columns from the result set. Together with COLUMNS() and REPLACE(), DuckDB provides an easy way to specify the data you want returned in the result set.

Select all columns except a fewSQL

// This will select all information about ducks
// except their height and weight
SELECT * EXCLUDE (height, weight) FROM ducks;

Copy code

Select all columns matching a RegexSQL

// Only return the columns matching a regular expression
// In this case, we're storing colors as individual columns
// named color_1, color_2, etc.  We want to return all
// species of ducks along with their specified colors.
SELECT species, COLUMNS('^color_\d+$') FROM ducks;

// alternatively, we could store the colors in a 
// column named colors which is a list type, allowing us
// to have multiple color values for each species.

Copy code

Ryan Boyd

Expand

Share link

Execute this SQL

COPY (SELECT * FROM read_csv_auto(
'http://raw.githubusercontent.com/fivethirtyeight/data/master/bechdel/movies.csv'))
TO 'movies.parquet' (FORMAT 'parquet');

Copy code

Ryan Boyd

Copy code

Expand

Share link