Back to Code Snippets
@ryan-1
Generate series of numbers in DuckDB
DuckDB has two common ways to generate a series of numbers: the range() function and the generate_series() function. They differ only in that the generate_series() function has a 'stop' value that's inclusive, while the 'stop' value of range() is exclusive.
generate_series with inclusive stop valueSQL
// generate_series(start, stop, step) // get all even numbers, starting at 0 up to and including 100 SELECT * FROM generate_series(0,100,2);
Copy code
range with exclusive stop valueSQL
// range(start, stop, step) // get all even numbers, starting at 0 up to and including 98 SELECT * FROM range(0,100,2);
Copy code
Generate range() as arraySQL
// Using range() as a column value instead of a table // in your SQL statement will return an array of the // numbers in the range SELECT range(0,100,2)
Copy code
Expand
Share link
Run SQL file in DuckDB CLI
The DuckDB CLI enables you to execute a set of SQL statements stored in a file or passed in via STDIN. There are a few variants of this capability demonstrated below.
Read and execute SQL using init CLI argument and prompt for additional SQL statementsBash
# executes SQL in create.sql, and then prompts for additional # SQL statements provided interactively. note that when specifying # an init flag, the ~/.duckdbrc file is not read duckdb -init create.sql
Copy code
Read and execute SQL using init CLI argument and immediately exit Bash
# executes SQL in create.sql and then immediately exits # note that we're specifying a database name so that we # can access the created data later. note that when specifying # an init flag, the ~/.duckdbrc file is not read duckdb -init create.sql -no-stdin mydb.ddb
Copy code
Pipe SQL file to the DuckDB CLI and exitBash
duckdb < create.sql mydb.ddb
Copy code
Expand
Share link
Select a sample of rows from a tableSQL
DuckDB allows sampling of data in your tables using a several different statistical techniques, usually to increase performance. The default sampling method is used in this case- this is a bernoulli variant where each vector has a specified chance of being included in the result set.
Execute this SQL
-- uses system sampling to select 10% of the people rows SELECT * FROM people USING SAMPLE 10%;
Copy code
Expand
Share link
Using the EXCLUDE() function in DuckDB
Tired of copying/pasting many column names to select all columns except a handful? The EXCLUDE() function in SQL allows you to exclude specific columns from the result set. Together with COLUMNS() and REPLACE(), DuckDB provides an easy way to specify the data you want returned in the result set.
Select all columns except a fewSQL
// This will select all information about ducks // except their height and weight SELECT * EXCLUDE (height, weight) FROM ducks;
Copy code
Select all columns matching a RegexSQL
// Only return the columns matching a regular expression // In this case, we're storing colors as individual columns // named color_1, color_2, etc. We want to return all // species of ducks along with their specified colors. SELECT species, COLUMNS('^color_\d+$') FROM ducks; // alternatively, we could store the colors in a // column named colors which is a list type, allowing us // to have multiple color values for each species.
Copy code
Expand
Share link
Execute this SQL
COPY (SELECT * FROM read_csv_auto( 'http://raw.githubusercontent.com/fivethirtyeight/data/master/bechdel/movies.csv')) TO 'movies.parquet' (FORMAT 'parquet');
Copy code
Copy code
Expand
Share link