read_dsv() -> Parse properly separated CSV files
I tend to prefer using the ASCII unit (\x1f) and group separator (\x1e) as resp. column and line delimiters in CSVs (which technically no longer makes them a CSV). The read_csv function doesn't seem to want to play nice with these, so here's my attempt at a workaround.
Marco definitionSQL
-- For more info on DSVs (I'm not the author): https://matthodges.com/posts/2024-08-12-csv-bad-dsv-good/ CREATE OR REPLACE MACRO read_dsv(path_spec) AS TABLE ( with _lines as ( select filename ,regexp_split_to_array(content, '\x1e') as content from read_text(path_spec ) ) , _cleanup as ( select filename ,regexp_split_to_array(content[1],'\x1f') as header ,generate_subscripts(content[2:],1) as linenum ,unnest((content[2:]).list_filter(x -> trim(x) != '').list_transform(x -> x.regexp_split_to_array('\x1f'))) as line from _lines ) select filename ,linenum ,unnest(map_entries(map(header, line)), recursive := true) as kv from _cleanup );
Copy code
UsageSQL
-- You can use the same path specification as you would with read_text or read_csv, this includes globbing. -- Trying to include the pivot statement in the macro isn't possible, as you then have to explicitly define the column values (which defeats the purpose of this implementation) pivot read_dsv("C:\Temp\csv\*.csv") on key using first(value) group by filename, linenum order by filename, linenum
Copy code
Expand
Share link
Read Apache Iceberg to Google SheetsSQL
Sometimes you just need to get an Apache Iceberg table into Google Sheets for further analysis. The 'gsheet_id' can be found in the URL of your Google Sheet, and writes to the sheet with gid=0.
Execute this SQL
-- get iceberg extension INSTALL iceberg; LOAD iceberg; -- get gsheets extension INSTALL gsheets FROM community; LOAD gsheets; -- authenticate to google sheets CREATE SECRET (TYPE gsheet); -- copy the iceberg data to your google sheet! COPY (from iceberg_scan('s3://my-bucket/iceberg_table')) TO ‘gsheet_id’ (FORMAT gsheet);
Copy code
Expand
Share link
Label columns based on source tableSQL
Commonly, tables that are joined together have overlapping column names. This snippet will rename all columns to have a prefix based on the source table. No more duplicate names! This is similar to the Pandas join feature of lsuffix and rsuffix.
Execute this SQL
SELECT COLUMNS(t1.*) AS 't1_\0', COLUMNS(t2.*) AS 't2_\0' FROM range(10) t1 JOIN range(10) t2 ON t1.range = t2.range
Copy code
Expand
Share link
Generate series of numbers in DuckDB
DuckDB has two common ways to generate a series of numbers: the range() function and the generate_series() function. They differ only in that the generate_series() function has a 'stop' value that's inclusive, while the 'stop' value of range() is exclusive.
generate_series with inclusive stop valueSQL
// generate_series(start, stop, step) // get all even numbers, starting at 0 up to and including 100 SELECT * FROM generate_series(0,100,2);
Copy code
range with exclusive stop valueSQL
// range(start, stop, step) // get all even numbers, starting at 0 up to and including 98 SELECT * FROM range(0,100,2);
Copy code
Generate range() as arraySQL
// Using range() as a column value instead of a table // in your SQL statement will return an array of the // numbers in the range SELECT range(0,100,2)
Copy code
Expand
Share link
Run SQL file in DuckDB CLI
The DuckDB CLI enables you to execute a set of SQL statements stored in a file or passed in via STDIN. There are a few variants of this capability demonstrated below.
Read and execute SQL using init CLI argument and prompt for additional SQL statementsBash
# executes SQL in create.sql, and then prompts for additional # SQL statements provided interactively. note that when specifying # an init flag, the ~/.duckdbrc file is not read duckdb -init create.sql
Copy code
Read and execute SQL using init CLI argument and immediately exit Bash
# executes SQL in create.sql and then immediately exits # note that we're specifying a database name so that we # can access the created data later. note that when specifying # an init flag, the ~/.duckdbrc file is not read duckdb -init create.sql -no-stdin mydb.ddb
Copy code
Pipe SQL file to the DuckDB CLI and exitBash
duckdb < create.sql mydb.ddb
Copy code
Expand
Share link