SQL with PipesSQL
Pipes in SQL via psql extension created by Yannick Welsch
Execute this SQL
install psql from community; load psql; from 'https://sampledata.sidequery.ai/earthquakes.parquet' |> limit 10000 |> where status = 'Reviewed' |> select data_type, avg(depth), avg(magnitudo) group by all;
Copy code
Expand
Share link
Split strings into version numbers and order properSQL
This snippet takes version numbers that might contain arbitrary additional information and splits them into a list of integers, that one can sort like `sort -V` does.
Execute this SQL
SELECT v FROM VALUES ('1.10.0'), ('1.3.0'), ('1.13.0.RELEASE') f(v) ORDER BY list_transform(string_split(v, '.'), x -> TRY_CAST (x AS INTEGER)) ASC;
Copy code
Expand
Share link
Generate series of numbers in DuckDB
DuckDB has two common ways to generate a series of numbers: the range() function and the generate_series() function. They differ only in that the generate_series() function has a 'stop' value that's inclusive, while the 'stop' value of range() is exclusive.
generate_series with inclusive stop valueSQL
// generate_series(start, stop, step) // get all even numbers, starting at 0 up to and including 100 SELECT * FROM generate_series(0,100,2);
Copy code
range with exclusive stop valueSQL
// range(start, stop, step) // get all even numbers, starting at 0 up to and including 98 SELECT * FROM range(0,100,2);
Copy code
Generate range() as arraySQL
// Using range() as a column value instead of a table // in your SQL statement will return an array of the // numbers in the range SELECT range(0,100,2)
Copy code
Expand
Share link
Label columns based on source tableSQL
Commonly, tables that are joined together have overlapping column names. This snippet will rename all columns to have a prefix based on the source table. No more duplicate names! This is similar to the Pandas join feature of lsuffix and rsuffix.
Execute this SQL
SELECT COLUMNS(t1.*) AS 't1_\0', COLUMNS(t2.*) AS 't2_\0' FROM range(10) t1 JOIN range(10) t2 ON t1.range = t2.range
Copy code
Expand
Share link
Execute this SQL
-- This selects only specific fields from the CSV -- It also renames one of the fields in the target schema COPY (SELECT col1, col2, col3 AS new_col_name FROM read_csv('~/data/source.csv',AUTO_DETECT=TRUE)) TO '~/data/target.parquet' (FORMAT 'PARQUET', CODEC 'ZSTD');
Copy code
Copy code
Expand
Share link
Execute this SQL
-- listing files FROM glob('dataset/*'); -- reading from files FROM 'dataset/*.parquet' LIMIT 100; -- reading parquet files metadata (min_value, max_value, null_count for each field in each file) FROM parquet_metadata('dataset/*.parquet'); -- convert files or export tables to parquet COPY (FROM tbl) TO 'file.parquet' (FORMAT 'parquet'); COPY 'data.csv' TO 'data.paruqet' (FORMAT 'parquet');
Copy code
Copy code
Expand
Share link