Updating CSV files in-placeSQL
DuckDB allows updating CSV files in-place.
Execute this SQL
COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x)) TO 'example.csv'; COPY (SELECT s FROM 'example.csv') TO 'example.csv';
Copy code
Expand
Share link
Specifying types in the CSV loaderSQL
We can specify types for the `read_csv` function on a per-column basis.
Execute this SQL
COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x)) TO 'example.csv'; CREATE OR REPLACE TABLE example AS FROM read_csv('example.csv', types = {'x': 'DECIMAL(15, 3)'});
Copy code
Expand
Share link
Deterministically shuffling dataSQL
Sometimes it's beneficial to shuffle data sets in a deterministic, repeatable way. We can achieve this by using the `hash` function on the `rowid` pseudocolumn.
Execute this SQL
COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x)) TO 'example.csv'; CREATE OR REPLACE TABLE example AS FROM 'example.csv'; FROM example ORDER BY hash(rowid + 42);
Copy code
Expand
Share link
Copying the schema of a tableSQL
We can use `LIMIT` to copy the schema of an existing table without repeating the specification of its column names/types.
Execute this SQL
COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x)) TO 'example.csv'; CREATE TABLE tbl AS FROM example LIMIT 0;
Copy code
Expand
Share link
Pretty-printing floating-pont numbersSQL
By casting the result to a `DECIMAL` with a fixed number of digits after the decimal point, we can pretty-print it while keeping it as a numeric value.
Execute this SQL
COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x)) TO 'example.csv'; SELECT x::DECIMAL(15, 3) AS x FROM 'example.csv';
Copy code
Expand
Share link
UK Bank HolidaysSQL
The UK government publish the UK bank holidays as a JSON object at a public endpoint. DuckDB can read the data from the endpoint, unpivot it, and expand it in a few lines!
Execute this SQL
/* division, title, date, notes, bunting */ select division, unnest(events.events, recursive:=true) from ( unpivot 'https://www.gov.uk/bank-holidays.json' on "england-and-wales", "scotland", "northern-ireland" into name division value events )
Copy code
Expand
Share link
Parquet to ExcelSQL
Editor's note: although not very intuitive, format for MS Excel XLSX files is provided by the spatial extension. While other snippets show you how to read Excel files, this shows you how to write data from a named Parquet file (or any DuckDB resultset) to Excel.
Execute this SQL
INSTALL spatial; LOAD spatial; COPY (SELECT * FROM 'source.parquet') TO 'target.XLSX' WITH (FORMAT GDAL, DRIVER 'XLSX');
Copy code
Expand
Share link
build-your-own SUMMARIZE with columns(*), unpivot, and struct.*SQL
Editor's note: Although SUMMARIZE() is fully supported in DuckDB, Hamilton shows you how to build this capability yourself using more primitive functions.
Execute this SQL
-- C = number of columns in TBL, S = number of aggregates -- ------------------------------------------------------ -- (1) use columns(*) to produce S aggregates -- for each column as a single struct, -- producing 1 struct row and C columns with aggregate as ( from sf311 select { name: first(alias(columns(*))), type: first(typeof(columns(*))), max: max(columns(*))::varchar, min: min(columns(*))::varchar, approx_unique: approx_count_distinct(columns(*)), nulls: count(*) - count(columns(*)) } ), -- (2) unpivot to get a result with C rows and 1 column of structs columns as (unpivot aggregate on columns(*)) -- (3) explode the struct to get a result with C rows and S columns select value.* from columns;
Copy code
Expand
Share link