Updating CSV files in-placeSQL

DuckDB allows updating CSV files in-place.

Execute this SQL

COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x))
TO 'example.csv';

COPY (SELECT s FROM 'example.csv') TO 'example.csv';

Copy code

Gábor Szárnyas

Expand

Share link


Specifying types in the CSV loaderSQL

We can specify types for the `read_csv` function on a per-column basis.

Execute this SQL

COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x))
TO 'example.csv';

CREATE OR REPLACE TABLE example AS
    FROM read_csv('example.csv', types = {'x': 'DECIMAL(15, 3)'});

Copy code

Gábor Szárnyas

Expand

Share link


Deterministically shuffling dataSQL

Sometimes it's beneficial to shuffle data sets in a deterministic, repeatable way. We can achieve this by using the `hash` function on the `rowid` pseudocolumn.

Execute this SQL

COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x))
TO 'example.csv';

CREATE OR REPLACE TABLE example AS FROM 'example.csv';
FROM example ORDER BY hash(rowid + 42);

Copy code

Gábor Szárnyas

Expand

Share link


Copying the schema of a tableSQL

We can use `LIMIT` to copy the schema of an existing table without repeating the specification of its column names/types.

Execute this SQL

COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x))
TO 'example.csv';

CREATE TABLE tbl AS
    FROM example
    LIMIT 0;

Copy code

Gábor Szárnyas

Expand

Share link


Pretty-printing floating-pont numbersSQL

By casting the result to a `DECIMAL` with a fixed number of digits after the decimal point, we can pretty-print it while keeping it as a numeric value.

Execute this SQL

COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x))
TO 'example.csv';

SELECT x::DECIMAL(15, 3) AS x
FROM 'example.csv';

Copy code

Gábor Szárnyas

Expand

Share link


UK Bank HolidaysSQL

The UK government publish the UK bank holidays as a JSON object at a public endpoint. DuckDB can read the data from the endpoint, unpivot it, and expand it in a few lines!

Execute this SQL

/* division, title, date, notes, bunting */
select
    division,
    unnest(events.events, recursive:=true)
from (
    unpivot 'https://www.gov.uk/bank-holidays.json'
    on "england-and-wales", "scotland", "northern-ireland"
    into name division value events
)

Copy code

Bill Wallis

Expand

Share link


Parquet to ExcelSQL

Editor's note: although not very intuitive, format for MS Excel XLSX files is provided by the spatial extension. While other snippets show you how to read Excel files, this shows you how to write data from a named Parquet file (or any DuckDB resultset) to Excel.

Execute this SQL

INSTALL spatial;
LOAD spatial;
COPY (SELECT * FROM 'source.parquet') TO 'target.XLSX' WITH (FORMAT GDAL, DRIVER 'XLSX');

Copy code

Simon Müller

Expand

Share link


build-your-own SUMMARIZE with columns(*), unpivot, and struct.*SQL

Editor's note: Although SUMMARIZE() is fully supported in DuckDB, Hamilton shows you how to build this capability yourself using more primitive functions.

Execute this SQL

-- C = number of columns in TBL, S = number of aggregates
-- ------------------------------------------------------
-- (1) use columns(*) to produce S aggregates
-- for each column as a single struct,
-- producing 1 struct row and C columns
with aggregate as (
  from sf311 select 
    {
      name: first(alias(columns(*))),
      type: first(typeof(columns(*))),
      max: max(columns(*))::varchar,
      min: min(columns(*))::varchar,
      approx_unique: approx_count_distinct(columns(*)),
      nulls: count(*) - count(columns(*))
    }
),
-- (2) unpivot to get a result with C rows and 1 column of structs
columns as (unpivot aggregate on columns(*))
-- (3) explode the struct to get a result with C rows and S columns
select value.* from columns;

Copy code

Hamilton Ulmer

Expand

Share link