Unnest JSON Array into Rows (pseudo-json_each)SQL

Editor's note: as APIs and applications increasingly represent tabular data in JSON format, we sometimes want the comfort of seeing it as rows, queryable in SQL. DuckDB has a great extension for parsing JSON, including the ability to convert into a DuckDB list which can then be unnested into rows.

Execute this SQL

select 
    -- Split DuckDB list into multiple rows
    unnest(
        -- Convert JSON array into DuckDB list of JSON elements
        from_json(
            json('[{"col1":1, "col2":2}, {"col1":10, "col2":20}]'),
            '["JSON"]'
        )
    );
/* 
Returns:
{"col1":1,"col2":2}
{"col1":10,"col2":20}
*/

Copy code

Alex Monahan

Expand

Share link


DuckDB in Action: Some neat DuckDB specific SQL extension

Editor's note: DuckDB strives to make it easy to write SQL, even when it requires introducing non-standard syntax. See the great blog posts by Alex Monahan or explore the Manning "DuckDB in Action" book by the author of this snippet.

Attach and select MotherDuck database

Data shared/available on MotherDuck

ATTACH 'md:_share/duckdb_in_action_ch3_4/d0c08584-1d33-491c-8db7-cf9c6910eceb' AS duckdb_in_action_ch3_4;
USE duckdb_in_action_ch3_4;

Copy code

DuckDB specific extensions: Project all columns matching a patternSQL

SELECT COLUMNS('valid.*') FROM prices LIMIT 3;

Copy code

DuckDB specific extensions: Apply an aggregation to several columnsSQL

SELECT max(COLUMNS('valid.*')) FROM prices;

Copy code

DuckDB specific extensions: Apply one condition to many columnsSQL

FROM prices
WHERE COLUMNS('valid.*') BETWEEN '2020-01-01' AND '2021-01-01';

Copy code

Michael Simons

Expand

Share link


Query Yahoo Finance data with Scrooge extensionSQL

Editor's note: The Scrooge McDuck extension provides common aggregation functions and data scanners used for financial data. This example grabs stock quotes (or the S&P 500 in this case) on specific dates for analysis in DuckDB.

Execute this SQL

-- Install httpfs extension
INSTALL httpfs;
LOAD httpfs;
-- Install Scrooge extension https://github.com/pdet/Scrooge-McDuck
-- NOTE: You need to start DuckDB with `-unsigned` flag to authorize to install & load of 3rd party extension
SET custom_extension_repository='scrooge-duckdb.s3.us-west-2.amazonaws.com/scrooge/s3_deploy';
INSTALL scrooge;
LOAD scrooge;
-- Example of query 
FROM yahoo_finance("^GSPC", "2023-02-01", "2023-02-04", "1d");

Copy code

Mehdi Ouazza

Expand

Share link


Dynamic SQL in DuckDB CLISQL

Editor's note: this gets a little recursive for all those SQL nerds out there. It shows you how to use SQL to generate SQL. It also shows you some of the dot commands for interacting with the DuckDB CLI: changing the output format and writing SQL to a file which is then read in for execution.

Execute this SQL

-- Use SQL to generate SQL statements.
-- Write them to a file, then execute that file.
-- This uses DuckDB CLI-specific features, so it only works in the CLI.
.mode list
.header off
.once my_dynamic_sql.sql
select 'CREATE TABLE t' || i || ' AS SELECT ' || i || ' AS my_column;' from generate_series(5) x(i);
.read my_dynamic_sql.sql

select table_name from duckdb_tables(); -- Check our work!

Copy code

Alex Monahan

Expand

Share link


Create partitioned Parquet files from a remote CSV sourceSQL

Editor's note: DuckDB can create partitioned Parquet files - allowing you to store your data in partitions (eg orders for specific dates, traffic from specific IPs, etc) based on predictable filenames. This allows for more performant queries from cloud storage as only the needed files are retrieved.

Execute this SQL

-- Read from a remote CSV file, and write partitioned Parquet files to local target
-- Queries like this are commonly used in Data Lakes
COPY (SELECT cloud_provider, cidr_block, ip_address, ip_address_mask, ip_address_cnt, region from read_csv_auto('https://raw.githubusercontent.com/tobilg/public-cloud-provider-ip-ranges/main/data/providers/all.csv')) TO '/tmp/ip-ranges' (FORMAT PARQUET, PARTITION_BY cloud_provider);

Copy code

Tobias Müller

Expand

Share link


Combine several parquet files into one and compress with zstdBash

Editor's note: another great example of using DuckDB's wide data format support to merge/combine multiple Parquet files. Parth also kindly shows you how to compress the resulting Parquet file with the zstd codec. DuckDB also supports gzip and snappy compression codecs.

Execute this Bash

duckdb -c "COPY (SELECT * FROM '~/Binance_Spot_Data/*.parquet') TO 'binance.parquet' (FORMAT 'PARQUET', CODEC 'zstd')"

Copy code

Parth Patil

Expand

Share link