Create partitioned Parquet files from a remote CSV sourceSQL

Editor's note: DuckDB can create partitioned Parquet files - allowing you to store your data in partitions (eg orders for specific dates, traffic from specific IPs, etc) based on predictable filenames. This allows for more performant queries from cloud storage as only the needed files are retrieved.

Execute this SQL

-- Read from a remote CSV file, and write partitioned Parquet files to local target
-- Queries like this are commonly used in Data Lakes
COPY (SELECT cloud_provider, cidr_block, ip_address, ip_address_mask, ip_address_cnt, region from read_csv_auto('https://raw.githubusercontent.com/tobilg/public-cloud-provider-ip-ranges/main/data/providers/all.csv')) TO '/tmp/ip-ranges' (FORMAT PARQUET, PARTITION_BY cloud_provider);
Copy code

Tobias Müller

Edited 02/23/24