Back to Code Snippets
Tobias Müller
@tobilg-1
Create partitioned Parquet files from a remote CSV sourceSQL
Editor's note: DuckDB can create partitioned Parquet files - allowing you to store your data in partitions (eg orders for specific dates, traffic from specific IPs, etc) based on predictable filenames. This allows for more performant queries from cloud storage as only the needed files are retrieved.
Execute this SQL
-- Read from a remote CSV file, and write partitioned Parquet files to local target -- Queries like this are commonly used in Data Lakes COPY (SELECT cloud_provider, cidr_block, ip_address, ip_address_mask, ip_address_cnt, region from read_csv_auto('https://raw.githubusercontent.com/tobilg/public-cloud-provider-ip-ranges/main/data/providers/all.csv')) TO '/tmp/ip-ranges' (FORMAT PARQUET, PARTITION_BY cloud_provider);
Copy code
Tobias Müller
Expand
Share link