Back to Code Snippets
Loading remote parquet files
Editor's note: DuckDB supports lambda functions for easy transformation. It also supports reading data over http/https [in addition from blob stores]. Simon's snippet shows you how to use both of those capabilities to dynamically generate filenames and query them when globbing is unsupported.
Load a single parquet fileSQL
-- Query a single remote parquet file -- Query the January data from NY taxi data set SELECT tpep_pickup_datetime, total_amount FROM read_parquet('https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-01.parquet') LIMIT 10;
Copy code
Dynamically iterate over remote parquet files - lambda function and list_transformSQL
-- Dynamically create the expected filenames to load -- eg, the files names `2023-01.parquet` to `2023-06.parquet` -- a lambda function can use the result of the generate_series(1, 6) -- and the format command can the `{:02d}` format specifier -- to left-pad filenames with 0's SELECT filename, min(tpep_pickup_datetime), max(tpep_pickup_datetime), avg(total_amount) FROM read_parquet( list_transform( generate_series(1, 6), n -> format('https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-{:02d}.parquet', n) ), filename=true ) GROUP BY filename ORDER BY filename;
Copy code
Simon Aubury
Expand
Share link