Back to Code Snippets

Parth Patil

@parthpatil-1


Convert CSV to Parquet and provide schema to useBash

Editor's note: while there are other snippets showing file conversion, Parth's shows you how to convert from CSV to Parquet files using DuckDB with specification of the entire schema (columns) and compression codec.

Execute this Bash

duckdb -c "COPY (SELECT * FROM read_csv('pageviews-sanitized-20230101-000000.csv', delim=' ', header=False, columns={'domain_code': 'VARCHAR', 'page_title': 'VARCHAR', 'count_views': 'UINTEGER', 'total_response_size': 'UINTEGER'})) TO 'pageviews-sanitized-20230101-000000.parquet' (FORMAT 'PARQUET', CODEC 'zstd')"

Copy code

Parth Patil

Expand

Share link


Combine several parquet files into one and compress with zstdBash

Editor's note: another great example of using DuckDB's wide data format support to merge/combine multiple Parquet files. Parth also kindly shows you how to compress the resulting Parquet file with the zstd codec. DuckDB also supports gzip and snappy compression codecs.

Execute this Bash

duckdb -c "COPY (SELECT * FROM '~/Binance_Spot_Data/*.parquet') TO 'binance.parquet' (FORMAT 'PARQUET', CODEC 'zstd')"

Copy code

Parth Patil

Expand

Share link