Quickly convert a CSV to Parquet, bash functionBash

Editor's note: DuckDB makes it easy to convert between a variety of popular data formats (CSV, JSON, Parquet, and more) using simple SQL statements. It's also easy to execute these statements from a bash shell so you have them ready to go.

Mehdi Ouazza

Expand

Share link

Query the Output of Another ProcessBash

Editor's note: if you're executing command-line interfaces that output JSON, CSV or other common formats, DuckDB enables you to do ad hoc queries on the results using SQL.

Mark Roddy

Expand

Share link

Filter column names using a patternSQL

Editor's note: DuckDB aims to make SQL even easier, while supporting standards whenever possible. When you have extremely wide tables, it's often helpful to return only columns matching a regex, and COLUMNS() does exactly that. With the EXCLUDE() and REPLACE() functions you get even more simplicity.

Octavian Zarzu

Expand

Share link

Select a sample of rows from a tableSQL

DuckDB allows sampling of data in your tables using a several different statistical techniques, usually to increase performance. The default sampling method is used in this case- this is a bernoulli variant where each vector has a specified chance of being included in the result set.

Ryan Boyd

Expand

Share link

Parse a File in an Unsupported FormatSQL

Editor's note: as data engineers, we're often burdened with data that's not in a standard format. Using DuckDB's basic string functions, advanced regex functions, list functions and the CSV parser, you can parse data of arbitrary formats.

Mark Roddy

Expand

Share link

Select all columns except a fewSQL

Editor's note: tired of copying/pasting many column names to select all columns except a handful? The EXCLUDE() function allows you to exclude specific columns from the result set. Together with COLUMNS() and REPLACE() it provides an easy way to specific the data you want returned.

Ryan Boyd

Expand

Share link

SUMMARIZESQL

Editor's note: the SUMMARIZE() function allows you to quickly understand your data. If you want to understand a little more about how it works under the hood, see Hamilton's other snippet on building your own SUMMARIZE() capabilities using built-in analytics functions.

Carlo Piovesan

Expand

Share link

Download CSV and convert to ParquetSQL

Ryan Boyd

Copy code

Expand

Share link