Aggregate rows into a sorted list.SQL
Execute this SQL
-- list and array_agg take in their own ORDER BY clause, so that you -- can sort the aggregate. The statements order by cannot be used -- as the columns that are used for sorting then need to be a grouping -- key and cannot be used in the aggregate SELECT name, -- Order the aggregated list by another column from line_items list(item_name ORDER BY pos ASC) items FROM orders JOIN line_items ON order_id = id GROUP BY ALL -- Order by grouping keys is ofc possible ORDER BY name;
Copy code
Michael Simons
Copy code
Expand
Share link
if you were to export a table as a csv, this query should estimate the size (minus the header)SQL
Execute this SQL
with bits as ( select -- add 8 bits to the end to account for the delimiter bit_length(columns(*)) + 8 from <TABLE> ), -- aggregate all columns bits_agg as ( select sum(columns(*)) from bits ), -- unpivot a wide single row of aggs to single column bits_col as ( unpivot bits_agg on columns(*) ) -- add them all up & convert to mb select sum(value) / (8 * 1024 ** 2) as mb from bits_col
Copy code
Hamilton Ulmer
Copy code
Expand
Share link
Convert CSV to Parquet and amend the schemaSQL
Execute this SQL
-- This selects only specific fields from the CSV -- It also renames one of the fields in the target schema COPY (SELECT col1, col2, col3 AS new_col_name FROM read_csv('~/data/source.csv',AUTO_DETECT=TRUE)) TO '~/data/target.parquet' (FORMAT 'PARQUET', CODEC 'ZSTD');
Copy code
Robin Moffatt
Copy code
Expand
Share link
Remove duplicatesSQL
Execute this SQL
/* removes duplicate rows at the order_id level */ SELECT * FROM orders QUALIFY row_number() over (partition by order_id order by created_at) = 1
Copy code
Octavian Zarzu
Copy code
Expand
Share link
Creating parameterized views with TVFs macros (Table-Valued functions)SQL
Execute this SQL
-- create macro CREATE OR REPLACE MACRO udf_products_in_year (v_year, v_category) AS TABLE SELECT name, category, created_at FROM products WHERE category = v_category AND year(created_at) = v_year; -- select using the macro as you would do from a table SELECT * FROM udf_products_in_year (2020, 'Home and Garden'); | Copper Light | Home and Garden | 2020-04-05 00:00:00.000 | | Pink Armchair | Home and Garden | 2020-06-23 00:00:00.000 | -- input ddl and data CREATE TABLE products ( name varchar, category varchar, created_at timestamp ); INSERT INTO products VALUES ('Cream Sofa', 'Home and Garden', '2019-03-14'), ('Copper Light', 'Home and Garden', '2020-04-05'), ('Pink Armchair', 'Home and Garden', '2020-06-23');
Copy code
Octavian Zarzu
Copy code
Expand
Share link
Split a String and cast all elements to a desired datatypeSQL
Execute this SQL
-- Here we make use of list_transform and -- the fact that DuckDB knows lambda expressions SELECT list_transform(split('01:08:22', ':'), x -> CAST (x AS INTEGER));
Copy code
Michael Simons
Copy code
Expand
Share link
Convert EPOC integer field to TimestampSQL
Editor's note: date/time conversion can be painful in SQL and in many programming languages. Simon shows you how to handle it in DuckDB here, with timezone values. There are other timestamp without timezone functions in the DuckDB docs.
Execute this SQL
select ('EPOCH'::TIMESTAMP + INTERVAL (epoc_field::INT) seconds)::TIMESTAMPTZ as epoc_tz from ( -- epoc column - number of seconds since 1 Jan 1970 select 1682481422 as epoc_field ) ;
Copy code
Simon Aubury
Expand
Share link
Convert CSV to Parquet and provide schema to useBash
Editor's note: while there are other snippets showing file conversion, Parth's shows you how to convert from CSV to Parquet files using DuckDB with specification of the entire schema (columns) and compression codec.
Execute this Bash
duckdb -c "COPY (SELECT * FROM read_csv('pageviews-sanitized-20230101-000000.csv', delim=' ', header=False, columns={'domain_code': 'VARCHAR', 'page_title': 'VARCHAR', 'count_views': 'UINTEGER', 'total_response_size': 'UINTEGER'})) TO 'pageviews-sanitized-20230101-000000.parquet' (FORMAT 'PARQUET', CODEC 'zstd')"
Copy code
Parth Patil
Expand
Share link