Aggregate rows into a sorted list.SQL

Execute this SQL

-- list and array_agg take in their own ORDER BY clause, so that you
-- can sort the aggregate. The statements order by cannot be used 
-- as the columns that are used for sorting then need to be a grouping
-- key and cannot be used in the aggregate

SELECT name, 
       -- Order the aggregated list by another column from line_items  
       list(item_name ORDER BY pos ASC) items 
FROM orders 
JOIN line_items ON order_id = id 
GROUP BY ALL
-- Order by grouping keys is ofc possible
ORDER BY name;

Copy code

Michael Simons

Copy code

Expand

Share link


if you were to export a table as a csv, this query should estimate the size (minus the header)SQL

Execute this SQL

with bits as (
    select
    -- add 8 bits to the end to account for the delimiter
    bit_length(columns(*))  + 8
    from <TABLE>
),
-- aggregate all columns
bits_agg as (
    select sum(columns(*)) from bits
),
-- unpivot a wide single row of aggs to single column
bits_col as (
    unpivot bits_agg on columns(*)
)
-- add them all up & convert to mb
select sum(value) / (8 * 1024 ** 2) as mb
from bits_col

Copy code

Hamilton Ulmer

Copy code

Expand

Share link


Convert CSV to Parquet and amend the schemaSQL

Execute this SQL

-- This selects only specific fields from the CSV
-- It also renames one of the fields in the target schema
COPY (SELECT col1, col2, col3 AS new_col_name
	    FROM read_csv('~/data/source.csv',AUTO_DETECT=TRUE))
  TO '~/data/target.parquet' (FORMAT 'PARQUET', CODEC 'ZSTD');

Copy code

Robin Moffatt

Copy code

Expand

Share link


Remove duplicatesSQL

Execute this SQL

/* removes duplicate rows at the order_id level */
SELECT * FROM orders
QUALIFY row_number() over (partition by order_id order by created_at) = 1

Copy code

Octavian Zarzu

Copy code

Expand

Share link


Creating parameterized views with TVFs macros (Table-Valued functions)SQL

Execute this SQL

-- create macro
CREATE OR REPLACE MACRO udf_products_in_year (v_year, v_category)
AS TABLE
SELECT 
	name, 
	category, 
	created_at
FROM products 
WHERE category = v_category
AND year(created_at) = v_year;

-- select using the macro as you would do from a table
SELECT *
FROM udf_products_in_year (2020, 'Home and Garden');

| Copper Light	| Home and Garden	| 2020-04-05 00:00:00.000 |
| Pink Armchair	| Home and Garden	| 2020-06-23 00:00:00.000 |

-- input ddl and data
CREATE TABLE products 
(
	name varchar,
	category varchar,
	created_at timestamp
);

INSERT INTO products
VALUES
('Cream Sofa', 'Home and Garden', '2019-03-14'),
('Copper Light', 'Home and Garden', '2020-04-05'),
('Pink Armchair', 'Home and Garden', '2020-06-23');

Copy code

Octavian Zarzu

Copy code

Expand

Share link


Split a String and cast all elements to a desired datatypeSQL

Execute this SQL

-- Here we make use of list_transform and 
-- the fact that DuckDB knows lambda expressions
SELECT list_transform(split('01:08:22', ':'), x -> CAST (x AS INTEGER));

Copy code

Michael Simons

Copy code

Expand

Share link


Convert EPOC integer field to TimestampSQL

Editor's note: date/time conversion can be painful in SQL and in many programming languages. Simon shows you how to handle it in DuckDB here, with timezone values. There are other timestamp without timezone functions in the DuckDB docs.

Execute this SQL

select ('EPOCH'::TIMESTAMP + INTERVAL (epoc_field::INT) seconds)::TIMESTAMPTZ  as epoc_tz
from
(
    -- epoc column - number of seconds since 1 Jan 1970
    select 1682481422 as epoc_field
)
;

Copy code

Simon Aubury

Expand

Share link


Convert CSV to Parquet and provide schema to useBash

Editor's note: while there are other snippets showing file conversion, Parth's shows you how to convert from CSV to Parquet files using DuckDB with specification of the entire schema (columns) and compression codec.

Execute this Bash

duckdb -c "COPY (SELECT * FROM read_csv('pageviews-sanitized-20230101-000000.csv', delim=' ', header=False, columns={'domain_code': 'VARCHAR', 'page_title': 'VARCHAR', 'count_views': 'UINTEGER', 'total_response_size': 'UINTEGER'})) TO 'pageviews-sanitized-20230101-000000.parquet' (FORMAT 'PARQUET', CODEC 'zstd')"

Copy code

Parth Patil

Expand

Share link