Back to Code Snippets

Hamilton Ulmer

@hamilton-1


build-your-own SUMMARIZE with columns(*), unpivot, and struct.*SQL

Editor's note: Although SUMMARIZE() is fully supported in DuckDB, Hamilton shows you how to build this capability yourself using more primitive functions.

Execute this SQL

-- C = number of columns in TBL, S = number of aggregates
-- ------------------------------------------------------
-- (1) use columns(*) to produce S aggregates
-- for each column as a single struct,
-- producing 1 struct row and C columns
with aggregate as (
  from sf311 select 
    {
      name: first(alias(columns(*))),
      type: first(typeof(columns(*))),
      max: max(columns(*))::varchar,
      min: min(columns(*))::varchar,
      approx_unique: approx_count_distinct(columns(*)),
      nulls: count(*) - count(columns(*))
    }
),
-- (2) unpivot to get a result with C rows and 1 column of structs
columns as (unpivot aggregate on columns(*))
-- (3) explode the struct to get a result with C rows and S columns
select value.* from columns;

Copy code

Hamilton Ulmer

Expand

Share link


result set → single row array of structsSQL

Execute this SQL

-- First, transform TBL (or any relation) 
-- into a single column of structs.
with structs (from TBL select TBL)
-- Then pack those structs into a list.
from structs select list(TBL) as ready_to_plot;

Copy code

Hamilton Ulmer

Copy code

Expand

Share link


if you were to export a table as a csv, this query should estimate the size (minus the header)SQL

Execute this SQL

with bits as (
    select
    -- add 8 bits to the end to account for the delimiter
    bit_length(columns(*))  + 8
    from <TABLE>
),
-- aggregate all columns
bits_agg as (
    select sum(columns(*)) from bits
),
-- unpivot a wide single row of aggs to single column
bits_col as (
    unpivot bits_agg on columns(*)
)
-- add them all up & convert to mb
select sum(value) / (8 * 1024 ** 2) as mb
from bits_col

Copy code

Hamilton Ulmer

Copy code

Expand

Share link