Open the HTML version of an EXPLAIN statementBash

DuckDB can output the plan in different ways, e.g. HTML or JSON. It's hard to easily view this when in the DuckDB CLI, but in non-interactive mode we can pass the result to a HTML file directly and open it.

Execute this Bash

duckdb my_complex_db.duckdb -c "                                                                                                                                                         
EXPLAIN  (ANALYZE, FORMAT html)
INSERT INTO big_data.complex_db
    (key_to_update, values, event_time)
SELECT
    key_to_update, values, event_time
FROM big_data.new_data
WHERE event_time >= '2026-01-01'::date
" > output.html && open output.html

Copy code

Dumky de Wilde

Expand

Share link


Fetch HackerNews Stories with read_json()

You can read JSON from any endpoint into DuckDB. This uses read_json() to fetch all HackerNews 'Who is hiring' stories and create a table with all stories and the comment IDs. It works by: - creating a list of URLs - reading the list and creating the table - querying the table

Create JSON URLsSQL

-- https://hn.algolia.com/api
with urls as (
  select
    'https://hn.algolia.com/api/v1/search_by_date?tags=author_whoishiring&page=' || page_num || '&query=hiring?&attributesToRetrieve=title,tags,comment_text,created_at,object_id,story_id,story_title,children,story_text' as url
  from unnest(range(0,10)) as t(page_num)
)

select array_to_json(list(url)) from urls

Copy code

Read JSON from API endpointSQL

create or replace table hn_stories as (
select 
  hit.title as story_title,
  hit.story_text,
  hit.created_at,
  hit.objectId,
  hit.children,

from read_json(
 -- your_url_list_from_above --
  ), unnest(hits) as t(hit)
)

Copy code

Dumky de Wilde

Expand

Share link


Multiple String ReplacementSQL

A macro to easily run multiple string replacements iteratively without function nesting.

Execute this SQL

CREATE MACRO replace_many (string, replacements) AS 
  list_reduce(
    replacements, 
    lambda acc, replacement: [  -- We wrap this in a list for type safety
        replace(acc[1], replacement[1], replacement[2])
    ], 
    [string]
  )[1];

-- Usage example
SELECT replace_many('Foo Bar', [['Foo', 'foofoo'], ['Bar', 'BarBar']]);

Copy code

Teague Sterling

Expand

Share link


List Hive PartitionsSQL

Given Hive partitioned data, list all the keys, values and files. Handy to get a summary of the structure of Hive data.

Execute this SQL

WITH hive_cols AS (
  PIVOT (
  SELECT
    file.split('/').list_filter(lambda x: x.contains('=')).list_transform(lambda x: x.split('=')).unnest() AS kv,
    file
  FROM glob('/path/to/hive/**/*.parquet')
  )
  ON kv[1]
  USING first(kv[2])
)
FROM hive_cols;

Copy code

Alejandro Wainzinger

Expand

Share link


Top 50 Most Queried Domains from DNS A Records in PCAP AnalysisSQL

This query parses a PCAP file to identify the top 50 most requested domains from DNS A record queries. It filters DNS packets, extracts domains from the _ws.col.info field, normalizes them to lowercase, counts occurrences, and sorts the results by frequency in descending order.

Execute this SQL

SELECT LOWER(REGEXP_EXTRACT("_ws.col.info", ' A ([^ ]+)', 1)) AS domain, COUNT(*) AS total
 FROM read_pcap('dump.pcap')
 WHERE "frame.protocols" LIKE '%dns%' 
          AND "_ws.col.info" LIKE '% A %' 
GROUP BY domain 
ORDER BY total DESC 
LIMIT 50;

Copy code

A
arrowsix

Expand

Share link


Use method chaining to filter tables

This is just a simple example of how to implement method chaining using `list` and `unnest` without any other extensions to provide a functional interface.

ExampleSQL

SELECT 
    list(t)
    .list_filter(lambda c: c.registration_date >= '2023-02-01')
    .list_transform(lambda c: struct_insert(c, 
        is_north_american:=c.country in ('USA', 'Mexico', 'Canada')))
    .list_filter(lambda c: c.is_north_american)
    .unnest(recursive:=true)
FROM 'customers.csv' AS t;

Copy code

customers.csvBash

customer_id,name,email,age,tier,country,registration_date,lifetime_value
1,Alice Johnson,alice.johnson@email.com,29,Premium,USA,2023-01-15,2847.50
2,Bob Smith,bob.smith@email.com,34,Standard,Canada,2023-02-03,1256.80
3,Charlie Brown,charlie.brown@email.com,42,Premium,UK,2023-01-22,3421.90
4,Diana Ross,diana.ross@email.com,38,Gold,USA,2023-03-10,5678.25
5,Eve Wilson,eve.wilson@email.com,26,Standard,Australia,2023-02-18,845.60
6,Frank Miller,frank.miller@email.com,31,Premium,Germany,2023-01-08,2156.75
7,Grace Lee,grace.lee@email.com,28,Standard,Canada,2023-03-05,967.40
8,Henry Davis,henry.davis@email.com,45,Gold,USA,2023-01-29,4532.80
9,Ivy Chen,ivy.chen@email.com,33,Premium,Singapore,2023-02-14,2890.45
10,Jack Wilson,jack.wilson@email.com,39,Standard,UK,2023-03-12,1134.20
11,Kate Brown,kate.brown@email.com,27,Premium,Australia,2023-01-18,2234.85
12,Leo Garcia,leo.garcia@email.com,36,Gold,Spain,2023-02-27,3789.60
13,Mia Taylor,mia.taylor@email.com,30,Standard,USA,2023-03-08,1089.95
14,Noah Kim,noah.kim@email.com,41,Premium,South Korea,2023-01-25,2567.30
15,Olivia Jones,olivia.jones@email.com,35,Gold,Canada,2023-02-11,4123.75
16,Paul Anderson,paul.anderson@email.com,32,Standard,UK,2023-03-15,876.40
17,Quinn Lee,quinn.lee@email.com,29,Premium,Singapore,2023-01-12,2678.95
18,Rachel Green,rachel.green@email.com,37,Gold,USA,2023-02-06,3956.80
19,Sam Wilson,sam.wilson@email.com,28,Standard,Australia,2023-03-01,1245.30
20,Tina Rodriguez,tina.rodriguez@email.com,44,Premium,Mexico,2023-01-31,2345.65

Copy code

Teague Sterling

Expand

Share link