SQL, Python & More for DuckDB - Page 10

‌

Compute the length of a GPX track in metersSQL

Execute this SQL

LOAD spatial;

-- Pick for example any Garmin Connect or Strava export as GPX file
-- https://en.wikipedia.org/wiki/GPS_Exchange_Format

SELECT round(
         -- Compute the length of the geomentry in the units of the
         -- reference system being used
         ST_Length(
           -- Transform from standard GPS WGS-84 into a reference system 
           -- that uses meters. EPSG:25832 stands for ETRS89 / UTM zone 32N,
           -- Europe between 6°E and 12°E
           ST_transform(
             -- Fix the order of coordinates 
             -- (GXP read as long/lat, we need lat/long)
             ST_FlipCoordinates(
               -- Turn the WKT into a DuckDB Geometry
               ST_GeomFromWKB(wkb_geometry)
             ),
            'EPSG:4326',
            'EPSG:25832'
           )
       ) /1000, 2) AS 'Distance (km)'
FROM st_read('activity_11501782487.gpx', layer='tracks');
Copy code

Michael Simons

Created 07/08/23

‌

Query parquet files storage on S3 compatible storage providerSQL

Execute this SQL

-- Using  Scaleway as an example, where first 75GB is free
-- https://www.scaleway.com/en/object-storage/
-- This also works for Google Storage, Digital Ocean Spaces, 
-- Wasabi and so on

-- set these like you do for AWS
SET s3_access_key_id='MY_KEY'
SET s3_secret_access_key='MY_SECRET'

-- set the endpoint and region to override the default of 
-- using AWS endpoints
SET s3_endpoint='s3.nl-ams.scw.cloud'
SET s3_region="nl-ams"

SELECT
  -- format a large number with helpful commas 
  -- for readability
  format('{:,}', count(*)) as "total rows in file" 
FROM "s3://<BUCKET_NAME>/<FILE_NAME>.zst.parquet"
Copy code

Chris Adams

Created 07/14/23

‌

Split a String and cast all elements to a desired datatypeSQL

Execute this SQL

-- Here we make use of list_transform and 
-- the fact that DuckDB knows lambda expressions
SELECT list_transform(split('01:08:22', ':'), x -> CAST (x AS INTEGER));
Copy code

Michael Simons

Created 08/22/23

‌

if you were to export a table as a csv, this query should estimate the size (minus the header)SQL

Execute this SQL

with bits as (
    select
    -- add 8 bits to the end to account for the delimiter
    bit_length(columns(*))  + 8
    from <TABLE>
),
-- aggregate all columns
bits_agg as (
    select sum(columns(*)) from bits
),
-- unpivot a wide single row of aggs to single column
bits_col as (
    unpivot bits_agg on columns(*)
)
-- add them all up & convert to mb
select sum(value) / (8 * 1024 ** 2) as mb
from bits_col
Copy code

Hamilton Ulmer

Created 11/15/23

‌

Convert EPOC integer field to TimestampSQL

Editor's note: date/time conversion can be painful in SQL and in many programming languages. Simon shows you how to handle it in DuckDB here, with timezone values. There are other timestamp without timezone functions in the DuckDB docs.

Execute this SQL

select ('EPOCH'::TIMESTAMP + INTERVAL (epoc_field::INT) seconds)::TIMESTAMPTZ  as epoc_tz
from
(
    -- epoc column - number of seconds since 1 Jan 1970
    select 1682481422 as epoc_field
)
;
Copy code

Simon Aubury

Edited 02/23/24

‌

Calculate MD5 of a text SQL

Execute this SQL

--Below SQL to generate MD5 of a string

select md5('test')
Copy code

Created 12/26/23

‌

StackOverflow Analytics

Editor's note: Michael shares Stackoverflow data in MotherDuck as part of this snippet as well as typical aggregate analytics on the data. There are additional sample data sets attached by default in MotherDuck as the 'sample_data' share.

Attach and select MotherDuck database
Data shared/available on MotherDuck

ATTACH 'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5' AS stackoverflow;
USE stackoverflow;
Copy code

Which 5 questions have the most comments, what is the post title and comment countSQL

SELECT Title, CommentCount
FROM posts
WHERE PostTypeId = 1
ORDER BY CommentCount DESC
LIMIT 5;
Copy code

User reputation and reputation rate per daySQL

SELECT name, reputation, 
       round(reputation/day(today()-createdAt)) as rate, 
       day(today()-createdAt) as days, 
       createdAt
FROM users WHERE reputation > 1000000 ORDER BY rate DESC;
Copy code

Reputation rate as bar chart with CTESQL

WITH top_users as (
  SELECT name, reputation, 
       round(reputation/day(today()-createdAt)) as rate, day(today()-createdAt) as days, 
       createdAt
       FROM users WHERE reputation > 1000000
)
SELECT name, reputation, rate, bar(rate,150,300,35) AS bar FROM top_users;
Copy code

Post statistics per yearSQL

SELECT  year(CreationDate) AS year, count(*), 
        round(avg(ViewCount)), max(AnswerCount)
FROM posts 
GROUP BY year 
ORDER BY year DESC LIMIT 10;
Copy code

Posting Frequency with bar chart on Weekdays for "sql" tagSQL

SELECT count(*) as freq, dayname(CreationDate) AS day, 
       bar(freq, 0, 150000,20) AS plot
FROM posts WHERE posttypeid = 1 
AND tags LIKE '%<sql>%'
GROUP BY all 
ORDER BY freq DESC;
Copy code

Posting Frequency with bar chart on Weekdays for "rust" tagSQL

SELECT count(*) as freq, dayname(CreationDate) AS day, 
       bar(freq, 0, 10000,20) AS plot
FROM posts WHERE posttypeid = 1 
AND tags LIKE '%<rust>%'
GROUP BY all 
ORDER BY freq DESC;
Copy code

Show less

Michael Hunger

Edited 02/23/24

Compute the length of a GPX track in metersSQL

Execute this SQL

Query parquet files storage on S3 compatible storage providerSQL

Execute this SQL

Split a String and cast all elements to a desired datatypeSQL

Execute this SQL

if you were to export a table as a csv, this query should estimate the size (minus the header)SQL

Execute this SQL

Convert EPOC integer field to TimestampSQL

Execute this SQL

Calculate MD5 of a text SQL

Execute this SQL

StackOverflow Analytics

Attach and select MotherDuck databaseData shared/available on MotherDuck

Which 5 questions have the most comments, what is the post title and comment countSQL

User reputation and reputation rate per daySQL

Reputation rate as bar chart with CTESQL

Post statistics per yearSQL

Posting Frequency with bar chart on Weekdays for "sql" tagSQL

Posting Frequency with bar chart on Weekdays for "rust" tagSQL

Attach and select MotherDuck database
Data shared/available on MotherDuck