SQL, Python & More for DuckDB - Page 10

‌

StackOverflow Analytics

Editor's note: Michael shares Stackoverflow data in MotherDuck as part of this snippet as well as typical aggregate analytics on the data. There are additional sample data sets attached by default in MotherDuck as the 'sample_data' share.

Attach and select MotherDuck database
Data shared/available on MotherDuck

ATTACH 'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5' AS stackoverflow;
USE stackoverflow;
Copy code

Which 5 questions have the most comments, what is the post title and comment countSQL

SELECT Title, CommentCount
FROM posts
WHERE PostTypeId = 1
ORDER BY CommentCount DESC
LIMIT 5;
Copy code

User reputation and reputation rate per daySQL

SELECT name, reputation, 
       round(reputation/day(today()-createdAt)) as rate, 
       day(today()-createdAt) as days, 
       createdAt
FROM users WHERE reputation > 1000000 ORDER BY rate DESC;
Copy code

Reputation rate as bar chart with CTESQL

WITH top_users as (
  SELECT name, reputation, 
       round(reputation/day(today()-createdAt)) as rate, day(today()-createdAt) as days, 
       createdAt
       FROM users WHERE reputation > 1000000
)
SELECT name, reputation, rate, bar(rate,150,300,35) AS bar FROM top_users;
Copy code

Post statistics per yearSQL

SELECT  year(CreationDate) AS year, count(*), 
        round(avg(ViewCount)), max(AnswerCount)
FROM posts 
GROUP BY year 
ORDER BY year DESC LIMIT 10;
Copy code

Posting Frequency with bar chart on Weekdays for "sql" tagSQL

SELECT count(*) as freq, dayname(CreationDate) AS day, 
       bar(freq, 0, 150000,20) AS plot
FROM posts WHERE posttypeid = 1 
AND tags LIKE '%<sql>%'
GROUP BY all 
ORDER BY freq DESC;
Copy code

Posting Frequency with bar chart on Weekdays for "rust" tagSQL

SELECT count(*) as freq, dayname(CreationDate) AS day, 
       bar(freq, 0, 10000,20) AS plot
FROM posts WHERE posttypeid = 1 
AND tags LIKE '%<rust>%'
GROUP BY all 
ORDER BY freq DESC;
Copy code

Show less

Michael Hunger

Edited 02/23/24

‌

if you were to export a table as a csv, this query should estimate the size (minus the header)SQL

Execute this SQL

with bits as (
    select
    -- add 8 bits to the end to account for the delimiter
    bit_length(columns(*))  + 8
    from <TABLE>
),
-- aggregate all columns
bits_agg as (
    select sum(columns(*)) from bits
),
-- unpivot a wide single row of aggs to single column
bits_col as (
    unpivot bits_agg on columns(*)
)
-- add them all up & convert to mb
select sum(value) / (8 * 1024 ** 2) as mb
from bits_col
Copy code

Hamilton Ulmer

Created 11/15/23

‌

Load AWS credentialsSQL

Execute this SQL

CALL load_aws_credentials();
--- duckdb >= v0.9.0, AWS and httpfs extension loaded on the fly
Copy code

Carlo Piovesan

Created 09/27/23

‌

Unnest a LIST from JSONSQL

Execute this SQL

with 

nested_json_list as (
    
    select '["a", "b", "c"]'::json as foo

)
    
select json(foo)::varchar[] from nested_json_list
Copy code

winnie

Created 08/31/23

‌

Split a String and cast all elements to a desired datatypeSQL

Execute this SQL

-- Here we make use of list_transform and 
-- the fact that DuckDB knows lambda expressions
SELECT list_transform(split('01:08:22', ':'), x -> CAST (x AS INTEGER));
Copy code

Michael Simons

Created 08/22/23

‌

Replacing one a column in a select * with an expression under a different nameSQL

Execute this SQL

-- * EXCLUDE plus any additional expression does the trick
SELECT * EXCLUDE (valid_from),  -- valid_from used in computation
       year(valid_from) AS year -- A new expression
FROM prices
 -- Note that the new expression can be referred to in the WHERE clause
WHERE year BETWEEN 2019 AND 2020;
Copy code

Michael Simons

Created 08/22/23

‌

Quick inline bar charts with bar() functionSQL

Execute this SQL

-- bar(x, min, max[, width])
--- Draw a band whose width is proportional to (x - min) and equal to width characters when x = max. width defaults to 80.

with data as (select unnest(range(1,10)) as v order by random()) 

select bar(v,minv,maxv,30) from data, 
       (select min(v) as minv, max(v) as maxv from data);

-- ┌────────────────────────────────┐
-- │     bar(v, minv, maxv, 30)     │
-- │            varchar             │
-- ├────────────────────────────────┤
-- │ ██████████████████████▌        │
-- │ ███▊                           │
-- │ ██████████████████████████████ │
-- │                                │
-- │ ███████████████                │
-- │ ███████████▎                   │
-- │ ██████████████████▊            │
-- │ ██████████████████████████▎    │
-- │ ███████▌                       │
-- └────────────────────────────────┘
Copy code

Michael Hunger

Created 07/15/23

StackOverflow Analytics

Attach and select MotherDuck databaseData shared/available on MotherDuck

Which 5 questions have the most comments, what is the post title and comment countSQL

User reputation and reputation rate per daySQL

Reputation rate as bar chart with CTESQL

Post statistics per yearSQL

Posting Frequency with bar chart on Weekdays for "sql" tagSQL

Posting Frequency with bar chart on Weekdays for "rust" tagSQL

if you were to export a table as a csv, this query should estimate the size (minus the header)SQL

Execute this SQL

Load AWS credentialsSQL

Execute this SQL

Unnest a LIST from JSONSQL

Execute this SQL

Split a String and cast all elements to a desired datatypeSQL

Execute this SQL

Replacing one a column in a select * with an expression under a different nameSQL

Execute this SQL

Quick inline bar charts with bar() functionSQL

Execute this SQL

Attach and select MotherDuck database
Data shared/available on MotherDuck