StackOverflow Analytics

Editor's note: Michael shares Stackoverflow data in MotherDuck as part of this snippet as well as typical aggregate analytics on the data. There are additional sample data sets attached by default in MotherDuck as the 'sample_data' share.

Attach and select MotherDuck database

Data shared/available on MotherDuck

ATTACH 'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5' AS stackoverflow;
USE stackoverflow;

Copy code

Which 5 questions have the most comments, what is the post title and comment countSQL

SELECT Title, CommentCount
FROM posts
WHERE PostTypeId = 1
ORDER BY CommentCount DESC
LIMIT 5;

Copy code

User reputation and reputation rate per daySQL

SELECT name, reputation, 
       round(reputation/day(today()-createdAt)) as rate, 
       day(today()-createdAt) as days, 
       createdAt
FROM users WHERE reputation > 1000000 ORDER BY rate DESC;

Copy code

Reputation rate as bar chart with CTESQL

WITH top_users as (
  SELECT name, reputation, 
       round(reputation/day(today()-createdAt)) as rate, day(today()-createdAt) as days, 
       createdAt
       FROM users WHERE reputation > 1000000
)
SELECT name, reputation, rate, bar(rate,150,300,35) AS bar FROM top_users;

Copy code

Post statistics per yearSQL

SELECT  year(CreationDate) AS year, count(*), 
        round(avg(ViewCount)), max(AnswerCount)
FROM posts 
GROUP BY year 
ORDER BY year DESC LIMIT 10;

Copy code

Posting Frequency with bar chart on Weekdays for "sql" tagSQL

SELECT count(*) as freq, dayname(CreationDate) AS day, 
       bar(freq, 0, 150000,20) AS plot
FROM posts WHERE posttypeid = 1 
AND tags LIKE '%<sql>%'
GROUP BY all 
ORDER BY freq DESC;

Copy code

Posting Frequency with bar chart on Weekdays for "rust" tagSQL

SELECT count(*) as freq, dayname(CreationDate) AS day, 
       bar(freq, 0, 10000,20) AS plot
FROM posts WHERE posttypeid = 1 
AND tags LIKE '%<rust>%'
GROUP BY all 
ORDER BY freq DESC;

Copy code

Show less
Michael Hunger

Expand

Share link


Specifying types in the CSV loaderSQL

We can specify types for the `read_csv` function on a per-column basis.

Execute this SQL

COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x))
TO 'example.csv';

CREATE OR REPLACE TABLE example AS
    FROM read_csv('example.csv', types = {'x': 'DECIMAL(15, 3)'});

Copy code

Gábor Szárnyas

Expand

Share link


Emulate the Initcap functionSQL

In many database systems, there is a SQL function called something like INITCAP which capitalizes the first letter of all the words in a text string. Unfortunately, DuckDB doesn’t have this built-in, so let’s see if we can emulate it using function chaining and list comprehension.

Execute this SQL

SELECT ([upper (x[1])||x[2:] 
for x in 
(
'the quick brown fox jumped over the lazy dog'
).string_split(' ')]).list_aggr('string_agg',' ') 

Copy code

Thomas Reid

Expand

Share link


Load content from Strapi CMS REST API to Parquet fileSQL

A nice trick to load data from Strapi CMS. The Api Token can be obtained in the Settings menu of Strapi. A nice way to let users maintain reference data using the CMS and be able to use it directly in DuckDB. Should work for both Strapi self-hosted and cloud.

Execute this SQL

INSTALL httpfs;
LOAD httpfs;

CREATE SECRET http (
    TYPE HTTP,
    EXTRA_HTTP_HEADERS MAP {
        'Authorization': 'Bearer [Api Token]'
    }
); 

-- Replace strapi.mydomain.com with your Strapi URL and replace `pets` with your content type
COPY (SELECT unnest(data, recursive:= true) FROM read_json_auto('https://strapi.mydomain.com/api/pets')) TO 'pets.parquet';

Copy code

PK

Expand

Share link

Execute this SQL

COPY (SELECT * FROM read_parquet('/path/to/parquet/files/*.parquet') TO '/path/to/combined/file.parquet' (FORMAT PARQUET)

Copy code

Mike Metzger

Copy code

Expand

Share link