StackOverflow Analytics
Editor's note: Michael shares Stackoverflow data in MotherDuck as part of this snippet as well as typical aggregate analytics on the data. There are additional sample data sets attached by default in MotherDuck as the 'sample_data' share.
Which 5 questions have the most comments, what is the post title and comment countSQL
SELECT Title, CommentCount FROM posts WHERE PostTypeId = 1 ORDER BY CommentCount DESC LIMIT 5;
User reputation and reputation rate per daySQL
SELECT name, reputation, round(reputation/day(today()-createdAt)) as rate, day(today()-createdAt) as days, createdAt FROM users WHERE reputation > 1000000 ORDER BY rate DESC;
Reputation rate as bar chart with CTESQL
WITH top_users as ( SELECT name, reputation, round(reputation/day(today()-createdAt)) as rate, day(today()-createdAt) as days, createdAt FROM users WHERE reputation > 1000000 ) SELECT name, reputation, rate, bar(rate,150,300,35) AS bar FROM top_users;
Post statistics per yearSQL
SELECT year(CreationDate) AS year, count(*), round(avg(ViewCount)), max(AnswerCount) FROM posts GROUP BY year ORDER BY year DESC LIMIT 10;
Posting Frequency with bar chart on Weekdays for "sql" tagSQL
SELECT count(*) as freq, dayname(CreationDate) AS day, bar(freq, 0, 150000,20) AS plot FROM posts WHERE posttypeid = 1 AND tags LIKE '%<sql>%' GROUP BY all ORDER BY freq DESC;
Posting Frequency with bar chart on Weekdays for "rust" tagSQL
SELECT count(*) as freq, dayname(CreationDate) AS day, bar(freq, 0, 10000,20) AS plot FROM posts WHERE posttypeid = 1 AND tags LIKE '%<rust>%' GROUP BY all ORDER BY freq DESC;
Specifying types in the CSV loaderSQL
We can specify types for the `read_csv` function on a per-column basis.
Execute this SQL
COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x)) TO 'example.csv'; CREATE OR REPLACE TABLE example AS FROM read_csv('example.csv', types = {'x': 'DECIMAL(15, 3)'});
Emulate the Initcap functionSQL
In many database systems, there is a SQL function called something like INITCAP which capitalizes the first letter of all the words in a text string. Unfortunately, DuckDB doesn’t have this built-in, so let’s see if we can emulate it using function chaining and list comprehension.
Execute this SQL
SELECT ([upper (x[1])||x[2:] for x in ( 'the quick brown fox jumped over the lazy dog' ).string_split(' ')]).list_aggr('string_agg',' ')
Load content from Strapi CMS REST API to Parquet fileSQL
A nice trick to load data from Strapi CMS. The Api Token can be obtained in the Settings menu of Strapi. A nice way to let users maintain reference data using the CMS and be able to use it directly in DuckDB. Should work for both Strapi self-hosted and cloud.
Execute this SQL
INSTALL httpfs; LOAD httpfs; CREATE SECRET http ( TYPE HTTP, EXTRA_HTTP_HEADERS MAP { 'Authorization': 'Bearer [Api Token]' } ); -- Replace with your Strapi URL and replace `pets` with your content type COPY (SELECT unnest(data, recursive:= true) FROM read_json_auto('')) TO 'pets.parquet';
Replace string multiple timesSQL
`replace` target string multiple time with list of replacements.
Execute this SQL
SELECT reduce([['', content], ['foo','FOO'], ['bar', 'BAR']], (x, y, i)-> ['', replace(x[2], y[1], y[2])]) FROM posts;
Run SQL file in DuckDB CLI
The DuckDB CLI enables you to execute a set of SQL statements stored in a file or passed in via STDIN. There are a few variants of this capability demonstrated below.
Read and execute SQL using init CLI argument and prompt for additional SQL statementsBash
# executes SQL in create.sql, and then prompts for additional # SQL statements provided interactively. note that when specifying # an init flag, the ~/.duckdbrc file is not read duckdb -init create.sql
Read and execute SQL using init CLI argument and immediately exit Bash
# executes SQL in create.sql and then immediately exits # note that we're specifying a database name so that we # can access the created data later. note that when specifying # an init flag, the ~/.duckdbrc file is not read duckdb -init create.sql -no-stdin mydb.ddb
Pipe SQL file to the DuckDB CLI and exitBash
duckdb < create.sql mydb.ddb
