StackOverflow Analytics
Editor's note: Michael shares Stackoverflow data in MotherDuck as part of this snippet as well as typical aggregate analytics on the data. There are additional sample data sets attached by default in MotherDuck as the 'sample_data' share.
Which 5 questions have the most comments, what is the post title and comment countSQL
SELECT Title, CommentCount FROM posts WHERE PostTypeId = 1 ORDER BY CommentCount DESC LIMIT 5;
Copy code
User reputation and reputation rate per daySQL
SELECT name, reputation, round(reputation/day(today()-createdAt)) as rate, day(today()-createdAt) as days, createdAt FROM users WHERE reputation > 1000000 ORDER BY rate DESC;
Copy code
Reputation rate as bar chart with CTESQL
WITH top_users as ( SELECT name, reputation, round(reputation/day(today()-createdAt)) as rate, day(today()-createdAt) as days, createdAt FROM users WHERE reputation > 1000000 ) SELECT name, reputation, rate, bar(rate,150,300,35) AS bar FROM top_users;
Copy code
Post statistics per yearSQL
SELECT year(CreationDate) AS year, count(*), round(avg(ViewCount)), max(AnswerCount) FROM posts GROUP BY year ORDER BY year DESC LIMIT 10;
Copy code
Posting Frequency with bar chart on Weekdays for "sql" tagSQL
SELECT count(*) as freq, dayname(CreationDate) AS day, bar(freq, 0, 150000,20) AS plot FROM posts WHERE posttypeid = 1 AND tags LIKE '%<sql>%' GROUP BY all ORDER BY freq DESC;
Copy code
Posting Frequency with bar chart on Weekdays for "rust" tagSQL
SELECT count(*) as freq, dayname(CreationDate) AS day, bar(freq, 0, 10000,20) AS plot FROM posts WHERE posttypeid = 1 AND tags LIKE '%<rust>%' GROUP BY all ORDER BY freq DESC;
Copy code
Expand
Share link
Specifying types in the CSV loaderSQL
We can specify types for the `read_csv` function on a per-column basis.
Execute this SQL
COPY (FROM VALUES ('foo', 10/9), ('bar', 50/7), ('qux', 9/4) t(s, x)) TO 'example.csv'; CREATE OR REPLACE TABLE example AS FROM read_csv('example.csv', types = {'x': 'DECIMAL(15, 3)'});
Copy code
Expand
Share link
Emulate the Initcap functionSQL
In many database systems, there is a SQL function called something like INITCAP which capitalizes the first letter of all the words in a text string. Unfortunately, DuckDB doesn’t have this built-in, so let’s see if we can emulate it using function chaining and list comprehension.
Execute this SQL
SELECT ([upper (x[1])||x[2:] for x in ( 'the quick brown fox jumped over the lazy dog' ).string_split(' ')]).list_aggr('string_agg',' ')
Copy code
Expand
Share link
Query from Google SheetsSQL
Sometimes you have data over in Google Sheets and you want quickly and easily load it into DuckDB. With read_csv and the Google Sheets 'export to csv', its just one line of code.
Execute this SQL
-- you will need to create a sharelink for your sheet. -- note the tab id (gid) and sheet id (in the url) from read_csv('https://docs.google.com/spreadsheets/d/{sheet_id}/export?format=csv&gid={tab_id}')
Copy code
Expand
Share link
Add a unique column (key) to an existing table in DuckDBSQL
We can use DuckDB's `SEQUENCE` to add a unique key to a table.Many source data files (such as CSV files) do not include a unique key.
Execute this SQL
CREATE TABLE tbl (s VARCHAR); INSERT INTO tbl VALUES ( 'hello'),('world'); CREATE SEQUENCE id_sequence; ALTER TABLE tbl ADD COLUMN id INTEGER DEFAULT nextval ('id_sequence'); SELECT * FROM tbl;
Copy code
Expand
Share link