Top 25 Songs at Number 1, Billboard Hot 100SQL

Editor's note: do you like looking at data around the top music, including when an album debuted and when it appeared on the Billboard Hot 100 list? David has you covered with some simple analytical SQL, along with a MotherDuck share of the data.

Attach and select MotherDuck database

Data shared/available on MotherDuck

ATTACH 'md:_share/music/af2a4a73-d4ef-4c82-98ef-e49f60adccd2' AS music;
USE music;

Copy code

Execute this SQL

-- Switch to the attached music database
use music;

-- Top 25 songs at number 1, Billboard Hot 100
SELECT song, performer, 
  COUNT(*) AS weeks_at_no_1,
  chart_debut,
  MIN(chart_date) AS first_chart_date_as_no_1, 
  MAX(chart_date) AS last_chart_date_as_no_1
FROM billboard_hot_100
WHERE chart_position = 1
GROUP BY ALL
ORDER BY weeks_at_no_1 DESC, chart_debut
LIMIT 25;

Copy code

David Neal

Expand

Share link


UK Bank HolidaysSQL

The UK government publish the UK bank holidays as a JSON object at a public endpoint. DuckDB can read the data from the endpoint, unpivot it, and expand it in a few lines!

Execute this SQL

/* division, title, date, notes, bunting */
select
    division,
    unnest(events.events, recursive:=true)
from (
    unpivot 'https://www.gov.uk/bank-holidays.json'
    on "england-and-wales", "scotland", "northern-ireland"
    into name division value events
)

Copy code

Bill Wallis

Expand

Share link


Remove duplicatesSQL

Execute this SQL

/* removes duplicate rows at the order_id level */
SELECT * FROM orders
QUALIFY row_number() over (partition by order_id order by created_at) = 1

Copy code

Octavian Zarzu

Copy code

Expand

Share link


Creating parameterized views with TVFs macros (Table-Valued functions)SQL

Execute this SQL

-- create macro
CREATE OR REPLACE MACRO udf_products_in_year (v_year, v_category)
AS TABLE
SELECT 
	name, 
	category, 
	created_at
FROM products 
WHERE category = v_category
AND year(created_at) = v_year;

-- select using the macro as you would do from a table
SELECT *
FROM udf_products_in_year (2020, 'Home and Garden');

| Copper Light	| Home and Garden	| 2020-04-05 00:00:00.000 |
| Pink Armchair	| Home and Garden	| 2020-06-23 00:00:00.000 |

-- input ddl and data
CREATE TABLE products 
(
	name varchar,
	category varchar,
	created_at timestamp
);

INSERT INTO products
VALUES
('Cream Sofa', 'Home and Garden', '2019-03-14'),
('Copper Light', 'Home and Garden', '2020-04-05'),
('Pink Armchair', 'Home and Garden', '2020-06-23');

Copy code

Octavian Zarzu

Copy code

Expand

Share link


Query Parquet files and their metadataSQL

Execute this SQL

-- listing files
FROM glob('dataset/*');

-- reading from files
FROM 'dataset/*.parquet' LIMIT 100; 

-- reading parquet files metadata (min_value, max_value, null_count for each field in each file)
FROM parquet_metadata('dataset/*.parquet');

-- convert files or export tables to parquet
COPY (FROM tbl) TO 'file.parquet' (FORMAT 'parquet');
COPY 'data.csv' TO 'data.paruqet' (FORMAT 'parquet');

Copy code

Octavian Zarzu

Copy code

Expand

Share link


if you were to export a table as a csv, this query should estimate the size (minus the header)SQL

Execute this SQL

with bits as (
    select
    -- add 8 bits to the end to account for the delimiter
    bit_length(columns(*))  + 8
    from <TABLE>
),
-- aggregate all columns
bits_agg as (
    select sum(columns(*)) from bits
),
-- unpivot a wide single row of aggs to single column
bits_col as (
    unpivot bits_agg on columns(*)
)
-- add them all up & convert to mb
select sum(value) / (8 * 1024 ** 2) as mb
from bits_col

Copy code

Hamilton Ulmer

Copy code

Expand

Share link


Split a String and cast all elements to a desired datatypeSQL

Execute this SQL

-- Here we make use of list_transform and 
-- the fact that DuckDB knows lambda expressions
SELECT list_transform(split('01:08:22', ':'), x -> CAST (x AS INTEGER));

Copy code

Michael Simons

Copy code

Expand

Share link


Convert CSV to Parquet and amend the schemaSQL

Execute this SQL

-- This selects only specific fields from the CSV
-- It also renames one of the fields in the target schema
COPY (SELECT col1, col2, col3 AS new_col_name
	    FROM read_csv('~/data/source.csv',AUTO_DETECT=TRUE))
  TO '~/data/target.parquet' (FORMAT 'PARQUET', CODEC 'ZSTD');

Copy code

Robin Moffatt

Copy code

Expand

Share link