Working with spatial data

Editor's note: Geospatial data is increasingly important for analytics - whether you're looking at data like store inventory, customer location or the weather. The spatial extension for DuckDB provides support for common data formats, calculations and searching within geometries.

Create a point from latitude and longitude pairsSQL

-- Install spatial extension
INSTALL spatial; 
LOAD spatial; 

-- Represent a latitude and longitude as a point
-- The Eiffel Tower in Paris, France has a 
-- latitude of 48.858935 and longitude of 2.293412
-- We can represent this location as a point
SELECT st_point(48.858935, 2.293412) AS Eiffel_Tower;

Copy code

Find the distance between two locations (in meters)SQL

-- Distance between the Eiffel Tower and the Arc de Triomphe in Paris
-- Using the EPSG spatial reference systems: 
-- EPSG:4326 geographic coordinates as latitude and longitude pairs
-- EPSG:27563 projection that covers northern France and uses meters 

SELECT
    st_point(48.858935, 2.293412) AS Eiffel_Tower, 
    st_point(48.873407, 2.295471) AS Arc_de_Triomphe,
    st_distance(
        st_transform(Eiffel_Tower, 'EPSG:4326', 'EPSG:27563'), 
        st_transform(Arc_de_Triomphe, 'EPSG:4326', 'EPSG:27563')
    ) AS Aerial_Distance_M;

Copy code

Find the country for given latitude and longitude locationSQL

-- Load the geometry outline for each country
-- Save the country name and "geom" border in table world_boundaries
CREATE OR REPLACE TABLE world_boundaries
AS
SELECT *
FROM st_read('https://public.opendatasoft.com/api/explore/v2.1/catalog/datasets/world-administrative-boundaries/exports/geojson');

-- Find the enclosing country for a given point
-- We can which country the Eiffel Tower is in 
SELECT name, region
FROM world_boundaries
WHERE ST_Within(st_point(2.293412, 48.858935) , geom);

Copy code

Simon Aubury

Expand

Share link


Working with public REST API's

Editor's note: by combining the read_json capability of DuckDB with the ability to retrieve arbitrary https URLs, you can easily use DuckDB to read data from public and unprotected APIs. Note that you can't currently pass arbitrary HTTP headers to access protected APIs.

Query JSON from a public REST APISQL

-- DuckDB can directly query a public REST API returning a JSON 
-- Query the "tvmaze.com" API for the TV show "South Park"

SELECT name, type, summary
FROM read_json('https://api.tvmaze.com/singlesearch/shows?q=South%20Park', 
auto_detect=true, format='newline_delimited');

Copy code

Format the response from a public REST API SQL

-- Format the response from a public REST API 
-- Retrieve a list of temperature readings for Sydney, Australia 
-- The returned payload has hourly temps for 24 hours
-- We can use JSONPath to extract data from a JSON documemnt 
-- to find the most recent temperature

SELECT json_extract(hourly, '$.temperature_2m[1]')
from read_json('https://api.open-meteo.com/v1/forecast?latitude=-33.8678&longitude=151.2073&hourly=temperature_2m&forecast_days=1', auto_detect=true, format='newline_delimited');

Copy code

Simon Aubury

Expand

Share link


City air quality insights based on WHO data

Editor's note: Mehdi's database share includes air quality data from the World Health Organization (WHO). Use his example queries to understand pollution in particular areas. You might even try combining with the spatial extension discussed in other snippets.

Attach and select MotherDuck database

Data shared/available on MotherDuck

ATTACH 'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6' AS sample_data;
USE sample_data;

Copy code

Annual city air quality rating based on WHO dataSQL

SELECT
    city,
    year,
    CASE
        WHEN
            AVG(pm25_concentration) <= 10
            AND AVG(pm10_concentration) <= 20
            AND AVG(no2_concentration) <= 40
            THEN 'Good'
        WHEN
            AVG(pm25_concentration) > 10
            AND AVG(pm10_concentration) > 20
            AND AVG(no2_concentration) > 40
            THEN 'Poor'
        ELSE 'Moderate'
    END AS airqualityrating
FROM
    sample_data.who.ambient_air_quality
GROUP BY
    city,
    year
ORDER BY
    city,
    year;

Copy code

Yearly average pollutant concentrations of a city (Berlin)SQL

SELECT
    year,
    AVG(pm25_concentration) AS avg_pm25,
    AVG(pm10_concentration) AS avg_pm10,
    AVG(no2_concentration) AS avg_no2
FROM sample_data.who.ambient_air_quality 
WHERE city = 'Berlin'
GROUP BY year
ORDER BY year DESC;

Copy code

Mehdi Ouazza

Expand

Share link


Top 25 Songs at Number 1, Billboard Hot 100SQL

Editor's note: do you like looking at data around the top music, including when an album debuted and when it appeared on the Billboard Hot 100 list? David has you covered with some simple analytical SQL, along with a MotherDuck share of the data.

Attach and select MotherDuck database

Data shared/available on MotherDuck

ATTACH 'md:_share/music/af2a4a73-d4ef-4c82-98ef-e49f60adccd2' AS music;
USE music;

Copy code

Execute this SQL

-- Switch to the attached music database
use music;

-- Top 25 songs at number 1, Billboard Hot 100
SELECT song, performer, 
  COUNT(*) AS weeks_at_no_1,
  chart_debut,
  MIN(chart_date) AS first_chart_date_as_no_1, 
  MAX(chart_date) AS last_chart_date_as_no_1
FROM billboard_hot_100
WHERE chart_position = 1
GROUP BY ALL
ORDER BY weeks_at_no_1 DESC, chart_debut
LIMIT 25;

Copy code

David Neal

Expand

Share link


DuckDB in Action: Some neat DuckDB specific SQL extension

Editor's note: DuckDB strives to make it easy to write SQL, even when it requires introducing non-standard syntax. See the great blog posts by Alex Monahan or explore the Manning "DuckDB in Action" book by the author of this snippet.

Attach and select MotherDuck database

Data shared/available on MotherDuck

ATTACH 'md:_share/duckdb_in_action_ch3_4/d0c08584-1d33-491c-8db7-cf9c6910eceb' AS duckdb_in_action_ch3_4;
USE duckdb_in_action_ch3_4;

Copy code

DuckDB specific extensions: Project all columns matching a patternSQL

SELECT COLUMNS('valid.*') FROM prices LIMIT 3;

Copy code

DuckDB specific extensions: Apply an aggregation to several columnsSQL

SELECT max(COLUMNS('valid.*')) FROM prices;

Copy code

DuckDB specific extensions: Apply one condition to many columnsSQL

FROM prices
WHERE COLUMNS('valid.*') BETWEEN '2020-01-01' AND '2021-01-01';

Copy code

Michael Simons

Expand

Share link