Working with spatial data

Editor's note: Geospatial data is increasingly important for analytics - whether you're looking at data like store inventory, customer location or the weather. The spatial extension for DuckDB provides support for common data formats, calculations and searching within geometries.

Create a point from latitude and longitude pairsSQL

-- Install spatial extension
INSTALL spatial; 
LOAD spatial; 

-- Represent a latitude and longitude as a point
-- The Eiffel Tower in Paris, France has a 
-- latitude of 48.858935 and longitude of 2.293412
-- We can represent this location as a point
SELECT st_point(48.858935, 2.293412) AS Eiffel_Tower;

Copy code

Find the distance between two locations (in meters)SQL

-- Distance between the Eiffel Tower and the Arc de Triomphe in Paris
-- Using the EPSG spatial reference systems: 
-- EPSG:4326 geographic coordinates as latitude and longitude pairs
-- EPSG:27563 projection that covers northern France and uses meters 

SELECT
    st_point(48.858935, 2.293412) AS Eiffel_Tower, 
    st_point(48.873407, 2.295471) AS Arc_de_Triomphe,
    st_distance(
        st_transform(Eiffel_Tower, 'EPSG:4326', 'EPSG:27563'), 
        st_transform(Arc_de_Triomphe, 'EPSG:4326', 'EPSG:27563')
    ) AS Aerial_Distance_M;

Copy code

Find the country for given latitude and longitude locationSQL

-- Load the geometry outline for each country
-- Save the country name and "geom" border in table world_boundaries
CREATE OR REPLACE TABLE world_boundaries
AS
SELECT *
FROM st_read('https://public.opendatasoft.com/api/explore/v2.1/catalog/datasets/world-administrative-boundaries/exports/geojson');

-- Find the enclosing country for a given point
-- We can which country the Eiffel Tower is in 
SELECT name, region
FROM world_boundaries
WHERE ST_Within(st_point(2.293412, 48.858935) , geom);

Copy code

Simon Aubury

Expand

Share link


result set → single row array of structsSQL

Execute this SQL

-- First, transform TBL (or any relation) 
-- into a single column of structs.
with structs (from TBL select TBL)
-- Then pack those structs into a list.
from structs select list(TBL) as ready_to_plot;

Copy code

Hamilton Ulmer

Copy code

Expand

Share link


Show supported DuckDB extensionsSQL

Editor's note: DuckDB has many supported extensions for everything from data formats (JSON, parquet, excel, iceberg) to specific types (IP addresses, time zones) to indexing (full-text-search) and more. This table function will tell you which extensions are supported in your local DuckDB install.

Execute this SQL

-- show supported duckdb extensions
FROM duckdb_extensions();

Copy code

Mehdi Ouazza

Expand

Share link


Create partitioned Parquet files from a remote CSV sourceSQL

Editor's note: DuckDB can create partitioned Parquet files - allowing you to store your data in partitions (eg orders for specific dates, traffic from specific IPs, etc) based on predictable filenames. This allows for more performant queries from cloud storage as only the needed files are retrieved.

Execute this SQL

-- Read from a remote CSV file, and write partitioned Parquet files to local target
-- Queries like this are commonly used in Data Lakes
COPY (SELECT cloud_provider, cidr_block, ip_address, ip_address_mask, ip_address_cnt, region from read_csv_auto('https://raw.githubusercontent.com/tobilg/public-cloud-provider-ip-ranges/main/data/providers/all.csv')) TO '/tmp/ip-ranges' (FORMAT PARQUET, PARTITION_BY cloud_provider);

Copy code

Tobias Müller

Expand

Share link


ATTACH 'other.db';SQL

Editor's note: DuckDB allows you to attach multiple databases at once. For example, you can attach a local file, an in-memory database and a database from MotherDuck and work with all of them simultaneously. The ATTACH statement is executed for each database to be attached.

Execute this SQL

-- attach another database, alias inferred from the name ("other")
ATTACH 'other.db';
SELECT * FROM other.some_table;

Copy code

Carlo Piovesan

Expand

Share link


Combine several parquet files into one and compress with zstdBash

Editor's note: another great example of using DuckDB's wide data format support to merge/combine multiple Parquet files. Parth also kindly shows you how to compress the resulting Parquet file with the zstd codec. DuckDB also supports gzip and snappy compression codecs.

Execute this Bash

duckdb -c "COPY (SELECT * FROM '~/Binance_Spot_Data/*.parquet') TO 'binance.parquet' (FORMAT 'PARQUET', CODEC 'zstd')"

Copy code

Parth Patil

Expand

Share link

Sponsored by
Inspired by