Query Yahoo Finance data with Scrooge extensionSQL
Editor's note: The Scrooge McDuck extension provides common aggregation functions and data scanners used for financial data. This example grabs stock quotes (or the S&P 500 in this case) on specific dates for analysis in DuckDB.
Execute this SQL
-- Install httpfs extension INSTALL httpfs; LOAD httpfs; -- Install Scrooge extension https://github.com/pdet/Scrooge-McDuck -- NOTE: You need to start DuckDB with `-unsigned` flag to authorize to install & load of 3rd party extension SET custom_extension_repository='scrooge-duckdb.s3.us-west-2.amazonaws.com/scrooge/s3_deploy'; INSTALL scrooge; LOAD scrooge; -- Example of query FROM yahoo_finance("^GSPC", "2023-02-01", "2023-02-04", "1d");
Copy code
Expand
Share link
Unnest JSON Array into Rows (pseudo-json_each)SQL
Editor's note: as APIs and applications increasingly represent tabular data in JSON format, we sometimes want the comfort of seeing it as rows, queryable in SQL. DuckDB has a great extension for parsing JSON, including the ability to convert into a DuckDB list which can then be unnested into rows.
Execute this SQL
select -- Split DuckDB list into multiple rows unnest( -- Convert JSON array into DuckDB list of JSON elements from_json( json('[{"col1":1, "col2":2}, {"col1":10, "col2":20}]'), '["JSON"]' ) ); /* Returns: {"col1":1,"col2":2} {"col1":10,"col2":20} */
Copy code
Expand
Share link
Dynamic SQL in DuckDB CLISQL
Editor's note: this gets a little recursive for all those SQL nerds out there. It shows you how to use SQL to generate SQL. It also shows you some of the dot commands for interacting with the DuckDB CLI: changing the output format and writing SQL to a file which is then read in for execution.
Execute this SQL
-- Use SQL to generate SQL statements. -- Write them to a file, then execute that file. -- This uses DuckDB CLI-specific features, so it only works in the CLI. .mode list .header off .once my_dynamic_sql.sql select 'CREATE TABLE t' || i || ' AS SELECT ' || i || ' AS my_column;' from generate_series(5) x(i); .read my_dynamic_sql.sql select table_name from duckdb_tables(); -- Check our work!
Copy code
Expand
Share link
Put null values last when sorting (like Excel or Postgres)SQL
Editor's note: DuckDB enables you to configure whether NULL values are returned first or last in result sets by default. You can also specify it per query using NULLS LAST in the query ORDER BY clause. Note that NULLS LAST is now the default with 0.8.0+.
Execute this SQL
PRAGMA default_null_order='NULLS LAST';
Copy code
Expand
Share link
Query S3 Access LogsSQL
Editor's note: Want to read log files with DuckDB? You can use the read_csv function and custom date/time + regex parsing to do it. To make the data more useful, you can specifically CAST some of the values as numerical types. This snippet also shows CASE WHEN ELSE statements in action.
Execute this SQL
/* Background: If you have S3 Access Logging enabled on one of your S3 buckets, you'll have some useful information about requests to your bucket. Unfortunately, it's in a semistructured format that can be difficult to parse. This SQL query will can help in this manner, both pulling out individual fields and coersing them to native data types. Usage: you'll want to search for the strings <bucket> and <prefix>, and insert the S3 bucket where your access logs are being delivered. Use (or delete) <prefix> to filter to a subset of your logs. Also, these commented out configuration settings you can either run yourself in the REPL and source this file using `.read parse_s3_access_logs.sql`, or you can uncomment them and supply values for yourself. */ -- install https; -- load https; -- SET s3_region='us-west-2'; -- SET s3_access_key_id=''; -- SET s3_secret_access_key=''; WITH parsed_logs AS ( SELECT regexp_extract(col1, '^([0-9a-zA-Z]+)\s+([a-z0-9.\-]+)\s+\[([0-9/A-Za-z: +]+)\] ([^ ]+) ([^ ]+) ([^ ]+) ([^ ]+) ([^ ]+) ("[^"]*"|-) ([^ ]+) ([^ ]+) (\d+|-) (\d+|-) (\d+|-) (\d+|-) ("[^"]*"|-) ("[^"]*"|-) (\S+) (\S+) (\S+) (\S+) (\S+) (\S+) (\S+) (\S+) (\S+)(.*)$', ['bucket_owner', 'bucket', 'timestamp', 'remote_ip', 'request', 'request_id', 'operation', 's3_key', 'request_uri', 'http_status', 's3_errorcode', 'bytes_sent','object_size', 'total_time', 'turn_around_time', 'referer', 'user_agent', 'version_id', 'host_id', 'sigver', 'cyphersuite', 'auth_type', 'host_header', 'tls_version', 'access_point_arn', 'acl_required', 'extra'] ) AS log_struct FROM -- Trick the CSV reader into reading as a single column read_csv( 's3://<bucket>/<prefix>/*', columns={'col1': 'VARCHAR'}, -- Use a *hopefully* nonsensical deliminator, so no ',' chars screw us up delim='\0' ) ) SELECT -- Grab everything from the struct that we want as strings, exclude stuff we'll coersce to diff types log_struct.* exclude (timestamp, bytes_sent, object_size, total_time, turn_around_time), strptime(log_struct.timestamp, '%d/%b/%Y:%H:%M:%S %z') AS timestamp, CASE WHEN log_struct.bytes_sent = '-' THEN NULL ELSE CAST(log_struct.bytes_sent AS INTEGER) END AS bytes_sent, CASE WHEN log_struct.object_size = '-' THEN NULL ELSE CAST(log_struct.object_size AS INTEGER) END AS object_size, CASE WHEN log_struct.total_time = '-' THEN NULL ELSE CAST(log_struct.total_time AS INTEGER) END AS total_time, CASE WHEN log_struct.turn_around_time = '-' THEN NULL ELSE CAST(log_struct.turn_around_time AS INTEGER) END AS turn_around_time FROM parsed_logs;
Copy code
Expand
Share link
DuckDB in Action: Some neat DuckDB specific SQL extension
Editor's note: DuckDB strives to make it easy to write SQL, even when it requires introducing non-standard syntax. See the great blog posts by Alex Monahan or explore the Manning "DuckDB in Action" book by the author of this snippet.
DuckDB specific extensions: Project all columns matching a patternSQL
SELECT COLUMNS('valid.*') FROM prices LIMIT 3;
Copy code
DuckDB specific extensions: Apply an aggregation to several columnsSQL
SELECT max(COLUMNS('valid.*')) FROM prices;
Copy code
DuckDB specific extensions: Apply one condition to many columnsSQL
FROM prices WHERE COLUMNS('valid.*') BETWEEN '2020-01-01' AND '2021-01-01';
Copy code
Expand
Share link