Back to Code Snippets

Dumky de Wilde

@dumky


Fetch HackerNews Stories with read_json()

You can read JSON from any endpoint into DuckDB. This uses read_json() to fetch all HackerNews 'Who is hiring' stories and create a table with all stories and the comment IDs. It works by: - creating a list of URLs - reading the list and creating the table - querying the table

Create JSON URLsSQL

-- https://hn.algolia.com/api
with urls as (
  select
    'https://hn.algolia.com/api/v1/search_by_date?tags=author_whoishiring&page=' || page_num || '&query=hiring?&attributesToRetrieve=title,tags,comment_text,created_at,object_id,story_id,story_title,children,story_text' as url
  from unnest(range(0,10)) as t(page_num)
)

select array_to_json(list(url)) from urls

Copy code

Read JSON from API endpointSQL

create or replace table hn_stories as (
select 
  hit.title as story_title,
  hit.story_text,
  hit.created_at,
  hit.objectId,
  hit.children,

from read_json(
 -- your_url_list_from_above --
  ), unnest(hits) as t(hit)
)

Copy code

Dumky de Wilde

Expand

Share link