Detect Schema Changes Across Datasets (Python)Python

Compare the schema of two datasets and identify any differences.

Execute this Python

import duckdb

def compare_schemas(file1, file2):
    """
    Compare schemas of two datasets and find differences.
    Args:
        file1 (str): Path to the first dataset (CSV/Parquet).
        file2 (str): Path to the second dataset (CSV/Parquet).
    Returns:
        list: Schema differences.
    """
    con = duckdb.connect()
    schema1 = con.execute(f"DESCRIBE SELECT * FROM read_csv_auto('{file1}')").fetchall()
    schema2 = con.execute(f"DESCRIBE SELECT * FROM read_csv_auto('{file2}')").fetchall()
    return {"file1_schema": schema1, "file2_schema": schema2}

# Example Usage
differences = compare_schemas("data1.csv", "data2.csv")
print(differences)

Copy code

Expand

Share link


Export/import/share DuckDB UI Notebooks

The DuckDB UI stores notebook content in an internal database called _duckdb_ui. You can query and export notebook content, as well as insert new definitions into the database. Warning: Modifying the internal database may lead to corruption and data loss. Be cautious and use it at your own risk!

Export a notebook definition to JSONSQL

copy (
  select
    "json"
  from _duckdb_ui.notebook_versions
  where 1=1
    and title = 'MySingleNotebook'
    and expires is null
) to 'exported-notebook.json'

Copy code

Import notebook definitionSQL

set variable notebook_content = (select json from 'exported-notebook.json');
set variable notebook_id = uuid();
set variable current_timestamp = now();

begin transaction;
  insert into _duckdb_ui.notebooks (id, name, created)
  select
    getvariable('notebook_id'),
    'notebook_' || getvariable('notebook_id'),
    getvariable('current_timestamp')
  ;

  insert into _duckdb_ui.notebook_versions (notebook_id, version, title, json, created, expires)
  select
    getvariable('notebook_id'),
    1,
    'imported-notebook-' || getvariable('current_timestamp'),
    getvariable('notebook_content'),
    getvariable('current_timestamp'),
    null
  ;
commit;

Copy code

Expand

Share link


DuckDB to geojsonSQL

Snippet helps put correctly formatted geojson file using spatial and json extensions from any table/query.

Execute this SQL

COPY (
    SELECT
        *, -- all these field will become properties in resulting geojson (doesnt work well with map-like columns)
 	ST_GeomFromText(wkt) as geometry -- this is the main geometry entry    
    FROM temp_table      
) TO 'test.geojson' WITH (FORMAT GDAL, DRIVER 'GeoJSON', LAYER_NAME 'test');

Copy code

Shabbir Marzban

Expand

Share link