Back to Code Snippets
DuckØ
@🦆 rule!
read_dsv() -> Parse properly separated CSV files
I tend to prefer using the ASCII unit (\x1f) and group separator (\x1e) as resp. column and line delimiters in CSVs (which technically no longer makes them a CSV). The read_csv function doesn't seem to want to play nice with these, so here's my attempt at a workaround.
Marco definitionSQL
-- For more info on DSVs (I'm not the author): https://matthodges.com/posts/2024-08-12-csv-bad-dsv-good/ CREATE OR REPLACE MACRO read_dsv(path_spec) AS TABLE ( with _lines as ( select filename ,regexp_split_to_array(content, '\x1e') as content from read_text(path_spec ) ) , _cleanup as ( select filename ,regexp_split_to_array(content[1],'\x1f') as header ,generate_subscripts(content[2:],1) as linenum ,unnest((content[2:]).list_filter(x -> trim(x) != '').list_transform(x -> x.regexp_split_to_array('\x1f'))) as line from _lines ) select filename ,linenum ,unnest(map_entries(map(header, line)), recursive := true) as kv from _cleanup );
Copy code
UsageSQL
-- You can use the same path specification as you would with read_text or read_csv, this includes globbing. -- Trying to include the pivot statement in the macro isn't possible, as you then have to explicitly define the column values (which defeats the purpose of this implementation) pivot read_dsv("C:\Temp\csv\*.csv") on key using first(value) group by filename, linenum order by filename, linenum
Copy code
DuckØ
Expand
Share link