Back to Code Snippets
Parse a File in an Unsupported Format
Editor's note: as data engineers, we're often burdened with data that's not in a standard format. Using DuckDB's basic string functions, advanced regex functions, list functions and the CSV parser, you can parse data of arbitrary formats.
Execute this SQL
/* This query pulls the contents of a file into a a single row/column, and then parses it in the select statement. As an example, we're parsing /proc/{pid}/environ files which are record separated by the null char (\0) and column separated by the '=' char. */ SELECT str_split(filename, '/')[3] AS PID, env_pair[1] AS Name, env_pair[2] AS Value FROM ( SELECT filename, str_split(unnest(str_split(column0, chr(0))), '=') AS env_pair FROM read_csv( '/proc/[0-9]*/environ', header=False, filename=true, -- Make sure your 'delim' is a string not in the file delim='\0', columns={column0: 'VARCHAR'}) ) WHERE Name != '' ORDER BY PID, Name
Copy code
Mark Roddy
Expand
Share link