Back to Code Snippets


Discretize numeric column following specified thresholds

discretize() converts a numeric column into discrete ordered ids, taking into account a list of thresholds.

Execute this SQL

CREATE OR REPLACE MACRO discretize(v, l) AS (
	WITH t1 AS (
		SELECT unnest(list_distinct(l)) as j
	), t2 AS (
		SELECT COUNT(*) + 1 c FROM t1 
	  WHERE try_cast(j AS float) <= v
	) FROM t2 
	SELECT IF(v IS NULL, NULL, c) 
) ;

--Usage
FROM 'https://raw.githubusercontent.com/thewiremonkey/factbook.csv/master/data/c2127.csv'
SELECT name, value, discretize(value, [2,3,4,5]) AS class ;

Copy code

Éric Mauvière

Expand

Share link