Back to Code Snippets
KMeans on one dimensional data with recursive CTESQL
Compute kmeans thresholds from a table with 2 columns : id (unique) and numeric. Outputs a list. Easy to extend to 2 dimensions data (x,y).
Execute this SQL
CREATE FUNCTION kmeans(tname, idcol_name, numcol_name, bins:=5, maxiter:=100) AS ( WITH RECURSIVE clusters(iter, cid, x) AS ( WITH t1 AS (FROM query_table(tname) SELECT idcol_name AS id, numcol_name AS x) (SELECT 0, id, x FROM t1 LIMIT bins-1) UNION ALL SELECT iter + 1, cid, avg(px) FROM ( SELECT iter, cid, p.x as px, rank() OVER (PARTITION BY p.id ORDER BY (p.x-c.x)^2, c.x^2) r FROM t1 p, clusters c ) x WHERE x.r = 1 and iter < maxiter GROUP BY ALL ) SELECT list(x) FROM (FROM clusters WHERE iter = maxiter ORDER BY x) ) ;
Copy code
Éric Mauvière
Expand
Share link