Dienstag, 24. Februar 2015

KNN with PostGIS

This article is basically a note to my future self, in case I need to look up how to do KNN searches with PostGIS again. The technique is taken from the talk "Writing better PostGIS queries" by Regina Obe (given at FOSS4G 2014 in Portland).

The problem statement

Given a set of geometries G1, find the k nearest neighbours for each of these geometries from a second set of geometries G2. For simplicity, I assume G1 and G2 to be sets of points which are stored as simple POINT geometries in PostGIS. Note that G1 and G2 could be the same set, I will use this in the example code.

CREATE TABLE knntst
(
gid serial NOT NULL,
geom geometry,
CONSTRAINT knntst_pk PRIMARY KEY (gid)
);
CREATE INDEX knntst_geom_gist
ON knntst
USING gist
(geom);
INSERT INTO knntst (geom) VALUES ('POINT (0 0)'::geometry), ('POINT (1 0)'::geometry), ('POINT (0 1)'::geometry), ('POINT (1 1)'::geometry), ('POINT (3 0)'::geometry), ('POINT (4 0)'::geometry), ('POINT (3 3)'::geometry), ('POINT (4 5)'::geometry);

Using the KNN operator

There are many ways for solving this problem in PostGIS, but the trouble is to find a way that actually uses the spatial index and that does not require to set a maximum distance beforehand (and that provides the complete output, i.e. all k nearest neighbours for each of the points). The solution is to use the KNN operator "<->", which may be included in the ORDER BY clause. However, one argument to that operator needs to be constant in order to actually take advantage of the spatial index. So, for the given problem, a LATERAL join is required in the following way:
SELECT g1.gid, g2.gid, ST_AsText(g1.geom), ST_AsText(g2.geom), ST_Distance(g1.geom, g2.geom) FROM knntst AS g1 CROSS JOIN LATERAL (SELECT g3.gid, g3.geom FROM knntst AS g3 WHERE g1.gid!=g3.gid ORDER BY g3.geom <-> g1.geom LIMIT 3) AS g2
The output table contains the 3 nearest neighbours to each of the points in the source table, excluding the point itself.