>
Uh, the GIF just looks bad, the DB server is actually fast. Trust Me.
🤫
A few weeks ago I attended an NYC OpenData Week event where about a dozen volunteers worked together to build a small mapping application for a community group. After the event, I wanted to dedicate some effort to generating dynamic map tiles from a DB. Typically one does not generate and serve tiles on the fly. In fact it’s quite discouraged, but with the right planning we can build a serviceable tileserver using nothing but PostgreSQL.
Typically, when one creates a tileserver they either serve tiles from
a directory of individual files, or from a large *.mbtiles
file which references pre-calculated tiles. In either case, any tile can
be accessed by looking up a unique x, y,
z value where and x, and y
represent a location and z represents a zoom level.
When a user makes a request the server accesses the tile stored at
the corresponding X/Y/Z value instead of generating the
tile on the fly. Ignoring the obvious advantages, if we absolutely
insisted on creating tiles from a database and serving them on-the-fly,
how could we accomplish that?
“Therefore, if your data does not very change quickly I would almost always suggest using prepossessing over dynamic rendering of tiles. You might spend more effort maintaining something than you expect if you start using PostGIS to create tiles on demand over existing tiling tools.” — A poster on HN (lmao)
I downloaded and wrote data from a sample file (TIGER
Blocks) into PG running on a AWS t3.micro EC2 instance
(2 VCPU, 1GB RAM) and then stood up a minimal front-end using OpenlayersJS.
# Download 190MB -> Unzip 600MB
wget https://www2.census.gov/geo/tiger/TIGER2020/TABBLOCK20/tl_2020_36_tabblock20.zip &&\
unzip tl_2020_36_tabblock20.zip -d /data/layers/nyblocks &&\
rm tl_2020_36_tabblock20.zip
# Use OGR2OGR to write into the DB
ogr2ogr -overwrite -f "PostgreSQL" PG:"dbname=${PG_DB} user=${PG_USER} host=localhost" \
/data/layers/blocks/ -nln blocks -lco OVERWRITE=yes -t_srs "EPSG:4326" -nlt PROMOTE_TO_MULTIWe can use the PostGIS functions ST_AsMVT and
ST_AsMVTGeom to prepare a vector tile. Overall, we don’t
need to do much to get PostGIS features in the DB represented as
tiles.
WITH mvtgeom as (
SELECT
ST_AsMVTGeom(
st_segmentize(st_simplify(c2.wkb_geometry, $6), $5),
st_transform(ST_MakeEnvelope($1, $2, $3, $4, 3857), 4326),
extent => 4096, buffer => 64
) AS geom
FROM blocks c2
WHERE c2.wkb_geometry &&
st_transform(ST_MakeEnvelope($1, $2, $3, $4, 3857), 4326)
)
SELECT ST_AsMVT(mvtgeom.*) as pbf FROM mvtgeom;Apologies for the hyper-specific PostGIS talk in this post. At time of writing I was living and breathing PostGIS. In effect, I implemented the PostGIS Topology module’s recommendation for simplifying adjacent shapes. I’m scanning through this in Jan. 2024 and thinking about all the interesting other ways this could have been made to go F-A-S-T! Really wish I thought to benchmark this.
Querying blocks was feasible with ST_AsMVT,
but the performance was a bit clunky, I was routinely seeing query times
of >200ms to generate large tiles. This could be improved with an
instance with better CPU, or I could look into performance tuning
PostgreSQL, but I opted to try writing a better query first.
Each block in the dataset is a polygon and because adjacent polygons intersect on their shared edge, the DB performs operations on all common edges twice. This is no good! If I decompose all shapes to edges the same query could produce much smaller tiles much more quickly. To get this effect, I ran the following on each table in the DB.
-- Return the intersection of shapes if & only if they touch (i.e overlap, but w. no
-- shared interior points)
create table bglines as (
SELECT
a.geoid20,
b.geoid20 as neighbor_geoid,
st_intersection(a.wkb_geometry, b.wkb_geometry) as wkb_geometry
FROM blocks a
LEFT JOIN blocks b
on ST_Touches(a.wkb_geometry, b.wkb_geometry)::bool);There are a few outstanding issues:
Tile size is unbounded! Zooming too far out will return tiles >500kb (the recommended tile size max is ~50kb) and the request will take >1s. Oof!
The standard Web-Mercator projection doesn’t use the same constants as the NY state file. I’m certain I missed a transform somewhere that could align these two projections. It’s not pressing, it seems like it makes a difference of just 3-5m.
This reduces the size of the data by almost 40%, makes very little visual difference, and allows our tile generation queries to operate on lines instead of polygons. I was quite happy with the performance given the hardware, lack of caching, and minimal effort put into optimization. Following this update I was seeing reliably seeing ~30-50ms responses from the DB. Admittedly, this is a little visually jarring, and I would never do this “for real” without a better caching layer.