Squeezing performance out maps with lots of polygons (Leaflet/PostGIS)
"We built a complicated map, and it's super slow"
That's what they said (our client). I'm looking at a map with lots of polygons drawn on top of regional borders filled with different colors to represent different data points. e.g:
The original map took forever to build using ArcGIS and was not very performant in the low bandwith (~500 kb/s) environments typical of the every day reality for many of our users. The polygon colors were also only editable by GIS programmers, not in the browser. We said, "Let us take a stab at this", and they let us. First they needed to hand over the shape files for all of the various country, district, and sub-district borders, exported from their in-house GIS editors. They gave us four Files:
Each of the four files represents a level of granularity with classifying areas on the map. Admin0 would represent all country borders, while Admin1 would be states, Admin2 would be counties, and so on.
By golly, these files are big!
We knew this off the bat and that load times would be an issue. With this in mind we decided to take a page out of the video game designers playbook and use LoD Methodology in which you "smooth" or "generalize" the polygons at a zoomed out level so that they have less points and in turn, yield smaller JSON payloads that load faster. More about this later.
First, we needed a way to parse the geometries out of these massive JSON files and store them in a PostGIS database. You have to get a little bit tricky with this as huge files don't play well within the standard use case of many JSON parsers. We used a combination of Event Machine, the yajl-ffi gem, and the activerecord-postgis-adapter gem.
I wish I had time to go into a bit more detail with this. I will if there is demand for it. Drop us a line if you're looking for more on how to parse!
Whew, my polygons are in the database how do I display them?
These basic examples work just fine for small payloads of mapped data
In our case for a bad connection (~500 kb/s), displaying all of the full resolution polygons in Nigeria for example, took around 3 minutes. Which is awful obviously. This is where the LoD methodology comes in handy.
The reason the payloads are so big is simple. Each polygon is represented by a series of [lat,lng] coordinates. Each time the border of a polygon changes direction (for example, along a river) there is a new set of coordinates. This results in tens of thousands of coordinates for a given polygon. Dial it up to 811 polygons (the number of regions we are trying to display in our prototype), and you've got a beast of a payload on your hands. A 16Mb beast to be exact. Now 16Mb is not a huge deal for a good connection, but it's a different story for a bad connection.
Here's a screenshot of the full resolution page load. pay attention to the payload size (~16Mb):
Level of Detail, PostGIS, and Materialized Views
We found a combination of these three to be the sweet spot for getting the payload down to a reasonable size. The strategy is as follows:
For each polygon in our geometries table, we created a separate entry in a materialized view called geometries_simplified. This entry contains multiple polygons associated with different tolerance levels or resolution.
Here is the code used to create the materialized view:
CREATE MATERIALIZED VIEW geometries_simplified AS
id as geometry_id,
ST_Simplify(geometry, 0.1) as level_1,
ST_Simplify(geometry, 0.01) as level_2,
ST_Simplify(geometry, 0.001) as level_3,
ST_Simplify(geometry, 0.0001) as level_4
ST_Simplify is a built in PostGIS function that uses the Douglas Peucker Algorithm to "smooth" or "generalize" the polygon along a linear path by removing more minor angle vertices (or changes in direction of the shape). So essentially the higher tolerance level, the more aggressively the algorithm will try to remove points.
One caveat with this algorithm is some polygons just cannot be generalized unless its a super low tolerance level. The last level (level_4), with a super low tolerance (.0001) seemed to be the catch-all for my set of polygons, but at that tolerance level, hardly any points get removed (~1%).
Here's what the first level geometry looked like, pay attention to the payload size (~500kb):
Even though this yielded the smallest payload (~500kb) we decided that the polygons were too ugly to use. The sweet spot for us was level_2 which uses a tolerance of .01, this seemed to yield the prettiest polygons with a still very small payload (~1Mb). With this final tolerance level we were able to drop the load time on a (~500 kb/s) connection from ~3 minutes (~16Mb Payload), to ~20 seconds (1Mb Payload).
I definitely plan on adding another post outlining what we do moving forward as this is just a prototype phase. The future may or may not hold caching, displaying different resolutions based on network speed, loading spinners overlaid on the map, and spatial queries to only load polygons within the map frame.
Stay tuned for more on this!
Jan. 18 2017