# Topological simplification of simple features

## The case for topology

Simple feature representations of polygon geometries are ubiquitous due to their ease of use. Thinking of spatial features as having a single, independent geometry is easy and fits most use cases. But that ease of use disappears when we need to represent the topological relationship between features.

In this article, I'll focus on one particular task with simple features data that would benefit from topology - namely simplifying a polygon dataset by removing vertices. Here's the original dataset, a 30+MB shapefile with very dense line work.

Geometries *can* be simplified under the Simple Features model but, since
each geometry is processed independently, the **topological relationships between
features can be disrupted**. For instance, using the `Simplify Geometries`

tool in
QGIS, I can simplify the polygons dramatically but we see gaps between polygons
and other side effects.

## The plan

Because we'll need to *build* topology before acting on it, the process for simplifying
simple features datasets involves converting the data to topological structure,
simplifying it, then converting it back to a simple features representation.

Many of the big GIS systems (ESRI's .e00, ArcInfo "coverages", and GRASS vectors) have their own topological data structures. More recently, we've seen the rise of Open Street Map (OSM) format and TopoJSON, both of which model topological relationships.

Of these options, I selected TopoJSON because of it's robust command-line tool which handles building topology and simplification in one step. Additionally, it works with GeoJSON and Shapefile inputs, two of the most common data formats for simple features.

The workflow goes something like this:

- Convert data into a shapefile with the EPSG:4326 spatial reference (lonlat, wgs84)
- Convert to topojson and simplify
- Convert to geojson
- Optionally, convert geojson to other formats supported by OGR

To follow along, you'll need to have the following software installed:

- GDAL command line utilities (we'll use
`ogr2ogr`

at the command line)`apt-get install gdal-bin`

- The
`topojson`

command line utility`npm install -g topojson`

- Python with the
`shapely`

package installed.`pip install shapely`

## Step 1: Convert to WGS84 shapefile

If you're already working with an ESRI Shapefile or GeoJSON format and your data is already in unprojected WGS84 coordinates (i.e. EPSG:4326), you can skip to step 2.

Otherwise, `ogr2ogr`

makes that conversion simple:

```
ogr2ogr -t_srs epsg:4326 -f "ESRI Shapefile" \
ecoregions_original.shp EcoregionSummaries3.gdb.zip EcoRegions
```

## Step 2: Convert to TopoJSON and simplify

The simplification, quantization (more on that later) and the conversion to
a topological data model are handled by `topojson`

You have two options for specifying how aggressively you want to simplify your data.

- Use a tolerance, specified in steridians with the
`-s`

flag - Use a proportion of points, 0 to 1, to retain with the
`--simplify-proportion`

flag

One quirk of the topojson implementation is that it uses a relatively low quantization factor by default.
Effectively, this snaps coordinates to a grid in order to save space and simplify geometries even further.
This yields nice small coordinates but can result in a "stair step" effect at higher
zoom levels. The default is `-q 1E4`

but I've found good results with `-q 1E6`

as
recommended in the topojson docs.

As an example, let's take our `ecoregions_original.shp`

and convert it to topojson
with a tolerance of `1E-8`

steridians. We want to make sure we explicitly mention
that the data is in spherical (unprojected) coordinates and to retain the properties
of the original attribute table:

```
topojson --spherical \
--properties \
-s 1E-8 \
-q 1E6 \
-o temp.topojson \
ecoregions_original.shp
```

## Step 3: Convert to GeoJSON

This part was a bit trickier than I anticipated. Luckily Sean Gillies has written some preliminary python functions for converting topojson geometries to standard GeoJSON-like python dictionaries.

In order to make a higher-level conversion utility, I started working on topo2geojson.py which provides a command line interface to perform TopoJSON to GeoJSON conversions.

```
python topo2geojson.py temp.topojson ecoregions_simple.geojson
```

There is some additional logic to ensure validity of polygons though it is very basic and I'm sure there are ways to make the geometry conversions more robust. Please note that I've only tested this script on this one dataset and it likely needs additional work to be considered a full-fledged conversion tool; consider it more of a starting point than an out-of-the box solution.

## Optional Step 4: Convert to any OGR format

Once data is in GeoJSON format, we're free to do what we want with it, including converting it back to a shapefile or any other OGR supported data format.

```
ogr2ogr -f "ESRI Shapefile" ecoregions_simple.shp ecoregions_simple.geojson OGRGeoJson
```

# Case study: evaluating simplification tolerances

In the remainder of this article, I'll walk through a demonstration of these steps
in order to find an optimal simplification tolerance for my test data. The optimal
tolerance depends on your needs, what scales you will be using your data and how
aggressively you need to reduce file size. Ultimately, it's a ** tradeoff between
low geometry size and accurate line work**.

We can easily script this solution in order to test multiple simplification tolerances. As a bonus, we can fire off multiple iterations at once to leverage multiple cores. Since I've got 4 cores on my laptop, I can run 4 processes in nearly the same time it takes to run 1 using some simple shell tricks (Linux/OSX only; sorry Windows users but I don't know .bat files well enough to demonstrate)

```
for tolerance in 1E-7 1E-8 1E-9 1E-10
do
topojson --spherical \
--properties \
-s $tolerance \
-q 1E6 \
-o temp_$tolerance.topojson \
ecoregions_original.shp &&
# Convert it to GeoJSON
python topo2geojson.py temp_$tolerance.topojson temp_$tolerance.geojson &&
# Optionally, convert GeoJSON to any OGR data source
ogr2ogr -f "ESRI Shapefile" ecoregions_$tolerance.shp temp_$tolerance.geojson OGRGeoJson &
done
wait
```

Then we can take a look at the resulting .topojson file sizes

```
$ ls -lh *.topojson
-rw-rw-r-- 1 mperry mperry 4.5M Jan 11 12:25 temp_1E-10.topojson
-rw-rw-r-- 1 mperry mperry 2.1M Jan 11 12:25 temp_1E-9.topojson
-rw-rw-r-- 1 mperry mperry 869K Jan 11 12:25 temp_1E-8.topojson
-rw-rw-r-- 1 mperry mperry 362K Jan 11 12:25 temp_1E-7.topojson
```

OK, so with a simplification tolerance of 1E-10 steridians, we can get a 4.5M file. If we reduce it to 1E-7, we can get 362k file - a 12.5x reduction. Is the reduction in file size worth the reduction in geometric accuracy? The only way to find out is to render maps of the resulting datasets and visually assess them.

Original | 1E-7 | 1E-8 | 1E-9 | |
---|---|---|---|---|

First thing that we notice - all of the results have retained topology with no gaps or slivers introduced. (the key benefit to this workflow).

Next, we notice that at this scale (roughly 1:500k on my monitor) we can barely see a difference between the 1E-9 version and the original. And the 1E-7 version looks a bit too simplified and chunky. So, in this case, we can say that a simplification tolerance of around 1E-8 steridians is an optimal balance of file size and detail.

Of course other datasets, scales and uses may have completely different results so please try it out and let me know how it goes. Just don't settle for simple features simplification next time you need to reduce file sizes!