Making Nice Maps in Python

Data visualization on top of nicely rendered maps is a common need across a lot of business analytics. Let's look at how we do that in Python.

For our specific use case here, we want to provide a nice visual for customer density. Let’s assume we have a handy dataset of our customers, each of which has a latitude and longitude for their house. We want to start to understand some basic properties about the geospatial arrangement of our customer base. How dense are they?

We’re going to use the Python library tilemapbase for building our map layers here. The API exposed by tilemapbase is fairly quirky and pretty low-level, requiring a fair amount of familiarity with the implementation details to successfully use.

To start, we’re going to import a bunch of modules that will be useful as we build up our end result. Note that the code below will be presented mostly in notebook fashion.

1
2
3
4
5
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import tilemapbase
import logging

Then there’s some boilerplate needed to get things rolling. I’m using Seaborn on top of matplotlib, so we set some theme stuff and then do the one-time dance needed to prepare tilemapbase for further use. We’re using OpenStreetMap as our source for map data.

1
2
3
4
5
6
7
%matplotlib inline
sns.set_theme(style="whitegrid", palette="pastel")

tilemapbase.start_logging()
tilemapbase.init(create=True)
osm = tilemapbase.tiles.build_OSM()
logging.getLogger().setLevel(logging.CRITICAL)

Let’s grab our customer file into a generic Pandas dataframe. For our simple example, let’s assume we have a very basic set of information about each customer. I’ve just mocked up some synthetic data corresponding to North Central Arkansas where I live. In a real example, you probably have many attributes that you would carry through the analysis, but here, we focus on just where the customer lives and what branch they’re assigned to.

1
2
3
4
5
6
7
8
coltypes = {
    'customer_id': str,
    'branch_code': str,
    'branch_name': str,
    'latitude': np.float64,
    'longitude': np.float64
}
df = pd.read_csv(customers.csv', dtype=coltypes)

First, let’s just use this dataset to draw an appropriate region of the map to contain all our data points. To do this, we compute the bounding box that contains our min and max of both latitude and longitude, with a padding factor in each dimension to provide a bit of margin around all of our data points.

1
2
3
4
5
6
7
8
pad = 0.5
bbox = tilemapbase.Extent.from_lonlat(min(df['longitude']) - pad, 
                                      max(df['longitude']) + pad,
                                      min(df['latitude']) - pad,
                                      max(df['latitude']) + pad)
plotter = tilemapbase.Plotter(bbox, osm, width=800)
fig, ax = plt.subplots(figsize=(12, 8))
plotter.plot(ax, osm)
Map of North Central Arkansas

Plotting a basic extent on a map

So we have our map. Let’s now add the individual data points corresponding to each customer.

1
2
3
4
5
6
7
xy = [(tilemapbase.project(lon, lat), br) for lat, lon, br in zip(df['latitude'], df['longitude'], df['branch_code'])]
locs = pd.DataFrame(data={'x': [x[0][0] for x in xy], 'y': [x[0][1] for x in xy], 'Branch': [x[1] for x in xy]})
sns.scatterplot(data=locs, x='x', y='y', alpha=0.7, hue='branch_code', ax=ax)

ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
ax.set_title('Customer Density')
Map of customer locations color-coded by branch

Adding a scatter plot to the map

There’s a bit more going on in this code. We have to project the coordinates from the standard latitude/longitude pairs into the coordinate space used by tilemapbase. To do this, I create a list of tuples. The first element of each tuple is a (lat, lon) pair. The second is a string with the branch code. I then use a list comprehension to extract the result of tilemapbase.project on the coordinates along with the branch code. Then we create a data frame with the transformed coordinates and branch labels and use that to plot our scatter plot. The axis ticks aren’t especially useful here, so we hide them and put a title up.

We could continue further by, e.g., adding a density plot on top of the scatter plot.

1
sns.kdeplot(data=locs, x='x', y='y', ax=ax, alpha=0.3, fill=True, gridsize=100)
Map of customer locations color-coded by branch with a kernel density overlay

Adding a density plot to the map

The basic idea with tilemapbase is just to project your coordinates, plot the map using the Plotter object, and then continue to build on the axes using whatever normal matplotlib/seaborn code you like.

Licensed under CC BY-NC-SA 4.0
Built with Hugo
Theme Stack designed by Jimmy