Making Interactive maps in Python using GeoJSON and GitHub
Recently, I’ve been involved with the AncientMetagenomeDir project. Briefly, with this collaborative community effort, we aimed to regroup in one single repository, all the metadata about every single published ancient DNA metagenomics article, and turn them into FAIR scientific data.
We ended with large
TSV (table) files regrouping a standardized set of metadata, about each ancient DNA metagenomics sample. Because these are originally archeological data, one of the information that is systematically collected is the geographical location of each sample.
Usually, hosting interactive elements online require some sort of backend framework (like Streamlit or Shiny) to perform the rendering, however, I wanted to have it as serverless as possible, and this is where the
GeoJSON rendering function of GitHub came to the rescue.
From TSV to GeoJSON
The question that I was left with: How to go from a
TSV table to a
GeoJSON file ?
Luckily for me, this is really easy to do thanks to GeoPandas.
I only needed to make sure that there were a
longitude columns in the
import pandas as pd import geopandas df = pd.read_csv("table.tsv", sep="\t") gdf = geopandas.GeoDataFrame(df, geometry=geopandas.points_from_xy(df.longitude, df.latitude)) gdf.to_file("output.geo.json", driver='GeoJSON')
Instead of pushing to GitHub at every change to check a GeoJSON rendering, you can check a GeoJSON map with the geojson.io website.
Displaying more metadata on the map
So far, I only used the map to display the latitude and longitude of each sample, but we can actually display more information by changing the color, size, or the shape of each marker(point) for example.
Refering again the Github documentation, this corresponds to the
For example, to change the color, we add a
marker-color column with the desired color value.
|#009C54||10.1016/j.quascirev.2017.11.037||Hässeldala Port||56.16||15.01||HA1.1||13900||lake sediment||ENA||SRS2040659|
|#C22026||10.3390/geosciences10070270||Unknown||53.322||1.118||ELF001A_95_S81_ELFM1D1||6000||shallow marine sediment||ENA||ERS3605424|
Table 1: Sample data from the AncientMetagenomeDir repository
Here, markers in pink are host-associated single genomes, while markers in light-blue are host-associated metagenomes.
Preventing overlapping points
In this dataset, different samples are sometimes coming from a same archeological site. In practice, this means that points will overlap on the map because they share the exact same geographic coordinates. In Figure 2, for example, you can notice a very dark shadow bellow each marker: that’s because there are many overlapping markers present on the spot.
The problem is that only one marker will be displayed, and the other ones being hidden below.
To overcome this issue, the little trick is to slightly alter the coordinates of each sample to plot them as distinct points on the map. I did that with random sampling from the normal distribution using Numpy with a very small standard deviation.
import pandas as pd import geopandas import numpy as np df = pd.read_csv("table.tsv", sep="\t") sigma = 0.0015 df['new_latitude'] = df['latitude'].apply(lambda x: np.random.normal(x, sigma)) df['new_longitude'] = df['longitude'].apply(lambda x: np.random.normal(x, sigma)) gdf = geopandas.GeoDataFrame(df, geometry=geopandas.points_from_xy(df.new_longitude, df.new_latitude)) gdf.to_file("output.geo.json", driver='GeoJSON')
Problem solved !
Finally, thanks to the magic of GitHub GeoJSON rendering, the map can be easily embedded on any web page !
This is an interactive map, you can click on a marker to look at the details.