Map-Matching for Trajectory Prediction | by João Paulo Figueira | Jul, 2023

The place are you going? Do you have to be going that manner?

Picture by Mateusz Wacławek on Unsplash

This text presents a technique to foretell car trajectories on a digital highway community utilizing a database of previous journeys sampled from noisy GPS sensors. Apart from predicting future instructions, this methodology additionally assigns a likelihood to an arbitrary sequence of places.

Central to this concept is utilizing a digital map unto which we mission all sampled places by aggregating them into particular person trajectories and matching them to the map. This matching course of discretizes the continual GPS area into predetermined places and sequences. After encoding these places into distinctive geospatial tokens, we will extra simply predict sequences, consider the likelihood of present observations and estimate future instructions. That is the gist of this text.

What issues am I attempting to resolve right here? If you’ll want to analyze car path knowledge, you may must reply questions like these within the article’s sub-heading.

The place are you going? Do you have to be going that manner?

How do you consider the likelihood that the trail below commentary follows regularly traveled instructions? This is a vital query as, by answering it, you can program an automatic system to categorise journeys in accordance with their noticed frequency. A brand new trajectory with a low rating would trigger concern and immediate quick flagging.

How do you expect which maneuvers the car will do subsequent? Will it hold going straight forward, or will it flip proper on the subsequent intersection? The place do you count on to see the car within the subsequent ten minutes or ten miles? Fast solutions to those questions will help a web-based monitoring software program resolution in offering solutions and insights to supply planners, on-line route optimizers, and even alternative charging methods.

The answer I’m presenting right here makes use of a database of historic trajectories, every consisting of a timed sequence of positions generated by the movement of a particular car. Every positional file should comprise time, place data, a reference to the car identifier, and the trajectory identifier. A car has many trajectories, and every trajectory has many positional information. A pattern of our enter knowledge is depicted in Determine 1 beneath.

Determine 1 — The desk above exhibits a small pattern of a trajectory from the Extended Vehicle Energy Dataset. Though every row accommodates extra properties than those displayed, we solely want the implicit order and the places. Notice that there are numerous duplicated places because of the sampling technique. We’ll deal with this concern in a while. (Picture supply: Writer)

I drew the information above from the Extended Vehicle Energy Dataset (EVED) [1] article. You may construct the corresponding database by following the code in one in every of my earlier articles.

Our first job is to match these trajectories to a supporting digital map. The aim of this step is just not solely to get rid of the GPS knowledge sampling errors however, most significantly, to coerce the acquired journey knowledge to an present highway community the place every node and edge are identified and stuck. Every recorded trajectory is thus transformed from a sequence of geospatial places into one other sequence of numeric tokens coinciding with the prevailing digital map nodes. Right here, we are going to use open-sourced knowledge and software program, with map knowledge sourced from OpenStreetMap (compiled by Geofabrik), the Valhalla map-matching bundle, and H3 because the geospatial tokenizer.

Edge Versus Node Matching

Map-matching is extra nuanced than it’d take a look at first sight. As an example what this idea entails, allow us to take a look at Determine 2 beneath.

Determine 2 — The map above exhibits a loud trajectory taken from the EVED in blue. As you possibly can see, it doesn’t match the closest highway and desires matching to the map. As soon as we mission the blue line vertices to the map, we receive a sequence of projections of the unique factors to the inferred map edges, and you’ll see the ensuing trajectory in purple. Nonetheless, this path misses the underlying map in some locations, most notably within the picture’s heart, the place the purple line jumps between roads. We goal to reconstruct the journey’s path on the map, as represented by the inexperienced line, that follows the underlying map nodes. (Picture supply: Writer utilizing Folium and OpenStreetMap imagery)

Determine 2 above exhibits that we will derive two trajectories from an unique GPS sequence. We receive the primary trajectory by projecting the unique GPS places into the closest (and most certainly) highway community segments. As you possibly can see, the ensuing polyline will solely typically comply with the highway as a result of the map makes use of graph nodes to outline its fundamental shapes. By projecting the unique places to the map edges, we get new factors that belong to the map however might stray from the map’s geometry when linked to the following ones by a straight line.

By projecting the GPS trajectory to the map nodes, we get a path that completely overlays the map, as proven by the inexperienced line in Determine 2. Though this path higher represents the initially pushed trajectory, it doesn’t essentially have a one-to-one location correspondence with the unique. Luckily, this will likely be advantageous for us as we are going to all the time map-match any trajectory to the map nodes, so we are going to all the time get coherent knowledge, with one exception. The Valhalla map-matching code all the time edge-projects the preliminary and ultimate trajectory factors, so we are going to systematically discard them as they don’t correspond to map nodes.

H3 Tokenization

Sadly, Valhalla doesn’t report the distinctive highway community node identifiers, so we should convert the node coordinates to distinctive integer tokens for later sequence frequency calculation. That is the place H3 enters the image by permitting us to encode the node coordinates right into a sixty-four-bit integer uniquely. We decide the Valhalla-generated polyline, strip the preliminary and ultimate factors (these factors aren’t nodes however edge projections), and map all remaining coordinates to level 15 H3 indices.

The Twin Graph

Utilizing the method above, we convert every historic trajectory right into a sequence of H3 tokens. The subsequent step is to transform every trajectory to a sequence of token triplets. Three values in a sequence symbolize two consecutive edges of the prediction graph, and we wish to know the frequencies of those, as they would be the core knowledge for each the prediction and the likelihood evaluation. Determine 3 beneath depicts this course of visually.

Determine 3 — The record of geospatial tokens on the left is expanded to a different record of triplets, representing a twin imaginative and prescient of the implicit graph. Every token is a node on the geospatial graph, and its sequence represents the sides. The remodeled record considers every edge a node within the twin graph, and the center token is the brand new edge, as proven in the correct column. (Picture supply: Writer)

The transformation above computes the twin of the highway graph, reversing the roles of the unique nodes and edges.

We are able to now begin to reply the proposed questions.

Do you have to be going that manner?

We have to know the car trajectory as much as a given second to reply this query. We map-match and tokenize the trajectory utilizing the identical course of as above after which compute every trajectory triplet frequency utilizing the identified historic frequencies. The ultimate result’s the product of all particular person frequencies. If the enter trajectory has an unknown triplet, its frequency will likely be zero as the ultimate path likelihood.

A triplet likelihood is the ratio of counts of a particular sequence (A, B, C) to the rely of all (A, B, *) triplets, as depicted in Determine 4 beneath.

Determine 4 — The triplet likelihood is the ratio of its frequency to the frequency of all triplets with the identical two preliminary tokens. (Picture supply: Writer)

The journey likelihood is simply the product of particular person journey triplets, as depicted in Determine 5 beneath.

Determine 5 — The journey likelihood is the straightforward product of all matched triplets. (Picture supply: Writer)

The place are you going?

We use the identical ideas to reply this query however begin with the final identified triplet solely. We are able to predict the okay most certainly successors utilizing this triplet as enter by enumerating all triplets which have as their first two tokens the final two of the enter. Determine 6 beneath illustrates the method for triplet sequence technology and analysis.

Determine 6 — On this fictitious case, the following most certainly triplet is the one with the very best noticed frequency: (B, C, D). (Picture supply: Writer)

We are able to extract the highest okay successor triplets and repeat the method to foretell the most certainly journey.

We’re prepared to debate the implementation particulars, beginning with map-matching and a few related ideas. Subsequent, we are going to see find out how to use the Valhalla toolset from Python, extract the matched paths and generate the token sequences. The information preprocessing step will likely be over as soon as we retailer the consequence within the database.

Lastly, I illustrate a easy consumer interface utilizing Streamlit that calculates the likelihood of any hand-drawn trajectory after which tasks it into the longer term.


Map-matching converts GPS coordinates sampled from a shifting object’s path into an present highway graph. A highway graph is a discrete mannequin of the underlying bodily highway community consisting of nodes and connecting edges. Every node corresponds to a identified geospatial location alongside the highway, encoded as a latitude, longitude, and altitude tuple. Every directed edge connects adjoining nodes following the underlying highway and accommodates many properties such because the heading, most pace, highway kind, and extra. Determine 7 beneath illustrates the idea with an easy instance.

Determine 7 — The image above exhibits a tiny digital highway community highlighting an intersection. Every purple dot represents a identified geospatial location alongside the prevailing highway. The blue strains symbolize the connecting edges between the nodes. Notice that these edges are often directed and may also be a number of. (Picture supply: Writer)

When profitable, the map-matching course of produces related and beneficial data on the sampled trajectory. On the one hand, the method tasks the sampled GPS factors to places alongside the most certainly highway graph edges. The map-matching course of “corrects” the noticed spots by squarely inserting them over the inferred highway graph edges. Then again, the strategy additionally reconstructs the sequence of graph nodes by offering the most certainly path by way of the highway graph similar to the sampled GPS places. Notice that, as beforehand defined, these outputs are completely different. The primary output accommodates coordinates alongside the edges of the most certainly path, whereas the second output consists of the reconstructed sequence of graph nodes. Determine 8 beneath illustrates the method.

Determine 8 — The diagram above illustrates the map-matching course of, the place the inexperienced dots symbolize the noticed GPS coordinates, and the orange diamonds are the projected places alongside the identified edges. Notice that, for the simplified instance above, we will solely safely reconstruct the trail between nodes 2 and three. This predicament is just not as dire because it seems to be as a result of, in precise maps, trajectories match many extra edges than only one, so the data loss is minimal. (Picture supply: Writer)

A byproduct of the map-matching course of is the standardization of the enter places utilizing a shared highway community illustration, particularly when contemplating the second output kind: the most certainly sequence of nodes. When changing sampled GPS trajectories to a collection of nodes, we make them comparable by decreasing the inferred path to a collection of node identifiers. We are able to consider these node sequences as phrases of a identified language, the place every inferred node identifier is a phrase, and their association conveys behavioral data.

That is the fifth article the place I discover the Extended Vehicle Energy Dataset¹ (EVED) [1]. This dataset is an enhancement and overview of prior work and gives the map-matched variations of the unique GPS-sampled places (the orange diamonds in Determine 8 above).

Sadly, the EVED solely accommodates the projected GPS places and misses the reconstructed highway community node sequences. In my earlier two articles, I addressed the problem of rebuilding the highway phase sequences from the remodeled GPS places with out map-matching. I discovered the consequence considerably disappointing, as I anticipated lower than the noticed 16% of faulty reconstructions. You may comply with this dialogue from the articles beneath.

Now I’m wanting on the supply map-matching device to see how far it may go in correcting the faulty reconstructions. So let’s put Valhalla by way of its paces. Under are the steps, references, and code I used to run Valhalla on a Docker container.

Valhalla Setup

Right here I carefully comply with the directions offered by Sandeep Pandey [2] on his weblog.

First, just be sure you have Docker put in in your machine. To put in the Docker engine, please comply with the online instructions. Should you work on a Mac, a terrific different is Colima.

As soon as put in, you will need to pull a Valhalla picture from GitHub by issuing the next instructions at your command line, because the shell code in Determine 9 beneath depicts.

Determine 9 — Pulling Valhalla’s docker picture from the command line. (Picture supply: Writer)

Whereas executing the above instructions, you could have to enter your GitHub credentials. Additionally, guarantee you might have cloned this text’s GitHub repository, as some recordsdata and folder constructions seek advice from it.

As soon as finished, it’s best to open a brand new terminal window and concern the next command to begin the Valhalla API server (MacOS, Linux, WSL):

Determine 10 — The above command runs the pulled Valhalla picture in a Docker container. Throughout first-time execution, the command additionally downloads and prepares the newest Geofabrik Michigan knowledge file earlier than beginning. (Picture supply: Writer)

The command line above explicitly states which OSM file to obtain from the Geofabrik service, the newest Michigan file. This specification signifies that when executed the primary time, the server will obtain and course of the file and generate an optimized database. In subsequent calls, the server omits these steps. When wanted, delete every little thing below the goal listing to refresh the downloaded knowledge and spin up Docker once more.

We are able to now name the Valhalla API with a specialised consumer.

Enter PyValhalla

This spin-off mission merely affords packaged Python bindings to the incredible Valhalla project.

Utilizing the PyValhalla Python bundle is sort of easy. We begin with a neat set up process utilizing the next command line.

Determine 11 — You may set up PyValhalla utilizing PIP. (Picture supply: Writer)

In your Python code, you will need to import the required references, instantiate a configuration from the processed GeoFabrik recordsdata and eventually create an Actor object, your gateway to the Valhalla API.

Determine 12 — The code above exhibits how simple it’s to arrange PyValhalla on a Python utility or pocket book. (Picture supply: Writer)

Earlier than we name the Meili map-matching service, we should get the trajectory GPS places utilizing the perform listed beneath in Determine 13.

Determine 13 — The perform above masses the distinctive positions of a car’s trajectory, returning a Pandas DataFrame with latitude, longitude, and timestamp. (Picture supply: Writer)

We are able to now arrange the parameter dictionary to cross into the PyValhalla name to hint the route. Please seek advice from the Valhalla documentation for extra particulars on these parameters. The perform beneath calls the map-matching characteristic in Valhalla (Meili) and is included within the data preparation script. It illustrates find out how to decide the inferred route from a Pandas knowledge body containing the noticed GPS places encoded as latitude, longitude, and time tuples.

Determine 14 — The perform above accepts a PyValhalla Actor object and a Pandas DataFrame containing the supply path and returns a map-matched string-encoded polyline. This string is later decoded into an inventory of geospatial places similar to the digital map community nodes, aside from the extremities, that are edge-projected. (Picture supply: Writer)

The above perform returns the matched path as a string-encoded polyline. As illustrated within the knowledge preparation code beneath, we will simply decode the returned string utilizing a PyValhalla library name. Notice that this perform returns a polyline whose first and final places are projected to edges, not graph nodes. You will note these extremities eliminated by code later within the article.

Allow us to now take a look at the information preparation part, the place we convert all of the trajectories within the EVED database right into a set of map edge sequences, from the place we will derive sample frequencies.

Knowledge preparation goals at changing the noisy GPS-acquired trajectories into sequences of geospatial tokens similar to identified map places. The primary code iterates by way of the prevailing journeys, processing one by one.

On this article, I take advantage of an SQLite database to retailer all the information processing outcomes. We begin by filling the matched trajectory path. You may comply with the outline utilizing the code in Determine 15 beneath.

Determine 15 — The code above accommodates the preprocessing knowledge loop. This loop iterates by way of the identified trajectories, computes their map-matched paths (if any), tokenizes the nodes, and expands them into triplets. The code shops all middleman and ultimate leads to the database. (Picture supply: Writer)

For every trajectory, we instantiate an object of the Actor kind (line 9). That is an unspoken requirement, as every name to the map-matching service requires a brand new occasion. Subsequent, we load the trajectory factors (line 13) acquired by the automobiles’ GPS receivers with the added noise, as acknowledged within the unique VED article. In line 14, we make the map-matching name to Valhalla, retrieve the string-encoded matched path, and reserve it to the database. Subsequent, we decode the string into an inventory of geospatial coordinates, take away the extremities (line 17) after which convert them to an inventory of H3 indices computed at degree 15 (line 19). On line 23, we save the transformed H3 indices and the unique coordinates to the database for later reverse mapping. Lastly, on strains 25 to 27, we generate a sequence of 3-tuples primarily based on the H3 indices record and save them for later inference calculations.

Let’s undergo every of those steps and clarify them intimately.

Trajectory Loading

Now we have seen find out how to load every trajectory from the database (see Determine 13). A trajectory is a time-ordered sequence of sampled GPS places encoded as a latitude and longitude pair. Notice that we aren’t utilizing the matched variations of those places as offered by the EVED knowledge. Right here, we use the noisy and unique coordinates as they existed within the preliminary VED database.

Map Matching

The code that calls the map-matching service is already offered in Determine 14 above. Its central concern is the configuration settings; apart from that; it’s a fairly easy name. Saving the ensuing encoded string to the database can be easy.

Determine 16 — The code above saves the encoded polyline string to the brand new database. (Picture supply: Writer)

On line 17 of the primary loop (Determine 15), we decode the geometry string into an inventory of latitude and longitude tuples. Notice that that is the place we strip out the preliminary and ultimate places, as they aren’t projected to nodes. Subsequent, we convert this record to its corresponding H3 token record on line 19. We use the utmost element degree to attempt to keep away from overlaps and guarantee a one-to-one relationship between H3 tokens and map graph nodes. We insert the tokens within the database within the following two strains. First, we save the entire token record associating it to the trajectory.

Determine 17 — The perform above inserts the trajectory H3 token record within the database. (Picture supply: Writer)

Subsequent, we insert the mapping of node coordinates to H3 tokens to allow drawing polylines from a given record of tokens. This characteristic will likely be useful in a while when inferring future journey instructions.

Determine 18 — We insert a mapping between H3 tokens and node coordinates to allow the reconstruction of a trajectory from given inferred tokens. (Picture supply: Writer)

We are able to now generate and save the corresponding token triples. The perform beneath makes use of the newly generated record of H3 tokens and expands it to a different record of triples, as detailed in Determine 3 above. The growth code is depicted in Determine 19 beneath.

Determine 19 — The code above converts an inventory of H3 tokens into an inventory of the corresponding triples. (Picture supply: Writer)

After triplet growth, we will lastly save the ultimate product to the database, as proven by the code in Determine 20 beneath. By way of intelligent querying of this desk, we are going to infer present journey possibilities and future most-likely trajectories.

Determine 20 — The perform above saves the H3 triples to the database. That is the ultimate step of the information preparation part. We are able to now transfer on to exploring the data we collected. (Picture supply: Writer)

We are actually finished with one cycle of the information preparation loop. As soon as the outer loop is accomplished, we’ve got a brand new database with all of the trajectories transformed to token sequences that we will discover at will.

You will discover the entire data preparation code within the GitHub repository.

We now flip to the issue of estimating present journey possibilities and predicting future instructions. Let’s begin by defining what I imply by “present journey possibilities.”

Journey Possibilities

We begin with an arbitrary path projected into the highway community nodes by way of map-matching. Thus, we’ve got a sequence of nodes from the map and wish to assess how possible that sequence is, utilizing as a frequency reference the identified journey database. We use the method in Determine 5 above. In a nutshell, we compute the product of the chances of all particular person token triplets.

As an example this characteristic, I applied a easy Streamlit application that enables the consumer to attract an arbitrary journey over the lined Ann Arbor space and instantly compute its likelihood.

As soon as the consumer attracts factors on the map representing the journey or the hypothetical GPS samples, the code map matches them to retrieve the underlying H3 tokens. From then on, it’s a easy matter of computing the person triplet frequencies and multiplying them to compute the whole likelihood. The perform in Determine 21 beneath computes the likelihood of an arbitrary journey.

Determine 21 — The perform above computes an arbitrary path likelihood from the triplet frequency database. (Picture supply: Writer)

The code will get assist from one other perform that retrieves the successors of any present pair of H3 tokens. The perform listed beneath in Determine 22 queries the frequency database and returns a Python Counter object with the counts of all successors of the enter token pair. When the question finds no successors, the perform returns the None fixed. Notice how the perform makes use of a cache to enhance database entry efficiency (code not listed right here).

Determine 22 — The perform above queries the frequency database for the identified successors of any pair of H3 tokens and returns a Counter object with the counts of all successors. (Picture supply: Writer)

I designed each features such that the computed likelihood is zero when no identified successors exist for any given node.

Allow us to take a look at how we will predict a trajectory’s most possible future path.

Predicting Instructions

We solely want the final two tokens from a given operating journey to foretell its most certainly future instructions. The thought includes increasing all of the successors of that token pair and deciding on essentially the most frequent ones. The code beneath exhibits the perform because the entry level to the instructions prediction service.

Determine 23 — The perform above populates a FeatureGroup object from Folium with the expected paths of the prevailing user-provided journey. (Picture supply: Writer)

The above perform begins by retrieving the user-drawn trajectory as an inventory of map-matched H3 tokens and extracting the final pair. We name this token pair the seed and can develop it additional within the code. At line 9, we name the seed-expansion perform that returns an inventory of polylines similar to the enter growth standards: the utmost branching per iteration and the whole variety of iterations.

Allow us to see how the seed growth perform works by following the code listed beneath in Determine 24.

Determine 24 — The seed growth perform makes use of the PredictedPath class to handle every iteration. Please see beneath for extra particulars on this class. (Picture supply: Writer)

By calling a path growth perform that generates the most effective successor paths, the seed growth perform iteratively expands paths, beginning with the preliminary one. Path growth operates by choosing a path and producing essentially the most possible expansions, as proven beneath in Determine 25.

Determine 25 — The trail growth perform above iterates essentially the most frequent successors to the present path. It creates a brand new path for every of essentially the most frequent successors utilizing a specialised perform (see beneath). (Picture supply: Writer)

The code generates new paths by appending the successor nodes to the supply path, as proven in Determine 26 beneath.

Determine 26 — To generate a “baby” path, we solely must append the successor node to an present path, as proven beneath. Notice that the code creates a replica of the unique path earlier than appending the brand new node. (Picture supply: Writer)

The code implements predicted paths utilizing a specialised class, as proven in Determine 27.

Determine 27 — The category above implements a predicted path with likelihood sorting assist, creation from a seed token pair, and map polyline technology. (Picture supply: Writer)

We are able to now see the ensuing Streamlit utility in Determine 28 beneath.

The right way to Chunk Textual content Knowledge — A Comparative Evaluation | by Solano Todeschini | Jul, 2023

Methods to Take a look at Widespread Machine Studying Duties with Recent Eyes | by TDS Editors | Jul, 2023