Real-time and near-accurate prediction of the arrivals of public transit vehicles is valuable as it allows passengers to plan their trips better and to improve overall passenger experience. Although a lot of research and studies has gone into this area, predictions fall behind by 20-25% variance, thanks to the quality of data being processed, rising congestion & a dynamic traffic real-time situations which leave the emerging prediction models guessing. Let’s look at what goes into the market-leading routing engines which is being used as the base for any prediction model. Most industry leading engines are proprietary and the algorithms are highly optimized and refined by relying heavily on historic and crowd-sourced data. It considers the whole road network as a graph, where Nodes represent intersections or points, and edges represent road segments. All the related restrictions are modeled as well. However, the known physical parameters considered are mostly region-specific.

This would range from the following:
• Speed limits (recommended vs. official)
• Historical average speed data at a particular time period (averages & time of day)
• Number of stops along the route
• Distance between adjacent stops
• Actual travel times from the location data submitted by previous users
• Real-time crowd sourced traffic data including traffic signals along a particular route

The weightage for historical data from a parameter perspective is significant and other parameters are often correlated with this data. It should be noted here that in areas where there are similar traffic patterns, the algorithms based on extensive historic data analysis will work and shall provide acceptable bus ETA, often without a complex prediction model. However, It is assumed that the commonly used prediction model (by Google, HERE maps) follows a certain pattern and the absence of data or a dynamic change (for e.g. a road crash, traffic signal malfunction, speed limit) from a particular source will result in a less accurate ETA prediction and route planning. Cold start problem is another scenario where there will virtually be zero data to integrate with a known ETA offset in some places with area/region restrictions because of government policies. Evolution of Modern Routing Algorithms The rise in the number of vehicles on our roads has been inversely proportional to the growth of road infrastructure during the past decade. Taking a cue, it had become highly imperative to develop modern routing algorithms integrated to a highly optimized system capable of handling thousands of ETA requests per second with minimal response time.

Let’s have a look at some modern routing algorithms and how they served as a platform and underwent refinement: 

Dijkstra’s Search Algorithm is considered to be the foundation from which the modern routing algorithms were built. However, in its basic form this algorithm slows down when working with unprocessed node/segment combination in a production environment.
Open Source Routing Machine (OSRM) which uses a prediction model called Contraction Hierarchies (CH).Though the model is very effective, updating real-time traffic data (by assigning of weightages in a segment) was often time consuming. Any change in the segments would require the model to go through the pre-processing step again for the whole data set, making it difficult to make real-time traffic updates work.

Uber, which used the basic OSRM model for ETA at pickup locations, further developed and optimized this model. The optimized model was called the ‘Dynamic Contraction Hierarchies’ (also called Gurafu), which updated just the applicable segment (the particular road) when a real-time traffic update came in. This significantly improved the pre-processing time (the re-building time required for the dataset to reflect the changes in ETA) , thus making the ETA almost accurate. However, Gurafu is proprietary to Uber. Bringing in ETA Predictability in using Artificial Neural Networks(ANN) In most public transit applications, there will often be multiple sources of data coming mainly from systems handling vehicle tracking, scheduling & operations. The data from these systems are integrated to a centralized server where the data management and processing of business functions happen. The processing will be done by algorithms which need to be fast and responsive enough to provide quick updates to passengers, in case of a change in schedule or a delay.

Artificial Neural Networks (ANN), actually inspired by biological neural networks is a machine learning technique used to perform certain specific tasks like classification, pattern recognition etc. As mentioned earlier, since modern algorithms use the graph-edge-node-weightage concept, ANN is the most preferred technique for ETA prediction mainly because of its ability to pre-process the sudden changes in the road networks. ANN would also be able to predict travel time without implicitly addressing the physical traffic parameters. Basically to make it simpler, in a road network with multiple transit vehicles, the ANN would be undergoing two stages – The Learning Phase and Recalling Phase, both of which will be a continuous, related activity. In the Learning Phase, the model will be trained with extensive current as well as historical data like weather, speed limits, no. of stops, road conditions etc. (this will be an automated process) with an accepted fault tolerance. This is to equip the model for the first run and as shown in the diagram, the learning phase will be a continuous ongoing activity.

The Recalling Phase implements the weights assigned to a segment during the Learning Phase. In other words, if there is a dynamic change in the real-time traffic data, it’s the recalling phase which re-calculates and transmits the arrival times at a location. Let us take an example of a public transit vehicle plying between Point A & Point B. While moving, the vehicle collects and sends location data to the servers (Learning Phase), which will be the most important variable to analyze the speed and congestion ratio, together with data on weather, stops and signals. These parameters are highly non-linear and the ANN model learns the pattern over time on a particular route. Now when the user queries the ETA of a particular bus on a mobile application, it will be able to display a near-accurate arrival time prediction, with minimum variance since the ANN is trained on the route and to handle specific scenarios. The best part is that, in case of any unforeseen incidents on a segment (road), the ANN model automatically discovers the relationship between the parameters, analyses and processes it before the query results are correlated and displayed.

ANN for transportation is the future. The fact that it suits best for arrival predictions because of its ability to perform well in a non-linear & dynamically changing traffic scenario has drawn quite a lot of studies, tests & implementations around it. ANN together with other models such as Kalman Filter Technique has been implemented in the mixed traffic conditions in India and has derived excellent results. Various sub-models like Recurrent Neural Network models also has been derived to address certain specialized areas in road traffic segmentation. The constant evolution of ANN as a reliable technology has advantages – One, it will drive the agencies to focus more on the data quality side, since ANN heavily depends on quality data for effective output. Two, adopting ANN for improved ETA predictions will be a good motivator for the people to use public transport systems for their daily commute, thus saving our roads from the increasing menace of congestion and pollution. The same information also helps transport agencies to monitor and improve operational performance. Either way, it’s a win-win situation for all involved.

To know more about predictions related to predictive analysis and its application to transportation software please get in touch with our experts: