The speed and extent of mobility transformation today is vastly different from what we have seen in the past. With an increasing number of megacities growing around the world, mobility for all those within it becomes a challenge.  Well-planned urban mobility strategies backed by sufficient investments are needed to solve the transportation challenges the masses will face, and no doubt already do. 

With high internet penetration and access to smartphones, on-demand services have opened up a new arena of services to customers by offering consumers what they want, when they want, without owning a car. Commuters find it easy and convenient to just have to open their smartphones to find and connect with various transportation modes available for a ride. 

According to McKinsey, 23% of Americans have no interest in owning a car, proving the changing mentality and openness towards adopting on-demand ride hiring services. The global on-demand transportation market size is expected to reach USD 304.97 billion by 2025, with innovative mobility solutions and the rising adoption of connected vehicles. 

Building a ride-hailing application – The challenges & the way-around

Experion was approached by a leading transportation provider in the Middle-East to develop an on-demand ride-hailing application after they realized the vast market potential and opportunity it held. Designing and developing such a responsive, scalable application with a real-time communication engine requires significant architectural foresight, and in this article,  we will delve into how the team at Experion designed and built the real-time communication module in this product. 

The heart of any ride-hailing application is the real-time communication system between the driver application, passenger mobile application, and the backend APIs. This real-time communication framework ensures the flow of events through the complete ride life-cycle (from ride-booking until ride completion). A product of this nature has to be high performance with expected response times in milliseconds, and communication (ride status, driver status, locations, etc.) has to flow between the mobile apps, web, and API seamlessly. The real-time communication module has to be scalable in nature since peak-time cannot be predicted. Back-end APIs use the real-time data from mobile as input to its sophisticated routing and matching algorithms that manage supply and demand matching. Based on real-time communication, notifications were to be sent to various client applications (drivers and passengers) through notification services. 

This solution suite needed to be developed in a short time period of a few weeks, due to the critical launch-date provided. Timelines for development, scalability of the solution, and the performance expectations of the real-time communication engine were all of equal importance as we started the architectural considerations process.    

Real-time communication solution options and decisions:

The main decision factors that we had to consider during this process were:

  • Highly responsive framework with the entire publish-subscribe cycle working within milliseconds.
  • Minimum learning curve and development time
  • Availability of APIs (SDKs) which can be accessed from native Android applications, native iOS applications, REST APIs, and AWS Lambda programs. 
  • Scalability during peak time – Preferably cloud-based solutions with automatic scaling options. 
  • Offline support for mobile applications that work regardless of network latency or internet connectivity.
  • Preference for an AWS based service, since the rest of the solution, was in AWS stack.

An illustration of the expectation from the real-time communication framework is given here:

  • Locations of all logged-in drivers and pax are to be tracked in real-time. If mobile devices lose internet connectivity/involve slow networks, the real-time communication framework should manage the synchronization and communication with the server.
  • Every driver who is ready to accept a ride will be available in the real-time communication data store and shall be available for the ride-matching algorithm to pick. 
  • When a ride request from a pax is initiated from the mobile app, the real-time engine should notify the AWS Lambda Ride Relay algorithm with details of the ride request. 
  • Once the Ride Relay Algorithm selects a driver, the real-time communication framework should immediately show the ride request in the driver mobile app. 
  • If the driver accepts the ride, the real-time communication framework should send the driver’s data to the pax app along with the driver’s location and contact details. 
  • The driver’s movement will be updated into the real-time communication framework from the driver’s mobile application and it will be displayed in real-time in the passenger’s mobile application. 

This illustration is just shown as an example of how a real-time communication network should function. The actual product has many more sophisticated real-time communication scenarios that have to be managed by the selected communication framework. 

Based on the requirements made evident from the illustration above, the following technology options were considered:

Option 1: Socket Based Communication

Socket-based communication was not favorable due to the relatively large learning curve needed to develop a socket-based communication framework from scratch,  and a longer development schedule while using it. Developing such a communication framework with an entire real-time event management framework with subscribe-publish model interfacing with the mobile, web, and API side in a short time could result in product quality issues. It was too much of a risk, and hence we rejected this option.

Option 2: AWS Appsync

The client wanted the product to be deployed in the AWS infrastructure. Technology choice for the scalable, high-performance demand-to-supply matching, and relay algorithm was AWS Lambda. Though we thought about creating a framework with SQS, SNS, and our own custom API code, the architectural considerations for automatic handling of mobile app connectivity issues made us think about other options. AWS Appsync and Firebase were the options that could satisfy most of our requirements.

We decided to explore AWS Appsync as the real-time communication framework.  AWS AppSync could be used in conjunction with DynamoDB (which is a high-performance data store for AWS), both of which had seamless integration capabilities with Lambda functions. In addition, we believed we could take advantage of having everything in the same AWS services framework.  

We created a feasibility checklist to verify the important capabilities expected from the real-time framework and quickly executed a proof of concept exercise for AWS Appsync. Based on this, AWS Appsync ticked against most of the points, and it integrated well with the rest of our architecture. We finalized on using AWS Appsync for real-time communication and started the development with this. As per our design, we were using multiple Appsync data stores/nodes for ‘Ride Requests’, ‘Driver-Vehicle Pairs’ etc. to dynamically handle real-time communication. AWS Appsync was working well during our development and unit testing phases. 

However, towards the closure of the development stage, when we started the integration tests simulating real-life peak scenarios, we started noticing some inconsistencies in the communication. When many concurrent requests were being served, it was noticed that the ride status updates like drivers accepting a ride or canceling a ride, etc. were not updated properly through AppSync (in DynamoDB) for around 20% of the cases. These tests were simulating real-life test scenarios with multiple ride requests flowing through their ride lifecycle. In production scenarios, we could not afford even 1% of real-time communication failure. 

With just a month to go for planned UAT – we decided to report this to AWS and troubleshoot. We tried various options of troubleshooting based on materials, the internet, and the options suggested by the AWS support team, but the issue was not resolved. Though the AWS support team was helpful, they were not sure about the resolution time and hence had no option but to start looking for alternatives.

Option 3 – Firebase:

During the design phase, we had successfully completed the proof-of-concept for Firebase as well. Initially, we did not choose Firebase since we favored the usage of AWS stack for the whole solution. However, the failure scenarios in Appsync left us with no option but to change our real-time communication framework to Firebase. 

Hence we started the re-development of the real-time communication module using Firebase, with less than one month to go into UAT.  The same data stores which were designed for real-time communication in Appsync were converted as nodes in Firebase. We designed methods to integrate the real-time data from Firebase into DynamoDB. As soon as we completed the development of a few important real-time communication APIs, we started the integration tests with real-life test simulations. We noticed that there were no failures in real-time communication for Firebase, even when 3 times the peak load size was tried out. All the real-time operations like Book Ride, Ride Accept, Passenger Cancel, and Driver Cancel were executed successfully. The ride status and location updates were synchronized between the mobile applications and server-side APIs without failures. The load testing was successful for all test cases for different concurrent requests executed using JMeter and automated test scripts. Firebase could handle the real-time communication scenarios expected from the solution. UAT phase was successful, and the application was moved to production without any issues.

In Conclusion

The road to building a successful application is not a walk in the park, but with the right team of experts who are passionate about getting it right, any challenge can be successfully overcome. 

Experion’s adept team overcame the hurdles they faced while building this solution suite with ease- they showed a pragmatic use of technical and domain expertise and ensured the product went live in record time, completely unhassled by the time-crunch they were met with.

So the next time you open a ride-hailing app on your phone, spare a smile for all the engineers, developers, and designers who made this convenience possible for you.