The rundown
De Lijn, the Flemish public transport company, gets people to their destinations all across the country using thousands of buses on a daily basis.
To improve customer satisfaction and share real-time information about the location of buses, a new platform was launched. This platform gathers GPS data from all vehicles and calculates if a bus is riding according to a predefined schedule. Multiple clients (applications) can then access this data and let travellers know if their bus is going to be on time or not. The platform also allows De Lijn to pinpoint weaknesses on trajectories or predict service problems. The platform was built on Microsoft Azure by De Lijn.
The setup
The setup is pretty straightforward: Every 15 seconds, GPS data from all vehicles is sent to an IOT-hub via a private network. The received data is then compared to the the ride-schedule, which is stored in an Azure Cosmos Database. Afterwards it is processed (offset from the schedule is calculated, amongst other things) and forwarded to an Azure Event Hub. Passing through this last hub, data is consumed by several workers and stored into another Cosmos DB, which serves as an endpoint for clients to fetch data.
This system was operational in a testing setup, but lacked any form of reporting, so it was difficult for the development team to convince decision makers of its accuracy and potential.
Since De Lijn was running short on resources and had a deadline to deliver, they turned to DeltaBlue for aid.
Development
Our goal was to deliver insights in the enormous amount of data, without being intrusive. To achieve this, we added an Azure Stream Analytics Job, which consumed data from the Event Hub (in parallel, so we didn’t affect the existing flow) and stored it in a seperate SQL database as raw data. We scheduled functions at regular intervals to process raw data into relevant data and push those results back into the database.
Challenges
The biggest difficulty we encountered was the continuous stream of data (millions of rows on an hourly basis) which needed to be processed. The data itself was not really the big issue here, but the timeouts of Azure turned out to be quite a hurdle. After putting our heads together, we came up with clever data fragmentation strategy so data processing would happen on smaller datasets. The issues we were having with timeouts were completely sidestepped.
A good approach to processing large datasets is almost always to divide and conquer.
As more and more data was being gathered, we quickly realised that the data processing speed of one Azure Event Hub was not going to be sufficient. We advised on adding another one to prevent data from piling up in the queue.
Finally, we added PowerBI to the equation to create reports of the processed results and display those reports in a dashboard.
Conclusion
The setup we landed on allows for easy vertical or horizontal scaling, which is something we always strive for.
With these tools in place, De Lijn could improve the accuracy of its platform, were able to address issues with faulty GPS-devices faster and could convince the boards to go ahead and continue with development.
And all within the deadline: happy customer, happy developers.