Have a pleasant journey with FlightStats and ArangoDB
Benjamin Corliss, Platform Engineer @ FlightStats Inc.
When travelling from A to B you want an overall pleasant journey. Compared to driving, flying is much more complicated due to multiple factors which can delay your flight (weather, air traffic) or increase waiting times (security check, luggage check-in, delays of flights, etc.). FlightStats lays the data-centric groundwork for a perfect journey.
All major airlines, airports, hotels and leading information providers like Google or Yahoo use FlightStats´ products to improve their services. Our data reaches more than 35% of global air travelers each day and the data request to our systems exceed 300 million each month.
Aviation data is fragmented and not uniformed to one standard. Flight status, weather, reference data, FAA Airport Delays and many data sources more affect the whole customer journey from destination A (e.g. home) to your final destination B (e.g. hotel). Data from all these sources have to be harmonized or transformed to get accurate analytical or predictive results.
We maintain and use reference data for e.g. airlines, airports or equipment. This data has been maintained across many tables in a relational database. We were running into some data related challenges which are summed up into the following:
- Store reference data with temporal information
- Make it easier to change schema as future needs arise
- Improve our authoring web UI
- Reduce load on our relational db
- Provide API access to the reference data
We needed a document db that allowed us to easily query the data by effective date. Storing and processing data from more than 30k airports and airlines is a challenge itself but for our needs, it is necessary to store each modification of those entities with its own effective date as a document as well (i.e. key = id + effective_date). All changes and their effective date have a significant impact on historical reports and thus on our predictive analytics.
For our partners and data sources we needed a solution which enables easy access to multiple environments and a versatile, performant API layer. Due to the growth of our company in terms of API requests, external data sources, and quantity of data, we faced the need for a scalable solution.
Furthermore, our team was and is growing fast. Thus we need a good documentation of the technology we use at FlightStats to get new team members up to speed – especially on advanced topics.
Our reference data is best suited to a document tree format thus using a dedicated document store was our first idea. We experimented with a variety of databases – some of them market leaders in NoSQL – but quickly ran into limitations like expressing non trivial queries in a readable way or pushing performance to the required level.
By chance we found ArangoDB from Germany and it turned out the database was a perfect fit for us. ArangoDB is a multi-model database that let’s you handle data as key/value, graphs and as documents of course. We could solve all the issues concerning our requirements:
- We tested ArangoDB intensively and performance was really good
- It enabled us to easily access stored data from multiple environments. HTTP API’s ensure this. Currently we are hitting ArangoDB from a node web app as well as a clojure server app.
- Because of their framework Foxx for data-centric microservices we were able to provide new, sophisticated APIs super fast and let any logic run directly in the database
- The docs were good enough to quickly get us up and going with ArangoDB; they have a good amount of depth for advanced topics; and we feel they could easily be consumed by other team members at FlightStats. It’s also the best API documentation I’ve seen (Swagger API implementation).
Having a “Go” on these issues we could start with our project and store the reference data of airports, weather, locations and the like with their effective date.
Reference data is now used for all of our data products that refer to Airports (Terminals, Gates, referred to from trips, flights), Airlines (referred to from trips, flight alerts, etc), and equipment (which type of aircraft). It is vital that the data be accurate and up-to-date. ArangoDB is making it easy for us to add temporal effectiveness to the data – which is useful for both historical reporting as well as making changes that we know will be effective in the near future.
A big bonus for us is the freedom to scale ArangoDB along FlightStats necessities. In other parts of our organisation we have the need for graph models and can easily bring graph-teams up to speed with our experiences with ArangoDB and afterwards learn from our buddies in term of graph usage. Great symbiosis.
AQL is a powerful query language and its intuitive nature made it easy to adapt. We are working in a fast paced environment, quick iterations and prototyping is very important. With the microservice framework Foxx we save a lot of time and effort and get basic environments up and running in hours. Our team grows constantly and due to a good documentation we are able to bring new members up to speed in no time.
Overall ArangoDB is a sophisticated technology meeting high quality needs and we are keen to drive the implementation further for other use cases.
Importance of key characteristics
|Factor||not important||important||very important|
|AQL / JOINs||x|
Thank you so much Candice Parfitt and Benjamin Corliss for investing your scarce time and writing this great case study!
Also using ArangoDB? Write a few lines – post it to your blog or send it to Jana and we’ll publish it here.