…or: The Great Server Shootout
ArangoDB is a database server that talks HTTP with its clients: clients send HTTP requests to ArangoDB over TCP/IP, ArangoDB will process them and send back the results to the client, packaged as HTTP over TCP/IP.
ArangoDB’s communication layer is thus the foundation for almost all database operations, and it needs to be fast to not become a bottleneck, blocking further database operations.
To assess the actual performance of the communication layer, we did some benchmarks that compare ArangoDB’s network and HTTP processing capabilities with those of other popular servers.
ArangoDB is a database server and thus offers a lot more than just HTTP processing, but in this post we’ll be concentrating on just the networking/HTTP parts of the product. It is likely that we’ll publish results of further benchmarks that involve other parts of the system in the near future.
Though we’ll be trying to disclose the methodologies and results of this benchmark, the usual disclaimers apply here as well:
- The benchmark results may vary depending on the system you’ll be compiling and measuring on, furthermore, results will differ depending on the product configurations/settings
- Just a specific part of the products (networking/HTTP handling) was measured, not the complete functionality of each product
- We measured just specific aspects of performance (total time, requests per second) although there are obviously more aspects one could measure (CPU utilitisation, memory usage etc.)
- The test cases reflect just a few things of what one can do with a server. They might or might not be realistic depending on your workload/usage patterns
- We have compared general-purpose products to their stripped-down counterparts. The use cases for the products tests are not fully identical.
- Don’t trust any results with questioning. If in doubt, rerun the benchmarks and measure yourself!
We were interested in how fast ArangoDB could handle HTTP requests. And what kind of tools could be better at handling HTTP requests than…, well, web servers?
So for this benchmark, we have conducted load tests with the following products:
We have picked well-established general-purpose web servers such as Apache httpd as well as their more stripped-down and more specific counterparts such as Nginx and Gatling. Apache httpd 2 has several different multi-processing modules (mpms). As we weren’t sure which one we should as a baseline, We have used the following Apache mpms in the comparison: event, worker, and prefork.
We only picked open-source tools for our tests so we could compile everything ourselves with the same settings. No pre-optimised binaries have been used. And as we were interested in measuring the HTTP layer, we only picked tools that speak HTTP out of the box.
For all of the above server products, we have measured the total time it took a client to send 100.000 (100K) identical HTTP GET requests to the server and get the servers’ responses back. The total time it took the server to answer all requests was also translated into the “requests per second”.
For each product tested, the number of concurrent client connections was increased from 1 to 512 to also assess the servers’ scalability. For each concurrency level, 3 test runs were conducted and the average results of the 3 runs were used as the overall result for that concurrency level.
Two different test setups have been used:
- In one scenario (“local” scenario), the client was located on the same physical host as the server.
- In the other scenario (“network” scenario), the client was located on a different physical host and the requests went over the network. Client and server were located in the same network and using the same switch.
The communication between client and server was HTTP over TCP/IP in all cases. HTTPS/SSL has not been tested. HTTP Keep-Alive has been used for all client requests.
In the “local” scenario the client and the server parts were running on the same physical host so they might have competed for the same resources. This should not be a problem because this will not change the relative results, only the absolutes (that we’re not too much interested in).
All server products were installed on the same physical server. To get comparable results, no prefab server binaries have been used, but all server products were downloaded and compiled from source on the target environment. Compilation was done with gcc/g++ 4.5.1 and -O2 optimisation level for all servers tested.
As all products were installed on the same physical host, the server configuration was identical for all products:
- Linux Kernel 188.8.131.52-0.11, cfq scheduler
- 8x Intel(R) Core(TM) i7 CPU, 2.67 GHz
- 12 GB total RAM
- 1000Mb/s, full-duplex network connection
- SATA II hard drive (7.200 RPM, 32 MB cache)
As we were not interested in disk performance, all logging facilities offered by the server products (e.g. access logs, error logs) were turned off. All products were run with the same normal-privileged user account so the operating system imposed the same limits on them.
CPU stepping was turned off during the tests so all server CPUs ran at their maximum frequency. Recurring jobs (cron etc.) were disabled on the server for the duration of the tests.
The test client used in all cases was ApacheBench 2.3. It is known that ApacheBench is single-threaded and can itself become a bottleneck for high-throughput tests. Furthermore, the aggregation of test results within ApacheBench at the end of each run might also skew the results slightly. However, we found these two issues not to be a problem in our case because with the very small HTTP GET requests sent, ApacheBench produced sufficient load and did not become a major bottleneck. Slightly skewing the results in the aggregation phase was also unproblematic because we knew it would be the same in all test runs and would not affect the relative results at all. So we decided to use ApacheBench because of its ease of use and wide-spread availability (so others can reproduce the tests if they want to).
The command used for the tests was:
./ab -k -c $CONCURRENCY -n 100000 $URL
The results consist of the following data series:
- mongrel2: Mongrel2-1.7.5
- nginx: Nginx 1.2.1
- apache_event: Apache httpd 2.4.2 event mpm
- apache_worker: Apache httpd 2.4.2 worker mpm
- apache_prefork: Apache httpd 2.4.2 prefork mpm
- gatling: Gatling 0.12
- arangod-file: ArangoDB 1.0-alpha2
The actions performed for each series were identical and comparable: the same static file was requested by the client, read by the server and its contents returned to client.
Please note that this benchmark’s goal was not set any world records, so the absolute result values aren’t very interesting. They will likely vary with the hardware used anyway, and using better test servers will likely result in getting better results.
What was more interesting to us was to see the relative performance of ArangoDB compared to the performance of the other products, and, its ability to scale under increasing load, again compared to the scalability capabilities of the other products.
The tests were not conducted to disrespect any of the other products at all. We all know that and believe and the other products are well-established and that there are plenty of use cases for them. They were just used as a reference.
Full results can be found in the document accompanying this post: Performance test results
Results of the “local” scenario
ArangoDB in file server mode (series arangod-file) was able to handle more requests than any of the tested Apache2 variants for all tested concurrency levels. It could handle about the same amount of requests as Nginx with 1, 2, and 4 concurrent connections. Nginx was better at 8 concurrent connections, but after that, ArangoDB was able to handle significantly more throughput. ArangoDB’s throughput increased (though not much more) until up to 128 concurrent connections.
Results of the “network” scenario
In the “network” scenario, Nginx outperformed all competitors for up to including 8 concurrent connections. Its performance in that low concurrency segment was undisputed. From that point on, the competitors were catching up. ArangoDB started outperforming the others from 32 concurrent connections. Throughput in ArangoDB increased up to until 512 concurrent connections.
Overall, it seems that ArangoDB’s network and HTTP layer can generally keep up with those of other HTTP-based products in terms of throughput. For low concurrency situations, some highly optimised products such as Nginx performed better.
With increased concurrency, ArangoDB was catching up and outperformed the other products. The test showed that throughput in ArangoDB could be increased up to 128 concurrent connections (local scenario) and 512 connections (network scenario). At these concurrency levels, the other products showed stagnation or a decline in throughput.
Overall it seems that the networking and HTTP stack in ArangoDB is not likely to be a major bottleneck for regular operations. However, as some of the other products showed better throughput at low concurrency, it seems that there is still room for optimisation in ArangoDB’s request handling. Especially, it would be interesting to know what Nginx does to achieve that superior performance in low concurrency situations. If anyone knows, please leave a comment.
As mentioned before, we haven’t looked at memory consumption in these tests. Neither did we check CPU utilisation and other factors. These would be other interesting things to look at when performing additional tests.