In Next gen NoSQL: The demise of eventual consistency a recent post on Gigaom FoundationDB founder Dave Rosenthal proclaims the demise of eventual consistency. He argues that Google Spanner “demonstrates the falsity of a trade-off between strong consistency and high availability”. In this article I show that Google Spanner does not disprove CAP, but rather chooses one of many possible compromises between total consistency and total availability. For organizations with a less potent infrastructure than Google other compromises might be more suitable, and therefore eventual consistency is still a very good idea, even for future generations of nosql databases.

Dave’s article brings up a good and interesting discussion. However I believe that in the end he misses the point.

At the end he talks about machines failing and clients no longer being able to talk with those machines. Well, that is obvious. But a partition is a failure that is very different from a machine failing. Nobody has problems with consistency when a machine fails.

A partition is a network connection failure such that on either side of the failed network connection are both clients and database servers.

I will try to show that there still is a real and important consideration in this situation (which is at the core of the CAP theorem).

And I will try to demonstrate this with an example. So, assume Nozama is a large web shop. It has a database with all the products, containing – among other information – for each product the number of items in stock. When a customer orders a product the system checks whether there are still items left in stock, and if so reserves one item for this customer and reduces the number of items in stock accordingly. Reliability is very important for Nozama, and so they have their servers in two computing centres, one in Europe and one in the USA. Customers in Europe are connected to the computing centre in Europe and customers in the USA are connected to the computing centre in the USA.

In this context availability means that all customers (in Europe and the USA) can order any product at any time (as long as it is in stock). I.e. the only reason why the website would tell the customer that she cannot order a particular product is because no items are in stock for this product.

And consistency means that all customers (all over the world) have at all times the same information about the number of available items in stock. I.e. it will never happen that a customer orders a product (with positive feedback from the website) and is informed later that the product is no longer deliverable.

During normal operation everything is well. The web shop is available and consistent. There is no trade-off between availability and consistency.

But if the connection between the two computing centres in Europe and the USA is broken, then it becomes impossible to maintain both properties of the system (the web shop). Assume that we still allow all customers to order all products. If a customer in the USA orders the last item of a particular product (say the CD “Bach, The Well Tempered Clavier, Sviatoslav Richter”), the information that the number of items in stock is now 0 cannot be replicated to Europe. So if somebody in Europe wants to order this product, then the system will let her (because the database still contains the information that there is one item left in stock). But later she will be informed that the product is alas no longer deliverable. That means that consistency is violated. If – one the other hand – Nozama does no longer allow all customers to order all products when the connection is broken, then obviously the system is no longer fully available. Note that this does not mean that no customer cannot order anything anymore. It only means that some customers cannot order some things temporarily. Possible restrictions could be that all customers in the USA can still order everything, and no customers in Europe can order anything (maybe make that dependent on the time of day). Or one could define for each product individually on which continent this product is orderable during the breakdown. But in any case there is some restriction on the availability. This is basically what the CAP theorem says. You cannot keep both properties – availability and consistency – during the breakdown of the connection (the so called partition). So there is a decision to be made (actually as Brewer himself points out, this is not a binary decision, there is a whole spectrum of possible approaches – but you cannot have everything, something has to give).

What is important is that both possibilities could be the “right” approach depending on the business. In our example it basically depends on the costs of failing to deliver due to overselling in case of a connection breakdown. If you are selling books, and in case you cannot deliver a book you send out an apology and a small gift coupon, than allowing the system to become inconsistent is probably the right approach. If you are selling airplane tickets and customers will sue you if you do not deliver, than restricting availability is probably the right approach.

If you read the Google Spanner article carefully, you will notice that they cannot invalidate the CAP theorem. They have decided to opt for consistency. Which means that there can be situations where the Spanner database will not be fully available.

What Google can do is to make the probability of a network connection breakdown very (very) small. Basically they can reduce this probability more than any other company. Google owns so much networking infrastructure (and peering agreements, …), that it is possible for them to make the probability of totally losing the connection between two computing centres become very small. And for them it is not too expensive to do so. In particular it is cheaper for them to make this probability really small than to carry the development cost of building every application in such a way that this application can deal with inconsistency. However this is probably unique to Google. While many other companies have the reliability requirement to host their web site in two (or more) computing centres, very few can – with acceptable costs – make the connection between those two computing centres highly available. Note that the most likely reason for failure of such a connection is no longer the proverbial caterpillar, but misconfiguration of a router, switch, firewall, etc.

So those companies still have to face the question what – for them – is the best approach. Reduce the availability in case of a partition (as mentioned above that does not mean that no client can use the database any more at all, only that some clients cannot use some parts of the database). Or to allow the system to become somewhat inconsistent – and to fix up everything afterwards when the network breakdown is over.

And here (and only here) does eventual consistency enter the picture. Eventual consistency does not mean that each and every write (even when the system is not partitioned) to the database only becomes available to other clients after some time (eventually). Instead it means that if the database becomes inconsistent – during a partition – it will get rid of the inconsistencies when the partition is over. In a sense the database will heal itself (you only have to write down the rules how to merge divergent states).

This is actually a great property of the database! Without the interaction of administrators or specialists, without any human interaction the system will solve all the inconsistencies and return to a consistent state. If you have ever been responsible for a distributed database without this property, you will be painfully aware how contagious inconsistencies are, and instead of going away they tend to spread out.

So, if a company decides to opt for a distributed system that is fully available all the time, eventual consistency means that the inconsistencies that enter the system during network breakdowns do not spread out to catastrophic proportions but will disappear when the network breakdown is over.

A final remark

The real contribution of Spanner to the theory and practice of databases is not, that they show that one need not decide between availability and consistency. You still have to make this decision. And Google still has to make that decision also, they just can make the probability of a partition much smaller than everybody else. The real contribution is that they show how one can achieve consistency (in a distributed but not partitioned database) efficiently by maintaining an accurate and coherent clock on all machines. For a long time maintaining an accurate and coherent clock on distributed machines was so difficult that we stopped thinking about this possibility at all. But now with NTP, GPS and “your own atomic clocks” it becomes possible and even “cost effective” (though not “cheap”).

Do you have an opinion on this? Let us know in the comments!