This post is about improvements for the fulltext index in ArangoDB 2.6. The improvements address the problem that non-string attributes were ignored when fulltext-indexing.
Effectively this prevented string values inside arrays or objects from being indexed. Though this behavior was documented, it was limited the usefulness of the fulltext index much. Several users requested the fulltext index to be able to index arrays and object attributes, too.
Finally this has been accomplished, so the fulltext index in 2.6 supports indexing arrays and objects!
Read on in Jan’s blog post about Fulltext Index Enhancements.
The fans of modern and agile software development usually propose to use schemaless database engines to allow for greater flexibility, in particular during the early rapid prototyping phase of IT projects. The more traditionally minded insist that having a strict schema that is enforced by the persistence layer throughout the lifetime of a project is necessary to ensure quality and security.
In this post I would like to explain briefly, why I believe that both groups are completely right and why this is not so paradoxical as it sounds at first glance. (more…)
This is the second blog post on building hypermedia APIs with the focus on API design. In part 1 Lucas describes the concept of links in JSON.
Imagine we have an API where people can like books and other people can then see, who likes a certain book. We want this API to be highly connected: We don’t want to look up URLs in a documentation, we want to follow links as we know it from the world wide web. All we want to do as the author of the API is give our users a single URL from which they can then follow links to all other resources. This is similar to the way we would do this with a website. Leonard Richardson and Mike Amundsen refer to this as the billboard URL for this reason: If you put this URL on some billboard, people know everything to get started with your API. (more…)
MongoDB is a document DB whereas ArangoDB is a multi-model DB supporting documents, graphs and key/values within a single database. When it comes to data modeling and data querying, they pursue somewhat different approaches.
In a Nutshell: In MongoDB, data modeling is “aggregate-oriented”, avoiding relations and joins. On the other side, everybody has probably used relational databases which organize the data in tables with relations and try to avoid as much redundancy as possible. Both approaches have their pros and cons. ArangoDB is somewhat in-between: You can both model and query your data in a “relational way” but also in an “aggregate-oriented way”, depending on your use case. ArangoDB offers joins, nesting of sub-documents and multi-collection graphs. (more…)
I recently had the chance to visit FullStack London, a well organized conference. Thanks a lot to Skills Matter. FullStack was opened by Douglas Crockford about “The Better Parts” of ES6. I cannot wait to start using them. Douglas was followed by Isaac Schlueter talking about open source in companies. Although this talk was not technical I learned a lot and it was very inspiring.
A while ago we wrote some blog article that explained how ArangoDB uses disk space. That article compared the disk usage of ArangoDB, CouchDB, and MongoDB for loading some particular datasets. In this post, we’ll show in more detail the disk usage of ArangoDB for insert, update, and delete operations. We’ll also compare it to CouchDB for reference.
In this post we’ll explain how ArangoDB stores collection data on disk and look at its storage space requirements, compared to other popular NoSQL databases such as CouchDB and MongoDB.
How ArangoDB allocates disk space
ArangoDB stores documents in collections. The collection data is persisted on disk so it does not get lost in case of a server restart.
UNQL started with quite some hype last year. However, after some burst of activity the project came to a hold. So it seems, that – at least as a project – UNQL has been a failure. IMHO one of the major issues with the current UNQL is, that it tries to cover everything in NoSQL, from key-value stores to document-stores to graph-database. Basically you end up with greatest common divisor – namely key-value access. But with graph structures and also document-structures you really want to supports joins, paths or some sort of sub-structures.
Apart from all the technical and theoretical benefits of SQL and what advantages the underlying theory has to offer, the major plus from an users point of view is that it is readable. You simple can see an SQL statement – be it in C, Java, Ruby – and understand what is going on. It is declarative, not imperative. With other imperative solution, like a fluent interface or a map-reduce, you need to understand the underlying syntax or language. With SQL you only need to understand English – at least most of the time.
And here I think is where UNQL is totally right. We need something similar for the NoSQL world. But it should not try to be a “fits all situation”. It should be a fit for 80% of the problems. For simple key-values stores a fluent-interface is indeed enough. For very complex graph traversals a traversal program must be written. For very complex map-reduces you might need to write a program – but check out Google’s talk (www.nosql-matters.org/program) about NoNoSQL. There they describe why they are developing a SQL-like interface for Map/Reduce.
In my experience most of the time you have a set of collections holding different “types” of documents with some relations between them. One of the biggest advantages of document stores or graph databases is that you can have lists and sub-objects. The problem with SQL is, that it has no good way to deal with these structures. So I believe UNQL would be quite successful if it would concentrate on these strong advantages of NoSQL, instead of trying to unify everything – especially after hear Jan’s talk about a document query language at the NoSQL Cologne UG (an English version is also available).
Last week AvocadoDB got mentioned in “nosql weekly” and the project achieved a huge amount of public interest especially from Japan. Awesome!
In this context Mr. Fiber asked on twitter what the use of skip list indices in AvocadoDB is. Here’s a short video reply by chief architect martin Schoenert. Got an opinion on this? – we’d love to hear your thoughts on this in the comments.
Under a microscope: how ArangoDB stores data in RAM and data is secured consistently nonetheless in case of a server crash
AvocadoDB uses AppendOnly memory-mapped files with frequent fsync. Derived data (indices, etc.) is stored in the main memory only. This article explains why that particular combination leads to high performance and consistent data at the same time―even in case of a system failure.
Classical database systems – a bulk of data and insufficient main memory
Put simply, there are three possible settings regarding databases:
- Setting 1: All data fits into the main memory.
- Setting 2: The complete data pool does not fit into the main memory all at once, but the main memory is large enough to store all the data accessed in an average time span.
- Setting 3: Even the sub-set of data accessed in an average time span is too large for the main memory.
Classical database systems had to cope with setting 3 because main memory was too expensive to store the majority of data.
Basically, classical database systems had to manage the main memory themselves. To manage all data sets that exceeded the capacity of the main memory they needed sufficiently intelligent algorithms which the system software couldn’t provide (i.e., to stream the data through main memory for full table scans).