home shape

Take Alpha 2 of the upcoming ArangoDB 3.7 for a spin!

Estimated reading time: 8 minutes

We are 11 weeks into the development of ArangoDB 3.7 and want to give you yet another opportunity to try out the upcoming features before the release. On our technical preview page, you’ll find the Alpha 2 packages for the Community and Enterprise Edition.

This Alpha 2 comes with pretty neat features and improvements and we hope to get your early feedback!

This is particularly helpful for us to adjust our development in terms of solving real-world problems for you and ease-of-use for the new capabilities like:

  • ArangoSearch upgrades (all Editions)
    • FuzzySearch (n-gram based)
    • Wildcard Search with new LIKE operator
    • Enhanced phrase and proximity search
    • Late document materialization with stored values
    • Improved SIMD-based index format for much faster queries
  • New SatelliteGraphs (Enterprise Edition only)
    • Replicate graphs within a cluster for local query execution
  • Schema Validation (all Editions)
  • Graph Traversal Performance Improvements (all Editions)
  • Encryption-at-rest key rotation (Enterprise Edition only)
  • Improved Cluster Metrics (all Editions)
  • and other neat features

This is already a lot but we are still not done. Expect more features and improvements to come in 3.7 🙂

Please keep in mind that this is an alpha release. It is suited for testing purposes only and is NOT for production usage. It is solely meant for trying out new features way before they become available in a stable release. Although we have done our best to produce useful features and APIs, things may still contain bugs or may change until the final 3.7 release.

Check out the interactive tutorials and test the new features live without installation or registration for:

Your feedback is highly appreciated and please share your thoughts with us in the dedicated feedback-3-7 channel on Slack.

New features in the 3.7 Alpha 2

The following 3.7 features are already available in this Alpha 2 for testing. In addition, the Alpha 2 release contains all features of Alpha 1 like HTTP/2 support, JWT secret rotation, TLS key & certificate rotation and the new Insert-Update functionality.

ArangoSearch Upgrades

The text search and ranking engine within ArangoDB has seen many improvements already but with 3.7 we will make another huge step forward. This Alpha 2 includes already a good portion of this step.

ArangoSearch wildcard and fuzzy search

  • Wildcard search with `LIKE`
  • N-Gram based fuzzy search
  • Enhanced phrase and proximity search

ArangoSearch View optimizations

  • Improved late document materialization for Views with stored values
  • Covering Views (fulfilling requests using View indexes without touching the storage engine)
  • Improved SIMD-based index format for much faster queries

You can test the new ArangoSearch capabilities without installation and without any registration directly on ArangoGraph or as an interactive tutorial. Let us know if this is a good way to check out these new features!

SatelliteGraphs

When doing joins involving graph traversals, shortest path or k-shortest paths computation in an ArangoDB cluster, data has to be exchanged between different servers. In particular graph traversals are usually executed on a Coordinator because they need global information. This results in a lot of network traffic and potentially slow query execution.

Take a graph-based permissions use case where you have a large, sharded collection of documents within your ArangoDB cluster. You probably want to determine quickly, if a user, group or device has permission to e.g. access certain information. With SatelliteGraphs you can now replicate your graph handling the permissions to each DB-Server and execute queries locally.

SatelliteGraphs are the natural extension of the concept of SatelliteCollections to graphs. All of the usual benefits and caveats apply. SatelliteGraphs are synchronously replicated to all DB-Servers that are part of a cluster, which enables DB-Servers to execute graph traversals locally. This includes (k-)shortest path(s) computation and possibly joins with traversals and greatly improves performance for such queries.

SatelliteGraphs are only available in the Enterprise Edition and the ArangoDB Cloud Service ArangoGraph(once GA).

No installation and no registration: Take SatelliteGraphs for a spin in this interactive tutorial

AQL subquery performance improvements

We refactored the execution process of AQL internally. This especially pays off in subqueries. It will allow for more optimizations and better batching of requests.
The first stage of this refactoring has been part of 3.6 already where some subqueries have gained a significant performance boost. With 3.7 we go the next step in this direction. We can now combine skipping and producing of outputs in a single call, so all queries with an offset or the fullCount option enabled will benefit from this change straight away.
This also holds true for subqueries, hence the existing AQL optimizer rule `splice-subqueries` is now able to optimize all subqueries.

Alpha 2 remark:
Not all previously available optimizations have been transferred into the new execution process yet, so you might see a performance regression for some queries for the time being. We are working hard to have them all back in place for the final release. The most significant drawback right now will be subquery batching, so short subqueries with many inputs but only few data rows may be slower in this Alpha than in 3.6, but we are committed to fixing this.

The existing AQL optimizer rule `move-calculations-down` is now able to also move unrelated subqueries beyond SORT and LIMIT instructions, which can help avoid the execution of subqueries for which the results are later discarded.

For example, in the query:

FOR doc IN collection1
  LET sub1 = (FOR sub IN collection2 ... RETURN sub)
  LET sub2 = (FOR sub IN collection3 ... RETURN sub)
  
  SORT sub1
  LIMIT 10
  RETURN { doc, sub1, sub2 }

…the execution of the `sub2` subquery can be delayed to after the SORT and LIMIT. The query optimizer will automatically transform this query into the following equivalent:

FOR doc IN collection1
  LET sub1 = (FOR sub IN collection2 ... RETURN sub)
  SORT sub1
  LIMIT 10

  LET sub2 = (FOR sub IN collection3 ... RETURN sub)
  RETURN { doc, sub1, sub2 }

This optimization existed before for other types of queries, but didn’t handle subqueries so far.

Schema validation for documents

The structure of documents can now be enforced by providing per-collection schema definitions. If used, the schema definition has to be specified in JSON Schema (draft-4) format.

For example, the following ArangoShell code creates a collection with the following schema:

  • the “name” attribute must be an object, with required attributes “first” and “last”, and an optional “middle” attribute. Name lengths are restricted.
  • the “status” attribute must be one string out of a predefined set of values
  • no other attributes are allowed (excluding system attributes)
db._create("testSchema", {
  validation: {
    rule: {
      properties: {
        _key: { type: "string" },
        _rev: { type: "string" },
        _id: { type: "string" },
        name: { 
          type: "object", 
          properties: { 
            first: { 
              type: "string", 
              minLength: 1, 
              maxLength: 50 
            }, 
            middle: { 
              type: "string", 
              maxLength: 50 
            }, 
            last: { 
              type: "string", 
              minLength: 1, 
              maxLength: 50 
            } 
          }, 
          required: ["first", "last"] 
        }, 
        status: { enum: ["active", "inactive", "deleted"] } 
      }, 
      additionalProperties: false, 
      required: ["name", "status"] 
    },
    level: "strict",
    message: "document has an invalid schema!"
  }
});

It is not necessary to include system attributes in the schema description if you plan to set `additionalProperties` to `false`. They are invisible to schema validation.

With the schema in place and the validation level set to `”strict”`, all non-conforming documents will be rejected on insert, update or replace operations. The following operation will thus succeed:

db.schemaTest.insert({
  name: {
    first: "Foo",
    last: "Bar"
  },
  status: "active"
});

Trying to insert a document with an invalid schema will fail on the other hand, like:

db.schemaTest.insert({
  name: "Foo Bar",
  status: "active"
});

ArangoError: validation failed: document has an invalid schema!

Schemas can be introduced gracefully for existing collections if required, so that only newly inserted documents will be validated against the schema (`level: “new”`). It is also possible to enforce the schema for new documents and that existing documents which are valid remain valid, while existing invalid documents can still be modified without complying with the schema (`level: “moderate”`). Schema validation may also be disabled temporarily without removing the rule (`level: “none”`).

Check out the tutorial to learn more.

Graph traversal performance improvements

The performance of graph traversals is improved via some internal code refactoring:

  • Traversal cursors are reused instead of recreated from scratch, if possible. This can save lots of calls to the memory management subsystem.
  • Unnecessary checks have been removed from the cursors, by ensuring some invariants.
  • Each vertex lookup needs to perform slightly less work.

The traversal speedups observed by these changes alone are around 8 to 10% for single-server traversals and traversals in OneShard setups in our tests. Cluster traversals will also benefit from these changes, but to a lesser extent. This is because the network roundtrips have a higher share of the total query execution times in the cluster case.

Traversal performance is further improved by not fetching the visited vertices from the storage engine in case the traversal query does not refer to them.

Cluster Metrics

The amount of exported metrics has been extended and is now available in a format compatible with Prometheus. You can now easily scrape on `/_admin/metrics`. Please see the details on the available metrics in the documentation.

To highlight some of the available metrics:

  • Heartbeat metrics monitor if the servers are still connected to each other
  • Shard Distribution metrics monitor if your data is replicated and well distributed
  • Scheduler metrics monitor if your system gets overloaded and requests start to queue

Other neat Improvements

Server Name Indication (Enterprise Edition)
Sometimes it is desirable to have the same server use different server keys and certificates when it is contacted under different names. This is possible with the Server Name Indication (SNI) TLS extension. It is now supported by ArangoDB using a new startup option `–ssl.server-name-indication`.

See the docs for this feature.

Encryption at rest key rotation (Enterprise Edition)
It is possible to rotate the user supplied encryption key by sending a POST request without payload to the new endpoint `/_admin/server/encryption`. The file supplied via `–rocksdb.encryption-keyfile` will be reloaded and the internal encryption key will be re-encrypted with the new user key.

See the docs for this feature.

Override detected total memory
`arangod` detects the total amount of RAM present on the system and calculates various default sizes based on this value. If you run it alongside other services or in a container with a RAM limitation for its cgroup, then you probably don’t want the server to detect and use all available memory.
An environment variable `ARANGODB_OVERRIDE_DETECTED_TOTAL_MEMORY` can now be set to restrict the amount of memory it will detect (also available in v3.6.3).

Please see the docs of this feature for more details.

More features

There are a few more small features and optimizations we want to implement for ArangoDB 3.7.0. You will get a chance to test them before the release in a couple of weeks, so stay tuned!

We hope you find some useful stuff in this Alpha 2 and take either the Community or Enterprise Edition for s spin. Let us know your thoughts about the features and their ease-of-use via the dedicated #feedback-3-7 channel in our Community Slack.

Hear More from the Author

Demo ArangoML Pipeline Cloud – Managed Machine Learning Metadata

gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?

Continue Reading

Static binaries for a C++ application

Win your free ticket and join ArangoDB @ JontheBeach 2018

Welcome to the ArangoDB family, Ted Dunning!

Joerg Schad

Joerg Schad

Jörg Schad is our CTO. In a previous life, he worked on Machine Learning Infrastructure in health care, distributed systems at Mesosphere, implemented distributed and in-memory databases, and conducted research in the Hadoop and Cloud area. He’s a frequent speaker at meetups, international conferences, and lecture halls.

2 Comments

  1. Tom Fothergill on May 27, 2020 at 12:59 pm

    Sounds great. Any idea on a release date?

    • Jan Stücke on July 13, 2020 at 6:01 pm

      We are in the final stages of the release. So we will release 3.7 soon

Leave a Comment





Get the latest tutorials, blog posts and news: