Schema Validation

How to enforce attributes and their data types for documents of individual collections using JSON Schema

While ArangoDB is schema-less, it allows to enforce certain document structures on the collection level. The desired structure can be described in the popular JSON Schema  format (draft-4, without support for remote schemas for security reasons). The level of validation and a custom error message can be configured. The system attributes _key, _id, _rev, _from and _to are ignored by the schema validation.

Enable schema validation for a collection

Schema validation can be managed via the JavaScript API, typically using arangosh, as well as via the HTTP interface.

To enable schema validation either pass the schema property on collection creation or when updating the properties of an existing collection. It expects an object with the following attributes: rule, level and message.

  • The rule attribute must contain the JSON Schema description.
  • level controls when the validation is applied.
  • message sets the message that is used when validation fails.
var schema = {
  rule: { 
    properties: { nums: { type: "array", items: { type: "number", maximum: 6 } } }, 
    additionalProperties: { type: "string" },
    required: ["nums"]
  },
  level: "moderate",
  message: "The document does not contain an array of numbers in attribute 'nums', or one of the numbers is greater than 6."
};

/* Create a new collection with schema */
db._create("schemaCollection", { "schema": schema });

/* Update the schema of an existing collection */
db.schemaCollection.properties({ "schema": schema });

To remove an existing schema from a collection, a schema value of either null or {} (empty object) can be stored:

/* Remove the schema of an existing collection */
db.schemaCollection.properties({ "schema": null });

JSON Schema Rule

The rule must be a valid JSON Schema object as outlined in the specification . See Understanding JSON Schema  for a user guide on how to write JSON Schema descriptions.

System attributes are invisible to the schema validation, i.e. _key, _rev and _id (in edge collections additionally _from and _to) do not need to be specified in the schema. You may set additionalProperties: false to only allow attributes described by the schema. System attributes do not fall under this restriction.

Attributes with numeric values always have the type "number", even if they are whole numbers (and internally use an integer type). If you want to restrict an attribute to integer values, use "type": "number" together with "multipleOf": 1.

Remote schemas are not supported for security reasons.

Levels

The level controls when the validation is triggered:

  • none: The rule is inactive and validation thus turned off.
  • new: Only newly inserted documents are validated.
  • moderate: New and modified documents must pass validation, except for modified documents where the OLD value did not pass validation already. This level is useful if you have documents which do not match your target structure, but you want to stop the insertion of more invalid documents and prohibit that valid documents are changed to invalid documents.
  • strict: All new and modified document must strictly pass validation. No exceptions are made (default).

Error message

If the schema validation for a document fails, then a generic error is raised. You may customize the error message via the message attribute to provide a summary of what the expected document structure is or point out common mistakes.

The schema validation cannot pin-point which part of a rule made it fail because it is difficult to determine and report for complex schemas. For example, when using not and anyOf, this would result in trees of possible errors. You can use tools like jsonschemavalidator.net  to examine schema validation issues.

Performance

The schema validation is executed for data-modification operations according to the levels described above. That means that it can slow down document write operations, with more complex schemas typically taking more time for the validation than very simple schemas.

The following AQL functions are available to work with schemas:

Backup and restore

Logical backups created with arangodump include the schema configuration, which is a collection property.

When using arangorestore to restore to a collection with a defined schema, no schema validation is executed.