ArangoSearch data preprocessing

State of the Art Preprocessing and Filtering with ArangoSearch

00ArangoSearchTags: , , ,

Just in case you haven’t heard about ArangoSearch yet, it is a high-performance Full-Text Search engine integrated in ArangoDB (meaning connected with the other data-models and AQL). Feel free to check out ArangoSearch – Full-text search engine including similarity ranking capabilities for more details.

In ArangoDB version 3.7 the ArangoSearch team added Fuzzy Search support (see the comprehensive article Fuzzy search by Andrey Abramov). With Fuzzy Search data preprocessing and filtering becomes even more important. In the upcoming ArangoDB 3.8 release, ArangoSearch efforts will be focused on improving this part. In this post I’m going to uncover some of the new features we are proud to present. 

More info

ArangoML Series: Intro to NetworkX Adapter

00ArangoML, General, Graphs, how to, Machine LearningTags: , , , , ,

This post is the fifth in a series of posts introducing the ArangoML features and tools. This post introduces the NetworkX adapter, which makes it easy to analyze your graphs stored in ArangoDB with NetworkX.

In this post we:

  • Briefly introduce NetworkX
  • Explore the IMDB user rating dataset
  • Showcase the ArangoDB integration of NetworkX
  • Explore the centrality measures of the data using NetworkX
  • Store the experiment with arangopipe

This notebook is just a slice of the full-sized notebook available in the ArangoDB NetworkX adapter repository. It is summarized here to better fit the blog post format and provide a quick introduction to using the NetworkX adapter. 

ArangoML Pipeline Cloud graphic showing an example machine learning pipeline
More info

ArangoML Part 4: Detecting Covariate Shift in Datasets

00ArangoML, General, Graphs, Machine LearningTags: ,

This post is the fourth in a series of posts introducing ArangoML and showcasing its benefits to your machine learning pipelines. Until now, we have focused on ArangoML’s ability to capture metadata for your machine learning projects, but it does much more. 

In this post we:

  • Introduce the concept of covariate shift in datasets
  • Showcase the built-in dataset shift detection API
ArangoML Pipeline Complete pipeline - ArangoDB Machine Learning
More info

A story of a memory leak in GO: How to properly use time.After()

00GeneralTags: ,

Recently, we decided to investigate why our application ARANGOSYNC for synchronizing two ArangoDB clusters across data centers used up a lot of memory – around 2GB in certain cases. The environment contained ~1500 shards with 5000 GOroutines. Thanks to tools like pprof (to profile CPU and memory usage) it was very easy to identify the issue. The GO profiler showed us that memory was allocated in the function time.After() and it accumulated up to nearly 1GB. The memory was not released so it was clear that we had a memory leak. We will explain how memory leaks can occur using the time.After() function through three examples.

More info

ArangoDB is Open Source

We code transparently on Github.
Support & join us there
Become a Stargazer
close-link