Estimated reading time: 15 minutes
We are proud to announce the GA 1.0 release of the ArangoDB-DGL Adapter!
The ArangoDB-DGL Adapter exports Graphs from ArangoDB, a multi-model Graph Database, into Deep Graph Library (DGL), a python package for graph neural networks, and vice-versa.
On December 30th, 2021, we introduced to the ArangoML community our first release of the DGL Adapter for ArangoDB. We worked closely with our existing ArangoDB-NetworkX Adapter implementation to aim for a consistent UX across our (growing) Adapter Family. You can expect the same developer-friendly options, along with a helpful getting-started guide via Google Colab. And as always, it is open source!
This blog post will serve as a walkthrough of the ArangoDB-DGL Adapter, via its official Jupyter Notebook.
We will cover the following use cases:
- ArangoDB to DGL
- Via an ArangoDB graph
- Via a set of ArangoDB collections
- Via a user-defined metagraph
- Unique cases in attribute transfer
- DGL to ArangoDB
- Homogeneous graphs
- Heterogeneous graphs
- Unique cases in attribute transfer
ArangoDB DGL Adapter Getting Started Guide¶
Version: 2.0.0
Objective: Export Graphs from ArangoDB, a multi-model Graph Database, to Deep Graph Library (DGL), a python package for graph neural networks, and vice-versa.
Setup¶
%%capture
!pip install adbdgl-adapter==2.0.0
!pip install adb-cloud-connector
!git clone -b 2.0.0 --single-branch https://github.com/arangoml/dgl-adapter.git
## For drawing purposes
!pip install matplotlib
!pip install networkx
# All imports
import dgl
from dgl import remove_self_loop
from dgl.data import MiniGCDataset
from dgl.data import KarateClubDataset
import torch
from torch.functional import Tensor
from adbdgl_adapter import ADBDGL_Adapter, ADBDGL_Controller
from adbdgl_adapter.typings import Json, ArangoMetagraph, DGLCanonicalEType, DGLDataDict
from arango import ArangoClient
from adb_cloud_connector import get_temp_credentials
import json
import logging
import matplotlib.pyplot as plt
import networkx as nx
Understanding DGL¶
(referenced from docs.dgl.ai)
Deep Graph Library (DGL) is a Python package built for easy implementation of graph neural network model family, on top of existing DL frameworks (currently supporting PyTorch, MXNet and TensorFlow).
DGL represents a directed graph as a DGLGraph
object. You can construct a graph by specifying the number of nodes in the graph as well as the list of source and destination nodes. Nodes in the graph have consecutive IDs starting from 0.
The following code constructs a directed "star" homogeneous graph with 6 nodes and 5 edges.
# A homogeneous graph with 6 nodes, and 5 edges
g = dgl.graph(([0, 0, 0, 0, 0], [1, 2, 3, 4, 5]))
print(g)
# Print the graph's canonical edge types
print("\nCanonical Edge Types: ", g.canonical_etypes)
# >>> [('_N', '_E', '_N')]
# '_N' being the only Node type
# '_E' being the only Edge type
In DGL, a heterogeneous graph (heterograph for short) is specified with a series of graphs as below, one per relation. Each relation is a string triplet (source node type, edge type, destination node type)
. Since relations disambiguate the edge types, DGL calls them canonical edge types:
# A heterogeneous graph with 8 nodes, and 7 edges
g = dgl.heterograph({
('user', 'follows', 'user'): (torch.tensor([0, 1]), torch.tensor([1, 2])),
('user', 'follows', 'game'): (torch.tensor([0, 1, 2]), torch.tensor([1, 2, 3])),
('user', 'plays', 'game'): (torch.tensor([1, 3]), torch.tensor([2, 3]))
})
print(g)
print("\nCanonical Edge Types: ", g.canonical_etypes)
print("\nNode Types: ", g.ntypes)
print("\nEdge Types: ", g.etypes)
Many graph data contain attributes on nodes and edges. Although the types of node and edge attributes can be arbitrary in real world, DGLGraph only accepts attributes stored in tensors (with numerical contents). Consequently, an attribute of all the nodes or edges must have the same shape. In the context of deep learning, those attributes are often called features.
You can assign and retrieve node and edge features via ndata and edata interface.
# A homogeneous graph with 6 nodes, and 5 edges
g = dgl.graph(([0, 0, 0, 0, 0], [1, 2, 3, 4, 5]))
# Assign an integer value for each node.
g.ndata['x'] = torch.tensor([151, 124, 41, 89, 76, 55])
# Assign a 4-dimensional edge feature vector for each edge.
g.edata['a'] = torch.randn(5, 4)
print(g)
print("\nNode Data X attribute: ", g.ndata['x'])
print("\nEdge Data A attribute: ", g.edata['a'])
# NOTE: The following line ndata insertion will fail, since not all nodes have been assigned an attribute value
# g.ndata['bad_attribute'] = torch.tensor([0,10,20,30,40])
When multiple node/edge types are introduced, users need to specify the particular node/edge type when invoking a DGLGraph API for type-specific information. In addition, nodes/edges of different types have separate IDs.
g = dgl.heterograph({
('user', 'follows', 'user'): (torch.tensor([0, 1]), torch.tensor([1, 2])),
('user', 'follows', 'game'): (torch.tensor([0, 1, 2]), torch.tensor([1, 2, 3])),
('user', 'plays', 'game'): (torch.tensor([1, 3]), torch.tensor([2, 3]))
})
# Get the number of all nodes in the graph
print("All nodes: ", g.num_nodes())
# Get the number of user nodes
print("User nodes: ", g.num_nodes('user'))
# Nodes of different types have separate IDs,
# hence not well-defined without a type specified
# print(g.nodes())
#DGLError: Node type name must be specified if there are more than one node types.
print(g.nodes('user'))
To set/get features for a specific node/edge type, DGL provides two new types of syntax – g.nodes[‘node_type’].data[‘feat_name’] and g.edges[‘edge_type’].data[‘feat_name’].
Note: If the graph only has one node/edge type, there is no need to specify the node/edge type.
g = dgl.heterograph({
('user', 'follows', 'user'): (torch.tensor([0, 1]), torch.tensor([1, 2])),
('user', 'follows', 'game'): (torch.tensor([0, 1, 2]), torch.tensor([1, 2, 3])),
('user', 'plays', 'game'): (torch.tensor([1, 3]), torch.tensor([2, 3]))
})
g.nodes['user'].data['age'] = torch.tensor([21, 16, 38, 64])
# An alternative (yet equivalent) syntax:
# g.ndata['age'] = {'user': torch.tensor([21, 16, 38, 64])}
print(g.ndata)
For more info, visit https://docs.dgl.ai/en/0.6.x/.
Create a Temporary ArangoDB Cloud Instance¶
# Request temporary instance from the managed ArangoDB Cloud Service.
con = get_temp_credentials()
print(json.dumps(con, indent=2))
# Connect to the db via the python-arango driver
db = ArangoClient(hosts=con["url"]).db(con["dbName"], con["username"], con["password"], verify=True)
Feel free to use to above URL to checkout the UI!
Data Import¶
For demo purposes, we will be using the ArangoDB Fraud Detection example graph.
!chmod -R 755 dgl-adapter/
!./dgl-adapter/tests/assets/arangorestore -c none --server.endpoint http+ssl://{con["hostname"]}:{con["port"]} --server.username {con["username"]} --server.database {con["dbName"]} --server.password {con["password"]} --replication-factor 3 --input-directory "dgl-adapter/examples/data/fraud_dump" --include-system-collections true
Instantiate the Adapter¶
Connect the ArangoDB-DGL Adapter to our temporary ArangoDB cluster:
adbdgl_adapter = ADBDGL_Adapter(db)
ArangoDB to DGL¶
Via ArangoDB Graph¶
Data source
- ArangoDB Fraud-Detection Graph
Package methods used
Important notes
- The
name
parameter in this case must point to an existing ArangoDB graph in your ArangoDB instance.
# Define graph name
graph_name = "fraud-detection"
# Create DGL graph from ArangoDB graph
dgl_g = adbdgl_adapter.arangodb_graph_to_dgl(graph_name)
# You can also provide valid Python-Arango AQL query options to the command above, like such:
# dgl_g = aadbdgl_adapter.arangodb_graph_to_dgl(graph_name, ttl=1000, stream=True)
# See more here: https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute
# Show graph data
print('\n--------------------')
print(dgl_g)
print(dgl_g.ntypes)
print(dgl_g.etypes)
Via ArangoDB Collections¶
Data source
- ArangoDB Fraud-Detection Collections
Package methods used
Important notes
- The
name
parameter in this case is simply for naming your DGL graph. - The
vertex_collections
&edge_collections
parameters must point to existing ArangoDB collections within your ArangoDB instance.
# Define collection names
vertex_collections = {"account", "Class", "customer"}
edge_collections = {"accountHolder", "Relationship", "transaction"}
# Create DGL from ArangoDB collections
dgl_g = adbdgl_adapter.arangodb_collections_to_dgl("fraud-detection", vertex_collections, edge_collections)
# You can also provide valid Python-Arango AQL query options to the command above, like such:
# dgl_g = adbdgl_adapter.arangodb_collections_to_dgl("fraud-detection", vertex_collections, edge_collections, ttl=1000, stream=True)
# See more here: https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute
# Show graph data
print('\n--------------------')
print(dgl_g)
print(dgl_g.ntypes)
print(dgl_g.etypes)
Via ArangoDB Metagraph¶
Data source
- ArangoDB Fraud-Detection Collections
Package methods used
Important notes
- The
name
parameter in this case is simply for naming your DGL graph. - The
metagraph
parameter should contain collections & associated document attributes names that exist within your ArangoDB instance.
# Define Metagraph
fraud_detection_metagraph = {
"vertexCollections": {
"account": {"rank", "Balance", "customer_id"},
"Class": {"concrete"},
"customer": {"rank"},
},
"edgeCollections": {
"accountHolder": {},
"Relationship": {},
"transaction": {"receiver_bank_id", "sender_bank_id", "transaction_amt"},
},
}
# Create DGL Graph from attributes
dgl_g = adbdgl_adapter.arangodb_to_dgl('FraudDetection', fraud_detection_metagraph)
# You can also provide valid Python-Arango AQL query options to the command above, like such:
# dgl_g = adbdgl_adapter.arangodb_to_dgl(graph_name = 'FraudDetection', fraud_detection_metagraph, ttl=1000, stream=True)
# See more here: https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute
# Show graph data
print('\n--------------')
print(dgl_g)
print('\n--------------')
print(dgl_g.ndata)
print('--------------\n')
print(dgl_g.edata)
Via ArangoDB Metagraph with a custom controller and verbose logging¶
Data source
- ArangoDB Fraud-Detection Collections
Package methods used
Important notes
- The
name
parameter in this case is simply for naming your DGL graph. - The
metagraph
parameter should contain collections & associated document attributes names that exist within your ArangoDB instance. - We are creating a custom
ADBDGL_Controller
to specify how to convert our ArangoDB vertex/edge attributes into DGL node/edge features. View the defaultADBDGL_Controller
here.
# Define Metagraph
fraud_detection_metagraph = {
"vertexCollections": {
"account": {"rank"},
"Class": {"concrete", "name"},
"customer": {"Sex", "Ssn", "rank"},
},
"edgeCollections": {
"accountHolder": {},
"Relationship": {},
"transaction": {"receiver_bank_id", "sender_bank_id", "transaction_amt", "transaction_date", "trans_time"},
},
}
# A user-defined Controller class is REQUIRED when converting non-numerical
# ArangoDB attributes to DGL features.
class FraudDetection_ADBDGL_Controller(ADBDGL_Controller):
"""ArangoDB-DGL controller.
Responsible for controlling how ArangoDB attributes
are converted into DGL features, and vice-versa.
You can derive your own custom ADBDGL_Controller if you want to maintain
consistency between your ArangoDB attributes & your DGL features.
"""
def _adb_attribute_to_dgl_feature(self, key: str, col: str, val):
"""
Given an ArangoDB attribute key, its assigned value (for an arbitrary document),
and the collection it belongs to, convert it to a valid
DGL feature: https://docs.dgl.ai/en/0.6.x/guide/graph-feature.html.
NOTE: You must override this function if you want to transfer non-numerical
ArangoDB attributes to DGL (DGL only accepts 'attributes' (a.k.a features)
of numerical types). Read more about DGL features here:
https://docs.dgl.ai/en/0.6.x/new-tutorial/2_dglgraph.html#assigning-node-and-edge-features-to-graph.
:param key: The ArangoDB attribute key name
:type key: str
:param col: The ArangoDB collection of the ArangoDB document.
:type col: str
:param val: The assigned attribute value of the ArangoDB document.
:type val: Any
:return: The attribute's representation as a DGL Feature
:rtype: Any
"""
try:
if col == "transaction":
if key == "transaction_date":
return int(str(val).replace("-", ""))
if key == "trans_time":
return int(str(val).replace(":", ""))
if col == "customer":
if key == "Sex":
return {
"M": 0,
"F": 1
}.get(val, -1)
if key == "Ssn":
return int(str(val).replace("-", ""))
if col == "Class":
if key == "name":
return {
"Bank": 0,
"Branch": 1,
"Account": 2,
"Customer": 3
}.get(val, -1)
except (ValueError, TypeError, SyntaxError):
return 0
# Rely on the parent Controller as a final measure
return super()._adb_attribute_to_dgl_feature(key, col, val)
# Instantiate the new adapter
fraud_adbdgl_adapter = ADBDGL_Adapter(db, FraudDetection_ADBDGL_Controller())
# You can also change the adapter's logging level for access to
# silent, regular, or verbose logging (logging.WARNING, logging.INFO, logging.DEBUG)
fraud_adbdgl_adapter.set_logging(logging.DEBUG) # verbose logging
# Create DGL Graph from attributes
dgl_g = fraud_adbdgl_adapter.arangodb_to_dgl('FraudDetection', fraud_detection_metagraph)
# You can also provide valid Python-Arango AQL query options to the command above, like such:
# dgl_g = fraud_adbdgl_adapter.arangodb_to_dgl(graph_name = 'FraudDetection', fraud_detection_metagraph, ttl=1000, stream=True)
# See more here: https://docs.python-arango.com/en/main/specs.html#arango.aql.AQL.execute
# Show graph data
print('\n--------------')
print(dgl_g)
print('\n--------------')
print(dgl_g.ndata)
print('--------------\n')
print(dgl_g.edata)
DGL to ArangoDB¶
Karate Graph¶
Data source
Package methods used
Important notes
- The
name
parameter in this case is simply for naming your ArangoDB graph.
# Create the DGL graph & draw it
dgl_karate_graph = KarateClubDataset()[0]
nx.draw(dgl_karate_graph.to_networkx(), with_labels=True)
name = "Karate"
# Delete the graph if it already exists
db.delete_graph(name, drop_collections=True, ignore_missing=True)
# Create the ArangoDB graph
adb_karate_graph = adbdgl_adapter.dgl_to_arangodb(name, dgl_karate_graph)
print('\n--------------------')
print("URL: " + con["url"])
print("Username: " + con["username"])
print("Password: " + con["password"])
print("Database: " + con["dbName"])
print('--------------------\n')
print(f"View the created graph here: {con['url']}/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{name}\n")
print(f"View the original graph below:\n")
MiniGCDataset Graphs¶
Data source
Package methods used
Important notes
- The
name
parameters in this case are simply for naming your ArangoDB graph.
# Load the dgl graphs & draw:
## 1) Lollipop Graph
dgl_lollipop_graph = remove_self_loop(MiniGCDataset(8, 7, 8)[3][0])
plt.figure(1)
nx.draw(dgl_lollipop_graph.to_networkx(), with_labels=True)
## 2) Hypercube Graph
dgl_hypercube_graph = remove_self_loop(MiniGCDataset(8, 8, 9)[4][0])
plt.figure(2)
nx.draw(dgl_hypercube_graph.to_networkx(), with_labels=True)
## 3) Clique Graph
dgl_clique_graph = remove_self_loop(MiniGCDataset(8, 6, 7)[6][0])
plt.figure(3)
nx.draw(dgl_clique_graph.to_networkx(), with_labels=True)
lollipop = "Lollipop"
hypercube = "Hypercube"
clique = "Clique"
# Delete the graphs from ArangoDB if they already exist
db.delete_graph(lollipop, drop_collections=True, ignore_missing=True)
db.delete_graph(hypercube, drop_collections=True, ignore_missing=True)
db.delete_graph(clique, drop_collections=True, ignore_missing=True)
# Create the ArangoDB graphs
adb_lollipop_graph = adbdgl_adapter.dgl_to_arangodb(lollipop, dgl_lollipop_graph)
adb_hypercube_graph = adbdgl_adapter.dgl_to_arangodb(hypercube, dgl_hypercube_graph)
adb_clique_graph = adbdgl_adapter.dgl_to_arangodb(clique, dgl_clique_graph)
print('\n--------------------')
print("URL: " + con["url"])
print("Username: " + con["username"])
print("Password: " + con["password"])
print("Database: " + con["dbName"])
print('--------------------\n')
print("View the created graphs here:\n")
print(f"1) {con['url']}/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{lollipop}")
print(f"2) {con['url']}/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{hypercube}")
print(f"3) {con['url']}/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{clique}\n")
print(f"View the original graphs below:\n")
MiniGCDataset Graphs with attributes¶
Data source
Package methods used
Important notes
- The
name
parameters in this case are simply for naming your ArangoDB graph. - We are creating a custom
ADBDGL_Controller
to specify how to convert our DGL node/edge features into ArangoDB vertex/edge attributes. View the defaultADBDGL_Controller
here.
# Load the dgl graphs
dgl_lollipop_graph = remove_self_loop(MiniGCDataset(8, 7, 8)[3][0])
dgl_hypercube_graph = remove_self_loop(MiniGCDataset(8, 8, 9)[4][0])
dgl_clique_graph = remove_self_loop(MiniGCDataset(8, 6, 7)[6][0])
# Add DGL Node & Edge Features to each graph
dgl_lollipop_graph.ndata["random_ndata"] = torch.tensor(
[[i, i, i] for i in range(0, dgl_lollipop_graph.num_nodes())]
)
dgl_lollipop_graph.edata["random_edata"] = torch.rand(dgl_lollipop_graph.num_edges())
dgl_hypercube_graph.ndata["random_ndata"] = torch.rand(dgl_hypercube_graph.num_nodes())
dgl_hypercube_graph.edata["random_edata"] = torch.tensor(
[[[i], [i], [i]] for i in range(0, dgl_hypercube_graph.num_edges())]
)
dgl_clique_graph.ndata['clique_ndata'] = torch.tensor([1,2,3,4,5,6])
dgl_clique_graph.edata['clique_edata'] = torch.tensor(
[1 if i % 2 == 0 else 0 for i in range(0, dgl_clique_graph.num_edges())]
)
# A user-defined Controller class is OPTIONAL when converting DGL features
# to ArangoDB attributes. NOTE: A custom Controller is NOT needed if you want to
# keep the numerical-based values of your DGL features.
class Clique_ADBDGL_Controller(ADBDGL_Controller):
"""ArangoDB-DGL controller.
Responsible for controlling how ArangoDB attributes
are converted into DGL features, and vice-versa.
You can derive your own custom ADBDGL_Controller if you want to maintain
consistency between your ArangoDB attributes & your DGL features.
"""
def _dgl_feature_to_adb_attribute(self, key: str, col: str, val: Tensor):
"""
Given a DGL feature key, its assigned value (for an arbitrary node or edge),
and the collection it belongs to, convert it to a valid ArangoDB attribute
(e.g string, list, number, ...).
NOTE: No action is needed here if you want to keep the numerical-based values
of your DGL features.
:param key: The DGL attribute key name
:type key: str
:param col: The ArangoDB collection of the (soon-to-be) ArangoDB document.
:type col: str
:param val: The assigned attribute value of the DGL node.
:type val: Tensor
:return: The feature's representation as an ArangoDB Attribute
:rtype: Any
"""
if key == "clique_ndata":
try:
return ["Eins", "Zwei", "Drei", "Vier", "Fünf", "Sechs"][key-1]
except:
return -1
if key == "clique_edata":
return bool(val)
return super()._dgl_feature_to_adb_attribute(key, col, val)
# Re-instantiate a new adapter specifically for the Clique Graph Conversion
clique_adbgl_adapter = ADBDGL_Adapter(db, Clique_ADBDGL_Controller())
# Create the ArangoDB graphs
lollipop = "Lollipop_With_Attributes"
hypercube = "Hypercube_With_Attributes"
clique = "Clique_With_Attributes"
db.delete_graph(lollipop, drop_collections=True, ignore_missing=True)
db.delete_graph(hypercube, drop_collections=True, ignore_missing=True)
db.delete_graph(clique, drop_collections=True, ignore_missing=True)
adb_lollipop_graph = adbdgl_adapter.dgl_to_arangodb(lollipop, dgl_lollipop_graph)
adb_hypercube_graph = adbdgl_adapter.dgl_to_arangodb(hypercube, dgl_hypercube_graph)
adb_clique_graph = clique_adbgl_adapter.dgl_to_arangodb(clique, dgl_clique_graph) # Notice the new adapter here!
print('\n--------------------')
print("URL: " + con["url"])
print("Username: " + con["username"])
print("Password: " + con["password"])
print("Database: " + con["dbName"])
print('--------------------\n')
print("View the created graphs here:\n")
print(f"1) {con['url']}/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{lollipop}")
print(f"2) {con['url']}/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{hypercube}")
print(f"3) {con['url']}/_db/{con['dbName']}/_admin/aardvark/index.html#graph/{clique}\n")
Continue Reading
Introducing the ArangoDB-NetworkX Adapter