Set up Prometheus & Grafana for Datacenter-to-Datacenter-Replication

ArangoSync provides metrics in a format supported by Prometheus  that you can optionally use. ArangoDB also provides a standard set of dashboards for viewing those metrics in Grafana .

If you want to use these tools, please refer to their websites for instructions on how to deploy them.

After deployment, you must configure Prometheus using a configuration file that instructs it about which targets to scrape. For ArangoSync you should configure scrape targets for all sync masters and all sync workers. To do so, you can use a configuration such as this:

global:
  scrape_interval:     10s # scrape targets every 10 seconds.

scrape_configs:
  # Scrap sync masters
  - job_name: 'sync_master'
    scheme: 'https'
    bearer_token: "${MONITORINGTOKEN}"
    tls_config:
      insecure_skip_verify: true
    static_configs:
      - targets:
        - "${IPMASTERA1}:8629"
        - "${IPMASTERA2}:8629"
        - "${IPMASTERB1}:8629"
        - "${IPMASTERB2}:8629"
        labels:
          type: "master"
    relabel_configs:
      - source_labels: [__address__]
        regex:         ${IPMASTERA1}\:8629|${IPMASTERA2}\:8629
        target_label:  dc
        replacement:   A
      - source_labels: [__address__]
        regex:         ${IPMASTERB1}\:8629|${IPMASTERB2}\:8629
        target_label:  dc
        replacement:   B
      - source_labels: [__address__]
        regex:         ${IPMASTERA1}\:8629|${IPMASTERB1}\:8629
        target_label:  instance
        replacement:   1
      - source_labels: [__address__]
        regex:         ${IPMASTERA2}\:8629|${IPMASTERB2}\:8629
        target_label:  instance
        replacement:   2

  # Scrap sync workers
  - job_name: 'sync_worker'
    scheme: 'https'
    bearer_token: "${MONITORINGTOKEN}"
    tls_config:
      insecure_skip_verify: true
    static_configs:
      - targets:
        - "${IPWORKERA1}:8729"
        - "${IPWORKERA2}:8729"
        - "${IPWORKERB1}:8729"
        - "${IPWORKERB2}:8729"
        labels:
          type: "worker"
    relabel_configs:
      - source_labels: [__address__]
        regex:         ${IPWORKERA1}\:8729|${IPWORKERA2}\:8729
        target_label:  dc
        replacement:   A
      - source_labels: [__address__]
        regex:         ${IPWORKERB1}\:8729|${IPWORKERB2}\:8729
        target_label:  dc
        replacement:   B
      - source_labels: [__address__]
        regex:         ${IPWORKERA1}\:8729|${IPWORKERB1}\:8729
        target_label:  instance
        replacement:   1
      - source_labels: [__address__]
        regex:         ${IPWORKERA2}\:8729|${IPWORKERB2}\:8729
        target_label:  instance
        replacement:   2
The above example assumes 2 datacenters, with 2 sync masters & 2 sync workers per datacenter. You have to replace all ${...} variables in the above configuration with applicable values from your environment.

Prometheus can be a memory & CPU intensive process. It is recommended to keep them on other machines than used to run the ArangoDB cluster or ArangoSync components.

Consider these machines to be easily replaceable, unless you configure alerting on prometheus, in which case it is recommended to keep a close eye on them, such that you do not lose any alerts due to failures of Prometheus.