Introduction
=====================================

The Operations Central Monitoring setup collects monitoring information from across the instrument, and provides monitoring dashboards as well as an alarm-management system on top. It provides you with the following user services:

* A *Grafana* monitoring & alerting system, exposed on http://localhost:3001 (credentials: admin/admin),
* A *Alerta* alarm-management system implementing the `ISA 18.2 <http://www.tc.faa.gov/its/worldpac/Standards/isa/ISA_18.2[1].pdf>`_ alarm model, exposed on http://localhost:8081 (credentials: admin/alerta).

As well as the following backing services to support the setup:

* A *Prometheus* database that collects monitoring information from across the instrument, exposed on http://localhost:9091,
* A *Node Exporter* scraper that collects monitoring information of the host running this software stack, exposed on http://localhost:9100.

.. hint:: The URLs assume you're running this software on localhost. Replace this with the hostname of the hosting system if you're accessing this software on a server.

The services are connected as follows. The green components are part of this software package, the gray components are external:

.. graphviz::

    digraph monitoring_setup {
        layout=dot;
        nodesep=1.2;

        fontname="Helvetica,Arial,sans-serif"
        node [fontname="Helvetica,Arial,sans-serif" fontsize="20pt" style=filled fixedsize=true]
        edge [fontname="Helvetica,Arial,sans-serif" fontsize="20pt"]
        rankdir=TB;

        node [shape=ellipse height=1 width=2 color=gray];
        slack;

        node [shape=rectangle width=1 color=gray];
        user;

        subgraph cluster_operational_central_management {
            color=black;
            label="Operational Central Management";

            node [shape=ellipse height=1 width=2 color=aquamarine];
            prometheus; grafana; alerta; node_exporter;

            prometheus -> grafana [label="query results"];
            grafana -> alerta [label="alerts"];
            node_exporter -> prometheus [label="metrics"];
            grafana -> prometheus [label="metrics"];
            prometheus -> prometheus [label="metrics"];
        }

        subgraph cluster_station {
            label="LOFAR2.0 Station";
            node [shape=ellipse height=1 width=2 color=gray];
            station_prometheus [label="prometheus"];
            station_grafana [label="grafana"]
            station_node_exporter [label="node_exporter"]
            hardware
            tango_devices;
            exporter;
            jupyter;

            station_node_exporter -> station_prometheus [label="metrics"];
            station_prometheus -> station_grafana [label="query results"];
            station_grafana -> station_prometheus [label="metrics"];
            hardware -> tango_devices [label="M&C"];
            tango_devices -> exporter [label="metrics"];
            exporter -> station_prometheus [label="metrics"];
            station_prometheus -> jupyter [label="metrics"];
            tango_devices -> jupyter [label="M&C"]
        }

        station_prometheus -> prometheus [label="metrics" minlen=1];
        station_grafana -> user [label="dashboards"]
        jupyter -> user [label="M&C"];
        alerta -> slack [label="notifications"];

        grafana -> user [label="dashboards"];
        alerta -> user [label="notifications"];
        slack -> user [label="notifications"];
    }