Skip to content
Snippets Groups Projects
Commit 6746d9e7 authored by Jan David Mol's avatar Jan David Mol
Browse files

Describe the Prometheus content, and how to use Grafana

parent bba08be1
No related branches found
No related tags found
No related merge requests found
......@@ -17,6 +17,9 @@ The Operations Central Management module is setup to monitor the LOFAR telescope
:caption: Contents:
installation
intro
monitoring
stack
prometheus
grafana
......
......@@ -3,8 +3,8 @@ Introduction
The Operations Central Monitoring setup provides you with the following user services:
* A *Grafana* monitoring & alerting system, exposed on http://localhost:3001,
* A *Alerta* alarm-management system, exposed on http://localhost:8081.
* A *Grafana* monitoring & alerting system, exposed on http://localhost:3001 (credentials: admin/admin),
* A *Alerta* alarm-management system, exposed on http://localhost:8081 (credentials: admin/alerta).
As well as the following backing services to support the setup:
......
Monitoring
===================================
The Grafana system exposed on http://localhost:3001 allows visualisation of the monitoring information collected by Prometheus (and other sources). It contains, with links to the relevant Grafana documentation:
* A series of `dashboards <https://grafana.com/docs/grafana/latest/dashboards/>`_, organised into *folders*. Each dashboard is an independent page of visualisations. If you login, you will see the configured "Home" dashboard.
* Each dashboard has a series of `panels <https://grafana.com/docs/grafana/latest/panels/>`_, often organised into collapsable *rows*. Each panel contains a specific visualisation, and can have alarms configured on them. The panels are tiled.
* Each panel has a set of *queries*, which describe the data to be visualised, and a single *visualization*, which is how the data is visualised.
The Grafana documentation will help you with using Grafana in general. Also be sure to check out the `webinars and videos <https://grafana.com/videos/>`_ provided by them.
Writing Queries
------------------------------------
Most of the data will be queried from the *Prometheus* backend:
* Grafana provides a `Prometheus query editor <https://grafana.com/docs/grafana/latest/datasources/prometheus/#prometheus-query-editor>`_ to interactively setup queries,
* The queries themselves use the `PromQL <https://prometheus.io/docs/prometheus/latest/querying/basics/>`_ syntax.
* Apart from configuring panels, you can also play with queries in the Explore tab (http://localhost:3001/explore), and directly in the Prometheus backend (http://localhost:9091).
The Prometheus database is flat, containing time-series for metrics which carry a name, labels, and a float value::
attribute_name{label="value", ...} attribute_value
For example::
device_attribute{host="dop496", station="DTS Outside", device="stat/sdp/1", name="FPGA_temp_R", x="03", y="00"} 42.3
The queries express selections on these entries for a given name, filtered by the given labels. For example, the following query returns all FPGA temperatures across all stations, including the above entry::
device_attribute{device="stat/sdp/1", name="FPGA_temp_R"}
Furthermore, values of different metrics can be combined (added, merged, etc). See the PromQL documentation for more details.
Querying LOFAR Station Control
````````````````````````````````````
The `LOFAR Station Control <https://lofar20-station-control.readthedocs.io/en/latest/>`_ software exposes a series of metrics from each station:
:device_attribute: All monitoring points from Tango, that are configured to be exposed to Prometheus. For arrays, each element is its own metric. It carries the following labels:
:job: `stations`
:host: Station hostname from which the value was obtained (f.e. `dts-lcu`),
:station: Name of the station, as reported by the station (f.e. `DTS`) (NB: for now, the host is more reliable to use),
:device: Tango device of this attribute (f.e. `stat/recv/1`),
:name: Tango attribute name (f.e. `ANT_mask_RW`),
:type: Data type (f.e. `string`, `float`, `bool`),
:x: Offset in the first dimension, if the attribute is a 1D or 2D array, or "00",
:y: Offset in the second dimension, if the attribute is a 2D array, or "00",
:idx: Global offset in the array, combining `x` and `y`,
:str_value: The value of the attribute, if the attribute type is a string.
:device_scraping: Time required to scrape each Tango device, in seconds. It carries the following labels:
:job: `stations`
:host: Station hostname from which the value was obtained (f.e. `dts-lcu`),
:station: Name of the station, as reported by the station (f.e. `DTS`) (NB: for now, the host is more reliable to use),
:device: Tango device scraped.
Metrics from the non-Tango services are exposed as well. See the linked documentation, or use the interactive interfaces, to explore them further:
:scrape\_\*: Metrics describing scraping (=Prometheus periodically requesting the metrics), see https://prometheus.io/docs/concepts/jobs_instances/.
:job: `stations`
:host: Station hostname from which the value was obtained (f.e. `dts-lcu`),
:exported_job: Original job on the station (`host`, `prometheus`, `grafana`).
:node\_\*: Metrics describing the server, see https://github.com/prometheus/node_exporter.
:job: `stations`
:host: Station hostname from which the value was obtained (f.e. `dts-lcu`),
:exported_job: `host`
:go\_\*, grafana\_\*: Metrics from Grafana, see https://grafana.com/docs/grafana/latest/administration/view-server/internal-metrics/ and https://grafana.com/docs/grafana/latest/alerting/unified-alerting/fundamentals/evaluate-grafana-alerts/.
:job: `stations`
:host: Station hostname from which the value was obtained (f.e. `dts-lcu`),
:exported_job: `grafana`
Querying Operational Central Management
````````````````````````````````````````
This software stack itself also exposes metrics from its various services:
:scrape\_\*: Metrics describing scraping (=Prometheus periodically requesting the metrics), see https://prometheus.io/docs/concepts/jobs_instances/.
:job: `prometheus`
:node\_\*: Metrics describing the server, see https://github.com/prometheus/node_exporter.
:job: `host`
:go\_\*, grafana\_\*: Metrics from Grafana, see https://grafana.com/docs/grafana/latest/administration/view-server/internal-metrics/ and https://grafana.com/docs/grafana/latest/alerting/unified-alerting/fundamentals/evaluate-grafana-alerts/.
:job: `grafana`
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment