Merge branch 'L2SS-936_xst_logging' into 'master'

Reduce XST logging by periodically updating Statistics parameters Closes L2SS-936 See merge request !438

Merge branch 'L2SS-936_xst_logging' into 'master'
be876a54 · Corné Lukken · d99a1f79 · bd0551b5 · be876a54 · be876a54
Commit be876a54 authored 2 years ago by Corné Lukken
--- a/README.md
+++ b/README.md
@@ -8,11 +8,16 @@ Station Control software related to Tango devices.

 # Index

+* [Installation](#installation)
+  * [Prerequisites](#prerequisites)
+  * [Bootstrap](#bootstrap)
+* [User documentation (ReadTheDocs (Sphinx / ReStructuredText))](tangostationcontrol/docs/README.md)
 * [Docker compose documentation](docker-compose/README.md)
  * [Timescaledb](docker-compose/timescaledb/README.md)
  * [Jupyter startup files](docker-compose/jupyter/ipython-profiles/stationcontrol-jupyter/startup/README.md)
  * [Tango Prometheus exporter](https://git.astron.nl/lofar2.0/ska-tango-grafana-exporter)
-* [ReadTheDocs (Sphinx / ReStructuredText) documentation](tangostationcontrol/docs/README.md)
+* [Developer Documentation](#development)
+  * [Versioning](#versioning)
 * Source code documentation
  * [Attribute wrapper documentation](tangostationcontrol/tangostationcontrol/clients/README.md)
  * [Archiver documentation](tangostationcontrol/tangostationcontrol/toolkit/README.md)
@@ -20,6 +25,7 @@ Station Control software related to Tango devices.
  * [HDF5 statistics](tangostationcontrol/tangostationcontrol/statistics/README.md)
 * [Unit tests](tangostationcontrol/tangostationcontrol/test/README.md)
 * [Integration tests](tangostationcontrol/tangostationcontrol/integration_test/README.md)
+* [Release Notes](#release-notes)

 # Installation

@@ -41,20 +47,16 @@ You will also need:

 ## Bootstrap

-The bootstrap procedure is needed only once. First we build all docker containers, and load the initial configuration. This may take a while:
+The bootstrap procedure is needed only once. First we build all docker
+containers, and load the initial configuration. This may take a while:

 ```
 cd docker-compose
 make bootstrap
 ```

-If you lack access to LOFAR station hardware, configure the devices to use their simulators instead:
-
-```
-for sim in ../CDB/*-sim-config.json; do
-  ../sbin/update_ConfigDb.sh ../CDB${sim}-config.json
-done
-```
+By default bootstrap will configure the station to use simulators. You can
+lookup alternative configurations in the CDB directory.

 Now we can start all containers, and make sure everything is up:

@@ -63,6 +65,43 @@ make start
 make status
 ```

-If not, you can inspect why with `docker logs <container>`. The containers will automatically be restarted on reboot or failure. Stop them explicitly to bring them down (`make stop <container>`).
+If not, you can inspect why with `docker logs <container>`. The containers will
+automatically be restarted on reboot or failure. Stop them explicitly to bring
+them down (`make stop <container>`).
+
+Most notably, you will have web interfaces available at:
+ - http://localhost:8888 (Jupyter Notebook)
+ - http://localhost:3000 (Grafana).
+
+# Development
+
+For development you will need several dependencies including:
+
+```
+git g++ gcc make docker docker-compose shellcheck graphviz python3-dev \
+python3-pip python3-tox libboost-python-dev libtango-cpp pkg-config 
+```
+
+Of these docker-compose must be at least 2.0 and Python 3.7 or higher.
+Alternatively, tox can be installed through pip using `pip install tox`.
+
+Finally, running unit tests relies on availability of casacore data see:
+[lofar-device-base Dockerfile](docker-compose/lofar-device-base/Dockerfile)
+for details.
+
+## Versioning
+
+When changing behavior a new version for Lofar Station Control should be
+reserved. To do this please follow [semantic versioning](https://semver.org/).
+
+Next change the version in the following places:
+
+1. The [VERSION](VERSION) file.
+2. In [test_writer_sst.py](tangostationcontrol/tangostationcontrol/integration_test/default/statistics/test_writer_sst.py)
+   for the `test_header_info` test.
+3. Add a [Release note](#release-notes) for the given version.
+3. Once the merge requests is merged to master, add a tag with the version (just x.x.x not Vx.x.x)
+
+# Release Notes

-Most notably, you will have web interfaces available at http://localhost:8888 (Jupyter Notebook), and http://localhost:3000 (Grafana).
+* 0.1.2 Fix `StatisticsClient` accessing `last_invalid_packet_exception` parameter
--- a/tangostationcontrol/VERSION
+++ b/tangostationcontrol/VERSION
-0.1.1
\ No newline at end of file
+0.1.2
--- a/tangostationcontrol/requirements.txt
+++ b/tangostationcontrol/requirements.txt
@@ -3,7 +3,7 @@
 # integration process, which may cause wedges in the gate later.

 importlib-metadata<2.0.0,>=0.12;python_version<"3.8"
-lofar-station-client@git+https://git.astron.nl/lofar2.0/lofar-station-client@0.6.0
+lofar-station-client@git+https://git.astron.nl/lofar2.0/lofar-station-client@0.9.1
 numpy
 mock
 asyncua >= 0.9.90 # LGPLv3

--- a/tangostationcontrol/setup.cfg
+++ b/tangostationcontrol/setup.cfg
@@ -26,7 +26,7 @@ package_dir=
 packages=find:
 python_requires => 3.7
 install_requires =
-    importlib-metadata>=0.12;python_version<"3.8"
+    importlib-metadata>=0.12, <5.0;python_version<"3.8"
    pip>=1.5

 [options.packages.find]

--- a/tangostationcontrol/tangostationcontrol/clients/statistics/consumer.py
+++ b/tangostationcontrol/tangostationcontrol/clients/statistics/consumer.py
@@ -10,6 +10,7 @@
 import logging
 from threading import Thread
 from queue import Queue
+import time

 from lofar_station_client.statistics.collector import StatisticsCollector

@@ -24,6 +25,9 @@ class StatisticsConsumer(Thread, StatisticsClientThread):
    # Maximum time to wait for the Thread to get unstuck, if we want to stop
    DISCONNECT_TIMEOUT = 10.0

+    # Minimum time between packet exception logging
+    LOGGING_TIME = 30
+
    # No default options required, for now?
    _DEFAULT_OPTIONS = {}

@@ -35,10 +39,30 @@ class StatisticsConsumer(Thread, StatisticsClientThread):
        super().__init__()
        self.start()

+        self.last_exception_time = time.time()
+        self.exception_counter = 0
+
    @property
    def _options(self) -> dict:
        return StatisticsConsumer._DEFAULT_OPTIONS

+    def _exception_logging(self, err):
+        # get the time since we last logged a message
+        time_since_log = time.time() - self.last_exception_time
+        self.exception_counter += 1
+
+        # if the time since we last logged an exeption is greater than LOGGING TIME
+        if time_since_log < self.LOGGING_TIME:
+            return
+
+        if self.exception_counter == 1:
+            logger.exception(f"Could not parse statistics packet")
+        else:
+            logger.exception(f"Could not parse {self.exception_counter} statistics packets in the last {int(time_since_log)} seconds")
+
+        self.last_exception_time = time.time()
+        self.exception_counter = 0
+
    def run(self):
        logger.info("Starting statistics thread")

@@ -53,8 +77,7 @@ class StatisticsConsumer(Thread, StatisticsClientThread):
            try:
                self.collector.process_packet(self.last_packet)
            except ValueError as e:
-                logger.exception("Could not parse statistics packet")
-
+                self._exception_logging()
                # continue processing

        logger.info("Stopped statistics thread")

--- a/tangostationcontrol/tangostationcontrol/devices/sdp/statistics.py
+++ b/tangostationcontrol/tangostationcontrol/devices/sdp/statistics.py
@@ -67,6 +67,7 @@ class Statistics(opcua_device):
    # when last packet was received
    last_packet_timestamp_R = attribute_wrapper(comms_id=StatisticsClient, comms_annotation={"type": "udp", "parameter": "last_packet_timestamp"}, datatype=numpy.uint64)

+
    # queue fill percentage, as reported by the consumer
    queue_collector_fill_percentage_R = attribute_wrapper(comms_id=StatisticsClient, comms_annotation={"type": "queue", "parameter": "collector_fill_percentage"}, datatype=numpy.uint64)
    queue_replicator_fill_percentage_R = attribute_wrapper(comms_id=StatisticsClient, comms_annotation={"type": "queue", "parameter": "replicator_fill_percentage"}, datatype=numpy.uint64)
@@ -83,6 +84,8 @@ class Statistics(opcua_device):
    nof_invalid_packets_R   = attribute_wrapper(comms_id=StatisticsClient, comms_annotation={"type": "statistics", "parameter": "nof_invalid_packets"}, datatype=numpy.uint64)
    # last packet that could not be parsed
    last_invalid_packet_R   = attribute_wrapper(comms_id=StatisticsClient, comms_annotation={"type": "statistics", "parameter": "last_invalid_packet"}, dims=(9000,), datatype=numpy.uint8)
+    # what the last exception was
+    last_invalid_packet_exception_R = attribute_wrapper(comms_id=StatisticsClient, comms_annotation={"type": "statistics", "parameter": "last_invalid_packet_exception"}, datatype=str)

    # --------
    # Overloaded functions

--- a/tangostationcontrol/tangostationcontrol/integration_test/default/statistics/test_writer_sst.py
+++ b/tangostationcontrol/tangostationcontrol/integration_test/default/statistics/test_writer_sst.py
@@ -82,7 +82,7 @@ class TestStatisticsWriterSST(BaseIntegrationTestCase):
                    '2021-09-20T12:17:40.000+00:00'
                )
                self.assertIsNotNone(stat)
-                self.assertEqual("0.1.1", stat.station_version_id)
+                self.assertEqual("0.1.2", stat.station_version_id)
                self.assertEqual("0.1", stat.writer_version_id)

    def test_insert_tango_SST_statistics(self):