diff --git a/devices/devices/unb2.py b/devices/devices/unb2.py index e2f781a24e5e59c52591f0826e36000a38687aa1..4b071950bb68c52a41758a5eedba32605c0214cf 100644 --- a/devices/devices/unb2.py +++ b/devices/devices/unb2.py @@ -38,6 +38,12 @@ class UNB2(opcua_device): # Device Properties # ----------------- + UNB2_mask_RW_default = device_property( + dtype='DevVarBooleanArray', + mandatory=False, + default_value=[True] * 2 + ) + # ---------- # Attributes # ---------- diff --git a/docs/source/configure_station.rst b/docs/source/configure_station.rst new file mode 100644 index 0000000000000000000000000000000000000000..412795ff05d649ab57d255f566178a50614091bb --- /dev/null +++ b/docs/source/configure_station.rst @@ -0,0 +1,70 @@ +Enter your LOFAR2.0 Hardware Configuration +=========================================== + +The software will need to be told various aspects of your station configuration, for example, the hostnames of the station hardware to control. The following settings are installation specific, and are stored as *properties* in the :ref:`tangodb`. The format used here is ``device.property``: + +Mandatory settings +------------------- + +Without these settings, you will not obtain the associated functionality: + +:RECV.OPC_Server_Name: Hostname of RECVTR. + + :type: ``string`` + +:UNB2.OPC_Server_Name: Hostname of UNB2TR. + + :type: ``string`` + +:SDP.OPC_Server_Name: Hostname of SDPTR. + + :type: ``string`` + +:SST.OPC_Server_Name: Hostname of SDPTR. + + :type: ``string`` + +:SST.FPGA_sst_offload_hdr_eth_destination_mac_RW_default: MAC address of the network interface on the host running this software stack, on which the SSTs are to be received. This network interface must be capable of receiving Jumbo (MTU=9000) frames. + + :type: ``string[N_fpgas]`` + +:SST.FPGA_sst_offload_hdr_ip_destination_address_RW_default: IP address of the network interface on the host running this software stack, on which the SSTs are to be received. + + :type: ``string[N_fpgas]`` + +:XST.OPC_Server_Name: Hostname of SDPTR. + + :type: ``string`` + +:XST.FPGA_xst_offload_hdr_eth_destination_mac_RW_default: MAC address of the network interface on the host running this software stack, on which the XSTs are to be received. This network interface must be capable of receiving Jumbo (MTU=9000) frames. + + :type: ``string[N_fpgas]`` + +:XST.FPGA_xst_offload_hdr_ip_destination_address_RW_default: IP address of the network interface on the host running this software stack, on which the XSTs are to be received. + + :type: ``string[N_fpgas]`` + +Optional settings +------------------- + +These settings make life nicer, but are not strictly necessary to get your software up and running: + +:RECV.Ant_mask_RW_default: Which antennas are installed. + + :type: ``bool[N_RCUs][N_antennas_per_RCU]`` + +:SDP.RCU_mask_RW_default: Which RCUs are installed. + + :type: ``bool[N_RCUs]`` + +:UNB2.UNB2_mask_RW_default: Which Uniboard2s are installed in SDP. + + :type: ``bool[N_unb]`` + +:SDP.TR_fpga_mask_RW_default: Which FPGAs are installed in SDP. + + :type: ``bool[N_fpgas]`` + +:SDP.FPGA_sdp_info_station_id_RW_default: Numeric identifier for this station. + + :type: ``uint32[N_fpgas]`` diff --git a/docs/source/developer.rst b/docs/source/developer.rst new file mode 100644 index 0000000000000000000000000000000000000000..517dfa324298e9451bfa5f9b25eef9726476686e --- /dev/null +++ b/docs/source/developer.rst @@ -0,0 +1,61 @@ +Developer information +========================= + +This chapter describes key areas useful for developers. + +Docker compose +------------------------- + +The docker setup is managed using ``make`` in the ``docker-compose`` directory. Key commands are: + +- ``make status`` to check which containers are running, +- ``make build <container>`` to rebuild the image for the container, +- ``make build-nocache <container>`` to rebuild the image for the container from scratch, +- ``make restart <container>`` to restart a specific container, for example to effectuate a code change. +- ``make clean`` to remove all images and containers, and the ``tangodb`` volume. To do a deeper clean, we need to remove all volumes and rebuild all containers from scratch:: + + make clean + docker volume prune + docker build-nocache + +Since the *Python code is taken from the host when the container starts*, restarting is enough to use the code you have in your local git repo. Rebuilding is unnecessary. + +Docker networking +------------------------- + +The Docker containers started use a *virtual network* to communicate among each other. This means that: + +- Containers address each other by a host name equal to the container name (f.e. ``elk`` for the elk stack, and ``databaseds`` for the TANGO_HOST), +- ``localhost`` cannot be used within the containers to access ports of other containers. +- ``host.docker.internal`` resolves to the actual host running the containers, +- All ports used by external parties need to be exposed explicitly in the docker-compose files. The container must open the same port as is thus exposed, or the port will not be reachable. + +The networks are defined in ``docker-compose/networks.yml``: + +.. literalinclude:: ../../docker-compose/networks.yml + +The ``$NETWORK_MODE`` defaults to ``tangonet`` in the ``docker-compose/Makefile``. + +.. _corba: + +CORBA +```````````````````` + +Tango devices use CORBA, which require all servers to be able to reach each other directly. Each CORBA device opens a port and advertises its address to the CORBA broker. The broker then forwards this address to any interested clients. A device within a docker container cannot know under which name it can be reached, however, and any port opened needs to be exposed explicitly in the docker-compose file for the device. To solve all this, we *assign a unique port to each device*, and explictly tell CORBA to use that port, and what the hostname is under which others can reach it. Each device thus has these lines in their compose file:: + + ports: + - "5701:5701" # unique port for this DS + entrypoint: + # configure CORBA to _listen_ on 0:port, but tell others we're _reachable_ through ${HOSTNAME}:port, since CORBA + # can't know about our Docker port forwarding + - python3 -u /opt/lofar/tango/devices/devices/sdp/sdp.py LTS -v -ORBendPoint giop:tcp:0:5701 -ORBendPointPublish giop:tcp:${HOSTNAME}:5701 + +Specifying the wrong ``$HOSTNAME`` or port can make your device unreachable, even if it is running. Note that ``$HOSTNAME`` is advertised as is, that is, it is resolved to an IP address by any client that wants to connect. This means the ``$HOSTNAME`` needs to be correct for both the other containers, and external clients. + +The ``docker-compose/Makefile`` tries to set a good default for ``$HOSTNAME``, but you can override it by exporting the environment variable yourself (and run ``make restart <container>`` to effectuate the change). + +For more information, see: + +- https://huihoo.org/ace_tao/ACE-5.2+TAO-1.2/TAO/docs/ORBEndpoint.html +- http://omniorb.sourceforge.net/omni42/omniNames.html +- https://sourceforge.net/p/omniorb/svn/HEAD/tree/trunk/omniORB/src/lib/omniORB/orbcore/tcp/tcpEndpoint.cc diff --git a/docs/source/devices/configure.rst b/docs/source/devices/configure.rst new file mode 100644 index 0000000000000000000000000000000000000000..aa96966d2ee9d383c60e6a1651d0064bb8b914d2 --- /dev/null +++ b/docs/source/devices/configure.rst @@ -0,0 +1,63 @@ +Device Configuration +========================= + +The devices receive their configuration from two sources: + +- The TangoDB database, for static *properties*, +- Externally, from the user, or a control system, that set *control attributes* (see :doc:`devices` for what to set, and :ref:`attributes` for how to set them). + +.. _tangodb: + +TangoDB +------------------------- + +The TangoDB database is a persistent store for the properties of each device. The properties encode static settings, such as the hardware addresses, and default values for control attributes. + +Each device queries the TangoDB for the value of its properties during the ``initialise()`` call. Default values for control attributes can then be applied by explicitly calling ``set_defaults()``. The ``boot`` device also calls ``set_defaults()`` when initialising the station. The rationale being that the defaults can be applied at boot, but shouldn't be applied automatically during operations, as not to disturb running hardware. + +Device interaction +```````````````````````````` + +The properties of a device can be queried from the device directly:: + + # get a list of all the properties + property_names = device.get_property_list("*") + + # fetch the values of the given properties. returns a {property: value} dict. + property_dict = device.get_property(property_names) + +Properties can also be changed:: + + changeset = { "property": "new value" } + + device.put_property(changeset) + +Note that new values for properties will only be picked up by the device during ``initialise()``, so you will have to turn the device off and on. + +Command-line interaction +`````````````````````````` + +The content of the TangoDB can be dumped from the command line using:: + + bin/dump_ConfigDb.sh > tangodb-dump.json + +and changes can be applied using:: + + bin/update_ConfigDb.sh changeset.json + +.. note:: The ``dsconfig`` docker container needs to be running for these commands to work. + +Jive +`````````````````````````` + +The TangoDB can also be interactively queried and modified using Jive. Jive is an X11 application provided by the ``jive`` image as part of the software stack of the station. It must however be started on-demand, with a correctly configured ``$DISPLAY``:: + + cd docker-compose + make start jive + +If Jive does not appear, check ``docker logs jive`` to see what went wrong. + +For information on how to use Jive, see https://tango-controls.readthedocs.io/en/latest/tools-and-extensions/built-in/jive/. + +.. note:: If you need an X11 server on Windows, see :ref:`x11_on_windows`. + diff --git a/docs/source/devices/devices.rst b/docs/source/devices/devices.rst new file mode 100644 index 0000000000000000000000000000000000000000..1c6090bef3066def70a032b191688d8d0444cb03 --- /dev/null +++ b/docs/source/devices/devices.rst @@ -0,0 +1,179 @@ +Devices +============ + +.. _boot: + +Boot +--------- + +The ``boot == DeviceProxy("LTS/Boot/1")`` device is responsible for (re)starting and initialising the other devices. Devices which are not reachable, for example because their docker container is explicitly stopped, are skipped during initialisation. This device provides the following commands: + +:initialise_station(): Stop and start the other devices in the correct order, set their default values, and command them to initialise their hardware. This procedure runs asynchronously, causing this command to return immediately. Initialisation is aborted if an error is encountered. + + :returns: ``None`` + +The initialisation process can subsequently be followed through monitoring the following attributes: + +:initialising_R: Whether the initialisation procedure is still ongoing. + + :type: ``bool`` + +:initialisation_progress_R: Percentage completeness of the initialisation procedure. Each succesfully configured device increments progress. + + :type: ``int`` + +:initialisation_status_R: A description of what the device is currently trying to do. If an error occurs, this will hint towards the cause. + + :type: ``str`` + +A useful pattern is thus to call ``initialise_station()``, wait for ``initialising_R == False``, and then check whether the initalisation was succesful, if ``initialisation_progress_R == 100``. If a device fails to initialise, most likely the :doc:`../interfaces/logs` will need to be consulted. + +.. _docker: + +Docker +--------- + +The ``docker == DeviceProxy("LTS/Docker/1")`` device controls the docker containers. It allows starting and stopping them, and querying whether they are running. Each container is represented by two attributes: + +:<container>_R: Returns whether the container is running. + + :type: ``bool`` + +:<container>_RW: Set to ``True`` to start the container, and to ``False`` to stop it. + + :type: ``bool`` + +.. warning:: Do *not* stop the ``tango`` container, as doing so cripples the Tango infrastructure, leaving the station inoperable. It is also not wise to stop the ``device_docker`` container, as doing so would render this device unreachable. + + +RECV +---------- + +The ``recv == DeviceProxy("LTS/RECV/1")`` device controls the RCUs, the LBA antennas, and HBA tiles. Central to its operation are the masks (see also :ref:`attribute-masks`): + +:RCU_mask_RW: Controls which RCUs will actually be configured when attributes referring to RCUs are written. + + :type: ``bool[N_RCUs]`` + +:Ant_mask_RW: Controls which antennas will actually be configured when attributes referring to antennas are written. + + :type: ``bool[N_RCUs][N_antennas_per_RCU]`` + +Typically, ``N_RCUs == 32``, and ``N_antennas_per_RCU == 3``. + +SDP +----------- + +The ``sdp == DeviceProxy("LTS/SDP/1")``` device controls the digital signal processing in SDP, performed by the firmware on the FPGAs on the Uniboards. Central to its operation is the mask (see also :ref:`attribute-masks`): + +:TR_fpga_mask_RW: Controls which FPGAs will actually be configured when attributes referring to FPGAs are written. + + :type: ``bool[N_fpgas]`` + +Typically, ``N_fpgas == 16``. + +SST and XST +----------- + +The ``sst == DeviceProxy("LTS/SST/1")`` and ``xst == DeviceProxy("LTS/XST/1")`` devices manages the SSTs (subband statistics) and XSTs (crosslet statistics), respectively. The statistics are emitted piece-wise through UDP packets by the FPGAs on the Uniboards in SDP. By default, each device configures the statistics to be streamed to itself (the device), from where the user can obtain them. + +The statistics are exposed in two ways, as: + +- *Attributes*, representing the most recently received values, +- *TCP stream*, to allow the capture and recording of the statistics over any period of time. + +SST Statistics attributes +````````````````````````` + +The SSTs represent the amplitude of the signal in each subband, for each antenna, as an integer value. They are exposed through the following attributes: + +:sst_R: Amplitude of each subband, from each antenna. + + :type: ``uint64[N_ant][N_subbands]`` + +:sst_timestamp_R: Timestamp of the data, per antenna. + + :type: ``uint64[N_ant]`` + +:integration_interval_R: Timespan over which the SSTs were integrated, per antenna. + + :type: ``float32[N_ant]`` + +:subbands_calibrated_R: Whether the subband data was calibrated using the subband weights. + + :type: ``bool[N_ant]`` + +Typically, ``N_ant == 192``, and ``N_subbands == 512``. + +XST Statistics attributes +````````````````````````` + +The XSTs represent the cross-correlations between each pair of antennas, as complex values. The phases and amplitudes of the XSTs represent the phase and amplitude difference between the antennas, respectively. They are exposed as a matrix ``xst[a][b]``, of which only the triangle ``a<=b`` is filled, as the cross-correlation between antenna pairs ``(b,a)`` is equal to the complex conjugate of the cross-correlation of ``(a,b)``. The other triangle contains incidental values, but will be mostly 0. + +Complex values which cannot be represented in Tango attributes. Instead, the XST matrix is exposed as both their carthesian and polar parts: + +:xst_power_R, xst_phase_R: Amplitude and phase of the crosslet statistics. + + :type: ``float32[N_ant][N_ant]`` + +:xst_real_R, xst_imag_R: Real and imaginary parts of the crosslet statistics. + + :type: ``float32[N_ant][N_ant]`` + +:xst_timestamp_R: Timestamp of each block. + + :type: ``int64[N_blocks]`` + +:integration_interval_R: Timespan over which the XSTs were integrated, for each block. + + :type: ``float32[N_blocks]`` + +Typically, ``N_ant == 192``, and ``N_blocks == 136``. + +The metadata refers to the *blocks*, which are emitted by the FPGAs to represent the XSTs between 12 x 12 consecutive antennas. The following code converts block numbers to the indices of the first antenna pair in a block:: + + from common.baselines import baseline_from_index + + def first_antenna_pair(block_nr: int) -> int: + coarse_a, coarse_b = baseline_from_index(block_nr) + return (coarse_a * 12, coarse_b * 12) + +Conversely, to calculate the block index for an antenna pair ``(a,b)``, use:: + + from common.baselines import baseline_index + + def block_nr(a: int, b: int) -> int: + return baseline_index(a // 12, b // 12) + +TCP stream +`````````` + +The TCP stream interface allows a user to subscribe to the statistics packet streams, combined into a single TCP stream. The statistics will be streamed until the user disconnects, or the device is turned off. Any number of subscribers is supported, as bandwidth allows. Simply connect to the following port: + ++----------+----------------+ +| Device | TCP end point | ++==========+================+ +| SST | localhost:5101 | ++----------+----------------+ +| XST | localhost:5102 | ++----------+----------------+ + +The easiest way to capture this stream is to use our ``statistics_writer``, which will capture the statistics and store them in HDF5 file(s). The writer: + +- computes packet boundaries, +- processes the data of each packet, and stores their values into the matrix relevant for the mode, +- stores a matrix per timestamp, +- stores packet header information per timestamp, as HDF5 attributes, +- writes to a new file at a configurable interval. + +To run the writer:: + + cd devices/statistics_writer + python3 statistics_writer.py --mode SST --host localhost + +The correct port will automatically be chosen, depending on the given mode. See also ``statistics_writer.py -h`` for more information. + +The writer can also parse a statistics stream stored in a file. This allows the stream to be captured and processed independently. Capturing the stream can for example be done using ``netcat``:: + + nc localhost 5101 > SST-packets.bin + diff --git a/docs/source/devices/using.rst b/docs/source/devices/using.rst new file mode 100644 index 0000000000000000000000000000000000000000..8c2a58ca814fdea541e8e5dbcbe5b9ae189b5e84 --- /dev/null +++ b/docs/source/devices/using.rst @@ -0,0 +1,143 @@ +Using Devices +============= + +The station exposes *devices*, each of which is a remote software object that manages part of the station. Each device has the following properties: + +- It has a *state*, +- Many devices manage and represent hardware in the station, +- It exposes *read-only attributes*, that expose values from within the device or from the hardware it represents, +- It exposes *read-write attributes*, that allow controlling the functionality of the device, or the hardware it represents, +- It exposes *properties*, which are fixed configuration parameters (such as port numbers and timeouts), +- It exposes *commands*, that request the execution of a procedure in the device or in the hardware it manages. + +The devices are accessed remotely using ``DeviceProxy`` objects. See :doc:`../interfaces/control` on how to do this. + +States +------------ + +The state of a device is then queried with ``device.state()``. Each device can be in one of the following states: + +- ``DevState.OFF``: The device is not operating, +- ``DevState.INIT``: The device is being initialised, +- ``DevState.STANDBY``: The device is initialised and ready to be configured further, +- ``DevState.ON``: The device is operational. +- ``DevState.FAULT``: The device is malfunctioning. Functionality cannot be counted on. +- The ``device.state()`` function can throw an error, if the device cannot be reached at all. For example, because it's docker container is not running. See the :ref:`docker` device on how to start it. + +Each device provides the following commands to change the state: + +:off(): Turn the device ``OFF`` from any state. + +:initialise(): Initialise the device from the ``OFF`` state, to bring it to the ``STANDBY`` state. + +:on(): Mark the device as operational, from the ``STANDBY`` state, bringing it to ``ON``. + +The following procedure is a good way to bring a device to ``ON`` from any state:: + + def force_start(device): + if device.state() == DevState.FAULT: + device.off() + if device.state() == DevState.OFF: + device.initialise() + if device.state() == DevState.STANDBY: + device.on() + + return device.state() + +.. hint:: If a command gives you a timeout, the command will still be running until it finishes. You just won't know when it does or its result. In order to increase the timeout, use ``device.set_timeout_millis(timeout * 1000)``. + +FAULT +`````````` + +If a device enters the ``FAULT`` state, it means an error occurred that is fundamental to the operation of the software device. For example, the connection +to the hardware was lost. + +Interaction with the device in the ``FAULT`` state is undefined, and attributes cannot be read or written. The device needs to be reinitialised, which +typically involves the following sequence of commands:: + + # turn the device off completely first. + device.off() + + # setup any connections and threads + device.initialise() + + # turn on the device + device.on() + +Of course, the device could go into ``FAULT`` again, even during the ``initialise()`` command, for example because the hardware it manages is unreachable. To debug the fault condition, check the :doc:`../interfaces/logs` of the device in question. + +Initialise hardware +```````````````````` + +Most devices provide the following commands, in order to configure the hardware with base settings: + +:set_defaults(): Upload default attribute settings from the TangoDB to the hardware. + +:initialise_hardware(): For devices that control hardware, this command runs the hardware initialisation procedure. + +Typically, ``set_defaults()`` and ``initialise_hardware()`` are called in that order in the ``STANDBY`` state. The :ref:`boot` device runs these commands as part of its station initialsation sequence. + +.. _attributes: + +Attributes +------------ + +The device can be operated in ``ON`` state, where it exposes *attributes* and *commands*. The attributes can be accessed as python properties, for example:: + + recv = DeviceProxy("LTS/RECV/1") + + # turn on all LED0s + recv.RCU_LED0_RW = [True] * 32 + + # retrieve the status of all LED0s + print(recv.RCU_LED0_R) + +The attributes with an: + +- ``_R`` suffix are monitoring points, reflecting the state of the hardware, and are thus read-only. +- ``_RW`` suffix are control points, reflecting the desired state of the hardware. They are read-write, where writing requests the hardware to set the specified value. Reading them returns the last requested value. + +Meta data +````````````` + +A description of the attribute can be retrieved using:: + + print(recv.get_attribute_config("RCU_LED0_R").description) + +.. _attribute-masks: + +Attribute masks +--------------------- + +Several devices employ *attribute masks* in order to toggle which elements in their hardware array are actually to be controlled. This construct is necessary as most control points consist of arrays of values that cover all hardware elements. These array control points are always fully sent: it is not possible to update only a single element without uploading the rest. Without a mask, it is impossible to control a subset of the hardware. + +The masks only affect *writing* to attributes. Reading attributes (monitoring points) always result in data for all elements in the array. + +For example, the ``RCU_mask_RW`` array is the RCU mask in the ``recv`` device. It behaves as follows, when we interact with the ``RCU_LED0_R(W)`` attributes:: + + recv = DeviceProxy("LTS/RECV/1") + + # set mask to control all RCUs + recv.RCU_mask_RW = [True] * 32 + + # request to turn off LED0 for all RCUs + recv.RCU_LED0_RW = [False] * 32 + + # <--- all LED0s are now off + # recv.RCU_LED0_R should show this, + # if you have the RCU hardware installed. + + # set mask to only control RCU 3 + mask = [False] * 32 + mask[3] = True + recv.RCU_mask_RW = mask + + # request to turn on LED0, for all RCUs + # due to the mask, only LED0 on RCU 3 + # will be set. + recv.RCU_LED0_RW = [True] * 32 + + # <--- only LED0 on RCU3 is now on + # recv.RCU_LED0_R should show this, + # if you have the RCU hardware installed. + diff --git a/docs/source/faq.rst b/docs/source/faq.rst new file mode 100644 index 0000000000000000000000000000000000000000..367492e002e5d0d4bf20442c6e5e596ef78b852f --- /dev/null +++ b/docs/source/faq.rst @@ -0,0 +1,145 @@ +FAQ +=================================== + +Connecting to devices +-------------------------------------------------------------------------------------------------------------- + +My device is unreachable, but the device logs say it's running fine? +`````````````````````````````````````````````````````````````````````````````````````````````````````````````` + +The ``$HOSTNAME`` may have been incorrectly guessed by ``docker-compose/Makefile``, or you accidently set it to an incorrect value. See :ref:`corba`. + +I get "API_CorbaException: TRANSIENT CORBA system exception: TRANSIENT_NoUsableProfile" when trying to connect to a device? +```````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````` + +The ``$HOSTNAME`` may have been incorrectly guessed by ``docker-compose/Makefile``, or you accidently set it to an incorrect value. See :ref:`corba`. + +Docker +-------------------------------------------------------------------------------------------------------------- + +How do I prevent my containers from starting when I boot my computer? +```````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````` + +You have to explicitly stop a container to prevent it from restarting. Use:: + + cd docker-compose + make stop <container> + +or plain ``make stop`` to stop all of them. + +Windows +-------------------------------------------------------------------------------------------------------------- + +How do I develop from Windows? +`````````````````````````````````````````````````````````````````````````````````````````````````````````````` + +Our setup is Linux-based, so the easiest way to develop is by using WSL2, which lets you run a Linux distro under Windows. You'll need to: + +- Install WSL2. See f.e. https://www.omgubuntu.co.uk/how-to-install-wsl2-on-windows-10 +- Install `Docker Desktop <https://hub.docker.com/editions/community/docker-ce-desktop-windows/>`_ +- Enable the WSL2 backend in Docker Desktop +- We also recommend to install `Windows Terminal <https://www.microsoft.com/en-us/p/windows-terminal/9n0dx20hk701>`_ + +.. _x11_on_windows: + +How do I run X11 applications on Windows? +`````````````````````````````````````````````````````````````````````````````````````````````````````````````` +If you need an X11 server on Windows: + +- Install `VcXsrv <https://sourceforge.net/projects/vcxsrv/>`_ +- Disable access control during its startup, +- Use ``export DISPLAY=host.docker.internal:0`` in WSL. + +You should now be able to run X11 applications from WSL and Docker. Try running ``xterm`` or ``xeyes`` to test. + + +SSTs/XSTs +-------------------------------------------------------------------------------------------------------------- + +Some SSTs/XSTs packets do arrive, but not all, and/or the matrices remain zero? +`````````````````````````````````````````````````````````````````````````````````````````````````````````````` + +So ``sst.nof_packets_received`` / ``xst.nof_packets_received`` is increasing, telling you packets are arriving. But they're apparently dropped or contain zeroes. First, check the following settings: + +- ``sdp.TR_fpga_mask_RW[x] == True``, to make sure we're actually configuring the FPGAs, +- ``sdp.FPGA_wg_enable_RW[x] == False``, or the Waveform Generator might be replacing our the antenna data with zeroes, +- ``sdp.FPGA_processing_enabled_R[x] == True``, to verify that the FPGAs are processing, or the values and timestamps will be zero, +- For XSTs, ``xst.FPGA_xst_processing_enabled_R[x] == True``, to verify that the FPGAs are computing XSTs, or the values will be zero. + +Furthermore, the ``sst`` and ``xst`` devices expose several packet counters to indicate where incoming packets were dropped before or during processing: + +- ``nof_invalid_packets_R`` increases if packets arrive with an invalid header, or of the wrong statistic for this device, +- ``nof_packets_dropped_R`` increases if packets could not be processed because the processing queue is full, so the CPU cannot keep up with the flow, +- ``nof_payload_errors_R`` increases if the packet was marked by the FPGA to have an invalid payload, which causes the device to discard the packet, + +I am not receiving any XSTs and/or SSTs packets from SDP! +`````````````````````````````````````````````````````````````````````````````````````````````````````````````` + +Are you sure? If ``sst.nof_packets_received`` / ``xst.nof_packets_received`` is actually increasing, the packets are arriving, but are not parsable by the SST/XST device. If so, see the previous question. + +Many settings need to be correct for the statistics emitted by the SDP FPGAs to reach our devices correctly. Here is a brief overview: + +- ``sdp.TR_fpga_mask_RW[x] == True``, to make sure we're actually configuring the FPGAs, +- ``sdp.FPGA_communication_error_R[x] == False``, to verify the FPGAs can be reached by SDP, +- SSTs: + + - ``sst.FPGA_sst_offload_enable_RW[x] == True``, to verify that the FPGAs are actually emitting the SSTs, + - ``sst.FPGA_sst_offload_hdr_eth_destination_mac_R[x] == <MAC of your machine's mtu=9000 interface>``, or the FPGAs will not send it to your machine. Use f.e. ``ip addr`` on the host to find the MAC address of your interface, and verify that its MTU is 9000, + - ``sst.FPGA_sst_offload_hdr_ip_destination_address_R[x] == <IP of your machine's mtu=9000 interface>``, or the packets will be dropped by the network or the kernel of your machine, + - ``sst.FPGA_sst_offload_hdr_ip_destination_address_R[x] == 5001``, or the packets will not be sent to a port that the SST device listens on. + +- XSTs: + + - ``xst.FPGA_sst_offload_enable_RW[x] == True``, to verify that the FPGAs are actually emitting the SSTs, + - ``xst.FPGA_xst_offload_hdr_eth_destination_mac_R[x] == <MAC of your machine's mtu=9000 interface>``, or the FPGAs will not send it to your machine. Use f.e. ``ip addr`` on the host to find the MAC address of your interface, and verify that its MTU is 9000, + - ``xst.FPGA_xst_offload_hdr_ip_destination_address_R[x] == <IP of your machine's mtu=9000 interface>``, or the packets will be dropped by the network or the kernel of your machine, + - ``xst.FPGA_xst_offload_hdr_ip_destination_address_R[x] == 5002``, or the packets will not be sent to a port that the XST device listens on. + +If this fails, see the next question. + +I am still not receiving XSTs and/or SSTs, even though the settings appear correct! +`````````````````````````````````````````````````````````````````````````````````````````````````````````````` + +Let's see where the packets get stuck. Let us assume your MTU=9000 network interface is called ``em2`` (see ``ip addr`` to check): + +- Check whether the data arrives on ``em2``. Run ``tcpdump -i em2 udp -nn -vvv -c 10`` to capture the first 10 packets. Verify: + + - The destination MAC must match that of ``em2``, + - The destination IP must match that of ``em2``, + - The destination port is correct (5001 for SST, 5002 for XST), + - The source IP falls within the netmask of ``em2`` (unless ``net.ipv4.conf.em2.rp_filter=0`` is configured), + - TTL >= 2, + +- If you see no data at all, the network will have swallowed it. Try to use a direct network connection, or a hub (which broadcasts all packets, unlike a switch), to see what is being emitted by the FPGAs. +- Check whether the data reaches user space on the host: + + - Turn off the ``sst`` or ``xst`` device. This will not stop the FPGAs from sending. + - Run ``nc -u -l -p 5001 -vv`` (or port 5002 for XSTs). You should see raw packets being printed. + - If not, the Linux kernel is swallowing the packets, even before it can be sent to our docker container. + +- Check whether the data reaches kernel space in the container: + + - Enter the docker device by running ``docker exec -it device-sst bash``. + - Run ``sudo bash`` to become root, + - Run ``apt-get install -y tcpdump`` to install tcpdump, + - Check whether packets arrive using ``tcpdump -i eth0 udp -c 10 -nn``, + - If not, Linux is not routing the packets to the docker container. + +- Check whether the data reaches user space in the container: + + - Turn off the ``sst`` or ``xst`` device. This will not stop the FPGAs from sending. + - Enter the docker device by running ``docker exec -it device-sst bash``. + - Run ``sudo bash`` to become root, + - Run ``apt-get install -y netcat`` to install netcat, + - Check whether packets arrive using ``nc -u -l -p 5001 -vv`` (or port 5002 for XSTs), + - If not, Linux is not routing the packets to the docker container correctly. + +- If still on error was found, you've likely hit a bug in our software. + +Other containers +-------------------------------------------------------------------------------------------------------------- + +The ELK container won't start, saying "max virtual memory areas vm.max_map_count [65530] is too low"? +`````````````````````````````````````````````````````````````````````````````````````````````````````````````` + +The ELK stack needs the ``vm.max_map_count`` sysctl kernel parameter to be at least 262144 to run. See :ref:`elk-kernel-settings`. diff --git a/docs/source/index.rst b/docs/source/index.rst index 5e6c6564940391ea5171403a833a2f83ed015adc..524d21369c9e0ded662f12a365d479ce3dc39abc 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -6,10 +6,24 @@ Welcome to LOFAR2.0 Station Control's documentation! ==================================================== +LOFAR2.0 Station Control is a software stack aimed to monitor, control, and manage a LOFAR2.0 station. In order to do so, it whips up a series of Docker containers, and combines the power of `Tango Controls <https://www.tango-controls.org/>`_, `PyTango <https://pytango.readthedocs.io/en/stable/>`_, `Docker <https://www.docker.com/>`_, `Grafana <https://grafana.com/>`_, `ELK <https://www.elastic.co/what-is/elk-stack>`_, `Jupyter Notebook <https://jupyter.org/>`_, and many others to provide a rich and powerful experience in using the station. + +Full monitoring and control access to the LOFAR2.0 station hardware is provided, by marshalling their rich `OPC-UA <https://opcfoundation.org/about/opc-technologies/opc-ua/>`_ interfaces. Higher-level logic makes it possible to easily configure and obtain the LOFAR station data products (beamlets, XSTs, SSTs, BSTs) from your local machine using Python, or through one of our provided web interfaces. + +Even without having access to any LOFAR2.0 hardware, you can install the full stack on your laptop, and experiment with the software interfaces. + .. toctree:: :maxdepth: 2 :caption: Contents: + installation + interfaces/overview + devices/using + devices/devices + devices/configure + configure_station + developer + faq Indices and tables diff --git a/docs/source/installation.rst b/docs/source/installation.rst new file mode 100644 index 0000000000000000000000000000000000000000..cb0122ae95cc01de7f55e333345a6ec4d41bc369 --- /dev/null +++ b/docs/source/installation.rst @@ -0,0 +1,89 @@ +Installation +================== + +You will need the following dependencies installed: + +- docker +- docker-compose +- git +- make + +You start with checking out the source code, f.e. the master branch, as well as the git submodules we use:: + + git clone https://git.astron.nl/lofar2.0/tango.git + cd tango + git submodule init + git submodule update + +Next, we bootstrap the system. This will build our docker images, start key ones, and load the base configuration. This may take a while:: + + cd docker-compose + make bootstrap + +If you lack access to LOFAR station hardware, load additional configurations to use the simulators instead:: + + for sim in ../CDB/*-sim-config.json; do + ../sbin/update_ConfigDb.sh ../CDB${sim}-config.json + done + +If you do have access to LOFAR station hardware, you will have to :doc:`configure_station`. + +Now we are ready to start the other containers:: + + make start + +and make sure they are all up and running:: + + make status + +You should see the following state: + +- Containers ``astor``, ``hdbpp-viewer``, ``jive``, ``log-viewer`` and ``pogo`` will have State ``Exit 1``. These are containers that are interactive X11 tools, and not needed for now, +- Other containers have either State ``Up`` or ``Exit 0``. + +If not, you can inspect why with ``docker logs <container>``. Note that the containers will automatically be restarted on failure, and also if you reboot. Stop them explicitly to bring them down (``make stop <container>``). + +Post-boot Initialisation +--------------------------- + +After bootstrapping, and after a reboot, the software and hardware of the station needs to be explicitly initialised. Note that the docker containers do restart automatically at system boot. + +The following commands start all the software devices to control the station hardware, and initialise the hardware with the configured default settings. Go to http://localhost:8888, start a new *Station Control* notebook, and initiate the software boot sequence:: + + # reset our boot device + boot.off() + assert boot.state() == DevState.OFF + boot.initialise() + assert boot.state() == DevState.STANDBY + boot.on() + assert boot.state() == DevState.ON + + # start and initialise the other devices + boot.initialise_station() + + # wait for the devices to be initialised + import time + + while boot.initialising_station_R: + print(f"Still initialising station. {boot.initialisation_progress_R}% complete. State: {boot.initialisation_status_R}") + time.sleep(1) + + # print conclusion + if boot.initialisation_progress_R == 100: + print("Done initialising station.") + else: + print(f"Failed to initialise station: {boot.initialisation_status_R}") + +See :ref:`boot` for more information on the ``boot`` device. + +.. _elk-kernel-settings: + +ELK +```` + +The ELK stack requires some kernel settings to be tuned, before it will start. Although ``make bootstrap`` configures the kernel, these settings will not stick after a reboot. You will need to run either:: + + make start elk-configure-host + make restart elk + +after reboot, or configure your system to set ``sysctl -w vm.max_map_count=262144`` (or higher) as root during boot. diff --git a/docs/source/interfaces/control.rst b/docs/source/interfaces/control.rst new file mode 100644 index 0000000000000000000000000000000000000000..3c514f11d7a3e5a4bbc1c7339bac3bed0820d70f --- /dev/null +++ b/docs/source/interfaces/control.rst @@ -0,0 +1,84 @@ +Monitoring & Control +======================== + +The main API to control the station is through the `Tango Controls <https://tango-controls.readthedocs.io/en/latest/>`_ API we expose on port 10000, which is most easily accessed using a `PyTango <https://pytango.readthedocs.io/en/stable/client_api/index.html>`_ client. The Jupyter Notebook installation we provide is such a client. + +.. _jupyter: + +Jupyter Notebooks +------------------------ + +The station offers Juypyter notebooks On http://localhost:8888, which allow one to interact with the station, for example to set control points, access monitoring points, or to graph their values. + +The notebooks provide some predefined variables, so you don't have to look them up: + +.. literalinclude:: ../../../docker-compose/jupyter/ipython-profiles/stationcontrol-jupyter/startup/01-devices.py + +Note: the Jupyter notebooks use enhancements from the ``itango`` suite, which provide tab completions, but also the ``Device`` alias for ``DeviceProxy`` as was used in the Python examples in the next section. + +For example, you can start a new *Station Control* notebook (File->New Notebook->StationControl), and access these devices: + +.. image:: jupyter_basic_example.png + +.. _pytango-section: + +PyTango +------------------------ + +To access a station from scratch using Python, we need to install some dependencies:: + + pip3 install tango + +Then, if we know what devices are available on the station, we can access them directly:: + + import tango + import os + + # Tango needs to know where our Tango API is running. + os.environ["TANGO_HOST"] = "localhost:10000" + + # Construct a remote reference to a specific device. + # One can also use "tango://localhost:10000/LTS/Boot/1" if TANGO_HOST is not set + boot_device = tango.DeviceProxy("LTS/Boot/1") + + # Print the device's state. + print(boot_device.state()) + +To obtain a list of all devices, we need to access the database:: + + import tango + + # Tango needs to know where our Tango API is running. + import os + os.environ["TANGO_HOST"] = "localhost:10000" + + # Connect to the database. + db = tango.Database() + + # Retrieve the available devices, excluding any Tango-internal ones. + # This returns for example: ['LTS/Boot/1', 'LTS/Docker/1', ...] + devices = list(db.get_device_exported("LTS/*")) + + # Connect to any of them. + any_device = tango.DeviceProxy(devices[0]) + + # Print the device's state. + print(any_device.state()) + +.. _rest-api: + +ReST API +------------------------ + +We also provide a ReST API to allow the station to be controlled without needing to use the Tango API. The root access point is http://localhost:8080/tango/rest/v10/hosts/databaseds;port=10000/ (credentials: tango-cs/tango). This API allows for: + +- getting and setting attribute values, +- calling commands, +- retrieving the device state, +- and more. + +For example, retrieving http://localhost:8080/tango/rest/v10/hosts/databaseds;port=10000/devices/LTS/SDP/1/state returns the following JSON document:: + + {"state":"ON","status":"The device is in ON state."} + +For a full description of this API, see https://tango-rest-api.readthedocs.io/en/latest/. diff --git a/docs/source/interfaces/elk_last_hour.png b/docs/source/interfaces/elk_last_hour.png new file mode 100644 index 0000000000000000000000000000000000000000..d6f2a73c9ba754a5a6d5aeece1382906040acb15 Binary files /dev/null and b/docs/source/interfaces/elk_last_hour.png differ diff --git a/docs/source/interfaces/elk_log_fields.png b/docs/source/interfaces/elk_log_fields.png new file mode 100644 index 0000000000000000000000000000000000000000..c5774931f23933be6033e396220b2459409b1def Binary files /dev/null and b/docs/source/interfaces/elk_log_fields.png differ diff --git a/docs/source/interfaces/grafana_dashboard_1.png b/docs/source/interfaces/grafana_dashboard_1.png new file mode 100644 index 0000000000000000000000000000000000000000..448a9bd993b264cf35e98229f12829256f775029 Binary files /dev/null and b/docs/source/interfaces/grafana_dashboard_1.png differ diff --git a/docs/source/interfaces/grafana_dashboard_2.png b/docs/source/interfaces/grafana_dashboard_2.png new file mode 100644 index 0000000000000000000000000000000000000000..d7c34991d97cd22a209d1f02502afa1f439acf4e Binary files /dev/null and b/docs/source/interfaces/grafana_dashboard_2.png differ diff --git a/docs/source/interfaces/jupyter_basic_example.png b/docs/source/interfaces/jupyter_basic_example.png new file mode 100644 index 0000000000000000000000000000000000000000..c7e35204cc72b63e8ea2d81c2bdad337d3ce72a1 Binary files /dev/null and b/docs/source/interfaces/jupyter_basic_example.png differ diff --git a/docs/source/interfaces/logs.rst b/docs/source/interfaces/logs.rst new file mode 100644 index 0000000000000000000000000000000000000000..2b5c605ec5e47cf8b98b09dba47f6e6954f468ba --- /dev/null +++ b/docs/source/interfaces/logs.rst @@ -0,0 +1,44 @@ +Logs +================== + +The devices, and the docker containers in general, produce logging output. The easiest way to access the logs of a specific container is to ask docker directly. For example, to access and follow the most recent logs of the ``device-sdp`` container, execute on the host:: + + docker logs -n 100 -f device-sdp + +This is mostly useful for interactive use. + +.. _elk: + +ELK +------------------ + +To monitor the logs remotely, or to browse older logs, use the *ELK stack* that is included on the station, and served on http://localhost:5601. ELK, or ElasticSearch + Logstash + Kibana, is a popular log collection and querying system. Currently, the following logs are collected in our ELK installation: + +- Logs of all devices, +- Logs of the Jupyter notebook server. + +If you browse to the ELK stack (actually, it is Kibana providing the GUI), your go-to is the *Discover* view at http://localhost:5601/app/discover. There, you can construct (and save, load) a dashboard that provides a custom view of the logs, based on the *index pattern* ``logstash-*``. There is a lot to take in, and there are excellent Kibana tutorials on the web. + +To get going, use for example `this dashboard <http://localhost:5601/app/discover#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-60m,to:now))&_a=(columns:!(extra.tango_device,level,message),filters:!(),index:'1e8ca200-1be0-11ec-a85f-b97e4206c18b',interval:auto,query:(language:kuery,query:''),sort:!())>`_, which shows the logs of the last hour, with some useful columns added to the default timestamp and message columns. Expand the time range if no logs appear, to look further back. You should see something like: + +.. image:: elk_last_hour.png + +ELK allows you to filter, edit the columns, and a lot more. We enrich the log entries with several extra fields, for example the device that generated it, and stack traces if available. Click on the ``>`` before a log entry and the information expands, showing for example: + +.. image:: elk_log_fields.png + +Furthermore, statistics from the ELK stack, such as the number of ERROR log messages, are made available as a data source in :doc:`monitoring`. + +LogViewer +------------------ + +For each device, Tango collects the logs as well. These can be viewed with the LogViewer X11 application. Make sure ``$DISPLAY`` is set, and run:: + + cd docker-compose + make start logviewer + +If LogViewer does not appear, check ``docker logs logviewer`` to see what went wrong. + +For information on how to use the LogViewer, see https://tango-controls.readthedocs.io/en/latest/tools-and-extensions/built-in/logviewer/logviewer.html. + +.. note:: If you need an X11 server on Windows, see :ref:`x11_on_windows`. diff --git a/docs/source/interfaces/monitoring.rst b/docs/source/interfaces/monitoring.rst new file mode 100644 index 0000000000000000000000000000000000000000..7d8a85fdf5bd7c103119a89a8dbae127040a5240 --- /dev/null +++ b/docs/source/interfaces/monitoring.rst @@ -0,0 +1,51 @@ +Monitoring GUIs +======================== + +Each device exposes a list of monitoring points as attributes with the ``_R`` prefix. These can be accessed interactively from a controle console (such as Jupyter), but that will not scale. + +Grafana +------------------------ + +We offer `Grafana <https://grafana.com/>`_ dashboards on http://localhost:3000 that provide a quick overview of the station's status, including temperatures and settings. Several dashboards are included. An example: + +.. image:: grafana_dashboard_1.png +.. image:: grafana_dashboard_2.png + +NOTE: These dashboards are highly subject to change. The above examples provide an impression of a possible overview of the station state. + +You are encouraged to inspect each panel (graph) to see the underlying database query and settings. Use the small arrow in the panel's title to get a drop-down menu of options, and select *inspect*. See the Grafana documentation for further information. + +The Grafana dashboards are configured with the following data sources: + +- :ref:`prometheus-section`, the time-series database that caches the latest values of all monitoring points (see next section), +- *Archiver DB*, the database that provides a long-term cache of attributes, +- :ref:`tangodb`, providing access to device properties (fixed settings), +- :ref:`elk`, the log output of the devices. + +.. _prometheus-section: + +Prometheus +------------------------- + +`Prometheus <https://prometheus.io/docs/introduction/overview/>`_ is a low-level monitoring system that allows us to periodically retrieve the values of all the attributes of all our devices, and cache them to be used in Grafana: + +- Every several seconds, Prometheus scrapes our `TANGO-Grafana Exporter <https://git.astron.nl/lofar2.0/ska-tango-grafana-exporter>`_ (our fork of https://gitlab.com/ska-telescope/TANGO-grafana.git), collecting all values of all the device attributes (except the large ones, for performance reasons). +- Prometheus can be queried directly on http://localhost:9090, +- The TANGO-Grafana Exporter can be queried directly on http://localhost:8000, +- The query language is `PromQL <https://prometheus.io/docs/prometheus/latest/querying/basics/>`_, which is also used in Grafana to query Prometheus, + +Prometheus stores attributes in the following format:: + + device_attribute{device="lts/recv/1", + dim_x="32", dim_y="0", + instance="tango-prometheus-exporter:8000", + job="tango", + label="RCU_temperature_R", + name="RCU_temperature_R", + type="float", + x="00", y="0"} + +The above describes a single data point and its labels. The primary identifying labels are ``device`` and ``name``. Each point furthermore has a value (integer) and a timestamp. The following transformations take place: + +- For 1D and 2D attributes, each array element is its own monitoring point, with ``x`` and ``y`` labels describing the indices. The labels ``dim_x`` and ``dim_y`` describe the array dimensionality, +- Attributes with string values get a ``str_value`` label describing their value. diff --git a/docs/source/interfaces/overview.rst b/docs/source/interfaces/overview.rst new file mode 100644 index 0000000000000000000000000000000000000000..a00ab5710ad863b4f10d1bb0ee93ab3f547826d5 --- /dev/null +++ b/docs/source/interfaces/overview.rst @@ -0,0 +1,41 @@ +Interfaces +====================== + +The station provides the following interfaces accessible through your browser (assuming you run on `localhost`): + ++---------------------+---------+----------------------+-------------------+ +|Interface |Subsystem|URL |Default credentials| ++=====================+=========+======================+===================+ +| :ref:`jupyter` |Jupyter |http://localhost:8888 | | ++---------------------+---------+----------------------+-------------------+ +| :doc:`monitoring` |Grafana |http://localhost:3000 |admin/admin | ++---------------------+---------+----------------------+-------------------+ +| :doc:`logs` |Kibana |http://localhost:5601 | | ++---------------------+---------+----------------------+-------------------+ + +Futhermore, there are some low-level interfaces: + ++---------------------------+------------------+-----------------------+-------------------+ +|Interface |Subsystem |URL |Default credentials| ++===========================+==================+=======================+===================+ +| :ref:`pytango-section` |Tango |tango://localhost:10000| | ++---------------------------+------------------+-----------------------+-------------------+ +| :ref:`prometheus-section` |Prometheus |http://localhost:9090 | | ++---------------------------+------------------+-----------------------+-------------------+ +| TANGO-Grafana Exporter |Python HTTPServer |http://localhost:8000 | | ++---------------------------+------------------+-----------------------+-------------------+ +| :ref:`rest-api` |tango-rest |http://localhost:8080 |tango-cs/tango | ++---------------------------+------------------+-----------------------+-------------------+ +| :ref:`tangodb` |MariaDB |http://localhost:3306 |tango/tango | ++---------------------------+------------------+-----------------------+-------------------+ +|Archive Database |MariaDB |http://localhost:3307 |tango/tango | ++---------------------------+------------------+-----------------------+-------------------+ +|Log Database |ElasticSearch |http://localhost:9200 | | ++---------------------------+------------------+-----------------------+-------------------+ + +.. toctree:: + :hidden: + + control + monitoring + logs