Skip to content
Snippets Groups Projects
Hannes Feldt's avatar
Hannes Feldt authored
Refactor and fix snmp exporter

See merge request !1082
ea75f0d5
History

Pipeline Status Coverage Status Python Versions Documentation Status License

Tango Station Control

Station Control software related to Tango devices.
Please consult the online documentation!

Index

Installation

Prerequisites

After checking out this repo, be sure to install our git hooks and activate the virtual environment:

source setup.sh

You will also need:

  • docker
  • docker-compose
  • make
  • bash
  • dig (dnsutils package on ubuntu/debian)
  • jq
  • wget
  • hexdump (bsdextrautils package on ubuntu/debian)
  • gcc
  • g++
  • shellcheck
  • graphviz
  • pkg-config
  • python3.10
  • pip (python3-pip)
  • tox (python3-tox)
  • shyaml

On Ubuntu / Debian based systems these can typically be installed by executing: ./sbin/install-deps-ubuntu-debian.sh

Of these docker-compose must be at least 2.0. Alternatively, tox can be installed through pip using pip install tox, at least version 4.0 must be used.

Finally, running unit tests relies on availability of casacore data see: lofar-device-base Dockerfile for details.

Development and Testing

Setup environment variables

This should be done prior to any form of development or use and needs to be repeated everytime the environment variables are reset.

source setup.sh

Start dev environment

For local development a dev environment is needed. To setup this environment run

source ./sbin/prepare_dev_env.sh

This will install jumppad, if not present yet as well as creating the docker volume needed to simulate the station nomad cluster.

Afterwards run

jumppad up infra/dev/all.hcl

to start the dev environment including tango.

Nomad is now available at http://localhost:4646/ and Consul is available at http://192.168.73.16:8500/

The development environment and its network can be stopped using

jumppad down

The entire dev environment needs to be torn down this way prior to running integration tests! You can check for the existence of this 'station' network using docker network ls.

If you think the network is lingering as an error you can use docker network rm station 9000-station to remove it.

Inspect services

Most notably, you will have web interfaces available at:

Running integration tests

Notably, before running integration you should NOT run jumppad up or source sbin/prepare_dev_env.sh but still need to have run source setup.sh.

Also ensure the station networks do not exist or remove them using docker network remove station 9000-station.

You can run the integration using

# source setup.sh
unset TANGO_HOST
./sbin/run_integration_test.sh

Versioning

When changing behavior a new version for Lofar Station Control should be reserved. To do this please follow semantic versioning.

Next change the version in the following places:

  1. The VERSION file.
  2. Add a Release note for the given version.
  3. Once the merge requests is merged to master, add a tag with the version such as v0.3.2 or v0.3.3 etc
  4. The tag can be deployed to the environments, manually, through https://git.astron.nl/lofar2.0/tango/-/tags

Release Notes

  • 0.49.0 Add Station service to control station state to gRPC server
  • 0.48.2 rename antennafield_id
  • 0.48.1 Fix exposing correct triangle of XSTs in gRPC service
  • 0.48.0 Add Antennafield to gRPC server
  • 0.47.2 Fix ZMQ hostname to subscribe to in gRPC server
  • 0.47.1 Move GrafanaAPIV3 RPC interface to Opah repo
  • 0.47.0 Migrate from lofar-station-client to lofar-lotus package. Update various package dependencies.
  • 0.46.1 Include clock_RW in metadata JSON
  • 0.46.0 Expose latest BST/SST/XST from ZMQ over gRPC
  • 0.45.12 Improve syslog parsing for PyPCC logs
  • 0.45.11 Add Antenna Status & Use gRPC interface
  • 0.45.10 Fix build of snmp-exporter
  • 0.45.9 Support static IPs for Stingray jobs to receive UDP packets on
  • 0.45.8 Fix timestamp timezone in metadata JSON
  • 0.45.7 Fix casacore building by version pinning
  • 0.45.6 Add Added Reflection to be used with grpcui (use grpcui -plaintext localhost:50051)
  • 0.45.5 Add jupyter-collaboration plugin for real-time syncing of notebooks between users
  • 0.45.4 Fix device server freeze during initialise command
  • 0.45.3 Reduce Loki log output
  • 0.45.2 Remove dead code
  • 0.45.1 Have get_defaults return a dict instead of list, and fix returning numpy arrays
  • 0.45.0 Reset Tango polling and avoid it clashing with our own
  • 0.44.11 Fix vector syslog service (syslogs can be sent to syslog.service.consul:514 again)
  • 0.44.10 Deploy on stations through monitor.control.lofar
  • 0.44.9 Fix reading attributes after reconnecting to an OPC-UA server
  • 0.44.8 Add get_defaults command to all devices
  • 0.44.7 Add "ts" timestamp field back into metadata JSON
  • 0.44.6 Wait briefly after subscribing to make sure the subscription reaches the device
  • 0.44.5 Improved defaults for RECV devices
  • 0.44.4 Subscribe to Stingray and expose the latest SST/BST/XST in their respective devices
  • 0.44.3 Do not try to use disconnected OPC-UA connections
  • 0.44.2 Fix access control of FPGA_processing_error_R of BST/SST/XST devices
  • 0.44.1 Update jupyter docker images
  • 0.44.0 Route SST/BST/XST packets to Stingray directly
  • 0.43.5 Fix reconnect storm to OPC/UA servers
  • 0.43.4 Drain nomad jobs at startup to prevent misinitialisation after ungraceful shutdown
  • 0.43.3 Fix lingering connections to OPC/UA servers.
  • 0.43.2 Fix jumppad integration test using custom nomad image that includes consul
  • 0.43.1 Fix rounding for the coarse delay and loss compensations.
  • 0.43.0 Use PyTango 10.0.0
  • 0.42.12 Calibration device now fetches the station's name from the StationManager
  • 0.42.11 Fix HBA inner antenna mask for remote stations
  • 0.42.10 Fix in event subscriptions, effective for ProtectionControl
  • 0.42.9 Use latest tables from S3 to calibrate antenna fields
  • 0.42.8 Emit metrics even if attribute is polled by AttributePoller as well as Tango
  • 0.42.7 Prevent Prometheus name collision in ProtectionControl state attribute
  • 0.42.6 Fix crash caused by emitting change events for attributes polled by Tango
  • 0.42.5 Add additional features to protection control
  • 0.42.4 Add integration test fixture that routinely tests against cross test dependencies
  • 0.42.3 Use PyTango 10.0.0rc3 to reduce memory leaks
  • 0.42.2 Add protection control device shutting down station during over temperature Use the station manager protection_lock_RW to see if the station is locked against further damage. Add integration tests for remote stations
  • 0.42.1 Added lock around commands in AsyncDevices to prevent concurrent execution
  • 0.42.0 Change CS032 port mappings to prevent beamlet overlap
  • 0.41.1 Reduce log size for value changes and metadata publications drastically
  • 0.41.0 Export FPGA_jesd204b_rx_err0_R and FPGA_jesd204b_rx_err1_R SDP monitoring data to prometheus
  • 0.40.2 Add ds_debug_pycharm command to debug device servers during integration tests
  • 0.40.1 Deploy SDPTR for HBA only on RS stations
  • 0.40.0 Added CS032/RS307, and scripts to generate their CDB file and caltables
  • 0.39.12 SDPTR v1.4.0
  • 0.39.11 Install recent versions of tangostationcontrol and lofar_station_client in Jupyter
  • 0.39.10 Use ITRF2014 and extrapolate to current half year by default
  • 0.39.9 Minor speedup in observation start (do not wait for metadata to propagate)
  • 0.39.8 Allow station manager to use configurable state transition timeouts. Use the hibernate_transition_timeout_RW, standby_transition_timeout_RW and on_transition_timeout_RW attributes. The timeout is in seconds.
  • 0.39.7 Fixed archiving of Antenna_Status and Antenna_Use
  • 0.39.6 Fixed some multi-thread/multi-process race conditions
  • 0.39.5 Remove stestr from integration test
  • 0.39.4 Periodically publish all metadata regardless of change
  • 0.39.3 Use a periodic task instead of event subscriptions to manage Observation states
  • 0.39.2 DeviceProxies default to read access (including calling commands)
  • 0.39.1 Fixed metrics for spectral/image (2D/3D) enum arrays
  • 0.39.0 Station Grafana alarms through slack integration
  • 0.38.7 Support 2 APSPUs per translators
  • 0.38.6 Add attribute Antenna_Status_R to metadata
  • 0.38.5 Unify exception handling for station manager
  • 0.38.4 Fixed ordering in subband_frequency_R, which broke frequency calculations for HBA
  • 0.38.3 Upgraded to JupyterLab v4
  • 0.38.2 Fixed polling of some attributes required by Metadata device Wait 2s after enabling FPGA_processing_enable_RW to allow it to propagate
  • 0.38.1 Custom compiled casacore for improved beam-tracking performance
  • 0.38.0 Add metadata device publishing zmq events
  • 0.37.2 Improved event-subscription interface, avoid overlap between polling loops.
  • 0.37.1 Improved asyncio resource teardown when devices go Off.
  • 0.37.0-1 Fix for deploying on DTS Lab
  • 0.37.0 Run casacore in separate processes, increasing beam-tracking performance
  • 0.36.2 Fix polling 2D attributes Harden periodic tasks against exceptions
  • 0.36.1 Fix tile beamforming
  • 0.36.0 Upgraded base image to tango-itango:9.5.0, greatly improving numpy performance Improved multi-threading performance for delay calculations for beam pointing and tracking Added logging for observation initialisation sequence Update tile-beam weights (HBAT_BF_delay_steps) at most once every 10s
  • 0.35.0 Rename Antenna_Quality -> Antenna_Status to line up with LOFAR1 nomenclature Force arrays reporting about antennas in AFH/AFL devices to always match the number of antennas Added NOT_AVAILABLE antenna status Changed Antenna_Status_R and Antenna_Use_R from int arrays to enum arrays Added Antenna_Status_int_R reflecting the int value of the enum Removed Antenna_Quality_str_R and Antenna_Use_str_R
  • 0.34.3 Fix compute_weights for >1D arrays
  • 0.34.2 Generate CHANGE_EVENTs from AttributePoller
  • 0.34.1 Bugfix for gRPC
  • 0.34.0 Add gRPC based 'external' station control service
  • 0.33.3 Add support for new PCON devices (ACX)
  • 0.33.2 Fix for XSTs in Observations: Also write FPGA_xst_nof_crosslets_RW
  • 0.33.1 SDPFirmware: replace FPGA_ucp monitoring points with new TR_ucp ones
  • 0.33.0 Run containers with dedicated ethernet devices (ovs/macvlan)
  • 0.32.7 Fix for antennafield.Frequency_Band_RW Fix for Calibration device polling State attributes
  • 0.32.6 Fixes after reinstall on fresh L2TS LCU
  • 0.32.5 Fixed race condition that exposed cleared metrics. Corrected computation of xst.hardware_powered_fraction_R Fixed logging of device names
  • 0.32.4 Fixed polling period (from 2500s to 2.5s).
  • 0.32.3 Fixed disappeared metrics from LOFARDevice, OPCUADevice, StationManager.
  • 0.32.2 Change hardware_powered_R to hardware_powered_fraction_R to report partial power. Implemented hardware_powered_fraction_R for more devices.
  • 0.32.1 Do not serve stale metrics
  • 0.32.0 Add available_in_power_state_R attribute to determine from which station state a device will be available
  • 0.31.4 Bugfixes for DTS configuration, Fixes spurious errors when a station state transition fails Added variables for APS, Calibration, ObservationControl devices to Jupyter notebooks
  • 0.31.3 Log all writes to OPC-UA, except beam and tile weights
  • 0.31.2 Bugfixes for v0.31.0 rollout, fixes race conditions when polling attributes
  • 0.31.1 Add pre-push git hooks for basic CI checks
  • 0.31.0 Poll attributes independently from Tango
  • 0.30.5 Log and count event subscription errors
  • 0.30.4 Fix Tango attribute parameter types
  • 0.30.3 Configure FPGA_beamlet_output_nof_destinations in SDP before enabling FPGA_processing Avoid propagating exceptions to Tango's poll and event threads
  • 0.30.2-2 Fix typo in parsing XST observation settings
  • 0.30.2 Add XST/SST settings to Observation specifications Fixed dimensionality of xst.FPGA_subband_select_R(W)
  • 0.30.1 Remove deprecated FPGA_beamlet_output_nof_beamlets Fix prometheus metric for counting calibrations Removed ALARM and DISABLED states
  • 0.30.0 Refactor station state transitions using the State pattern
  • 0.29.2 Bump MinIO versions
  • 0.29.1-3 Fix central logs service consul name
  • 0.29.1-2 Fix vector tenant_id, must be string
  • 0.29.1 Cache SNMP MiB files to circumvent ASTRON domain blacklisting
  • 0.29.0 Metric & Logging replication to central monitoring
  • 0.28.2 Bugfixes / rollout fixes
  • 0.28.1 Bugfixes / rollout fixes
  • 0.28.0 Make StationManager device asynchronous
  • 0.27.2 Add new attributes in OPCUA devices
  • 0.27.1 Bugfixes / rollout fixes
  • 0.27.0 Replace device_attribute with a per-attribute metric
  • 0.26.7 Add monitoring of White Rabbit switch and power converters (PCON) using SNMP
  • 0.26.6 Replace pysnmp with pysnmplib
  • 0.26.3 Fix minor deployment issues
  • 0.26.2 Fix minor deployment issues
  • 0.26.1 Code quality: comply with flake8 7.0.0
  • 0.26.0 Expose all attributes as metrics by device server. Renamed sst.integration_interval_R -> sst.sst_integration_interval_R.
  • 0.25.3 Add environment decorators to devices in integration tests
  • 0.25.2 Increase prometheus retention time
  • 0.25.1 Add RCU_DTH_setup command in RECV devices
  • 0.25.0 Pull IERS tables from object store
  • 0.24.8 Push docker images to own registry and pull from there for deployments
  • 0.24.6 Bugfix: deploy tagged version instead of latest
  • 0.24.5 Bugfix: install 'tango' in Jupyter Lab again
  • 0.24.4 Apply calibration values in reverse order if subband frequencies are decreasing
  • 0.24.3 Tune python logging format and parsing
  • 0.24.2 Ensure code base is PyTango 9.5 compatible
  • 0.24.1 Let all devices emit basic prometheus metrics
  • 0.24.0 Allow multiple antenna fields to be used in single observation, This renames the Observation device to ObservationField.
  • 0.23.0 Migrate execution environment to nomad
  • 0.22.0 Split Antennafield in AFL and AFH devices in order to separate Low-Band and High-Band functionalities Removed Antenna_Type_R attribute from antennafield devices
  • 0.21.4 Replace ACC-MIB.mib with SP2-MIB.mib source file in PCON device
  • 0.21.3 Added DigitalBeam.Antenna_Usage_Mask_R to expose antennas used in beamforming
  • 0.21.2 Removed deprecated "Boot" device (use StationManager now)
  • 0.21.1 Implement multi project integration downstream pipeline
  • 0.21.0 Use radians instead of degrees when interpreting pointings
  • 0.20.5 Manage both polarisations in RCU_band_select_R(W), Antenna_Loss_R, and Frequency_Band_RW
  • 0.20.4 Collapse AbstractHierarchyDevice and AbstractHierarchy into one class
  • 0.20.3 Fix application of Field_Attenuation_R
  • 0.20.2 Support only one parent in hierarchies
  • 0.20.1 Create an abstract AntennaMapper class which implements behavior of both AntennaToSdpMapper and AntennaToRecvMapper
  • 0.20.0 Complete implementation of station-state transitions in StationManager device. Unified power management under power_hardware_on/off(), dropping prepare_hardware(), disable_hardware(). Replaced device.warm_boot() by device.boot().
  • 0.19.0 Ensure requirements.txt are installed when using pip install
  • 0.18.3 Many configuration fixes in tango device configs, Fixed APS & EC device port mapping, fixed variable initialization in several devices, Fixed XST device going into fault state, prevent UDP packet loss and verify UDP buffer size for XSTs, Fixed several tests due to use of numpy.array in properties, Implement control hierarchy, Version pin PyASN, Fix code coverage for PyTango devices, Fix beam tracker not starting again after being stopped.
  • 0.18.2 Fix documentation links in README
  • 0.18.1 Various improvements including: better error handling for commands and resolving a configuration issue related to beamlets
  • 0.18.0 Expose attribute related to SDP rings such as FPGA_bf_ring_nof_transport_hops_RW_default and FPGA_ring_use_cable_to_next_rn_RW_default
  • 0.17.1 Ensure OPCUA devices reconnect automatically if the connection is lost
  • 0.17.0 Add Power Hierarchy state transition
  • 0.16.2 Add Power_Parent and Parent_Children properties in LOFAR devices
  • 0.16.1 AntennaField: Do not put device in FAULT if an attribute cannot be read/written. AntennaField: Avoid archiving HBA-specific attributes for LBA fields.
  • 0.16.0 Observation: Removed antenna mask from specification DigitalBeam: Removed beamlet and antenna selection
  • 0.15.0 Split recv device into rcu2h and rcu2l and split recv-sim translator into rcu2h-sim and rcu2l-sim
  • 0.14.0 Create async device base and make tilebeam and digitalbeam async device servers, allowing for cooperative multitasking and preventing issues with beamtracking.
  • 0.13.1 Upgrade PyTango to 9.4.x and ensure it is installed through requirements.txt
  • 0.13.0 Remove all archiver-timescale, hdbppts-cm, hdbppts-es functionalities
  • 0.12.1 Add AbstractHierarchy and AbstractHierarchyDevice classes and functionality
  • 0.12.0 Add Calibration_SDP_Subband_Weights_<XXX>MHz_R attributes to implement HDF5 calibration tables
  • 0.11.2 Fix sleep duration in archiver test
  • 0.11.1 Fix event unsubscription in TemperatureManager
  • 0.11.0 Added StationManager device
  • 0.10.0 Add AntennaToSdpMapper and fpga_sdp_info_* mapped attributes in Antennafield device
  • 0.9.0 Statistics writer: moved the whole functionality to lofar-station-client repository
  • 0.8.0 Statistics writer: HDF5 format overhaul (removed values, added and moved attributes), Statistics writer: Added --field parameter to record statistics of a specific AntennaField, AntennaField: Added RCU_DTH_on_R, RCU_DTH_freq_R(W), RCU_band_select_R, RCU_attenuator_dB_R.
  • 0.7.2 Added sdp.subband_frequency_R, antennafield.Frequency_Band_RW, and support for spectral inversion
  • 0.7.1 Add restore backup configuration for Configuration device
  • 0.7.0 Raised required Python version to 3.10
  • 0.6.0 Changed recv.ANT_mask_RW and recv.ANT_mask_R into 32x3 matrices
  • 0.5.1 Add loading and updating methods for Configuration device
  • 0.5.0 Add Configuration device
  • 0.4.1 Fix for missing SDP attributes for spectral inversion
  • 0.4.0 Have most containers report health status and add make await command
  • 0.3.1 Fix for applying boot device dsconfig
  • 0.3.0 Initial version of deployment scripts and functionality
  • 0.2.0 Extend Beamlet device with FPGA source address attributes
  • 0.1.2 Fix StatisticsClient accessing last_invalid_packet_exception parameter