Skip to content
Snippets Groups Projects
Commit 786bae09 authored by Stefano Di Frischia's avatar Stefano Di Frischia
Browse files

Merge branch 'master' into L2SS-406-grafana-archiver

parents cee2be28 4ae2044a
No related branches found
No related tags found
1 merge request!190Resolve L2SS-406 "Grafana archiver"
# Test purpose
Tango Controls Device Servers (DS) can automatically poll device attribute values. The default for the polling is a single thread that runs in the DS process. It has the purpose to call the `read` function of every attribute in all devices that run in the DS.
The highest polling rate among all the attributes in all devices determines how often the polling thread runs. This can lead to a situation where the single polling thread is unable to finish executing all attribute `read` functions. When that happens, the attributes that did not have their `read` function executed will not have updated values. Since the polling thread always follows the same order of attributes, this will lead to some attribute values never be updated.
We investigate whether using more polling threads can alleviate the situation for a single or multiple devices that run in the same DS.
# References
Please refer to the reference documents for in-depth information about the automatic polling and how to configure a DS to use a dedicated thread per device for polling.
[Attribute polling in Tango Controls](https://tango-controls.readthedocs.io/en/latest/development/device-api/ds-guideline/device-server-guidelines.html#tango-polling-mechanism)
[Device polling in Tango Controls](https://tango-controls.readthedocs.io/en/latest/development/device-api/device-polling.html)
[Configuring a DS to use per Device threads in polling](https://tango-controls.readthedocs.io/en/latest/development/advanced/reference.html#dserver-class-device-properties)
# Test set-up
- Two devices run in the same DS.
- Tango DB is modified to have a DS named `monitoring_performance_test/1`.
- Tango DB is modified to have the DS named `monitoring_performance_test/1` run two devices named `test/monitoring_performance/1` and `test/monitoring_performance/2`. Both devices instantiate the same class `Monitoring_Performance_Device`.
- Execute the DS like this: `bin/start-DS.sh devices/test/devices/automatic_polling_performance_test/monitoring_performance_test.py 1`
- Get a DeviceProxy object to both devices like this:
```python
d1 = DeviceProxy('test/monitoring_performance/1')
d2 = DeviceProxy('test/monitoring_performance/2')
```
This will execute the device code and perform the automatic polling.
- Devices in the appended data (section Data) are labelled d1 and d2.
- Each device has 4 read-only attributes (MPs) that are arrays with 2e6 doubles.
- In Tango DB automatic polling every 1s is enabled for each array.
- Two scenarios:
1. On read a random number gets generated and the entire array is populated with it. Populating the array with a new random number on every read access prevents caching.
2. A 0-filled array is created in init_device and copied to the attribute when read is called.
- The number of polling threads is adjusted in init_device. The number of polling threads is a DS setting. Due to inconsistencies how Tango Controls handles input parameters, it is nor possible to pass parameters to devices.
- Number of polling threads: 1, 10, 100
# Test execution
- The DS source code is modified for the number of polling threads according to the test set-up outlined above.
- The DS is started manually.
- The attribute polling resumes automatically as soon as the device state is set to ON in init_device.
- The test script creates two Python processes that are assigned to one of the two devices each.
- The process creates a `DeviceProxy` object for its device and runs
- The process executes `attribute_polling_stats` and the results are printed out.
- The process exits.
- The DS is allowed to run for approximately 10 polling iterations.
- The DS processes will print statistics about the polling.
- The test is manually stopped and the output copied.
Test results are in the attached file.
# Findings
The tests have shown that polling gets significantly delayed under certain circumstances:
- When the read function takes a longer time to return an attribute's value.
Examples that have been tried out in this test but that are not excluding other reasons:
- Creating a numpy array of size 2e6 on the fly with the same random number in every value.
- Reading of a 2e5 values array from an OPC-UA server and converting it to a numpy array.
From this finding other reasons that will have a negative impact on polling become immediately obvious and need to be avoided no matter what:
- Fetching data on-the-fly over a slow communication link in an attribute's ` read` function.
- Fetching a big amount of data on-the-fly over a fast communication link in an attribute's ` read` function.
- Computing data on-the-fly in an attribute's `read`function.
Adding more polling threads to a DS does not alleviate this situation. The reason lies in the way how Tango Controls polling works. As a default the polling is performed by one thread per DS or, if the polling thread pool size is increased, by at most one thread per device. Adding twice as many polling threads as devices that are running in a DS does not change the situation as the data suggests.
# Recommendation
For devices that contain attributes with values that are big in byte size, i.e. arrays of significant size, it is strongly recommended to assess the situation, i.e. measure how long reading the data over a communication link takes. If it is essential to poll a high volume attribute at a polling rate that exceeds the performance capabilities of the DS's polling thread, several options are viable:
- Distribute high volume attributes among separate devices which run in their own DS.
If necessary create more and more devices with less and less attributes until the desired polling rate can be accomplished. Even if this means that each high volume attribute exists in its own device. To Tango Controls or to device clients it does not matter if a device contains one or many attributes.
- Distribute high volume attributes among separate devices but continue running them in the same DS. It is necessary to increase the number of polling threads to at least the number of devices in the DS.
The two solutions above are mutually exclusive. Other solutions that alleviate the load on single polling threads can be added to either of the above as well:
- Lower the polling rate so that the polling thread has more time to perform calling the attribute `read` functions. This means that an attribute's read function is allowed to take longer to perform its tasks. Note that this does not solve the original problem that the `read` function is just too slow when it is called.
- Move the updating of attribute values to their own process. Then the read function can return the current value immediately because the value gets independently updated.
There is an entirely different way to lower the pressure on the automatic polling: manual polling, i.e. performing everything that the polling thread does for a selected set of attributes. Tango Controls allows to manually send events. This opens the possibility to perform the reading of values over communication links, checking for value changes and sending out of archive or on-change events from separate threads or processes when it is convenient.
# Executive summary
Even Tango Controls cannot perform magic. High volume attributes cannot be polled and infinite velocity. One has to distribute the polling load either over more DS, more threads within a DS or more processes within a DS.
# Data
Filling the array with the same newly created random number on ever read.
threads = 1
d1
iterations = 10
Polling duration
min = 0.7053400000000001[s]
max = 0.7174940000000001[s]
median = 0.7123280000000001[s]
mean = 0.7121181[s]
stddev = 0.004004320403014733[s]
Polling delay
min = 0.792[s]
max = 2.207[s]
median = 0.8115[s]
mean = 1.2221[s]
stddev = 0.6406804897919087[s]
d2
iterations = 10
Polling duration
min = 0.689903[s]
max = 0.715033[s]
median = 0.7069909999999999[s]
mean = 0.7061663000000001[s]
stddev = 0.00792590103458277[s]
Polling delay
min = 0.744[s]
max = 2.245[s]
median = 0.758[s]
mean = 1.2010999999999998[s]
stddev = 0.681659805181441[s]
threads = 10
d1
iterations = 10
Polling duration
min = 0.700119[s]
max = 0.7102459999999999[s]
median = 0.710067[s]
mean = 0.7068808[s]
stddev = 0.004127314376201529[s]
Polling delay
min = 0.802[s]
max = 2.196[s]
median = 0.806[s]
mean = 1.2213[s]
stddev = 0.6370044034384692[s]
d2
iterations = 10
Polling duration
min = 0.6984130000000001[s]
max = 0.706296[s]
median = 0.7044239999999999[s]
mean = 0.7036658000000001[s]
stddev = 0.0025871636902213896[s]
Polling delay
min = 0.758[s]
max = 2.24[s]
median = 0.759[s]
mean = 1.3504[s]
stddev = 0.7247257688256988[s]
threads = 100
d1
iterations = 10
Polling duration
min = 0.690158[s]
max = 0.720522[s]
median = 0.7119365[s]
mean = 0.7107762[s]
stddev = 0.008783150821886167[s]
Polling delay
min = 0.79[s]
max = 2.209[s]
median = 0.8[s]
mean = 1.2176000000000002[s]
stddev = 0.6462041782594724[s]
d2
iterations = 10
Polling duration
min = 0.702939[s]
max = 0.724869[s]
median = 0.7119840000000001[s]
mean = 0.7122735[s]
stddev = 0.006137572716473502[s]
Polling delay
min = 0.749[s]
max = 2.25[s]
median = 0.755[s]
mean = 1.2005[s]
stddev = 0.6824934065615579[s]
Returning a 0-filled array that was created in init_device
threads = 100
d1
iterations = 10
Polling duration
min = 0.005712[s]
max = 0.008997999999999999[s]
median = 0.0065065[s]
mean = 0.006732[s]
stddev = 0.0009050982267135427[s]
Polling delay
min = 0.998[s]
max = 1.001[s]
median = 1.0[s]
mean = 0.9997[s]
stddev = 0.0007810249675906477[s]
d2
iterations = 10
Polling duration
min = 0.0062759999999999995[s]
max = 0.008672000000000001[s]
median = 0.0069180000000000005[s]
mean = 0.0070902[s]
stddev = 0.0007260824746542229[s]
Polling delay
min = 0.996[s]
max = 1.003[s]
median = 0.999[s]
mean = 0.9997[s]
stddev = 0.002491987158875375[s]
\ No newline at end of file
{
"servers":
{
"monitoring_performance_test":
{
"1":
{
"Monitoring_Performance_Device":
{
"test/monitoring_performance/1":
{
},
"test/monitoring_performance/2":
{
}
}
}
}
}
}
# -*- coding: utf-8 -*-
#
# This file is part of the LOFAR2.0 project
#
#
#
# Distributed under the terms of the APACHE license.
# See LICENSE.txt for more info.
# TODO(Corne): Remove sys.path.append hack once packaging is in place!
import os, sys
currentdir = os.path.dirname(os.path.realpath(__file__))
parentdir = os.path.dirname(currentdir)
parentdir = os.path.dirname(parentdir)
sys.path.append(parentdir)
import time
import numpy
from tango import DevState, Util
from tango.server import run, Device, attribute
from numpy import random
__all__ = ["Monitoring_Performance_Device", "main"]
POLLING_THREADS = 100
ARRAY_SIZE = 2000000
class Monitoring_Performance_Device(Device):
global ARRAY_SIZE
def read_array(self):
print("{} {}".format(time.time(), self.get_name()))
return self._array
array1_r = attribute(
dtype = (numpy.double,),
max_dim_x = ARRAY_SIZE,
period = 1000,
rel_change = 0.1,
archive_period = 1000,
archive_rel_change = 0.1,
max_value = 1.0,
min_value = 0.0,
fget = read_array,
)
array2_r = attribute(
dtype = (numpy.double,),
max_dim_x = ARRAY_SIZE,
period = 1000,
rel_change = 0.1,
archive_period = 1000,
archive_rel_change = 0.1,
max_value = 1.0,
min_value = 0.0,
fget = read_array,
)
array3_r = attribute(
dtype = (numpy.double,),
max_dim_x = ARRAY_SIZE,
period = 1000,
rel_change = 0.1,
archive_period = 1000,
archive_rel_change = 0.1,
max_value = 1.0,
min_value = 0.0,
fget = read_array,
)
array4_r = attribute(
dtype = (numpy.double,),
max_dim_x = ARRAY_SIZE,
period = 1000,
rel_change = 0.1,
archive_period = 1000,
archive_rel_change = 0.1,
max_value = 1.0,
min_value = 0.0,
fget = read_array,
)
def init_device(self):
Device.init_device(self)
util = Util.instance()
print("Current polling thread pool size = {}".format(util.get_polling_threads_pool_size()))
util.set_polling_threads_pool_size(POLLING_THREADS)
print("New polling thread pool size = {}".format(util.get_polling_threads_pool_size()))
print("Array size = {}".format(ARRAY_SIZE))
self.set_state(DevState.OFF)
self._array = numpy.zeros(ARRAY_SIZE)
self.array1_r.set_data_ready_event(True)
self.set_change_event("array1_r", True, True)
self.set_archive_event("array1_r", True, True)
self.array2_r.set_data_ready_event(True)
self.set_change_event("array2_r", True, True)
self.set_archive_event("array2_r", True, True)
self.array3_r.set_data_ready_event(True)
self.set_change_event("array3_r", True, True)
self.set_archive_event("array3_r", True, True)
self.array4_r.set_data_ready_event(True)
self.set_change_event("array4_r", True, True)
self.set_archive_event("array4_r", True, True)
self.set_state(DevState.ON)
def delete_device(self):
self.set_state(DevState.OFF)
def main(args = None, **kwargs):
return run((Monitoring_Performance_Device, ), args = args, **kwargs)
if __name__ == '__main__':
main()
import numpy
import tango
def attribute_polling_stats(dp: tango._tango.DeviceProxy = None, iterations: int = 10, polling_time: float = 1.0, quiet = False):
if dp is not None:
print('Will sample the device server\'s polling time {} times with a pause of {}s between each sampling.'.format(iterations, polling_time))
else:
print('A DeviceProxy object is needed!')
return
import numpy
from time import sleep
polling_durations = []
polling_delays = []
value = numpy.double(0)
iterations_left = iterations
while iterations_left > 0:
iterations_left -= 1
string = dp.polling_status()[0].split('\n')
polling_duration = numpy.double(string[3].split('=')[-1].strip()) / 1e3
polling_delay = numpy.double(string[5].split('=')[-1].split(',')[0].strip()) / 1e3
polling_durations.append(polling_duration)
polling_delays.append(polling_delay)
if not quiet:
print('Iteration #{}, {} iterations left, polling duration = {}s, polling delay = {}s.'.format(iterations - iterations_left, iterations_left, polling_duration, polling_delay))
sleep(polling_time)
durations = numpy.array(polling_durations)
delays = numpy.array(polling_delays)
def compute_and_print(result):
min = numpy.min(result)
max = numpy.max(result)
median = numpy.median(result)
mean = numpy.mean(result)
std = numpy.std(result)
print("\tmin = {}[s]\n\tmax = {}[s]\n\tmedian = {}[s]\n\tmean = {}[s]\n\tstddev = {}[s]".format(min, max, median, mean, std))
print("\n\titerations = {}\n\n\tPolling duration".format(iterations))
compute_and_print(durations)
print("\n\tPolling delay")
compute_and_print(delays)
return (durations, delays)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment