Skip to content
Snippets Groups Projects
Commit 89936d7b authored by Samuel Twum's avatar Samuel Twum
Browse files

Merge branch 'sar-277-update-docs-with-examples-for-lrc' into 'main'

SAR-277 Document LRC Usage and Implementation

See merge request ska-telescope/ska-tango-base!68
parents 8fd224e4 b2971639
No related branches found
No related tags found
No related merge requests found
......@@ -6,3 +6,4 @@ Developer Guide
Getting started<getting_started>
Components and component managers<component_managers>
Long Running Commands<long_running_command>
=====================
Long Running Commands
=====================
Some SKA commands interact with hardware systems that have some inherent delays
in their responses. Such commands block concurrent access to TANGO devices and
affect the overall performance (responsiveness) of the device to other requests.
To address this, the base device has a worker thread/queue implementation for
long running commands (LRCs) to allow concurrent access to TANGO devices.
.. note:: Long Running Command: A TANGO command for which the execution time
is in the order of seconds (CS Guidelines recommends less than 10 ms).
In this context it also means a command which is implemented to execute
asynchronously. Long running and asynchronous are used interchangeably in
this text and the code base. In the event where the meaning differ it will
be explained but both mean non-blocking.
This means that devices return immediately with a response while busy with the
actual task in the background or parked on a queue pending the next available worker.
The number of commands that can be enqueued depends on a configurable maximum queue
size of the device. Commands enqueued when the queue is full will be rejected.
New attributes and commands have been added to the base device to support the
mechanism to execute long running TANGO commands asynchronously.
Reference Design for the Implementation of Long Running Commands
----------------------------------------------------------------
A message queue solution is the backbone to the implementation of the LRC design. The goal
is to have a hybrid solution which will have the queue usage as an opt in. With the default option,
note that the enqueued commands will block short running commands, reply to attribute reads and writes,
process subscription requests until completed. That said, the SKABaseDevice meets the following
requirements for executing long running commands:
* With no queue (default):
* start executing LRC if another LRC is not currently executing
* reject the LRC if another LRC is currently executing
* With queue enabled:
* enqueue the LRC if the queue is not full
* reject the LRC if the queue is full
* execute the LRCs in the order which they have been enqueued (FIFO)
* Interrupt LRCs:
* abort the execution of currently executing LRCs
* flush enqueued LRCs
Monitoring Progress of Long Running Commands
--------------------------------------------
In addition to the listed requirements above, the device should provide monitoring points
to allow clients determine when a LRC is received, executing or completed (success or fail).
LRCs can assume any of the following defined task states: QUEUED, IN_PROGRESS, ABORTED,
COMPLETED, FAILED, NOT_ALLOWED. NOT_FOUND is returned for command IDs that are non-existent.
.. uml:: lrc_command_state.uml
A new set of attributes and commands have been added to the base device to enable
monitoring and reporting of result, status and progress of LRCs.
**LRC Attributes**
+-----------------------------+-------------------------------------------------+----------------------+
| Attribute | Example Value | Description |
+=============================+=================================================+======================+
| longRunningCommandsInQueue | ('StandbyCommand', 'OnCommand', 'OffCommand') | Keeps track of which |
| | | commands are on the |
| | | queue |
+-----------------------------+-------------------------------------------------+----------------------+
| longRunningCommandIDsInQueue|('1636437568.0723004_235210334802782_OnCommand', | Keeps track of IDs in|
| | | the queue |
| |1636437789.493874_116219429722764_OffCommand) | |
+-----------------------------+-------------------------------------------------+----------------------+
| longRunningCommandStatus | ('1636437568.0723004_235210334802782_OnCommand',| ID, status pair of |
| | 'IN_PROGRESS', | the currently |
| | | executing commands |
| | '1636437789.493874_116219429722764_OffCommand', | |
| | 'IN_PROGRESS') | |
+-----------------------------+-------------------------------------------------+----------------------+
| longRunningCommandProgress | ('1636437568.0723004_235210334802782_OnCommand',| ID, progress pair of |
| | '12', | the currently |
| | | executing commands |
| | '1636437789.493874_116219429722764_OffCommand', | |
| | '1') | |
+-----------------------------+-------------------------------------------------+----------------------+
| longRunningCommandResult | ('1636438076.6105473_101143779281769_OnCommand',| ID, ResultCode, |
| | '0', 'OK') | result of the |
| | | completed command |
+-----------------------------+-------------------------------------------------+----------------------+
**LRC Commands**
+-------------------------------+------------------------------+
| Command | Description |
+===============================+==============================+
| CheckLongRunningCommandStatus | Check the status of a long |
| | running command by ID |
+-------------------------------+------------------------------+
| AbortCommands | Abort the currently executing|
| | LRCs and remove all enqueued |
| | LRCs |
+-------------------------------+------------------------------+
In addition to the set of commands in the table above, a number of candidate SKA
commands in the base device previously implemented as blocking commands have been
converted to execute as long running commands (asynchronously), viz: Standby, On, Off,
Reset and GetVersionInfo.
The device has change events configured for all the LRC attributes which clients can use to track
their requests. **The client has the responsibility of subscribing to events to receive changes on
command status and results**. To make monitoring easier, there's an interface (LongRunningDeviceInterface)
which can be used to track attribute subscriptions and command IDs for a list of specified devices.
More about this interface can be found in `utils <https://gitlab.com/ska-telescope/ska-tango-base/-/blob/main/src/ska_tango_base/utils.py#L566>`_.
UML Illustration
----------------
Multiple Clients Invoke Multiple Long Running Commands
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. uml:: lrc_scenario.uml
Implementing a TANGO Command as Long Running
--------------------------------------------
The LRC update is a drop-in replacement of the current base device implementation.
The base device provisions a QueueManager which has no threads and no queue. Existing device
implementations will execute commands in the same manner unless your component manager
specifies otherwise. Summarised in a few points, you would do the following to implement
TANGO commands as long running:
1. Create a component manager with queue size and thread determined.
2. Create the command class for your tango command.
3. Use the component manager to enqueue your command in the command class.
Example Device Implementing Long Running Command
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: py
class DeviceWithLongRunningCommands(SKABaseDevice):
...
def create_component_manager(self):
return SampleComponentManager(
op_state_model=self.op_state_model,
logger=self.logger,
max_queue_size=20,
num_workers=3,
push_change_event=self.push_change_event,
)
.. note:: SampleComponentManager does not have access to the tango layer.
In order to send LRC attribute updates, provide a copy of the device's `push_change_event`
method to its constructor.
then to enqueue your command:
.. code-block:: py
class PerformLongTaskCommand(ResponseCommand):
"""The command class for PerformLongTask command."""
def do(self):
"""Download telescope data from the internet"""
download_tel_data()
@command(
dtype_in=None,
dtype_out="DevVarLongStringArray",
)
@DebugIt()
def PerformLongTask(self):
"""Command that queues a task that downloads data
:return: A tuple containing a return code and a string
message indicating status. The message is for
information purpose only.
:rtype: (ResultCode, str)
"""
handler = self.get_command_object("PerformLongTask")
# Enqueue here
unique_id, result_code = self.component_manager.enqueue(handler)
return [[result_code], [unique_id]]
[*] -> QUEUED : queued
QUEUED -> IN_PROGRESS : starts executing
IN_PROGRESS --> COMPLETED : completed normally
IN_PROGRESS --> FAILED : completed abnormally
IN_PROGRESS -> ABORTED : aborted
IN_PROGRESS -> NOT_ALLOWED : not allowed
state join <<join>>
QUEUED --> ABORTED : aborted
FAILED -> join
ABORTED -> join
NOT_ALLOWED -> join
COMPLETED -> join
join -> [*]
@startuml
participant Client2 as c2
participant Client1 as c1
participant SKADevice as d
entity Queue as q
participant Worker as w
== First Client Request ==
c1 -> d: Subscribe to attr to get result notification of LongRunningCommand
c1 -> d : LongRunningCommand
d -> d : Check queue capacity
d -> q : enqueue task LongRunningCommandTask
rnote over q
Queue:
LongRunningCommandTask
endrnote
d -> c1 : Response QUEUED LongRunningCommand, Task ID 101
== Second Client Request ==
c2 -> d: Subscribe to attr to get result notification of OtherLongRunningCommand
c2 -> d : OtherLongRunningCommand
d -> d : Check queue capacity
d -> q : enqueue task OtherLongRunningCommandTask
rnote over q
Queue:
LongRunningCommandTask
OtherLongRunningCommandTask
endrnote
d -> c2 : Response QUEUED OtherLongRunningCommandTask, Task ID 102
== Processing tasks ==
q -> w : dequeue LongRunningCommandTask
rnote over q
Queue:
OtherLongRunningCommandTask
endrnote
activate w
w -> d : LongRunningCommandTask result
deactivate w
d -> d : push_change_event (ID 101) on attr
d <--> c1 : on_change event with result (ID 101, some_result)
d <--> c2 : on_change event with result (ID 101, some_result)
c2 -> c2 : Not interested in 101, ignoring
q -> w : dequeue OtherLongRunningCommandTask
rnote over q
Queue:
<empty>
endrnote
activate w
w -> d : OtherLongRunningCommandTask result
deactivate w
d -> d : push_change_event (ID 102) on attr
d <--> c2 : on_change event with result (ID 102, some_result)
d <--> c1 : on_change event with result (ID 102, some_result)
c1 -> c1 : Not interested in 102, ignoring
@enduml
......@@ -85,18 +85,6 @@ class _Log4TangoLoggingLevel(enum.IntEnum):
DEBUG = 600
class LongRunningCommandState(enum.IntEnum):
"""The state of the long running command."""
QUEUED = 0
IN_PROGRESS = 1
ABORTED = 2
NOT_FOUND = 3
OK = 4
FAILED = 5
NOT_ALLOWED = 6
_PYTHON_TO_TANGO_LOGGING_LEVEL = {
logging.CRITICAL: _Log4TangoLoggingLevel.FATAL,
logging.ERROR: _Log4TangoLoggingLevel.ERROR,
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment