diff --git a/docs/source/guide/index.rst b/docs/source/guide/index.rst index 2d0b701bba26080cce22b54ff11659da1fbacccc..6ca8c338d4fb6cc250d79752861bbdcd11b5f29d 100644 --- a/docs/source/guide/index.rst +++ b/docs/source/guide/index.rst @@ -6,3 +6,4 @@ Developer Guide Getting started<getting_started> Components and component managers<component_managers> + Long Running Commands<long_running_command> diff --git a/docs/source/guide/long_running_command.rst b/docs/source/guide/long_running_command.rst new file mode 100644 index 0000000000000000000000000000000000000000..e1ce182aca5dba0c23fd9e60f342a7da1bd1875e --- /dev/null +++ b/docs/source/guide/long_running_command.rst @@ -0,0 +1,183 @@ +===================== +Long Running Commands +===================== + +Some SKA commands interact with hardware systems that have some inherent delays +in their responses. Such commands block concurrent access to TANGO devices and +affect the overall performance (responsiveness) of the device to other requests. +To address this, the base device has a worker thread/queue implementation for +long running commands (LRCs) to allow concurrent access to TANGO devices. + +.. note:: Long Running Command: A TANGO command for which the execution time + is in the order of seconds (CS Guidelines recommends less than 10 ms). + In this context it also means a command which is implemented to execute + asynchronously. Long running and asynchronous are used interchangeably in + this text and the code base. In the event where the meaning differ it will + be explained but both mean non-blocking. + +This means that devices return immediately with a response while busy with the +actual task in the background or parked on a queue pending the next available worker. +The number of commands that can be enqueued depends on a configurable maximum queue +size of the device. Commands enqueued when the queue is full will be rejected. + + +New attributes and commands have been added to the base device to support the +mechanism to execute long running TANGO commands asynchronously. + +Reference Design for the Implementation of Long Running Commands +---------------------------------------------------------------- +A message queue solution is the backbone to the implementation of the LRC design. The goal +is to have a hybrid solution which will have the queue usage as an opt in. With the default option, +note that the enqueued commands will block short running commands, reply to attribute reads and writes, +process subscription requests until completed. That said, the SKABaseDevice meets the following +requirements for executing long running commands: + +* With no queue (default): + * start executing LRC if another LRC is not currently executing + * reject the LRC if another LRC is currently executing +* With queue enabled: + * enqueue the LRC if the queue is not full + * reject the LRC if the queue is full + * execute the LRCs in the order which they have been enqueued (FIFO) +* Interrupt LRCs: + * abort the execution of currently executing LRCs + * flush enqueued LRCs + +Monitoring Progress of Long Running Commands +-------------------------------------------- +In addition to the listed requirements above, the device should provide monitoring points +to allow clients determine when a LRC is received, executing or completed (success or fail). +LRCs can assume any of the following defined task states: QUEUED, IN_PROGRESS, ABORTED, +COMPLETED, FAILED, NOT_ALLOWED. NOT_FOUND is returned for command IDs that are non-existent. + +.. uml:: lrc_command_state.uml + +A new set of attributes and commands have been added to the base device to enable +monitoring and reporting of result, status and progress of LRCs. + +**LRC Attributes** + ++-----------------------------+-------------------------------------------------+----------------------+ +| Attribute | Example Value | Description | ++=============================+=================================================+======================+ +| longRunningCommandsInQueue | ('StandbyCommand', 'OnCommand', 'OffCommand') | Keeps track of which | +| | | commands are on the | +| | | queue | ++-----------------------------+-------------------------------------------------+----------------------+ +| longRunningCommandIDsInQueue|('1636437568.0723004_235210334802782_OnCommand', | Keeps track of IDs in| +| | | the queue | +| |1636437789.493874_116219429722764_OffCommand) | | ++-----------------------------+-------------------------------------------------+----------------------+ +| longRunningCommandStatus | ('1636437568.0723004_235210334802782_OnCommand',| ID, status pair of | +| | 'IN_PROGRESS', | the currently | +| | | executing commands | +| | '1636437789.493874_116219429722764_OffCommand', | | +| | 'IN_PROGRESS') | | ++-----------------------------+-------------------------------------------------+----------------------+ +| longRunningCommandProgress | ('1636437568.0723004_235210334802782_OnCommand',| ID, progress pair of | +| | '12', | the currently | +| | | executing commands | +| | '1636437789.493874_116219429722764_OffCommand', | | +| | '1') | | ++-----------------------------+-------------------------------------------------+----------------------+ +| longRunningCommandResult | ('1636438076.6105473_101143779281769_OnCommand',| ID, ResultCode, | +| | '0', 'OK') | result of the | +| | | completed command | ++-----------------------------+-------------------------------------------------+----------------------+ + + +**LRC Commands** + ++-------------------------------+------------------------------+ +| Command | Description | ++===============================+==============================+ +| CheckLongRunningCommandStatus | Check the status of a long | +| | running command by ID | ++-------------------------------+------------------------------+ +| AbortCommands | Abort the currently executing| +| | LRCs and remove all enqueued | +| | LRCs | ++-------------------------------+------------------------------+ + +In addition to the set of commands in the table above, a number of candidate SKA +commands in the base device previously implemented as blocking commands have been +converted to execute as long running commands (asynchronously), viz: Standby, On, Off, +Reset and GetVersionInfo. + +The device has change events configured for all the LRC attributes which clients can use to track +their requests. **The client has the responsibility of subscribing to events to receive changes on +command status and results**. To make monitoring easier, there's an interface (LongRunningDeviceInterface) +which can be used to track attribute subscriptions and command IDs for a list of specified devices. +More about this interface can be found in `utils <https://gitlab.com/ska-telescope/ska-tango-base/-/blob/main/src/ska_tango_base/utils.py#L566>`_. + +UML Illustration +---------------- + +Multiple Clients Invoke Multiple Long Running Commands +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +.. uml:: lrc_scenario.uml + +Implementing a TANGO Command as Long Running +-------------------------------------------- +The LRC update is a drop-in replacement of the current base device implementation. +The base device provisions a QueueManager which has no threads and no queue. Existing device +implementations will execute commands in the same manner unless your component manager +specifies otherwise. Summarised in a few points, you would do the following to implement +TANGO commands as long running: + +1. Create a component manager with queue size and thread determined. + +2. Create the command class for your tango command. + +3. Use the component manager to enqueue your command in the command class. + +Example Device Implementing Long Running Command +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +.. code-block:: py + + class DeviceWithLongRunningCommands(SKABaseDevice): + ... + def create_component_manager(self): + + return SampleComponentManager( + op_state_model=self.op_state_model, + logger=self.logger, + max_queue_size=20, + num_workers=3, + push_change_event=self.push_change_event, + ) + +.. note:: SampleComponentManager does not have access to the tango layer. + In order to send LRC attribute updates, provide a copy of the device's `push_change_event` + method to its constructor. + +then to enqueue your command: + +.. code-block:: py + + class PerformLongTaskCommand(ResponseCommand): + """The command class for PerformLongTask command.""" + + def do(self): + """Download telescope data from the internet""" + download_tel_data() + + @command( + dtype_in=None, + dtype_out="DevVarLongStringArray", + ) + @DebugIt() + def PerformLongTask(self): + """Command that queues a task that downloads data + + :return: A tuple containing a return code and a string + message indicating status. The message is for + information purpose only. + :rtype: (ResultCode, str) + """ + handler = self.get_command_object("PerformLongTask") + + # Enqueue here + unique_id, result_code = self.component_manager.enqueue(handler) + + return [[result_code], [unique_id]] diff --git a/docs/source/guide/lrc_command_state.uml b/docs/source/guide/lrc_command_state.uml new file mode 100644 index 0000000000000000000000000000000000000000..9840b12fd8a68a247a2ac1911fa00c8a7772d4e9 --- /dev/null +++ b/docs/source/guide/lrc_command_state.uml @@ -0,0 +1,13 @@ +[*] -> QUEUED : queued +QUEUED -> IN_PROGRESS : starts executing +IN_PROGRESS --> COMPLETED : completed normally +IN_PROGRESS --> FAILED : completed abnormally +IN_PROGRESS -> ABORTED : aborted +IN_PROGRESS -> NOT_ALLOWED : not allowed +state join <<join>> +QUEUED --> ABORTED : aborted +FAILED -> join +ABORTED -> join +NOT_ALLOWED -> join +COMPLETED -> join +join -> [*] diff --git a/docs/source/guide/lrc_scenario.uml b/docs/source/guide/lrc_scenario.uml new file mode 100644 index 0000000000000000000000000000000000000000..5268037ca1fdd0148813c5b092ac3c8278d597a9 --- /dev/null +++ b/docs/source/guide/lrc_scenario.uml @@ -0,0 +1,63 @@ +@startuml + +participant Client2 as c2 +participant Client1 as c1 +participant SKADevice as d +entity Queue as q +participant Worker as w + +== First Client Request == + +c1 -> d: Subscribe to attr to get result notification of LongRunningCommand +c1 -> d : LongRunningCommand +d -> d : Check queue capacity +d -> q : enqueue task LongRunningCommandTask +rnote over q + Queue: + LongRunningCommandTask +endrnote +d -> c1 : Response QUEUED LongRunningCommand, Task ID 101 +== Second Client Request == + +c2 -> d: Subscribe to attr to get result notification of OtherLongRunningCommand +c2 -> d : OtherLongRunningCommand +d -> d : Check queue capacity +d -> q : enqueue task OtherLongRunningCommandTask +rnote over q + Queue: + LongRunningCommandTask + OtherLongRunningCommandTask +endrnote +d -> c2 : Response QUEUED OtherLongRunningCommandTask, Task ID 102 + +== Processing tasks == + +q -> w : dequeue LongRunningCommandTask +rnote over q + Queue: + OtherLongRunningCommandTask +endrnote +activate w + +w -> d : LongRunningCommandTask result +deactivate w +d -> d : push_change_event (ID 101) on attr +d <--> c1 : on_change event with result (ID 101, some_result) +d <--> c2 : on_change event with result (ID 101, some_result) +c2 -> c2 : Not interested in 101, ignoring + +q -> w : dequeue OtherLongRunningCommandTask +rnote over q + Queue: + <empty> +endrnote +activate w + +w -> d : OtherLongRunningCommandTask result +deactivate w +d -> d : push_change_event (ID 102) on attr +d <--> c2 : on_change event with result (ID 102, some_result) +d <--> c1 : on_change event with result (ID 102, some_result) +c1 -> c1 : Not interested in 102, ignoring + +@enduml diff --git a/src/ska_tango_base/base/base_device.py b/src/ska_tango_base/base/base_device.py index b9c904d174bb8e225bb9a07edfa3f3a8fd5766cc..f1d0e8a3bcc46d24a6b71229e929c24ba7ae62ca 100644 --- a/src/ska_tango_base/base/base_device.py +++ b/src/ska_tango_base/base/base_device.py @@ -85,18 +85,6 @@ class _Log4TangoLoggingLevel(enum.IntEnum): DEBUG = 600 -class LongRunningCommandState(enum.IntEnum): - """The state of the long running command.""" - - QUEUED = 0 - IN_PROGRESS = 1 - ABORTED = 2 - NOT_FOUND = 3 - OK = 4 - FAILED = 5 - NOT_ALLOWED = 6 - - _PYTHON_TO_TANGO_LOGGING_LEVEL = { logging.CRITICAL: _Log4TangoLoggingLevel.FATAL, logging.ERROR: _Log4TangoLoggingLevel.ERROR,