Skip to content
Snippets Groups Projects
Select Git revision
  • bc7c3277563383067a4a5dddfcb303d6b6eab9f0
  • master default protected
  • set_hba_element_power
  • L2SS-2199-apply-dab-to-xy
  • L2SS-2417-more-vector-memory
  • test-pytango-10.0.3
  • revert-cs032-ccd-ip
  • deploy-components-parallel
  • fix-chrony-exporter
  • L2SS-2407-swap-iers-caltable-monitoring-port
  • L2SS-2357-fix-ruff
  • sync-up-with-meta-pypcc
  • stabilise-landing-page
  • all-stations-lofar2
  • v0.39.7-backports
  • Move-sdptr-to-v1.5.0
  • fix-build-ubuntu
  • tokens-in-env-files
  • fix-build
  • L2SS-2214-deploy-cdb
  • fix-missing-init
  • v0.55.5-r2 protected
  • v0.52.8-rc1 protected
  • v0.55.5 protected
  • v0.55.4 protected
  • 0.55.2.dev0
  • 0.55.1.dev0
  • 0.55.0.dev0
  • v0.54.0 protected
  • 0.53.2.dev0
  • 0.53.1.dev0
  • v0.52.3-r2 protected
  • remove-snmp-client
  • v0.52.3 protected
  • v0.52.3dev0 protected
  • 0.53.1dev0
  • v0.52.2-rc3 protected
  • v0.52.2-rc2 protected
  • v0.52.2-rc1 protected
  • v0.52.1.1 protected
  • v0.52.1 protected
41 results

lts_cold_start.py

Blame
  • Annyas's avatar
    L2SS-357:Renamed PCC to RECV
    Jasper Annyas authored
    c4d52d71
    History
    Code owners
    Assign users and groups as approvers for specific file changes. Learn more.
    lts_cold_start.py 9.67 KiB
    #! /usr/bin/env python3
    import logging
    from time import sleep
    
    # TODO(Corne): Remove sys.path.append hack once packaging is in place!
    import os, sys
    currentdir = os.path.dirname(os.path.realpath(__file__))
    parentdir = os.path.dirname(currentdir)
    sys.path.append(parentdir)
    
    from toolkit.startup import startup
    from toolkit.lofar2_config import configure_logging
    
    
    def start_device(device: str):
        '''
        Start a Tango device with the help of the startup function.
        The device will not be forced to got through
        OFF/INIT/STANDBY/ON but it is assumed that the device is in OFF
        state.  If the device is not in OFF state, then an exception
        will be raised.
        '''
        dev = startup(device = device, force_restart = False)
        state = device.state()
        if state is not tango._tango.DevState.ON:
            raise Exception("Device \"{}\" is unexpectedly in \"{}\" state but it is expected to be in \"{}\" state.  Please check the reason for the unexpected device state.  Aborting the start-up procedure.".format(device, state, tango._tango.DevState.ON))
        return device
    
    
    def lts_cold_start():
        '''
        What is this?
        This is the LTS (LOFAR Test - and I forgot what S stands for) cold start
        procedure cast into source code.  The procedure can be found there:
        https://support.astron.nl/confluence/display/L2M/LTS+startup+procedure
    
        Paulus wrote already a script that - illegally ;) - makes direct use of the
        OPC-UA servers to accomplish the same thing that we are doing here.
        Paulus' script can be found there:
        https://git.astron.nl/lofar2.0/pypcc/-/blob/master/scripts/Startup.py
        Thanks, Paulus!  You made it very easy for me to cobble together this
        script.
    
        For obvious reasons is our script much better though.  :)
        First, it is bigger.  And bigger is always better.
        Then it is better documented but that does not count in the HW world.
        But it also raises exceptions with error messages that make an attempt to
        help the user reading them and shuts down the respective Tango device(s) if
        something goes south.
        And that is where we try to do it really right:  there is no reason to be
        excessively verbatim when things work like they are expected to work.  But
        tell the user when something goes wrong, give an indication of what could
        have gone wrong and where to look for the problem.
    
        Again, Paulus' script contains already very good indications where problems
        might lie and made my job very easy.
    
        No parameters, parameters are for wimps.  :)
        '''
        # Define the LOFAR2.0 specific log format
        configure_logging()
    
        # Get a reference to the RECV device, do not
        # force a restart of the already running Tango
        # device.
        recv = startup("LTS/RECV/1")
    
        # Getting CLK, RCU & RCU ADCs into proper shape for use by real people.
        #
        # The start-up needs to happen in this sequence due to HW dependencies
        # that can introduce issues which are then becoming very complicated to
        # handle in SW.  Therefore to keep it as simple as possible, let's stick
        # to the rule recommended by Paulus:
        # 1 CLK
        # 2 RCU
        # 3 RCU ADCs
        #
        #
        # First take the CLK board through the motions.
        # 1.1 Switch off CLK
        # 1.2 Wait for CLK_translator_busy_R == True, throw an exception in timeout
        # 1.3 Switch on CLK
        # 1.4 Wait for CLK_translator_busy_R == True, throw an exception in timeout
        # 1.5 Check if CLK_PLL_locked_R == True
        # 1.6 Done
        #
        #
        # Steps 1.1 & 1.2
        recv.CLK_off()
        # 2021-04-30, Thomas
        # This should be refactored into a function.
        timeout = 10.0
        while recv.CLK_translator_busy_R is True:
            logging.debug("Waiting on \"CLK_translator_busy_R\" to become \"True\"...")
            timeout = timeout - 1.0
            if timeout < 1.0:
                # Switching the RECV clock off should never take longer than
                # 10 seconds.  Here we ran into a timeout.
                # Clean up and raise an exception.
                recv.off()
                raise Exception("After calling \"CLK_off\" a timeout occured while waiting for \"CLK_translator_busy_R\" to become \"True\".  Please investigate the reason why the RECV translator never set \"CLK_translator_busy_R\" to \"True\".  Aborting start-up procedure.")
            sleep(1.0)
    
        # Steps 1.3 & 1.4
        recv.CLK_on()
        # Per Paulus this should never take longer than 2 seconds.
        # 2021-04-30, Thomas
        # This should be refactored into a function.
        timeout = 2.0
        while recv.CLK_translator_busy_R is True:
            logging.debug("After calling \"CLK_on()\"  Waiting on \"CLK_translator_busy_R\" to become \"True\"...")
            timeout = timeout - 1.0
            if timeout < 1.0:
                # Switching theRECV clock on should never take longer than
                # a couple of seconds.  Here we ran into a timeout.
                # Clean up and raise an exception.
                recv.off()
                raise Exception("After calling \"CLK_on\" a timeout occured while waiting for \"CLK_translator_busy_R\" to become \"True\".  Please investigate the reason why the RECV translator never set \"CLK_translator_busy_R\" to \"True\".  Aborting start-up procedure.")
            sleep(1.0)
    
        # 1.5 Check if CLK_PLL_locked_R == True
        # 2021-04-30, Thomas
        # This should be refactored into a function.
        clk_locked = recv.CLK_PLL_locked_R
        if clk_locked is True:
           logging.info("CLK signal is locked.")
        else:
            # CLK signal is not locked
            clk_i2c_status = recv.CLK_I2C_STATUS_R
            exception_text = "CLK I2C is not working.  Please investigate!  Maybe power cycle subrack to restart CLK board and translator.  Aborting start-up procedure."
            if i2c_status <= 0:
                exception_text = "CLK signal is not locked.  Please investigate!  The subrack probably do not receive clock input or the CLK PCB is broken.  Aborting start-up procedure."
            recv.off()
            raise Exception(exception_text)
        # Step 1.6
        # Done.
    
        # 2 RCUs
        # If we reach this point in the start-up procedure, then the CLK board setup
        # is done.  We can proceed with the RCUs.
        #
        # Now take the RCUs through the motions.
        # 2.1 Set RCU mask to all available RCUs
        # 2.2 Switch off all RCUs
        # 2.3 Wait for RCU_translator_busy_R = True, throw an exception in timeout
        # 2.4 Switch on RCUs
        # 2.5 Wait for RCU_translator_busy_R = True, throw an exception in timeout
        # 2.6 Done
        #
        #
        # Step 2.1
        # We have only 8 RCUs in LTS.
        recv.RCU_mask_RW = [True, ] * 8
        # Steps 2.2 & 2.3
        recv.RCU_off()
        # 2021-04-30, Thomas
        # This should be refactored into a function.
        timeout = 10.0
        while recv.RCU_translator_busy_R is True:
            logging.debug("Waiting on \"RCU_translator_busy_R\" to become \"True\"...")
            timeout = timeout - 1.0
            if timeout < 1.0:
                # Switching the RCUs off should never take longer than
                # 10 seconds.  Here we ran into a timeout.
                # Clean up and raise an exception.
                recv.off()
                raise Exception("After calling \"RCU_off\" a timeout occured while waiting for \"RCU_translator_busy_R\" to become \"True\".  Please investigate the reason why the RECV translator never set \"RCU_translator_busy_R\" to \"True\".  Aborting start-up procedure.")
            sleep(1.0)
    
        # Steps 2.4 & 2.5
        # We leave the RCU mask as it is because it got already set for the
        # RCU_off() call.
        recv.RCU_on()
        # Per Paulus this should never take longer than 5 seconds.
        # 2021-04-30, Thomas
        # This should be refactored into a function.
        timeout = 5.0
        while recv.RCU_translator_busy_R is True:
            logging.debug("After calling \"RCU_on()\"  Waiting on \"RCU_translator_busy_R\" to become \"True\"...")
            timeout = timeout - 1.0
            if timeout < 1.0:
                # Switching the RCUs on should never take longer than
                # a couple of seconds.  Here we ran into a timeout.
                # Clean up and raise an exception.
                recv.off()
                raise Exception("After calling \"RCU_on\" a timeout occured while waiting for \"RCU_translator_busy_R\" to become \"True\".  Please investigate the reason why the RECV translator never set \"RCU_translator_busy_R\" to \"True\".  Aborting start-up procedure.")
            sleep(1.0)
        # Step 2.6
        # Done.
    
        # 3 ADCs
        # If we get here, we only got to check if the ADCs are locked, too.
        # 3.1 Check RCUs' I2C status
        # 3.2 Check RCU_ADC_lock_R == [True, ] for RCUs that have a good I2C status
        # 3.3 Done
        #
        #
        # Steps 3.1 & 3.2
        rcu_mask = recv.RCU_mask_RW
        adc_locked = numpy.array(recv.RCU_ADC_lock_R)
        for rcu, i2c_status in enumerate(recv.RCU_I2C_STATUS_R):
            if i2c_status == 0:
                rcu_mask[rcu] = True
                logging.info("RCU #{} is available.".format(rcu))
                for adc, adc_is_locked in enumerate(adc_locked[rcu]):
                    if adc_is_locked < 1:
                        logging.warning("RCU#{}, ADC#{} is unlocked.  Please investigate!  Will continue with normal operation.".format(rcu, adc))
            else:
                # The RCU's I2C bus is not working.
                rcu_mask[rcu] = False
                logging.error("RCU #{}'s I2C is not working.  Please investigate!  Disabling RCU #{} to avoid damage.".format(rcu, rcu))
        recv.RCU_mask_RW = rcu_mask
        # Step 3.3
        # Done
    
        # Start-up APSCTL, i.e. Uniboard2s.
        aps = startup("APSCTL/SDP/1")
        logging.warning("Cannot start-up APSCTL because it requires manual actions.")
    
        # Start up SDP, i.e. configure the firmware in the Unibards
        sdp = startup("LTS/SDP/1")
        logging.warning("Cannot start-up SDP because it requires manual actions.")
    
        logging.info("LTS has been successfully started and configured.")
    
    
    if __name__ == '__main__':
        lts_cold_start()