Merge branch 'master' of git.astron.nl:desp/hdl

e1aa9f4b · Pieter Donker · 28941abe · 9c51badc · e1aa9f4b · e1aa9f4b
Commit e1aa9f4b authored 5 years ago by Pieter Donker
--- a/applications/lofar2/doc/prestudy/station2_sdp_hdl_components.txt
+++ b/applications/lofar2/doc/prestudy/station2_sdp_hdl_components.txt
+*******************************************************************************
+* Rx input status:
+*******************************************************************************
+
+* Existing components:
+  - RSP rad_frame_status of the previous PPS sync interval: 
+    . rx_cnt:   18 bits, number Rx frames
+    . brc   :    1 bit,  0 if no Rx frames with CRC error, 1 if >= 1 Rx frames had a CRC error
+    . sync  :    1 bit,  1 if the frame with Rx sync was detected, else 0
+    . align :    1 bit,  1 if all frames aligned OK, else 0
+  
+  - RSP rad_latency:
+    . rx_latency : 16 bit, stores an internal count value when the Rx sync is detected. The internal count
+                           restarts at the PPS sync. This measures the latency in clock cycles.
+                           
+  - APERTIF dp_bsn_monitor
+    . mon_sync_timeout        = '1' when the Rx sync did not occur within 200M cycles since last Rx sync    ~= sync
+    . mon_ready_stable        = '1' when ready was always '1' during last Rx sync interval
+    . mon_xon_stable          = '1' when xon   was always '1' during last Rx sync interval
+    . mon_bsn_at_sync         = BSN at Rx sync
+    . mon_nof_sop             = number of sop during last Rx sync interval             = rx_cnt
+    . mon_nof_err             = number of err at eop during last Rx sync interval     ~= brc
+    . mon_nof_valid           = number of valid during last Rx sync interval
+    . mon_bsn_first           = BSN at first Rx sync     --> not useful
+    . mon_bsn_first_cycle_cnt = latency at first Rx sync --> should use every Rx sync like on RSP
+  
+    ==> Reuse dp_bsn_monitor with improvements:
+    . Monitor the packets per sync interval using Rx sync. This is more precise then using the PPS sync. 
+      The Rx sync based values are only valid if mon_sync_timeout = 0.
+    . Remove mon_bsn_first and mon_bsn_first_cycle_cnt.
+    . Add mon_latency, use PPS sync like in RSP to measure the latency between PPS sync and Rx sync in
+      number of clock cycles.
+  
+
+
 *******************************************************************************
 * DP encoder / decoder
 *******************************************************************************
@@ -81,7 +116,7 @@ The dp_validate_bsn_at_sync function verifies the entire 64 bit sync and BSN in
 remote inputs the BSN can only differ by a limited number dependent on the latency differences between the
 different inputs. Therefore if the input Rx BSN at sync matches the local Station BSN, then for the
 BSN aligner that aligns the inputs based on the BSN it is sufficient to only use a fraction of the BSN.
-Uding the fraction of the BSN as index is suffivient to distinguish between blocks within the maximum BSN
+Using the fraction of the BSN as index is suffivient to distinguish between blocks within the maximum BSN
 latency. If the fraction N is a power of 2 , then only the log2(N) LSbits of the BSN need to be compared
 to ensure that all inputs have the same 64 bit sync and BSN.

@@ -100,11 +135,13 @@ Assumptions:
  . N = 2 inputs aligner with 1 local data and 1   remote data
  . N > 2 inputs aligner with 1 local data and N-1 remote data
  . N >=2 inputs aligner with 0 local data and N   remote data (not used on ring, but was used in APERTIF)
+  . Treat all inputs equal, so no special role for a local input to suit more general usage
 - The local sync and BSN sources on all FPGAs are synchronous, to avoid additional BSN latency between inputs.
 - Static input enable or disable via M&C
  - it is possible to enable or disable any combination of inputs
  - if all inputs are disabled then the output stops.
  - if the input enable or disable setting is changed, then the BSN aligner restarts trying to achieve alignment.
+  - disabled inputs are output with zero or flagged data
  - for the ring with 1 local and 1 remote input the static input enable/disable supports the align modes:
    . disabled,
    . local only,
@@ -126,11 +163,15 @@ Assumptions:
    - support dynamic input enable/disable control
 - Only output correct blocks, either with the received input block or with flagged filler block
 - The output passes on the sync and therefore it does not have to pass on the BSN
- The output should support flow control to provide output throttling
+- The output should support flow control to:
+  . smoothen bursts (only an issue with remote drive output)
+  . provide output throttling (requires output FIFOs or data blocks that have sufficient gaps)
 - Stopped input:
-  . If all inputs of the BSN aligner stop, then the output stops.
  . If after some block periods (e.g. g_bsn_latency) there is no more block pending at any input, then the
-    BSN aligner should restart trying to achieve alignment.
+    output stops and the BSN aligner should restart trying to achieve alignment.    
+
+
+

 Notes:
 - In LOFAR and APERTIF the BSN aligner does loose more blocks due to input flush and realign
@@ -147,12 +188,12 @@ Notes:
 Design options:
 - Lost packet detection
  . Rely on next received packet:
-    - check per input that the align BSN increments +1 within the align_sync interval
+    - check per input that the BSN increments +1
    - requires a timeout or overflow detection on other inputs to detect a burst of lost packets
    - after a burst of lost packets, typically the output cannot catch up anymore, so then the BSN aligner
      needs to flush its input buffer and restart.
-  . Per packet using a local output block pacer.
-    The local output block pacer is offset by at least g_bsn_latency relative to the local BSN source, to
+  . Per packet using a local block reference.
+    The local block reference is offset by at least g_bsn_latency relative to the local BSN source, to
    ensure that all inputs should have a new block pending for output. This is possible, because the input
    latencies are static and within a fixed range:
    - in circular buffer the Wr flag for the lost block remains unset
@@ -212,8 +253,8 @@ Design options:
    - undefined
    - forced to zero
    - random with similar noise level,
-    - most negative integer in real data
-    - most negative integer in complex real part and imag part (or use imag part as cause identifier).
+    - flagged data using most negative integer in real data
+    - flagged data most negative integer in complex real part and imag part (or use imag part as cause identifier).

  ==> Design decision:
      - Replace lost blocks by filler blocks, to preserve the nominal output rate
@@ -258,80 +299,145 @@ Design options:
        aligner can also work when there are only remote inputs.

      
-. Define align_sync  
-  -  ...
+      
+. sync aligner instead of BSN aligner
+  - Using the sosi.sync one packet lost causes whole interval lost, this is too much impact.

  ==> Design decision:
-      - Define align_sync to start initial alignment and to avoid need for twice as large input buffer given a 
-        certain BSN latency
-      
-      
-      
-      
-. Initial alignment declaration can be based on:
-  - All active inputs have data pending with the same BSN index (in the same circular buffer slot or at the FIFO output)
-  - If BSN latency number of slots on all inputs got filled, then set the Rd pointer. This requires that all inputs
-    start filling at the same BSN index, because then the input with the lowest latency will get filled first. The
-    Rd pointer is set at the BSN index.
-  - The same slot is filled on all active inputs, this slot index sets the Rd pointer:
-  
-                   t=0        t=1        t=2        t=3        t=4        t=5        t=6        t=7        t=8    
-                                                                            9          10         11         12
-      t=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3
-                                                                                                                  
-        0 1 2 3    W . . .    0 W . .    0 1 W .    . 1 2 W    W . 2 3    0 W . 3    0 1 W .    . 1 2 W    W . 2 3
-        3 0 1 2    . . . W    W . . 3    0 W . 3    . 1 W 3    . . 2 W    W . . 3    0 W . .    . 1 W .    . . 2 W
-        2 3 0 1    . . W .    . . 2 W    W . 2 3    . W 2 3    . . W 3    . . . W    W . . .    . W . .    . . W .
-                                         R            R            R            R    R            R            R      
-                                         
-    If a packet got lost, then the alignment will fail and needs to be restarted.
-  - Align_sync found in same slot on all active inputs, this slot index sets the Rd pointer
-    . Align_sync period:
-      The maximum latency between two inputs is g_bsn_latency. The minimum time between the last align_sync of the
-      previous align_sync interval and the first align_sync in this align_sync interval is align_sync period -
-      g_bsn_latency. Hence if the align_sync period - g_bsn_latency > circular buffer size, then the align_sync in
-      the circular buffer all apply to the same BSN.
-      The period of the align_sync is preferrably a power of two, such that the align_sync can easily be derived
-      from the BSN and such that the align_sync will always occur at first slot of the circular buffer.
-      The 1 s sync interval could be used as align_sync, but in LOFAR2.0 the sync period is not a power of two and
-      differs by 1 per sync interval, so the sync appears at different slots. Furthermore a 1 s period is
-      relatively slow, using a dedicated and much shorter alig_sync period allows fast initial alignment.
-    . Do we need align_sync?
-      - The advantange of using an align_sync is that if the alignment fails in one period, e.g. due to a lost
-        packet, then it will automatically try again in the next interval. Schemes without an
-        align_sync require a restart, because they wait until the buffer has filled sufficiently and need to refill
-        to try again. 
-
-                           t=0        t=1        t=2        t=3        t=3        t=4        t=5    
-      t=0 1 2 3 4 5 6 7  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3
-                                                                                                                                                                                     
-        4 5 6 7 0 1 2 3    W . . .    4 W . .    4 5 W .    4 5 6 W    W 5 6 7    S W 6 7    S 5 W 7
-        3 4 5 6 7 0 1 2    . . . W    W . . 3    4 W . 3    4 5 W 3    4 5 6 W    W 5 6 7    S W 6 7
-        2 3 4 5 6 7 0 1    . . W .    . . 2 W    W . 2 3    4 W 2 3    4 5 W 3    4 5 6 W    W 5 6 7
-                                                                                             R
-      - Without align_sync the buffer would need to be twice as large to ensure unambigous detection of aligment
-      - The align_sync is only used within the BSN aligner.
-      
-  - The input streams have an align_sync that has a period > 2 * g_bsn_latency and that is a power of 2
-    . using a power of 2 implies that the data block at the align_sync will be stored at slot 0.
-    . using an align_sync period > 2 * g_bsn_latency ensures that the sync applies to the same BSN for all inputs
-    . using align_sync << 1 s sync allows faster initial alignment
-    . using align_sync instead of 1 s sync allows using an interval that is a power of 2 so that the align_sync
-      always occurs in slot 0 of a circulare buffer. The 1 s sync can occur in any slot, so then the BSN aligner
-      needs to detect which slot the sync occurred to set the initial Rd pointer.
+      Do not make or use a dp_sync_aligner, because loosing an entire sync interval is not acceptable.
+      
+      
+      
+. Initial alignment:
+  - Assume the received packets on the inputs contain one block per packet.
+    The maximum input latency between blocks from two inputs is g_bsn_latency number of block periods. If the
+    maximum input latency is less than one block period, then use g_bsn_latency = 1. 
+    Assume that all inputs are active and that all inputs start with empty input buffers. At each input packets
+    arrive and fill the input buffers.
+    The minimum size of the total input buffer memory is g_nof_inputs * g_bsn_latency blocks, because then
+    initial alignment can be declared as soon as there is a block pending in the input buffers with the same
+    BSN at all inputs.
+    After initial alignment the alignment can be maintained by using a local block reference to time the
+    subsequent output of aligned input blocks.    
+    As long as at least one input buffer still has blocks then output can continue using filler blocks. The
+    local block reference ensures that the buffers will read empty if they do not get new input, this ensures
+    that any input in the buffers can still be output at the correct instant. If all buffers are empty then
+    input realignment is needed.
+
+  - The initial alignment becomes easier if:
+  
+    . it is not done on the entire 64 bit BSN, but only on a periodic fraction r of the BSN,
+    . the periodic BSN has a period that is a power of 2, so r = BSN[R-1:0].
+    . it is not done on any BSN, but only on a certain periodic BSN marked by an align_sync pulse at r = 0.
+
+    The advantage of using a BSN fraction is that it smaller to handle and that it can be used
+    as index to a  block in the input buffer. The fraction r of the BSN must be unique over the maximum input
+    latency, so r >= g_bsn_latency. The calculation of the BSN fraction r becomes easier, by choosing a fraction
+    that is a power of 2, so r = 2**ceil_log2(g_bsn_latency), to avoid integer division. The BSN fraction then
+    follows directly from the R = log2(r) LSbits of the BSN, so r = BSN[R-1:0]. The advantange of detecting the
+    alignment only at a certain periodic BSN is that the initial block index is then fixed at a certain r,
+    choose r = 0. it is convenient to mark the periodic BSN fraction at r = 0 by a sync pulse that is called the
+    align_sync.
+    The align_sync is only used within the BSN aligner. If alignment fails on an align_sync, due to a lost
+    packet, then the intial alignment retries on the next align_sync. The minimal period of the align_sync
+    must be large enough to ensure that the input buffers will only contain corresponding align_sync and no
+    align_sync from different intervals. Hence the align_sync period must be > BSN latency + buffer size. 
+    Without align_sync the buffer would need to be twice as large to ensure unambigous detection of alignment.
+    The align_sync period must be short enough to have a fast initial alignment. The 1 s sync interval could
+    be used as align_sync, but in LOFAR2.0 the 1 s sync BSN period is not a power of two and differs by 1 per
+    sync interval, so the sync appears at different block indices. Furthermore a 1 s period is relatively long,
+    using a dedicated and much shorter align_sync period allows fast initial alignment.
+    
  - For the ring the latency depends on the number of hops. Therefore require that initial BSN alignment is achieved
    with all active input as defined by M&C, to ensure that the total input latency at each node on the ring is
    determined by the nominal operation.
    
-  
-
-      
- Input FIFO
-  . Blocks are stored in arrival order, therefore the FIFO must pass on the BSN index to be able to align the inputs
-    and to detect lost packets.
-  . The BSN index does not have to be incrementing, but is must be unique per BSN latency interval
-  . The FIFO must pass on the 1 s sync, to allow timestamp recovery from Station BSN.
-  . Flushing:
+  ==> Design decision:
+    - Use input buffer size > g_bsn_latency to compensate for the maximum BSN latency difference between inputs
+    - Use an align_sync period > g_bsn_latency + buffer size to start initial alignment and to ensure, 
+      together with the validation of the BSN at sync, the unambigous detection of input alignment on the same BSN
+   
+   
+. Input buffer type
+  The input buffer can be structured as:
+  
+  - a circular buffer that can be accessed at any address, or
+  - a FIFO buffer that is used first in first out. 
+  
+  In a circular buffer each input block will occupy a slot that is identified by the block index r. For each slot 
+  there is a write (Wr) flag that is set when the block is written and cleared when the block is read for output
+  or discarded. The slots in the circular buffer have the fixed block size, so therefore the sop and eop of the
+  input block do not have to be passed along. Each slot does pass on the 1 s sync. The slot at index r = 0 also
+  passes on the align_sync information. Initial alignment achieved when all inputs have the align_sync at block
+  index r = 0. The read (Rd) pointer starts at r = 0 and increments after every output slot. The Wr flag can be
+  set when the data block write begins, because then the read could already start as well since Wr and Rd run at
+  same rate. A lost packet shows as a slot with the Wr flag unset. Example using align_sync (A) at block 0 and
+  align_sync period of 8 and incrementing Rd pointer:
+
+                         Circular buffer
+        Input blocks     t=0        t=1        t=2        t=3        t=3        t=4        t=5        t=6    
+      t=0 1 2 3 4 5 6 7  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3
+                                                                                                                                                                                                
+        4 5 6 7 A 1 2 3    W . . .    4 W . .    4 5 W .    4 5 6 W    W 5 6 7    A W 6 7    A 1 W 7    0 1 2 W
+        3 4 5 6 7 A 1 2    . . . W    W . . 3    4 W . 3    4 5 W 3    4 5 6 W    W 5 6 7    A W 6 7    0 1 W 7
+        2 3 4 5 6 7 A 1    . . W .    . . 2 W    W . 2 3    4 W 2 3    4 5 W 3    4 5 6 W    W 5 6 7    0 W 6 7
+                                                                                             R            R      
+  
+  In a FIFO buffer each input block is written at the first free location, so in order of arrival. A lost packet
+  does not show in the FIFO, so therefore the block index r needs to be passed along with the block through the
+  FIFO and checked at the output. It is convenient to pass on the sop and eop information with the block through
+  the FIFO, to avoid having to count data within a block. The 1 s sync and the align_sync also need to be passed
+  on with the block. During initial alignment the FIFO is read until the FIFO output has an align_sync pending.
+  Initial alignment achieved when all inputs have the align_sync pending at FIFO output. A lost packet shows as
+  a pending output block with the wrong block index r compared to the local block reference, or as an empty
+  FIFO. Example using align_sync (A) at block 0 and align_sync period of 8 and reading from FIFO output:
+  
+                         FIFO buffer
+        Input blocks     t=0        t=1        t=2        t=3        t=3        t=4        t=5        t=6    
+      t=0 1 2 3 4 5 6 7  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3
+                                                                                                                                                                                                
+        4 5 6 7 A 1 2 3    . . . .    . . . .    . . . .    . . . .    W . . .    A W . .    A 1 W .    1 2 W .
+        3 4 5 6 7 A 1 2    . . . .    . . . .    . . . .    . . . .    . . . .    W . . .    A W . .    1 W . .
+        2 3 4 5 6 7 A 1    . . . .    . . . .    . . . .    . . . .    . . . .    . . . .    W . . .    W . . .
+                                                                                             R          R
+        
+  Both the buffers need to pass on the sync information per block, to allow timestamp recovery from Station BSN 
+  for the BSN aligner output.
+  
+  
+  The aspects of a circular buffer are:
+  - can handle out-of-order data, because it uses the BSN fraction as slot index. However on the ring in SDP all
+    data will be in order.
+  - if initial alignment fails it automatically retries on the next align_sync.
+  - the BSN must be continuous and incrementing, because then the remainder of the BSN / buffer size can
+    be used as Wr pointer. 
+  - to avoid integer division of the Station BSN the buffer size needs to be a power of 2
+  
+  The aspects of a FIFO buffer are:
+  - during initital alignment lost packets on one input will cause other inputs to overflow. 
+  - FIFO overflow can occur due to lost packets on an other input or when the initial alignment started while
+    align_sync from the different inputs are arriving. The align_sync period needs to be large enough to
+    ensure that at a next attempt all align_sync will be for the corresponding BSN. The overflow requires a
+    restart of the initial alignment by flushing the FIFOs.
+  - easy to use a buffer size that is not a power of 2, because the block index is not used as Wr pointer index.
+    Using a buffer size that is not a power of 2 can be significant to save RAM.
+  - Passing on the BSN fraction r via the FIFO may increase the number RAM, dependent on whether the combination
+    of data, sop, eop, sync, align_sync and r just fits in a multiple of the maximum data with of a block RAM
+  - The BSN index does not have to be incrementing, but is must be unique per BSN latency interval
+  
+
+- Input / output control
+  . The BSN aligner can operate independently per input / output. The only interaction between inputs is needed
+    to detect that all inputs have a pending align_sync. The local block reference for block output can be
+    shared for all outputs, or replicated per output.
+  . The 1 s sync does not have to be checked at the outputs, because if the sync is present on one output, then
+    it will be present on all outputs. 
+
+      
+- Flushing:
+  . Circular buffer:
+    - Clearing a Wr flag or all Wr flags is much faster than flush reading a FIFO.
+  . FIFO buffer:
    - flush per packet or flush until empty?
    - flush per input per input or flush all inputs?
    - flush by reading, or by reset or by moving a Rd pointer
@@ -342,43 +448,24 @@ Design options:
      multiple packets pending in the FIFO, that will then be output in a burst. The pending packets that
      corresponded to the lost packet will need to be discarded anyway, because there is no time to output them still.
    - also useful to know BSNs at FIFO inputs? --> No, because FIFO packet count can be used to detect pending FIFO overflow.
-  . Keep FIFOs outside or inside BSN aligner component.
-    - the input of the FIFO is needed to be able to maintain a count of the number of packets in the FIFO, which is
-      relevant for the align timeout. The input eop increments the count and the output eop decrements the count.
-    - inputs with a large latency could use a smaller FIFO, this is easier to control with external FIFOs
+  
+. Keep input buffers outside or inside BSN aligner component.
+  - for inputs with more latency the buffer can be smaller, this is easier to control with external buffers,
+    each input may have different g_bsn_latency, so then each input also has different align timeout, align_sync
+    interval and input FIFO size.
  - if the BSN aligner relies on FIFO input information, then it is better to have the FIFOs inside.
   

- Input circular buffer
-  . can handle data arriving out of order, but this is not needed within SDP
-  . The buffer memory size is g_bsn_latency * g_nof_inputs slots that can store a packet.
-    - the maximum latency between any two inputs must be < g_bsn_latency number of data blocks
-    - For each slot there is a Wr flag that needs to be maintained. The Wr flag can be set when the data block write
-      begins, because then the read could already start as well since Wr and Rd run at same rate.
-    - For each slot there is also a sync flag to pass on the 1 s sync
-  . Can handle out-of-order data, because it uses the BSN as an index. However on the ring in SDP all data will be in order.
-  . The circular buffer could be used as a FIFO with internal access and an incrementing Wr pointer. However it
-    seems better to use it with a Wr pointer that is derived from the BSN.
-  . the BSN must be continuous BSN and incrementing, because then the remainder of the BSN divided by the buffer size can
-    be used as Wr pointer. 
-  . The buffer size is preferrably a power of two, but can be any size (to save memory):
-    - Using a buffer size that is a power of 2 avoids an integer divsion of the BSN, because it can then use the
-      corrsponding LSbits of the BSN as Wr pointer.
-    - Modulo 2**n - 1 can be calculated efficiently for binary numbers, by adding the n-bit digit parts. Similar as
+- Fast integer division
+  . Modulo 2**n - 1 can be calculated efficiently for binary numbers, by adding the n-bit digit parts. Similar as
    mpdulo 3 (= (10-1)/3) can be calculated by adding the decimal digits.
-    - Modulo n for constnat n can be calculated efficiently suing multiplication by 1/n. The 1/n fraction must be 
-      represented with sufficient accuracy to determine the remainder.
-  . The slots in the circular buffer have a Wr flag that is set when the slot is written with an Rx packet and cleared
-    when the slot is read for output.
-  . Flushing:
-    - Clearing a Wr flag or all Wr flags is much faster than flush reading a FIFO.
-  . The Rd pointer increments at every output block period.
-  . The Rd pointer increments after every output slot.
-  . The write pointer always needs to be ahead of the Rd pointer. The minimum distance between the Wr and Rd pointer
-    is g_bsn_latency. The size of the circular buffer is the same for all inputs and must be > g_bsn_latency (for wr)
-    + 1 (for rd). The circular buffer read can occur when the write pointer exceeds rd pointer + g_bsn_latency. 
-  . the circular buffer is part of the BSN aligner component
-  . On CEP the beamlet data is written into a circular buffer based on the time stamp. A flag indicates whether data in the
+  . Modulo n for constant n can be calculated efficiently suing multiplication by 1/n. The 1/n fraction must be 
+    represented with sufficient accuracy to determine the remainder. This implies using a 50 bit multiplier,
+    because the Station BSN is 50 bit.
+
+
+. Cicrular buffers on CEP
+  On CEP the beamlet data is written into a circular buffer based on the time stamp. A flag indicates whether data in the
  circular buffer is valid. The size of the circular buffer is in the order of hundreds of ms to cover the distance latency 
  of the international stations. An array of tupples lists the lenght of continuous blocks in the circular buffer, and 
  therefore also to the gaps. A local timer determines when the circular buffer is read. The local timer has ms accuracy
@@ -386,34 +473,65 @@ Design options:
  also flags the initial channel data that is disturbed after a gap.


-
-
-
-. Circular buffer state machine
+. State machine for circular buffer
    all:
      Receive and monitor input
-      Derive align_sync and Wr pointer from input BSN
+      Derive align_sync from input BSN
+      Derive Wr pointer from input BSN
      Write the input at the slot indexed by the Wr pointer and set the Wr flag for that slot.
    s_xoff:
      Accept static input enable/disable control
-      Clear all Wr flags of the slots to initially align or to realign the inputs.
+      Flush buffer (by clearing all Wr flags) to prepare for realigning the inputs.
      Reset the Rd pointer at the first slot, because the align_sync is defined at slot 0
      --> s_align
    s_align:
      If input control event --> s_xoff
      If for all active inputs the Wr flag is set in slot 0 and slot 0 contains the align_sync then
-        restart a periodic slot pulse to set the pace for outputting the slots. An offset of the slot period
+        restart the local block reference to set the pace for outputting the slots. An offset of the block period
        is used to ensure that in subsequent block periods all inputs will have a pending block --> s_sop
    s_sop
      If input control event --> s_xoff
-      If slot pulse --> s_output
+      If local block reference pulse --> s_output
    s_output:
-      If all Wr flags are unset (empty buffer) --> s_xoff
-      else output one block, clear Wr flag of slot and increment Rd pointer --> s_sop
+      If empty buffer (all Wr flags in entire buffer are unset) --> s_xoff
+      else
+        output one block, use filler data for lost blocks, clear Wr flag of slot and increment Rd pointer --> s_sop


+. State machine for FIFO buffer
+    all:
+      Receive and monitor input
+      Derive align_sync from input BSN
+      Write the input into the FIFO
+    s_xoff:
+      Accept static input enable/disable control
+      Flush buffer (by resetting the FIFOs) to prepare for realigning the inputs.
+      --> s_align
+    s_align:
+      If input control event --> s_xoff
+      If an FIFO is full --> s_xoff
+      If for all active inputs the align_sync is pending at FIFO output then
+        restart the local block reference to set the pace for outputting the slots. An offset of the block period
+        is used to ensure that in subsequent block periods all inputs will have a pending block --> s_sop
+    s_sop
+      If input control event --> s_xoff
+      If local block reference pulse --> s_output
+    s_output:
+      If empty buffer (all FIFOs are empty) --> s_xoff
+      else
+        output one block, use filler data for lost blocks --> s_sop
+
+  ==> Design decision:
+      The circular buffer and FIFO are similar. The slight preference is to use a circular buffer, because it 
+      handles overflow automatically and if the maximum input BSN latency is close to a power of 2, then the
+      RAM usage of the circular buffer is near optimal, because it does not need to pass on the sop, eop and
+      BSN fraction r.
+
+      
+
+Obsolete investigations:
    
-. BSN max/min scheme of dp_bsn_align.vhd core:
+. APERTIF BSN max/min scheme of dp_bsn_align.vhd core:
  - State machine
      WHEN s_xoff =>
        accept input control
@@ -436,128 +554,7 @@ Design options:
  * one packet gets lost, next input arrives within g_sop_timeout --> bsn in range, flush one block from all other inputs

      
-. sync aligner instead of BSN aligner
-  - Using the sosi.sync one packet lost causes whole interval lost, this is too much impact.
-  - Use as much BSN range as necessary. At the end of the range the limited range BSN will wrap. This will cause
-    g_bsn_latency out of 2**c_bsn_align_w possible limited BSN values to fail alignment initially, but for these
-    instants the alignment will be possible some BSN later. Using c_bsn_align_w = ceil_log2(g_bsn_latency) + 2
-    provides sufficient opportunity for BSN alignment at the first sop attempt and certainly at the next.
-  - Instead an internal align_sync can be defined, e.g. with period 2**c_bsn_align_w and starting at sosi.sync
-    . Per input derive align_sync = (sosi.bsn(c_bsn_align_w-1:0)=0 or sosi.sync) and sosi.sop
-    . If a packet is lost during a align_sync interval, then the remaining packets in that sync interval are also
-      lost, but the output can recover at the next sync interval.
-    . At the end of each align_sync interval go via s_align, to avoid having to check for BSN wrap and to reconfirm
-      the that all enabled inputs still have the same BSN, also at the sosi.sync
-    . The align_sync interval does not fit in the sosi.sync interval due to 195312.5. This can be coped with by
-      going via s_align first in case in s_sop the sop is there.
-      
-      align_bsn    0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0           correct input
-      align_sync   1               1               1
-      
-      align_bsn    0 1 2 - 4 5 6 7 0 1 2 3 4 5 6 7 0           lost packet at bsn = 3, detected by increment 2 /= 1 in s_sop
-      align_sync   1               1               1
-                   o o o x x x x x o o o o o o o o o           recover output in next align_sync interval
-      
-      align_bsn    0 1 2 3 4 5 6 - 0 1 2 3 4 5 6 7 0           lost packet at bsn = 7, detected by any align_sync in s_sop
-      align_sync   1               1               1
-                   o o o o o o o x o o o o o o o o o           recover output in next align_sync interval by all align_sync in s_align
-                   
-      align_bsn    0 1 2 3 4 5 6 7 - 1 2 3 4 5 6 7 0           lost packet at bsn = 0 and sync, detected by any align_sync in s_sop
-      align_sync   1               -               1
-                   o o o o o o o o x x x x x x x x o           recover output in next align_sync interval by all align_sync in s_align
-                   
-      align_bsn    0 1 2 3 - - - - - 1 2 3 4 5 6 7 0           lost packet at bsn = 4,5,6,7,0 detected by align timeout in s_sop
-      align_sync   1               1               1
-                   o o o o x x x x x x x x x x x x o           recover output in next align_sync interval
-
-    - Minimum number of blocks per sync interval.
-      . the align_sync interval has n = 2**c_bsn_align_w and must be > 2*g_bsn_latency
-      . n=1 block per sync interval:
-        If one packet gets lost then packets from different sync intervals get aligned, as indicated by 1'. This
-        is not detected because the bsn always remains 0 (so no increment) and if the align timeout is based on
-        maximum number of packets in any input FIFO, then the maximum = 1. the number of packets in the FIFO then
-        only becomes > 1 if two or more packets (so two sync intervals) get lost. A solution would be to
-        instead use a align timeout in number of clock cycles, so as a true timeout.
-        
-        align_bsn    0               0               0           correct input
-        align_sync   1               1'              1
-                     o               o               0
-                     
-        align_bsn    0               -               0
-        align_sync   1               -               1'          packet count <= 1: misaligned output at 1' align_sync
-                     1               1               1           packet timeout: recover output at next align_sync
-                     o               x               o           lost packet at bsn 0
-                                     <---> true timeout
-                   
-      . n=2 blocks per sync interval:
-        If one packet gets lost then the next packet will have the same BSN, so no increment. The aligner will
-        then recover at the next align_sync. If two or more packets get lost, then the increment can be 0 or 1,
-        but in any case the other input FIFO will fill with more than 2 packets, so then the align timeout in
-        s_sop will occur. The align timeout is based on > n/2 = 1 packets in the FIFO.
-      
-        align_bsn    0       1       0       1       0           correct input
-        align_sync   1               1               1
-                     o       o       o       o       o
-      
-        align_bsn    0       -       0       1       0           lost packet at bsn 1, detected by increment 0 /= 1 in s_sop
-        align_sync   1               1               1           recover output at next align_sync interval
-                     o       x       o       o       o
-
-        align_bsn    0       1       -       1       0           lost packet at bsn 0 and sync, detected by any align_sync or align timeout in s_sop
-        align_sync   1               -               1           recover output at next align_sync interval
-                     o       o       x       x       o
-
-        align_bsn    0       -       -       1       0           lost packet at bsn 1,0 and sync, detected by any align_sync or align timeout in s_sop
-        align_sync   1               -               1           recover output at next align_sync interval
-                     o       x       x       o       o
-      
-      . n=3 blocks per sync interval:
-        If one packet gets lost then the other input will get 2 packets, which is more than n/2 = 1. The aligner will
-        then recover at the next align_sync.
-      
-        align_bsn    0    1    2     0    1    2     0           correct input
-        align_sync   1               1               1
-                     o    o    o     o    o    o     o
-                                                 
-        align_bsn    0    -    2     0    1    2     0           lost packet at bsn 1, detected by increment 0 /= 1 in s_sop
-        align_sync   1               1               1           recover output at next align_sync interval
-                     o    x    x     o    o    o     o
-                                                 
-        align_bsn    0    1    -     0    1    2     0           lost packet at bsn 2, detected by increment 0 /= 1 or by any align_sync in s_sop
-        align_sync   1               1               1           recover output at next align_sync interval
-                     o    o    x     o    o    o     o
-                     
-        align_bsn    0    1    2     -    1    2     0           lost packet at bsn 0 and sync, detected by any align_sync or align timeout in s_sop
-        align_sync   1               -               1           recover output at next align_sync interval
-                     o    o    o     x    x    x     o
-                   
-        align_bsn    0    1    2     0    1          0
-        align_sync   1               1               1           recover output at next align_sync interval
-                     o    o    o     x    o    o     o           lost packet at bsn 2
-      
-    . Lost packets are detected by:
-      - idle input                : check align timeout in s_align and in s_sop 
-      - active inputs at bsn 1-max: check bsn increment /= 1 per input in s_sop
-      - active inputs at bsn 0    : check any align_sync in s_sop and then all align_sync in s_align
-      
-      The idle input is detected by the timeout. The active inputs are checked by the bsn increment, but at the
-      align_sync the bsn wraps to 0, so then the active inputs are checked by the all align_sync. The initial
-      alignment was achieved starting with empty input FIFOs, so the align_sync and all subsequent align_sync
-      ensure that all enabled inputs have the same BSN. 
-      
-      define:
-      . align timeout > g_bsn_latency, to ensure that the maximum latency difference between inputs in number
-        of packets can still be aligned
-      . align_sync interval must be > g_bsn_latency, to ensure that align_sync correspond to same BSN on all inputs
-      . choose align timout > g_bsn_latency and choose align_sync interval n = 2**c_bsn_align_w bsn slots, where
-        c_bsn_align_w = ceil_log2(align timeout).
-        However n > g_bsn_latency is sufficient, so n does not have to be a power of 2, but it is convenient to
-        use a power of 2.
-      . input FIFO size > align timeout, to fit align timeout number of packets
-      . each input may have different g_bsn_latency, so then each input also has different align timeout, align_sync
-        interval and input FIFO size.
-      
-    . State machine using input FIFOs
+. Improved state machine using input FIFOs and timeouts
  To initially align or to realign the input FIFOs are read empty in s_xoff. In s_xoff it is also possible
  to change the static input enable/disable control.
  Then in s_align the sync aligner waits for the align_sync on all enabled inputs or an align timeout. The
@@ -630,74 +627,6 @@ Design options:
      align_sync interval.


-Design decisions:
-
-. Probably either circular buffer memory or FIFOs is suitable. For circular buffer the BSN fraction is used as slot
-  index and for the FIFO the BSN index needs to be passed on through the FIFO to compare pending inputs:
-. Support number of inputs >= 2
-. Treat all inputs equal, so no special role for a local input
-  - suits more general usage
-. Use local reference to drive the output block rate:
-  - adds somewhat more latency then using remote input to drive the output, but is necessary avoid extra loss in case
-    of lost packets and to support filler output
-. Support flow control
-  - to smoothen bursts (only an issue with remote drive output)
-  - to provide output throttling (requires output FIFOs or data blocks that have sufficient gaps)
-. Use sosi.sync and sosi.bsn(c_bsn_align_w-1:0) to align BSN
-  - using c_bsn_align_w much smaller than 32 b saves logic and thus eases timing closure
-  - if all enabled input BSN are equal then output
-. Use the align_sync scheme
-  - enable inputs will only be output if they all contain valid data, so packet loss on 
-    a single input will also briefly stop output for all inputs
-  - disabled inputs are output with zero or flagged data
-. Optionally support local input reference that can be used to drive the output BSN, instead of having
-  to use the BSN from the enabled inputs.
-. Optionally support dynamic input enable/disable based on expected number of packets per sync interval to avoid that one 
-  failing remote input causes all outputs to stop. This would be needed for APERTIF correlator input.
-. No need for artifical local block size (like in dp_bsn_align.vhd), because thanks to the CRC checking only 
-  correct packets (content and size) can enter the BSN aligner. Therefore any active input can drive the output.
-. Support static input enable/disable via M&C
-. Support dynamic input enable/disable based on whether the input had lost packets in the previous one or more sync interval.
-  - Maintain packet count per input per sync interval
-. Flagging:
-  - Static disabled inputs carry zero data
-  - Dynamically disabled inputs carry flagged data, using most negative real as flag and imag = 0.
-
-
-*******************************************************************************
-* Rx input status:
-*******************************************************************************
-
-* Existing components:
-  - RSP rad_frame_status of the previous PPS sync interval: 
-    . rx_cnt:   18 bits, number Rx frames
-    . brc   :    1 bit,  0 if no Rx frames with CRC error, 1 if >= 1 Rx frames had a CRC error
-    . sync  :    1 bit,  1 if the frame with Rx sync was detected, else 0
-    . align :    1 bit,  1 if all frames aligned OK, else 0
-  
-  - RSP rad_latency:
-    . rx_latency : 16 bit, stores an internal count value when the Rx sync is detected. The internal count
-                           restarts at the PPS sync. This measures the latency in clock cycles.
-                           
-  - APERTIF dp_bsn_monitor
-    . mon_sync_timeout        = '1' when the Rx sync did not occur within 200M cycles since last Rx sync    ~= sync
-    . mon_ready_stable        = '1' when ready was always '1' during last Rx sync interval
-    . mon_xon_stable          = '1' when xon   was always '1' during last Rx sync interval
-    . mon_bsn_at_sync         = BSN at Rx sync
-    . mon_nof_sop             = number of sop during last Rx sync interval             = rx_cnt
-    . mon_nof_err             = number of err at eop during last Rx sync interval     ~= brc
-    . mon_nof_valid           = number of valid during last Rx sync interval
-    . mon_bsn_first           = BSN at first Rx sync     --> not useful
-    . mon_bsn_first_cycle_cnt = latency at first Rx sync --> should use every Rx sync like on RSP
-  
-    ==> Reuse dp_bsn_monitor with improvements:
-    . Monitor the packets per sync interval using Rx sync. This is more precise then using the PPS sync. 
-      The Rx sync based values are only valid if mon_sync_timeout = 0.
-    . Remove mon_bsn_first and mon_bsn_first_cycle_cnt.
-    . Add mon_latency, use PPS sync like in RSP to measure the latency between PPS sync and Rx sync in
-      number of clock cycles.
-      
-

 *******************************************************************************
 * Reorder 

--- a/applications/lofar2/doc/prestudy/station2_to_do_erko.txt
+++ b/applications/lofar2/doc/prestudy/station2_to_do_erko.txt
@@ -79,8 +79,10 @@ touch .gitignore       # create .gitignore if it does not already exist
 .gitignore # file with working tree dirs and files to ignore, must also be commited

 # To start a repo
+cd ~/git
 git init  # start new repo at this dir, creates .git/
 git clone # get and start with existing repo
+git clone git@git.astron.nl:desp/args.git

 git status # what is in stage area and what is modified

@@ -530,5 +532,18 @@ PDR:
    - Uniboard2 firmware is an L4 product?
  . Detailed design of the firmware product

-
- 
\ No newline at end of file
+                       .
+                    RO .   SDC
+   --------------------.----------
+   | Operations        .         |
+   | ------------------.-------  |
+   | |     Telescope   .      |  |
+   | |     Manager     .      |  |
+   | |             ----.------|  |
+   | |             |   LEI    |  |
+   | |-----------------.------|  |
+   | |         |      |.      |  |
+   | |Station  |  CEP |. SDOS |  |
+   | |         |      |.      |  |
+   --------------------.----------
+                       .
\ No newline at end of file