Merge branch 'master' of git.astron.nl:desp/hdl

03350b9b · Pieter Donker · a77fd9e0 · ba650ee0 · 03350b9b · 03350b9b
Commit 03350b9b authored Aug 5, 2020 by Pieter Donker
--- a/.gitignore
+++ b/.gitignore
@@ -3,4 +3,5 @@ build/*
 *generated/
 *.pyc
 *.qsys_edit
+*.kate-swp
 transcript
--- a/applications/lofar2/doc/prestudy/desp_howtools_erko.txt
+++ b/applications/lofar2/doc/prestudy/desp_howtools_erko.txt
@@ -7,9 +7,12 @@
 * Latex
 * Markdown
 * Vi
+* Remote access
 * Screen to run a terminal session without ssh connection
 * Quartus Qsys IP files in GIT
 * Quartus version
+* Linux
+


 *******************************************************************************
@@ -374,12 +377,60 @@ Table:
 | row text | row text | row text|
 | row text | row text | row text|

-vi
+*******************************************************************************
+* vi
+*******************************************************************************
+
 "Replaced Windows LFCR by Linux LF to avoid ^R at end of line in vi. Removed trailing spaces."
 - in gvim replace \r --> nothing
 - in uex save new file as Linux, save as, menu edit/preferences/line end


+*******************************************************************************
+* Remote access
+*******************************************************************************
+
+* RDP:
+  Instead of Remote Desktop (RDP) use Remmina. It may be necessary to use:
+  
+  > ssh -t -L 5900:localhost:5900 -C dop466 'x11vnc -localhost -display :0'
+
+  en daarna:
+  
+  > remmina &
+  
+  Note: Windows NTSERVER65 has IP: 10.87.3.165
+
+* For ssh access from home without manual hop via the kooistra@portal.astron.nl, put this in $HOME/.ssh/config:
+
+Host *
+    #User kooistra
+    ServerAliveInterval 60
+    ServerAliveCountMax 30
+    TCPKeepAlive yes
+    ForwardAgent yes
+    ForwardX11 yes
+    ForwardX11Trusted yes
+    Port 22
+    Protocol 2
+    Compression yes
+Host astron
+    User kooistra
+    HostName portal.astron.nl
+Host dop428
+    User hiemstra
+    ProxyCommand ssh -q -A astron netcat 10.87.0.228 22
+Host dop421
+    User hiemstra
+    ProxyCommand ssh -q -A astron netcat 10.87.0.221 22
+Host dop36
+    User hiemstra
+    ProxyCommand ssh -q -A astron netcat 10.87.2.36 22
+Host dop421
+    User hiemstra
+    ProxyCommand ssh -q -A astron netcat 10.87.0.221
+
+
 *******************************************************************************
 * Screen to run a terminal session without ssh connection
 *******************************************************************************
@@ -558,3 +609,24 @@ Quartus version meeting minutes 13 may 2020 (RW, LH JH, EK):
 3) UniBoard2c IP was created using Q19.4 by Jonathan, but we need to reconsider going to the latest Quartus version and recreate the IP, when we continue with the pinning and test designs for UniBoard2c


+*******************************************************************************
+* Linux
+*******************************************************************************
+
+dop466 = SSD
+dop466_0 = HDD
+
+> grep -rl 'search text in files' .  # -r for recursive, -l for only list filename
+
+> sudo -s   # to become root
+> sudo pip install numpy      # to run Python2 library installer as root
+> sudo pip3 install numpy     # to run Python3 library installer as root
+
+> sudo apt-get install pip    # to install Python2 library installer
+
+> ifconfig
+
+apt-get upgrade
+apt-get dist-upgrade
+apt remove
+
--- a/applications/lofar2/doc/prestudy/station2_opc_ua.txt
+++ b/applications/lofar2/doc/prestudy/station2_opc_ua.txt
@@ -10,6 +10,7 @@ OPC-UA = OPC Unified Architecture
 https://opcfoundation.org/
 http://wiki.opcfoundation.org/index.php/UA_Overview
 https://en.wikipedia.org/wiki/OPC_Unified_Architecture
+https://opcfoundation.org/about/opc-technologies/opc-ua/ -- functions in OPC-UA

 - Service oriented architecture (SOA) using asynchronous request/response pattern
 - transport: via TCP in binary or web based

--- a/applications/lofar2/doc/prestudy/station2_sdp_dsp.txt
+++ b/applications/lofar2/doc/prestudy/station2_sdp_dsp.txt
@@ -18,6 +18,15 @@ The prototype FIR filter can be regarded as a window function. For a static FFT

 For the oversampling PFB the FFT is calculated every M input samples, where the oversampling factor R_os = N_fft / M. Note that the oversampling PFB increases the subband sample rate f_sub_os = f_sub * R_os, but not the subband frequency grid. The subband frequency grid is n * f_sub, for any R_os, because the downsampling factor N_fft is the same for any R_os.

+Fsub
+- spectral inversion
+- wpfb_unit_dev : g_wpfb, fft_r2_pipe
+- The tb_tb_wpfb_unit_wide verifies multiple variations of wpfb_unit_dev.
+- Try one instance so that the FIR coef are use for all streams.
+- mms_dp_gain_serial_arr : calibrate subband weights
+- select subbands before or after calibrate weights, dp_switch needs 1 cylce gap between blocks.
+- SST outside wpfb
+- use MM master mux to select between MM access and UDP offload, when UDP offload is enabled then do not do MM access.


 *******************************************************************************

--- a/applications/lofar2/doc/prestudy/station2_sdp_firmware_design.txt
+++ b/applications/lofar2/doc/prestudy/station2_sdp_firmware_design.txt
@@ -8,6 +8,8 @@ The System Engineering breaks up the product into sub products until the sub pro
 <-- Design decisions for product ADD
  + = Decision document for product ADD
 --> Sub products in ADD
+  * = Sub Product ADD
+  etc.

 * L2 Station ADD
  <-- L2 STAT Design decisions
@@ -151,14 +153,94 @@ Designs:

 References:
 - Preliminary design txt files:
-  . station2_sdp_m_and_c.txt        : Monitoring and control, Gemini protocol
  . station2_sdp_timing.txt         : Station BSN, timestamp definition, BSN aligner
-  . station2_sdp_ring.txt           : ring access, packets for beamlets, crosslets, subbands, TB readout
  . station2_sdp_dsp.txt            : beamformer, subband correlator, transient buffer, transient detection, subband offload
  . station2_sdp_icd.txt            : ICD
  . station2_sdp_hdl_components.txt : rework existing HDL components for LOFAR2.0
  . station2_sdp_hdl_article.txt    : reference article on RTL design using RL = 0, state and pipelining, AXI4 streaming

- Other:
+
+References:
+
+station2_sdp_srs : --> L3 SDP product
+- List of L2 Station requirements (from Station ADD at PDR) that map on SDP
+
+station2_sdp_icd :
+- SE explanation of ICD
+- Lists of items per ICD:
+  --> L2 ICD STAT-NW
+  --> L2 STAT-CEP
+  --> L3 SC-SDP
+  
+args_next_steps
+
+station2_opc_ua  : --> L4 SDP Translator product (almost DONE)
+- OPC-UA standard
+- Architecture of SPD Translator
+- Control points, monitor points, functions (?)
+
+station2_sdp_timing
+--> L2 STAT decisipn Timing in Station (DONE)
+--> L3 SDP decisipn Timing in SDP (DONE)
+
+station2_sdp_ring : --> L5 SDPFW product Ring
+- 10GbE
+- data types: beamlets, subbands offload, crosslets, transient readout
+- use store and forward
+- use raw ETH
+- ring access and transport schemes
+- ring direction per type of data
+- remote and local data alignment
+- packet sizes, data rates, R_os
+- Beamformer (BF)                       --> L5 SDPFW product BF
+- Subband correlator (XC), X_sq cell    --> L5 SDPFW product XC
+- Subband offload (SO) for AARTFAAC2.0
+- Transient buffer (TBUF) readout       --> L5 SDPFW product TBUF
+- UDP offload resource usage
+
+dupllo_oversampled_subband_filterbank
+--> L2 STAT decision Oversampled Filterbank (L2SDP-64 process review)
+
+station2_sdp_dsp : 
+- Fsub  --> L5 SDPFW product Fsub  (DONE)
+- BF    --> L5 SDPFW product BF
+- TBUF  --> L5 SDPFW product TBUF
+
+station2_sdp_hdl_components : 
+- Rx input status: dp_bsn_monitor with latency monitoring  --> L5 SDPFW product Ring
+- DP encoder / decoder: dp_packet_enc with CRC              --> L5 SDPFW product Ring
+- dp_validate_crc (uses dp_store_and_forward)              --> L5 SDPFW product Ring
+- dp_validate_bsn_at_sync                                  --> L5 SDPFW product Ring
+- BSN aligner dp_bsn_align_v2 --> L5 SDPFW product Ring  --> L6 SDPFW product Ring
+- Fill FIFO --> L5 SDPFW product Ring
+- Reorder with dual page  --> L5 SDPFW product TBUF
+- Synchronous global reset --> L2SDP-61,62
+
+station2_sdp_hdl_article.txt
+- Reference article on RTL design using RL = 0, state and pipelining, AXI4 
+- RL 0 development article and automatic pipelining tools
+
+station2_semi_float32
+- 32b statistics in LOFAR1, use 64b in LOFAR2.0
+
+station2_sdp_m_and_c :
+- M&C explanation  --> ICD SC-SDP
+- Update beamlet weights --> L2 STAT decision beamlet weigths
+- Monitoring interval:
+  . asynchronous
+  . every sync
+  . at single BSN scheduler event
+  . at periodic BSN scheduler event
+- Requirements: self-test, health-test, operationele aspecten
+- List of APERTIF registers (asynchronous, synchronous, single page, dual page)
+- UDP, TCP, sockets  --> L3 SDP decision FPGA M&C protocol
+
+WP 5 SDP plan:  --> https://support.astron.nl/confluence/display/STAT/WP-5+SDP
+- station2_sdp_firmware_planning : about planning, SDP planning and tasks, LTS, DTS, PTS, UniBoard2c
+- station2_sdp_deliverables list of WP 5 SDP deliverables in ASCII from WP 5 SDP plan
+- UniBoard2c planning : L2SDP-42
+
+Other:
  . tools/oneclick/doc/desp_firmware_dag_erko.txt
  . tools/oneclick/doc/desp_firmware_overview.txt
+  . desp_howtools_erko.txt
\ No newline at end of file
--- a/applications/lofar2/doc/prestudy/station2_sdp_firmware_planning.txt
+++ b/applications/lofar2/doc/prestudy/station2_sdp_firmware_planning.txt
@@ -2,8 +2,8 @@
 * Rules
 *******************************************************************************

-1) Continuously plan increment of 4 sprint ahead
-   After initial planning for thge whole project (at PDR) it remains necessary
+1) Continuously plan increment of 4 sprints (= 1 increment)  ahead
+   After initial planning for the whole project (at PDR) it remains necessary
   to keep on adapting / fine tuning the planning per quarter, so about 4
   sprints ahead. This concerns not only time but also expectations, interfaces
   and work
@@ -51,7 +51,7 @@ This then means that with the SDP work starting 1 jan 2020 it can complete mid
 1) Lab Test Station (LTS) - First-light Mai 2020
 Objectives: Verification of (parts of) individyual elements and their 
            interfaces
- 1 UniBoard2 Rev 2 (use different FPGA on same UniBoard for FW, SW tests)
+- 1 UniBoard2b Rev 2 (use different FPGA on same UniBoard for FW, SW tests)
 Setups for:
 - SW
 - FW
@@ -63,7 +63,7 @@ Setups for:
 Objectives: Verify that a complete signal chain using the first iteration of
            L3 hardware design shows no serious issues and that it can be
            reliably installed in a LOFAR station.
- 1 UniBoard2 Rev 2
+- 1 UniBoard2c Rev 3a
 - First iteration of electronic boards --> 2 UniBoard2 Rev 3a

 3) Prototype Test Station (PTS) - First-light Mai 2021
@@ -71,7 +71,7 @@ Objectives: Verify Station L2 requirements through testing and analysis, and
            provide evidence to the CDR review panel that the designs ensure
            compliance with all L2 requirements.
 - Second iteration of electronic boards --> 4 UniBoard2 Rev 3b
- 4 UniBoard2 Rev 3b in two subracks (one for LBA with 32 RCU2, one for HBA
+- 4 UniBoard2c Rev 3b in two subracks (one for LBA with 32 RCU2, one for HBA
  with 32 RCU2)
 - Output to CEP for correlation with other stations

@@ -586,6 +586,10 @@ all    12-2021  CDR       M Complete SDP document package for Station CDR
  So the difference is -10 weeks, which means that the 2019 PDR is about -10
  / 230 = 5 % more time then the 2018 AAD estimate.

+- 2020-jul
+  Planning differences occur due to:
+  - pre PDR work was not budgetted (L2 Station work)
+  - SC-SDP Translator work was not budgetted
  
 *******************************************************************************
 * SDP effort estimates in LOFAR2.0 Station WP5 (since jan 2020)

--- a/applications/lofar2/doc/prestudy/station2_sdp_hdl_components.txt
+++ b/applications/lofar2/doc/prestudy/station2_sdp_hdl_components.txt
@@ -660,10 +660,33 @@ Obsolete investigations:


 *******************************************************************************
-* Reorder 
+* Reorder (RP1399)
 *******************************************************************************
-    . Page swap (needed for TB)
-    . Variable output size
+
+ APERTIF needs reorder_matrix to implement R_sub, because it has 8 inputs and 16 outputs:
+ - The 8 inputs are subbands from signal inputs A,B on P_sub = 4 streams and from signal inputs C,D on P_sub = 4 streams.
+ - The 16 outputs are subbands from A,B,C,D on 16 streams, so each stream has 1/16-th of the total band.
+ 
+    reorder_matrix (= ss_parallel):
+    g_nof_inputs        g_nof_internals      g_nof_outputs
+	                    = g_wb _factor
+    --> reorder_row --> reorder_col_wide --> reorder_row -->
+	
+ APERTIF uses reorder_col_wide for R_beam, to select and replicate the 6 subbands per 40 beams	
+ APERTIF uses reorder_matrix for R_beamout, to redistribute the output beamlets
+
+ For LOFAR2.0 SDP R_beam (and R_beamout) are not needed, because SDP only makes one set of beamlets, equivalent to one 
+ compound beam in APERTIF.
+ For LOFAR2.0 SDP only reorder_col_wide is needed to implement Rsub. because it has the same number of S_PN/Q_fft = 6 
+ inputs and 6 outputs, and because the outputs remain on a single node so no need to distribute per band.
+ 
+ reorder_col_wide contains g_wb_factor instances of reorder_col
+ reorder_col:
+   --> reorder_store --> u_store_buf = common_paged_ram_r_w --> reorder_retrieve -->
+                  MM --> u_select_buf = common_ram_crw_crw  -->
+ New features for SDP:
+   . Page swap for u_select_buf (needed for TB)
+   . Variable output size ?

    
 *******************************************************************************

--- a/applications/lofar2/doc/prestudy/station2_sdp_icd.txt
+++ b/applications/lofar2/doc/prestudy/station2_sdp_icd.txt
@@ -51,19 +51,9 @@ UDP link control
 > ping <IP address>  # to find MAC address for IP address ?


-###################################################################################################
-# L1 ICD 11109 STAT-CEP

-Included:
- A) Beamlet data
- B) Transient buffer read out
- 
-Not included:
- . SST, BST, XST, because these are for monitoring and calibration, not for science data
- . Subband offload for AARTFAAC2.0 will have own EICD
-
-
-LFAA-CSP_Low : OSI (Open Systems Interconnection) layers
+###################################################################################################
+# LFAA-CSP_Low : OSI (Open Systems Interconnection) layers

 7 Application  : Not applicable, this is the level where the STAT and CEP products each perform their
                 allocated functions.
@@ -107,67 +97,250 @@ LFAA-CSP_Low : OSI (Open Systems Interconnection) layers
 1 Physical  :
  - Ethernet standard [IEEE Std 802.3-2015], 40 GbE

+###################################################################################################
+# APERTIF / ARTS
+
+# APERTIF BF
+
+  47b reserved
+   1b sync
+  64b BSN
+
+# APERTIF X
+
+  1 Byte, Marker (= 120)
+  1 Byte, Version (= 1)
+  2 Byte, Beamlet index (0-16383)
+  2 Byte, Channel index (0-63)
+  2 Byte, Reserved
+  8 Byte, Timestamp
+  24 Byte, Flags (0-23, 1 bit per pol/dish)
+  
+  --> Total 40 Bytes
+
+# Arts SC2
+
+  1 Byte, Marker (= 120)
+  1 Byte, Version (= 1)
+  1 Byte, Source id (0-15 from 16 UniBoard2)
+  1 Byte, TAB index (0-11)
+  2 Byte, number of channels per block
+  2 Byte, Number of blocks per packet
+  8 Byte, Timestamp
+  24 Byte, Flags (0-23, 1 bit per pol/dish)
+  
+  --> Total 40 Bytes
+
+# Arts SC3,4
+
+  1 Byte, Marker (= D0,1,2,3, E0,1,2,3)
+  1 Byte, Version (= 1)
+  1 Byte, CB index (0-39)
+  1 Byte, TAB index (0-11)
+  2 Byte, Channel index (0-1535)
+  2 Byte, Application payload size (6250, 8000 bytes)
+  8 Byte, Timestamp
+  1 Byte, Sequency number 
+  7 Byte, Reserved
+  24 Byte, Flags (0-23, 1 bit per pol/dish)
+  
+  --> Total 48 Bytes
+
+
+###################################################################################################
+# LOFAR_ASTRON_ICD_009_RSP_CEP
+
+1 Byte, VERSION_ID
+2 Byte, SOURCE_INFO : BM 2b, 160/200M 1b, payload OK 1b, RSP_ID 5b
+1 Byte, CONFIGURATION_ID
+2 Byte, STATION_ID
+1 Byte, NOF_BEAMLETS_PER_BANK
+1 Byte, NOF_BLOCKS
+4 Byte, TIMESTAMP
+4 Byte, BSN
+
+--> Total 16 Bytes
+
+###################################################################################################
+# L1 ICD 11109 STAT-CEP
+
+Included:
+ A) Beamlet data
+ B) Transient buffer read out
+ 
+Not included:
+ . SST, BST, XST, because these are for monitoring and calibration, not for science data
+ . Subband offload for AARTFAAC2.0 will have own EICD

 A) STAT-CEP Beamlet data interface:

- VERSION_ID 8b
+Do use:
+- marker  like in APERIF, ARTS and SPEAD (magic)
+- version like in APERIF, ARTS and SPEAD
+- source info that relates to SDP setting
+- observation_id to relate to parset with telescope setting
+- station_id
+- nof beamlets per block to allow 
+- nof blocks per packet to optimize use of jumbo frames
+
+LOFAR1 supported beamlet bit modes 16b, 8b and 4b by packing these beamlets into 16b, so 1 16b, 2 8b or 4 4b beamlet
+values per 16b word. This creates 1, 2, or 4 beamsets called banks. Packing the beamlets from different
+beamsets per 16b word made sense because these banks were already distinghuised at the AP and on the ring.
+Default LOFAR2.0 only has 8b mode and 1 beam set of S_sub_bf = 488 beamlets. Future LOFAR2.0 could support more
+beamsets. At the PN and on the ring these beamsets would be treated indepently and will always use W_beamlet_sum
+= 18b indepent of the beamlet bit mode. Therefore in LOFAR2.0 packing different beamsets per 8b word in a payload makes
+less sense. Instead it is better to use more blocks per packet to have sufficiently large payload for the 4b and 2b
+beamlet bit modes. Hence in LOFAR2.0 the extra beamlets that can be transported for the lower beamlet bit modes are
+treated as independent beamsets, similar as an extra 8b beam set. The beamlet index follows from:
+
+  bf[set]_[t][blk][blet][pol] --> global beamlet index = set * 488 + blet
+ 
+   where t is the BSN of the first block in the packet and blk is the block index in the packet, and
+   there are NOF_BLOCK_PER_PACKET.
+
+Hence in LOFAR2.0 it is then possible to output e.g. only one 4b beamlet bit mode beam set to CEP, instead of two,
+because the 4b are packed into payloads per beam set, not from different beamsets. 
+
+LOFAR1 used 4 lanes to output the beamlets. Each lane carried 1/4 of the beamlets, to beamlets i:4:244 on lane i.
+These lanes are usefull to distribute the beamlets to different processing destinations at CEP. For Cobalt it would
+be optimum to have 22 destinations, because it has 22 processing input nodes. The beamlet index follows from:
+
+  bf[set][lane]_[t][blk][bl][pol] --> global beamlet index = set * 488 + bl * 4 + lane
+  
+    where lane = 0:3, bl = 0:121  (488/4 = 122)
+	
+In LOFAR2.0 the number of lanes does not depend on the number of physical lanes on the ring. Therefore the number
+of lanes can be different than 4. With S_sub_bf = 488 = 2*2*2*61 there can be 1, 2, 4 or 8 lanes with equal number
+of beamlets per lane, respectively 488, 244, 122 or 61. With less beamlets per lane the NOF_BLOCK_PER_PACKET needs
+to be increased to have sufficiently large packets (< 9000 octets). However other number of lanes are feasible too,
+but will result in different number of beamlets per packet. For example using 22 lanes yields 18 * 22 + 4 * 23 = 
+488, so 18 streams with 22 beamlets per packet and 4 streams with 23 beamlets per packet.
+
+In LOFAR1 the 4 lanes are physical and used in a staggered way such that each lane has its own beamformer that 
+outputs on a different RSP board. In LOFAR2.0 only one UniBoard2 PN does the output, so the lanes cannot be
+staggered. Therefore the distribution over the lanes is done in the final PN and requires internal RAM to be
+able to assemble the beamlet output packet for each lane, before they can be send. This requires a double buffer,
+so about 2 * NOF_LANES * packet size number of octets of RAM. For example 2 * 16 * 8kB / 2 kB = 128 M20K BRAMs.
+The FPGA has 2713 M20k, so this is 128/2713 ~= 5% of the internal BRAM resources.
+
+The total number of streams to CEP then becomes NOF_BEAMSETS * NOF_LANES.
+ 
+- 1 Byte, MARKER
+  . Like in APERTIF and ARTS, may be useful to quickly recognize the data packet.
+  . Beamlets : 20
+  . Transient: 21
+  . SST      : 22
+  . BST      : 23
+  . XST      : 24
+  
+- 1 Byte, VERSION_ID
  . 2,3,4 for LOFAR1
  . 5 first for LOFAR2.0
  
- SOURCE_INFO 16b
-  . 2b Array ID (core station 1 LBA, 2 HBA, ...)
-  . 1b f_adc = 200 MHz, 160 MHz
-  . 1b critically PFB, oversampled PFB (or p, q for R_os = p/q)
-  . 4b beamlet width in number of bits (default 8 for W_beamlet = 8 bit, instead of BM = beamlet mode)
-  . 5b UniBoard2 FPGA id (16 FPGAs for LBA, 16 for HBA in International Station, instead of RSP ID)
-  . ==> Also beamlet scale setting
-  . ==> Number of antenna in beam (core, LBA, HBA inner to make HBA international look like HBA remote)
-  
- CONFIGURATION_ID 8b (used in LOFAR1? intended to refer to the parset that defines this observation)
-  ==> observation ID 32b
+- 4 Byte, OBSERVATION_ID
+    Instead of CONFIGURATION_ID 8b (used in LOFAR1? intended to refer to the parset that defines this observation)
+    The observation ID provides the hook to information on e.g. RCU mode, f_adc = 200 MHz, 160 MHz, Nyquist zone
+    (0, 1, 2), critically PFB, oversampled PFB, nof antenna in array (core, LBA, HBA inner to make HBA international look like HBA remote), maximum S_ant = 192.
+  . etc

- STATION_ID 16b (idem as LOFAR1)
+- 2 Byte, STATION_ID (idem as LOFAR1)
  ==> or 8b because there are only ~50 stations
-
- One packet per range of Station beamlets out of 488 beamlets
-  . Full band : S_sub_bf * W_beamlet * N_complex / W_byte = 488 * 8b * 2 / 8b = 976 octets
-  . NOF_BEAMLETS_PER_BANK not needed anymore
-  . nof_streams = Number of beamlet streams
+  ==> use 16b to fit number from station name (e.g. CS001, LV614, see list of stations at
+      https://proxy.lofar.eu/array_status/STATIONS/HTML/cs011/index.html)
+
+- 4 Byte, SOURCE_INFO
+    Only include info that can be inserted by SDP, without explicit write by SC. Therefore e.g. RCU mode,
+    Nyquist zone, nof antenna in array are not included.
+  . 1b f_adc = 200 MHz, 160 MHz, sample rate
+  . 1b t_pfb = PBF type, 0 critically PFB, 1 oversampled PFB (rather than p, q for R_os = p/q)
+  . 1b payload_ok, 0 payload ok, 1 one or more blocks in payload have data errors
+       - no need for indicator bit per block, assuming errors are rare and will result in loss of
+         multiple blocks anyway
+  . 5b beamlet_width in number of bits
+       - Instead of BM = beamlet mode
+       - Default 8 for W_beamlet = 8 bit
+       - Use 5 bit to even fit 16b mode like in LOFAR1)
+  . 6b pn_id = UniBoard2 FPGA ID
+       - Instead of RSP_ID in LOFAR1
+       - 16 FPGAs for LBA, 16 for HBA in International Station, so maximum 32, but use one bit extra
+       - The pn_id implicitly also reveals the antenna array ID (core station 1 LBA, 2 HBAS, 3 HBA0, 4 HBA1, ...)
+         Therefore it is not necessary to define an explicit antenna ARRAY_ID field that would need to be
+         filled in by SC.
+  . 12b beamlet_scale
+    - 18b --> 8b, scale = 1 yields lowest bits, scale = 1024 (= 11b) yields highest bits
+    - 18b --> 4b, scale = 1 yields lowest bits, scale = 4096 (= 13b) yields highest bits
+    - scale = 1 --> suitable if only one antenna input was used for the beamlet
+    - scale = 12, 24, 48, 96 --> to account for number of antennas in beam 
+    - scale > 96 --> to have more dynamic range, but less sensitivity. More dynamic range only makes
+                     sense in 8b mode (or 16b mode, but not in 4b or 2b mode), therefore given the
+                     18b beamlet sum the maximum scale = 1024.
+    - In SDP the beamlet scale function extracts the lowest 8b from the 18b beamlet sum, after having
+      multiplied the beamlet sum by 1/scale. Internally the beamlet scale function uses an 18b 
+      unsigned representaion of the 1/scale fraction, so 2**18 / scale. This yields:
+      scale = 1    --> 262144
+      scale = 96   -->   2731
+      scale = 1024 -->    256
+
+- 2 Byte, BEAMLET_INDEX = SET_INDEX * NOF_BEAMLETS_PER_SET + bl * NOF_LANES + LANE_INDEX
+  . NOF_BEAMLETS_PER_SET = 488
+  . SET_INDEX in range(number of beamsets, currenlty 1 beamset per antenna array)
+  . NOF_LANES 8b
+  . LANE_INDEX 8b in range(NOF_LANES)
+  . global beamlet index of first beamlet in block
+    -   0: 487 for beamset 0
+    - 488: 975 for beamset 1, etc
+    - can fit maximum 2**16 / 488 = 134 beamsets
+
+  . stream index = SET_INDEX * NOF_LANES + LANE_INDEX
    - Separate destination address per stream
-    - LOFAR1 supports 4 streams
+    - no need to have an stream index field, because the CEP only needs to know the beamlet index.
+    - the beamlet index for each sample follows from BEAMLET_INDEX, BEAMLET_STEP, NOF_BEAMLETS_PER_BLOCK
+      and NOF_BLOCKS_PER_PACKET
+    - LOFAR1 supports 4 streams (4 lanes from RSP ring, staggered so rsp_id identifies lane)
    - LOFAR2.0 preferrably supports >> 4 streams
-      - beamlet_id to identify start beamlet in stream (provides more info than a stream ID)
-      - NOF_BEAMLETS_PER_BLOCK to identify range of beamlets from beamlet_id
-      - LOFAR1: beamlet_id = 0 and NOF_BEAMLETS_PER_BLOCK = 61 (dual pol beamlets, 4 streams):
+    - LOFAR2.0 preferrably outputs only 1 stream
+    - CEP with N processing nodes would like N streams, Cobalt has N = 22
+    - S_sub_bf = 488 = 2*2*2* 61, so only NOF_LANES = 1, 2, 4, and 8 yield a fixed integer number
+      of NOF_BEAMLETS_PER_BLOCK.
+
+    
+- 1 Byte, BEAMLET_STEP
+  . Index increment of subsequent beamlets in block
+  . BEAMLET_STEP = NOF_LANES
+  ? Is it useful to support BEAMLET_STEP = NOF_LANES > 1 at SDP but < 22 which is optimum for CEP?
  
- NOF_BLOCKS 16b in payload
+  
+- 2 Byte, NOF_BEAMLETS_PER_BLOCK
+   . Equals floor or ceil of NOF_BEAMLETS_PER_SET / NOF_BEAM_LANES dependent on LANE_INDEX,
+     so redudant if all beamlets are send, but could be used to send less beamlets.
+   . Instead of NOF_BEAMLETS_PER_BANK in LOFAR1
+   . LOFAR1 NOF_BEAMLETS_PER_BLOCK = 61 (dual pol beamlets, 4 streams):
+   . Maximum NOF_BEAMLETS_PER_BLOCK when NOF_LANES = 1:
+     W_beamlet = 8b : N_pol * S_sub_bf = 2 * 488 =  976 beamlets, * N_complex = 1952 octets
+     W_beamlet = 4b :                              1952 beamlets
+     W_beamlet = 2b :                              3904 beamlets
+  
+- 1 Byte, NOF_BLOCKS_PER_PACKET
  . Multiple beamlet time slots in one packet to increase payload efficiency.
-  . For W_beamlet = 8 bit there can be maximum 9 blocks per payload (9 * 976 = 8784 octets < 9000)
-  . With nof_streams >> 4 the NOF_BLOCKS can become larger, therefore use 16b. For example:
-    - NOF_BEAMLETS_PER_BLOCK = S_sub_bf / nof_streams = 488 / 32 = 16
-    - NOF_BEAMLETS_PER_BLOCK * W_beamlet * N_complex / W_byte = 16 * 8b * 2 / 8b = 32 octets
-    - 9000 / 32 = 281 > 256 --> use 16b for NOF_BLOCKS
-    - nof_streams = 22 destination nodes, each with 8k Byte payload, possibly a double buffer:
-      22 * 8 kByte * 2 = 352 kByte = 176 BRAM (1 BRAM = 2 kByte, FPGA has 2713 BRAM)
-    - 488 / 22 = 22.18, so 488 = 4 * 23 + 18 * 22
-  . Only send correct data to CEP (so no need for SOURCE_INFO/payload error bit).
-  . How to handle blocks that got lost within the Station?
-
- TIMESTAMP 64b (instead of 32b seconds TIMESTAMP and 32b BLOCK_SEQUENCE_NUMBER within second)
-  . A 64 bit timestamp in 0.2 ns resolution since t_base = 1970 for first block in payload:
-    - to fit both T_adc = 5 ns and 6.4 ns
-    - for 116 year span since t_base = 1970 --> 2086
-
- BLOCK_PERIOD 16b
-  . bit block period in 0.2 ns resolution
-  . 2**16 * 0.2 ns = 13.1 us block period (block rate > 76 kHz) fits T_sub
-  
- BSN 64b
-  . Block sequence number since t_base = 1970 of first block in payload, increments by 1 for every block
-  . Used to detect lost blocks and to align blocks from different stations
-  
-
- TX_PACKET_COUNT 32b
+  . Maximum NOF_BLOCKS_PER_PACKET is about 4 * NOF_LANES, because:
+    NOF_LANES = 1: 4 --> 4 * 1952 = 7808 octets < 9000 Jumbo
+  . LOFAR1 has 4 streams (lanes) and 16 blocks per packet
+  . LOFAR1 has payload ok bit in SOURCE_INFO to indicate that at least one block in the packet
+    has incorrect data  
+
+- 8 Byte BSN
+  . 50b Block Sequence Number
+    - Instead of 32b seconds TIMESTAMP and 32b BLOCK_SEQUENCE_NUMBER within second of LOFAR1
+    - Block Sequence Number (BSN) used to detect lost blocks and to align blocks from different stations
+    - BSN unit T_sub, 50b yields > 100 year span (1970 - 2070)
+    
+- 2 Byte BLOCK_PERIOD
+  . 13b Subband period T_sub in ns resolution, 5120 ns @ 200 MHz, Ros = 1
+  
+--> Total 1 + 1 + 4 + 2 + 2 + 1 + 2 + 1 + 8 + 2 = 28 Bytes
+  
+Remark:
+  - TX_PACKET_COUNT
  ==> Not useful, because then CEP needs to count Rx packets. Better send filler packets to keep the
      packet rate at the nominal rate, so that any packet loss is due to the Network and already 
      clear at OSI 2 layer using lower level tools like Wireshark.

--- a/applications/lofar2/doc/prestudy/station2_sdp_ring.txt
+++ b/applications/lofar2/doc/prestudy/station2_sdp_ring.txt
@@ -551,7 +551,7 @@ The beamformer function has the following sub functions:
       if transit node:
         - Encode beamlet sums packet to ring
       else:
-         - "Beamlet data output" : On output node scale and output final beamlet sums
+         - "Beamlet data output" : On output node scale and output final 8 bit beamlet sums
 - "Beamlet statistics (BST)": Calculate BST for beamlet sums, output node has final BST



--- a/applications/lofar2/doc/prestudy/station2_to_do_erko.txt
+++ b/applications/lofar2/doc/prestudy/station2_to_do_erko.txt
@@ -73,6 +73,7 @@ Vijf principes:
  

 - Use GIT
+- Rename master/slave, mosi/miso
 - Understand AXI4 streaming (versus avalon, RL =0)
  . wrap between AXI4 - Avalon for MM and DP
 - Global reset only on sosi info not on sosi data
@@ -98,6 +99,7 @@ Vijf principes:
    - polarization correction via subband weights is not needed, so X and Y can be on different PN
    - EMI between X and Y, but X and Y have only about 40 dB isolation
    - EMI between single pol inputs get suppressed dependent on the station digital beam pointing
+	- No need for pseudo random PFB input decorrelator function like in LOFAR1?
  . LBA inner signal inputs and LBA outer signal inputs on different subracks or arbitrary. Inner is not used
    Instead they use sparse odd and sparse even to have two more or less random antenna allocations.
  . HBA core station sub-array inputs on different Uniboard2 to reduce EMI
@@ -215,6 +217,13 @@ Station:
  . get
  . read
  
+* Alternatives to master/slave (mosi, miso)
+  . client / server --> cosi, ciso
+  . primary, main / secondary, replica, subordinate
+  . initiator, requester / target, responder
+  . controller, host / device, worker, proxy
+  . leader / follower
+  . director / performer


 *******************************************************************************

--- a/libraries/technology/jesd204b/tb_tech_jesd204b.vhd
+++ b/libraries/technology/jesd204b/tb_tech_jesd204b.vhd