diff --git a/applications/lofar2/doc/prestudy/station2_sdp_firmware_planning.txt b/applications/lofar2/doc/prestudy/station2_sdp_firmware_planning.txt
old mode 100644
new mode 100755
index 19cb2fb62f00fd13365e5f2459fdb11d95e86e88..a7533d965fc7b4dd24a07d483ed8b84cfdb10567
--- a/applications/lofar2/doc/prestudy/station2_sdp_firmware_planning.txt
+++ b/applications/lofar2/doc/prestudy/station2_sdp_firmware_planning.txt
@@ -1,219 +1,140 @@
-*******************************************************************************
-* SDP Firmware planning
-*******************************************************************************
-Includes design, implementation, verification on HW, technical commissioning.
-
-v1  v2
-       Infrastructure
-10  20   - Development environment using GIT, RadioHDL, updating existing components
-20   .   - BSP using Gemini Protocol, ARGS
-10   .   - Ethernet access (OSI 1-4)
-10  20   - Ring access
-
-       Application:
-15   .   - ADC ingress and time stamp
-20  10   - Subband filterbank (critically sampled)
- 0  30   - Subband filterbank (oversampled)
-10   .   - Beamformer
-20   .   - Subband correlator
-25   .   - Transient buffer (DDR4 interface, subband select and DM >= 0, packet format, M&C, RW access via M&C)
-20   .   - Transient detection
-20   .   - Subband offload
- 0   .   - 160 MHz
-
-35   . Integration
-     5   - FPGA pinning
-    10   - Interface test designs unb2c
-     5   - Design revisions and lab tests
-    15   - Technical commissioning
-
-
-1 week = 100% project allocation, bruto 40 hours, netto 40 * 0.8 = 32 hours = 4 days
-sprint = 100% project allocation, bruto  3 weeks, netto 12 days
-
-v1 : 10 + 20 + 10 + 10 + 15 + 20 + 10 + 20 + 25 + 20 + 20 + 35 = 215 bruto weeks --> 215 / 40 = 5.4 FTE ~ 3 people each 2 years
-v2 : 10 less for critically sampled PFB
-     10 more for updating existing components
-     10 more for ring access
-     30 for oversampled PFB
-      . consider unb2c test part of SDP FW integration and of SDP HW
-     15 technical commisioning relies on proper Systems Engineering, otherwise may become 50 weeks
-
-==> EK, JH: v1 estimate of April 2019 is still valid as v2 on 10 Oct 2019.
-
-v3 :
-
-   Infrastructure
-20   - Development environment using GIT, RadioHDL, updating existing components
- 5   - unb2c FPGA pinning
-10   - unb2c FPGA interface test designs
-20   - Board Support Package using Gemini Protocol and ARGS
-20   - Ring access
-10   - 10GbE access (OSI 1-4)
-
-   Application:
-15   - ADC input and time stamp
-10   - Subband filterbank (critically sampled)
-20   - Subband correlator
-10   - Beamformer
-25   - Transient buffer
-20   - Subband offload for AARTFAAC
-20   - Transient detection
-30   - Oversampled subband filterbank
- 0   - Support 160 MHz
-
-   Integration:
-10   - Lab tests
- 5   - Technical commissioning Dwingeloo
- 5   - Technical commissioning Prototype Station
-
-All:
-20 + 5 + 10 + 20 + 20 + 10 + 15 + 10 + 20 + 10 + 25 + 20 + 20 + 30 +  0 + 10 + 5 + 5 = 255
-
-No oversampled filterbank:
-20 + 5 + 10 + 20 + 20 + 10 + 15 + 10 + 20 + 10 + 25 + 20 + 20 +       0 + 10 + 5 + 5 = 225
-
-
-
-
-*******************************************************************************
-* SDP Workpackage (UniBoard2 HW + FW)
-*******************************************************************************
-
-Firmware FPGA images:
-- the SDP has one main firmware design unb2c_sdp,
-- the integrated design of SDP is revision unb2c_sdp_station,
-- per task there are revisions of unb2c_sdp that contain subsets of the SDP functionality,
- 
-
-Deliverables (D): items that are needed for a milestone
-Milestones (M) : 'cake moments' when you demonstrate deliverables
-- integration passed
-- review passed
-
-
-Tasks:
-
-INFRASTRUCTURE UniBoard2:
-  weeks  nr task
-     20  1) Maintain firmware development environment
-            - using GIT
-            - using RadioHDL
-            - updating existing VHDL library components
-            D=> Operational firmware development environment
-            D=> VHDL libraries verified in simulation
-
-         2) UniBoard2 board and test firmware
-            - unb2c board HW
-            D=> unb2c board detailed design document
-            D=> unb2c board schematic
-            D=> unb2c board layout
-            
-            M=> unb2c board detailed design document review (unb2b modifications)
-            M=> unb2c board schematic review
-            M=> unb2c board layout review (production ready)
-            M=> unb2c board lab validation using JTAG, unb2c_test designs OK
-            M=> unb2c board production validation using JTAG, unb2c_minimal_gmi OK
-            
-      5     - unb2c FPGA pinning design
-     10     - unb2c FPGA interface test designs
-            D=> unb2c_test design revisions (1GbE, 10GbE, DDR4, flash, ADC)
-            D=> unb2c_test_adc (read ADC samples from multiple inputs)
-            
-
-     20  3) UniBoard2 board support package (BSP)
-            - M&C by SCU via Gemini protocol
-            - M&C interface definition and generation using ARGS (doc, C, HDL)
-            D=> Gemini board for SCU M&C tests
-            D=> unb2c_minimal_gmi (1GbE, flash)
-            M=> unb2c_minimal_gmi validated using M&C by SCU (read design name)
-
-INFRASTRUCTURE SDP:
-     10  4) Network access via 10GbE
-            - Ethernet MAC, UDP/IPv4, ARP, ping
-            D=> 10GbE HDL component including support for UDP/IPv4, ARP, ping
-            D=> unb2c_10GbE
-            M=> unb2c_10GbE validated using data capture on PC and ping
-
-     20  5) Ring access using test data and BSN monitor
-            D=> unb2c_ring_combiner for BF
-            D=> unb2c_ring_multicast for XC
-            D=> unb2c_ring_endcast for SO, TB
-            M=> unb2c_ring revisions verified in simulation
-            M=> unb2c_ring revisions validated on hardware using M&C on SCU
-
-APPLICATION SDP documents:
-         6) Required documents
-            D=> Detailed design document of SDP firmware
-            D=> L1 ICD-11109 SDP-CEP: beamlet data protocol
-            D=> L1 ICD-11109 SDP-CEP: transient data protocol
-            D=> L2 ICD-11211 SC-SDP: FW register map and register definitions
-            D=> L2 ICD-11211 SC-SDP: UniBoard2 hardware M&C
-            D=> L2 ICD-11207 RCU2S-SDP: ADC interface
-            D=> L2 ICD-11209 STF-SDP: Time and frequency interface
-            D=> L2 ICD-11218 SDP-STCA: Subrack interface
-            
-            M=> SDP detailed design and interface documents ready for DDR
-            M=> SDP detailed design and interface documents updated for CDR
-            
-            D=> SDP firmware verification and maintenance document
-            M=> SDP all documents finished
-
-APPLICATION single node:
-  weeks  nr task
-     15  7) ADC input and timestamp (RCU2 interface)
-            ==> unb2c_sdp_adc_capture, read ADC or WG samples from databuffer via M&C
-            ==> unb2c_sdp_station (ADC)
-            
-
-M=> SDP ready for CDR
-    All major technical UniBoard2 hardware and SDP firmware risks are mitigated (by design and
-    based on validation with at least two UniBoard2 using JTAG, unb2c_minimal_gmi, unb2c_ring,
-    and unb2c_sdp_adc_capture).
-
-
-     10  8) Subband filterbank (Fsub)
-            ==> unb2c_sdp_filterbank to read SST via M&C
-            ==> unb2c_sdp_station (ADC + SST)
-
-APPLICATION multi node:
-  weeks  nr task
-     20  9) Subband correlator (XC)
-            ==> unb2c_sdp_correlator_one_node, read XST via M&C and create ACM for one node
-            ==> unb2c_sdp_correlator_multi_node, read XST via M&C and use ring to create complete ACM
-            ==> unb2c_sdp_station (ADC + SST + XST)
-
-APPLICATION multi node / network output:
-  weeks  nr task
-     10 10) Beamformer (BF)
-            ==> unb2c_sdp_beamformer_bst_one_node, read BST via M&C
-            ==> unb2c_sdp_beamformer_output_one_input, output to CEP for one input from one node
-            ==> unb2c_sdp_beamformer_output_one_node, output to CEP and sum one node
-            ==> unb2c_sdp_beamformer_output_multi_node, output to CEP and use ring to sum nodes
-            ==> unb2c_sdp_station (ADC + SST + XST + BST + BF output)
-            ==> detailed design doc
-            
-     25 11) Transient buffer (TB)
-            ==> unb2c_sdp_transient_buffer revisions (ADC + SST + TB readout, M&C access DDR4)
-            ==> unb2c_sdp_station (ADC + SST + XST + BST + BF output + TB readout)
-            ==> detailed design doc
-
-     20 12) Transient detection (TD)
-            ==> unb2c_sdp_transient_buffer revisions (ADC + TD event)
-            ==> unb2c_sdp_station (ADC + SST + XST + BST + BF output + TB readout + TD event)
-            ==> detailed design doc
-
-     20 13) Subband offload (SO) for AARTFAAC2.0
-            ==> unb2c_sdp_subband_offload revisions (ADC + SST + SO, one node, all nodes via ring)
-            ==> unb2c_sdp_station (ADC + SST + XST + BST + BF output + TB readout + TD event + SO)
-            ==> detailed design doc
-
-INTEGRATION:
-  weeks  nr task
-     20 14) Station integration tests (using unb2c_sdp_station)
-            - Laboratory tests
-            - Technical commissioning Dwingeloo Test Station ("Huisje West")
-            - Technical commissioning Prototype Test Station
-            - Technical commissioning Pre-production Test Station
-
-
+*******************************************************************************
+* SDP Firmware planning
+*******************************************************************************
+Includes design, implementation, verification on HW, technical commissioning.
+
+v1  v2
+       Infrastructure
+10  20   - Development environment using GIT, RadioHDL, updating existing components
+20   .   - BSP using Gemini Protocol, ARGS
+10   .   - Ethernet access (OSI 1-4)
+10  20   - Ring access
+
+       Application:
+15   .   - ADC ingress and time stamp
+20  10   - Subband filterbank (critically sampled)
+ 0  30   - Subband filterbank (oversampled)
+10   .   - Beamformer
+20   .   - Subband correlator
+25   .   - Transient buffer (DDR4 interface, subband select and DM >= 0, packet format, M&C, RW access via M&C)
+20   .   - Transient detection
+20   .   - Subband offload
+ 0   .   - 160 MHz
+
+35   . Integration
+     5   - FPGA pinning
+    10   - Interface test designs unb2c
+     5   - Design revisions and lab tests
+    15   - Technical commissioning
+
+
+1 week = 100% project allocation, bruto 40 hours, netto 40 * 0.8 = 32 hours = 4 days
+sprint = 100% project allocation, bruto  3 weeks, netto 12 days
+
+v1 : 10 + 20 + 10 + 10 + 15 + 20 + 10 + 20 + 25 + 20 + 20 + 35 = 215 bruto weeks --> 215 / 40 = 5.4 FTE ~ 3 people each 2 years
+v2 : 10 less for critically sampled PFB
+     10 more for updating existing components
+     10 more for ring access
+     30 for oversampled PFB
+      . consider unb2c test part of SDP FW integration and of SDP HW
+     15 technical commisioning relies on proper Systems Engineering, otherwise may become 50 weeks
+
+==> EK, JH: v1 estimate of April 2019 is still valid as v2 on 10 Oct 2019.
+
+v3 :
+
+   Infrastructure
+20   - Development environment using GIT, RadioHDL, updating existing components
+ 5   - unb2c FPGA pinning
+10   - unb2c FPGA interface test designs
+20   - Board Support Package using Gemini Protocol and ARGS
+20   - Ring access
+10   - 10GbE access (OSI 1-4)
+
+   Application:
+15   - ADC input and time stamp
+10   - Subband filterbank (critically sampled)
+20   - Subband correlator
+10   - Beamformer
+25   - Transient buffer
+20   - Subband offload for AARTFAAC
+20   - Transient detection
+30   - Oversampled subband filterbank
+ 0   - Support 160 MHz
+
+   Integration:
+10   - Lab tests
+ 5   - Technical commissioning Dwingeloo
+ 5   - Technical commissioning Prototype Station
+
+All:
+20 + 5 + 10 + 20 + 20 + 10 + 15 + 10 + 20 + 10 + 25 + 20 + 20 + 30 +  0 + 10 + 5 + 5 = 255
+
+No oversampled filterbank:
+20 + 5 + 10 + 20 + 20 + 10 + 15 + 10 + 20 + 10 + 25 + 20 + 20 +       0 + 10 + 5 + 5 = 225
+
+
+
+
+*******************************************************************************
+* SDP Workpackage (UniBoard2 HW + FW)
+*******************************************************************************
+
+Changed tasks:
+- T4.6 : 20 weeks booked explicitely for Required documents
+- T4.1 : 10 weeks, because GIT, RadioHDL finished
+- T4.2 : 10 weeks, because some FW done
+- T4.2 : ? weeks, hardware effort
+
+
+Firmware FPGA images:
+- the SDP has one main firmware design unb2c_sdp,
+- the integrated design of SDP is revision unb2c_sdp_station,
+- per task there are revisions of unb2c_sdp that contain subsets of the SDP functionality,
+ 
+
+Deliverables (D) = an item, product : items that are needed for a milestone
+Milestones (M) = a moment in time, achievement : 'cake moments' when you demonstrate or review
+                 deliverables as part of a larger system
+- integration passed
+- review passed
+
+Planning for LOFAR2.0 Station Workpackage 4 : Station Digital Processing
+
+Below is the planning in weeks per task, the work includes:
+- UniBoard2 hardware 
+- Firmware that runs on UniBoard2
+
+weeks task   description
+10    T4.1   Maintain firmware development environment (GIT, RadioHDL, HDL libraries)
+10    T4.2   UniBoard2 test firmware (enable mass production of UniBoard2)
+ ?    T4.2   UniBoard2 board hardware
+20    T4.3   UniBoard2 board support package (BSP, M&C via Gemini Protocol, use ARGS for doc, C, VHDL)
+10    T4.4   Network access via 10GbE (support ARP and ping)
+20    t4.5   Ring access using test data and BSN monitor (support ring)
+20    T4.6   Required documents (SDP RS, detailed design, ICDs, FW manual)
+15    T4.7   ADC input and timestamp (RCU2 interface, capture timestamped data for offline analysis)
+10    T4.8   Subband filterbank (Fsub, critically sampled, SST)
+20    T4.9   Subband correlator (XC, one subband per 1 s integration)
+10    T4.10  Beamformer (BF, BST, beamlet output to CEP)
+
+10 + 10 + ? + 20 + 10 + 20 + 20 + 15 + 10 + 20 + 10 = 145 + ? weeks
+
+Milestone : SDP ready for CDR:
+All major technical UniBoard2 hardware and SDP firmware risks are mitigated:
+
+- by design
+- SDP hardware and interfaces validated with at least two UniBoard2 using JTAG, firmware for BSP,
+  ring and ADC
+- Station TD validated using BF beamlet output to CEP
+
+The remaining tasks concern completing the applications that the firmware needs to perform.
+
+weeks task   description
+25    T4.11  Transient buffer (TB, ADC data, subband data)
+20    T4.12  Transient detection (TDET)
+20    T4.13  Subband offload (SO) for AARTFAAC2.0
+20    T4.14  Station integration tests (using unb2c_sdp_station)
+
+25 + 20 + 20 + 20 = 85 weeks
+
diff --git a/applications/lofar2/doc/prestudy/station2_sdp_ring.txt b/applications/lofar2/doc/prestudy/station2_sdp_ring.txt
index b7996af32d391333c47a3f1396acecc3748bcb14..c4bceb8e853d0227d3a5564df4f03dff55e48950 100644
--- a/applications/lofar2/doc/prestudy/station2_sdp_ring.txt
+++ b/applications/lofar2/doc/prestudy/station2_sdp_ring.txt
@@ -27,7 +27,7 @@ critically sampled data is restricted to L_lane < 7.8125 Gbps.
 
 Note:
 The alternative to use full ring capacity for critically sampled data and then support less
-(S_sub_bf / R_os = 488 / 1.28 = 381, so almost 30 % less) beamlets for oversampled data is not
+(S_sub_bf / R_os = 488 / 1.28 = 381, so almost 30% less) beamlets for oversampled data is not
 compliant with the requirement of S_sub_bf = 488.
 
 Design descision: Support S_sub_bf = 488 also for maximum R_os = 1.28.
@@ -45,7 +45,7 @@ sampled beamlet data and the oversampled beamlet data, to avoid differences in t
 beamlet sum that is transported across the ring needs to fit on a 10GbE lane. With S_sub_bf = 488
 the data rate for one full band station beam is N_pol * S_sub_bf * f_sub * N_complex *
 W_beamlet_sum = 2 * 488 * 195312.5 * 2 * 18 = 6.8625 Gbps. Using L_lane = 7.8125 Gbps this leaves
-about 1 - 6.8625 / 7.8125 = 12 % margin for packet overhead, which is sufficient.
+about 1 - 6.8625 / 7.8125 = 12% margin for packet overhead, which is sufficient.
 
 
 Design descision:
@@ -92,7 +92,7 @@ can use one link per data type stream and thereby avoid having to multiplex diff
 onto the same 40GbE link. However some multiplexing of local packets and remote transit packets can
 also be needed. UniBoard2 has been tested with 10GbE but not yet with 40GbE.
 The Arria10 on UniBoard2 has 1708800 FF so 1708800 / 182400 = 9.3 times more than the Stratix IV
-on UniBoard1. On UniBoard2 one 10GbE interface uses maximum about 5500 / 1708800 = 0.32 % of the
+on UniBoard1. On UniBoard2 one 10GbE interface uses maximum about 5500 / 1708800 = 0.32% of the
 FF and maximum about 7 / 2713 = 0.25% of the block RAM. In total there will be 4 x 10GbE for the
 intra board ring, 4 x 10GbE for the inter board ring and 1 x 10GbE for external IO, so these will
 take about 3% of the FF and block RAM resources.
@@ -324,6 +324,15 @@ Scheme 1 and 2b are useful if the transit nodes also use or modify the packet da
 hops are then used to multi cast the data. Scheme 2a is suitable for packet transport from start
 to end node, whereby transit nodes only pass on the packet.
 
+With scheme 1 each node has a two input BSN aligner that needs to buffer a large packet. With
+scheme 2 the end node has a N input BSN aligner that needs to align N small packets, Even
+though scheme 2 only uses the BSN aligner at the en node, it is there at all nodes, because all
+nodes run the same firmware image. Therefore the resource usage of the BSN aligner will
+typically  not differ much for scheme 1 or 2.
+
+If one hop fails in scheme 1 then there is no offload. If one hop fails in scheme 2a then there
+is still offload from subsequent hops.
+
 For the beamformer beamlets scheme 1 is most suitable. The start node prepares the packet with
 the initial beamlet sums. The subsequent nodes add there local beamlet sum to the packet
 beamlet sums and then pass on the packet.
@@ -569,7 +578,7 @@ remote crosslet packets. The local crosslets have to be correlated with the loca
 with each of the S_lba - S_pn remote crosslet packets. The correlation with the local crosslets
 is a square matrix that yields X_sq = S_pn * S_pn = 144 visibilities. For the local-local square
 correlator cell the efficiency is (S_pn * (S_pn+1)) / 2 / X_sq = 54%, but for the N/2 other
-local-remote square correlator cells the efficiency is 100 %. With N = 16 PN for LBA there are
+local-remote square correlator cells the efficiency is 100%. With N = 16 PN for LBA there are
 N/2 = 8 remote crosslet packets. Hence together with the local crosslet visibilities this yields
 X_pn = (floor(N/2) + 1) * X_sq = (8 + 1) * 144 = 1296 visibilities per PN. In total the subband
 correlator calculates N * X_pn = 16 * 1296 = 20736 visibilities. There are 
@@ -606,7 +615,7 @@ With S_pn = 12 signal inputs per PN and one crosslet per signal input there are
 packet. A crosslet is a W_crosslet = 16 bit complex value, so P_payload = 12 * 4 = 48 octets
 payload, so the effective packet size is P_packet = P_overhead + P_payload = 60 + 48 = 108 octets.
 The relative packet overhead for single crosslet payloads is P_overhead / P_packet = 60 / 108 = 
-55 %. Note that P_overhead_dp + P_payload = 20 + 48 = 68 octets still meets the minimum Ethernet 
+55%. Note that P_overhead_dp + P_payload = 20 + 48 = 68 octets still meets the minimum Ethernet 
 payload size requirement of 46 octets.
 
 Maximum number of crosslets per lane:
@@ -641,7 +650,7 @@ crosslets. These 9 X_pn in parallel can correlate up to 7 different crosslets. T
 transport maximum 11 crosslets. Hence the processing capacity of 9 X_pn is less than the IO 
 capacity of one 10GbE lane, therefore 9 X_pn in parallel can correlate 7 different crosslets.
 The crosslet data rate on a lane is then ((60 + 7 * 48) * 8b) * 195312.5 * 16/2 = 4.95 Gbps, so a
-utilization of 4.95 / 7.8125 = 63 %. Another set of 9 X_pn could be used to correlate the remaining
+utilization of 4.95 / 7.8125 = 63%. Another set of 9 X_pn could be used to correlate the remaining
 11 - 7 = 4 crosslets that can be transported via that lane. However, if more than N_crosslet = 7
 crosslets need to be correlated in parallel per integration interval, then it is easier to allocate
 an extra lane and to instantiate an extra set of 9 X_pn to correlate 14 crosslets in parallel in
@@ -807,7 +816,7 @@ Assumptions for AARTFAAC2.0:
 - group subbands from all S = 96 inputs in a packet
 - similar subband output format as in ASTRON_RP_1403_UDP_SDO ICD
 
-LOFAR1 uses the outer LBA for about 80 % of the time and the inner LBA for 20 % of the time. This
+LOFAR1 uses the outer LBA for about 80% of the time and the inner LBA for 20% of the time. This
 is because at lower frequencies the mutual coupling of LBA in the inner region becomes more
 significant, which then reduces the sensitivity of the inner LBA. The mutual coupling increases
 and the sensitivity decreases because for nearby LBA the wavelength >~ the distance between LBA.
@@ -817,150 +826,204 @@ Assumptions for Station.SDP:
 - The number subbands per lane is independent of set the same for R_os = 1 and R_os = 1.28. This
   implies that the utilization of the lanes for R_os = 1 is about a factor 1.28 less.
 
-Select S = 96 from S_lba = 192 signal inputs
-AARTFAAC uses the dual pol antennas, so the signal inputs (SI) have to be selected per pair of X
-and Y polarization. The N = 48 antennas can be selected from the N_lba = 96 antennas in different
-either at the offload node or at each PN:
-- Transport all SI to the offload node and select there
-- Select SI per PN and only transport the selected SI to the offload node
-The first schemeThe selection can be programmable or fixed. 
-
-
-First collect all S_lba = 192 signal inputs at the offload node, and then make an arbitrary 
-  selection or a fixed selection. The disadvantage is that this doubles the load on the ring.
-- Select at each PN and transport only First collect all S_lba = 192 signal inputs at the offload node, and then make an arbitrary 
-  selection or a fixed selection.
-
-Use ring transport scheme 1 or scheme2a:
-- With scheme 1 the selection of S out of S_lba can be made per PN, as the payload is passed along
-  and each node can insert none, all or a subset of its S_pn at the allocated subband index in the
-  payload. With scheme 2a the selection of S will be done at the offload node, so all PN then send
-  all their S_pn inputs via the ring. This doubles the load on the ring.
-- With scheme 2a each node only has to pass on the remote packets, but at the offload node it 
-  needs an N input BSN aligner, an N input to one output subband selection to get the offload
-  payload. With scheme 1 the first node initiates the offload payload and then each node has to
-  insert the local subbands at the correct index. This requires only a two input BSN aligner.
-- If one hop fails in scheme 1 then there is no offload. If one hop fails in scheme 2a then there
-  is still offload from subsequent hops.
-
 
+Required subband output load for AARTFAAC2.0:
 
 Current AARTFAAC1 can offload S_sub_so = 32 subbands for S = 96 signal inputs (SI) in W_subband_so
 = 16 bit mode. On the RSP - Uniboard interface there are 9 subbands per lane, so S_sub_so = 36 in
 total, but on the UniBoard - UDP interface to the GPU correlator only 8 subbands, so 32 in total
 are output. The AARTFAAC1 output load is S_sub_so * S * f_sub * N_complex * W_subband_so =
-32 * 96 * 195312.5 * 2 * 16 = 19.2 Gbps. Due to a bug in probably the RSP firmware, W_subband_so
+32 * 96 * 195312.5 * 2 * 16 = 19.2 Gbps. Due to a bug (probably in the RSP firmware), W_subband_so
 = 8 bit mode cannot be supported, but for LOFAR2.0 it can. Hence for the same output load as
 AARTFAAC1, AARTFAAC2.0 can offload S_sub_so = 64 subbands, which corresponds to a bandwidth of
-64 * 195312.5 Hz = 12.5 MHz.
+64 * 195312.5 Hz = 12.5 MHz. Assume the AARTFAAC offload will use 4 10GbE links, so S_sub_lane =
+16 subbands per lane. The payload size is P_payload = S_sub_lane * S * N_complex * W_subband_so /
+W_byte = 64 * 96 * 2 * 8/8 = 3072 octets.
 
-For LOFAR 2.0 the number of LBA doubles to S_lba = 192, but AARTFAAC2.0 assumes that still S = 96
-will offload subbands. Assume that the S = 96 signal inputs can be selected from the S_lba = 192
-available signal inputs at the Station output. Therefore internally in the Station SDP the
-subbands from all S_lba are passed on via ring to an the output node in SDP. For the LBA the ring
-in SDP connects N = S_lbs / S_pn = 192 / 12 = 16 nodes, so N-1 hops. Assume all subbands are send
-in one direction along the ring. The subband data load on the last hop is then
-(N-1)/N * 2 * 19.2G = 15/16 * 2 * 19.2G = 36.0 Gbps, excluding packet overhead. Given a lane 
-load capacity of L_lane = 7.8125 Gbps, this implies that the subband offload requires at least 
-ceil(36.0 / 7.8125) = ceil(4.6) = 5 lanes.
+
+Maximum subband output load per 40Gbps QSFP:
 
 The load on the from one W_subband_so = 8 bit subband is L_sub_so = S_lba * f_sub * N_complex *
-W_subband_so = 192 * 195312.5 * 2 * 8 = 0.6 Gbps. Per 10GbE lane this then yields maximum of
-L_lane / L_sub_so = 7.8125G / 0.6G = 33.3 subbands for
-R_os = 1 and 10G / 0.75G = 13.3 subbands for R_os = 1.25. The 10GbE requires some spare
-capacity, so therefore assume S_sub_so = 12 subbands / 10GbE link will just fit for R_os <= 1.25, provided that
-the packet overhead is < (13.3-12)/12 ~= 10 %. Hence with one 4 * 10GbE QSFP port at the final PN it is possible
-to offload 4 * 12 = 48 subbands or 9.375 MHz bandwidth with S_lba = 192 signal paths and W_subband_so = 8 bit. 
-The ring can be used to transport the subbands to some single destination PN that then performs the output via
-the 4 x 10GbE ports or 40GbE port on the QSFP. The destination PN could also do subband reordering to group
-subbands per S_lba = 192 inputs.
-
-
-The subbands are gathered at the output node via the ring. Using the ring avoids the need to use a 10GbE switch.
-Such a switch would need > 16 + 16 ports to support LBA + international HBA and some output ports. If the data
-is gathered, then it can as well be reordered to combine all S signal inputs in a single payload. The subbands
-can be send to the output node via the ring using either scheme 1 or scheme 2a:
-
-If the subband data is transported in one packet using scheme 1, then the payload can contain all 192 signal
-inputs. The payload size for S_sub_so = 12 subbands then becomes S * S_sub_so * N_complex * W_subband_so / W_byte
-= 192 * 12 * 2 * 8/8 = 4608 octets. The packet overhead is then (40 + 4608) / 4608 = 1.009, so 0.9 % overhead.
-Each node then inserts its local subbands at the appropriate offset in the payload. The packet size is 40 + 4608
-= 4648 octets and the data rate is f_sub, so the load is on all links in the ring is:
-packet size * W_byte * f_sub * R_os = 4648 * 8 * 195312.5 * 1.25 ~= 9.08 Gbps.
-
-If the subband data is send in separate packet for each PN using scheme 2a, then the payload size for
-S_sub_so = 12 subband / 10GbE link, S_pn = 12 signal inputs per PN and W_subband_so = 8 bit becomes
-S_pn * S_sub_so * N_complex * W_subband_so / W_byte = 12 * 12 * 2 * 8/8 = 288 octets. The packet overhead is
-then (40 + 288) / 288 = 1.14, so 14 % overhead. The packet size is 40 + 288 = 328 octets and at the end node
-there are N-1 packets on the ring. This yields a aggregate 'packet size' of (16-1)*328 = 4920 octets. The
-load on the last link in the ring is: (N-1) * packet size * W_byte * f_sub * R_os =
-(16-1) * 328 * 8 * 195312.5 * 1.25 ~= 9.61 Gbps. Note that the packet overhead of 14 % is larger than the
-maximum estimated allowable packet overhead of 10 % to transport 12 subbands. The reason that 12 subbands
-still fit is that the last node in the ring does not have to transport its local subbands via the ring, 
-so the use capacity on the last link is a factor (N-1)/N less. At the output node the data from all N node
-is combined into one payload of size N * 288 = 4608 octets, so the output load is ~= 9.18 Gbps (identical
-to scheme 1, because the output rate does not depend on which ring scheme was used).
-
-Both scheme 1 and scheme 2a can send offload 12 subbands per 10GbE link. The difference is that scheme 1 has
-a load of ~9.08 Gbps on all hops, whereas for scheme 2a the load increases wit every hop and has a maximum
-of ~9.61 Gbps on the last hop. With scheme 1 each node has to put its
-local subbands at the right location in the packet. In this way the end node only needs to output the 
-payload, because the data is already in the subband offload payload format. With scheme 2a all nodes just
-send their local data and pass on the transit data. At the end node a dispatcher and BSN aligner are needed
-to align the packets from all N = 16 nodes. After that the end node needs to reorder the data from these
-N = 16 input payloads into the subband offload payload format. This functionality in the end node is similar
-to the rsp_terminal function on UniBoard1 for AARTFAAC. Scheme is specific to the ring, scheme 2a would also
-work if the subband data is send to the end node via a switch (or via URI like with RSP).
-
-With scheme 2a the ring could be used in both directions, but this does not improve the capacity of the
-ring. With scheme 2a in one direction  the packets travel 1+2+3+...+(16-1) = 120 hops. With scheme 2a in
-both directions the packets travel
-1+2+3+4+5+6+7+8 = 36 hops left and 1+2+3+4+5+6+7 = 28 hops right, so total 64 hops. For the transport load
-on the ring as a whole scheme 2 is a factor 102/64 = 1.875 more efficient. However at the end node both
-schemes still have transfer the same load of 15 packets. Therefore at the end node the load for both
-schemes is the same. Hence with 15+15 or (8+7)+(7+8) packets arriving at the end node, this node has no
-spare capacity left to receive more subband packets via these two links.
-Using the ring in both directions does reduce the latency and therefor the input buffering at the end node
-by a factor 1.875. Furthermore less hops also proportionally reduces the packet error rate. It is easier
-to use the ring in only one direction, because all nodes then send in the same direction, independent of
-their location in the ring.
-
-At the output node the packet payload is put in an UDP/IP packet and with an SDO application header. The UDP/IP
-header has 8+20 = 28 octets. The SDO header in LOFAR 1.0 has 22 octets. The output packet size is 40 + 28 + 22
-+ 4608 = 4698 octets and the output data rate is packet size * W_byte * f_sub * R_os = 4698 * 8 * 195312.5 *
-1.25 ~= 9.18 Gbps. The output load is independent of the ring scheme. The ring has 12 full duplex 10GbE links.
-Suppose 8 of these can be allocated to subband offload, then the ring can suppport subband offload for maximum
-2*8 * 12 = 192 subbands (= 37.5 MHz). This then requires 2*8 / 4 = 4 QSFP ports, on different nodes.
+W_subband_so = 96 * 195312.5 * 2 * 8 = 0.3 Gbps. Per 10GbE lane this then yields maximum of
+L_lane / L_sub_so = 7.8125G / 0.3G = 26.0 subbands. The 10GbE requires some spare capacity, so
+therefore assume S_sub_so = 24 subbands / 10GbE link will just fit for R_os <= 1.28, provided
+that the packet overhead is < (26-24)/24 ~= 8%. Hence with one 4 * 10GbE QSFP port at the
+final PN it is possible to offload 4 * 24 = 96 subbands or 19.75 MHz bandwidth with S = 96 signal
+paths and W_subband_so = 8 bit. The ring can be used to transport the subbands to some single
+destination PN that then performs the output via the 4 x 10GbE ports or 40GbE port on the QSFP.
+The destination PN could also do subband reordering to group subbands per S = 96 inputs.
+
+
+Transport via ring:
+
+The subbands are gathered at the output node via the ring. Using the ring avoids the need to use
+a 10GbE switch. Such a switch would need > 16 + 16 ports to support LBA + international HBA and
+some output ports. If the data is gathered, then it can as well be reordered to combine all S 
+signal inputs in a single payload. The subbands can be send to the output node via the ring using
+either scheme 1 or scheme 2a:
+
+
+Select N = 48 from N_lba = 96 antennas:
+
+AARTFAAC uses the dual pol antennas, so the signal inputs (SI) have to be selected per pair of X
+and Y polarization. AARTFAAC2.0 selects N = 48 antennas per Station. It is unlikely that 
+AARTFAAC2.0 will select more than N = 48, because if AARTFAAC2.0 rather correlates more Stations
+than more inputs per Station. The selection can be:
+
+- one fixed selection
+- subset selection
+- completely arbirary selection
+
+The disadvantage of the fixed selection is that it rules out half of the LBA. The disadvantage
+of the arbitrary selection is the book keeping within Station and TM. The N = 48 antennas can
+be selected from the N_lba = 96 antennas at different stages within SDP:
+
+- Transport all SI to the end node and select there. The advantage is that an arbirary selection
+  can be done at the end node. 
+  . With transport scheme 2a the selection of N from N_lba will be done at the end node, so all
+    PN then send all their S_pn inputs via the ring. The disadvantage is that it doubles the
+    load on the ring and requires a selection at the offload node.
+  
+- Transport only the selected SI per PN to the end node. For arbitrary selection this complicates
+  the control per node, because different number of SI may be selected per node. A compromise can
+  be to only support selecting all S_pn inputs for a node or none, or to only support select the
+  same S_pn/2 inputs for each node.
+  . With transport scheme 1 the payload is passed along and each node can insert none, all or a
+    subset of its S_pn at the allocated subband index in the payload. The payload size is fixed,
+    because it contains S signal inputs.
+  . With transport scheme 2a each node only sends the selected inputs. For arbitrary selection
+    this yields payload sizes that depend on the selection, which is awkward. 
+    
+The advantage of scheme 1 is that the output payload is already formed by the selection at each
+node. With scheme 2a a multiplexer is needed to combine the paylaods from all nodes into the
+output packet. If in scheme 1 a packet gets lost, then all subbands from the remote nodes that
+were already passed is lost. If in scheme 1 a packet gets lost, then only the subbands from the
+node that send that packet are lost.
+
+Design decision:
+- Assume the SO only transports the selected subbands and uses scheme 2a. The selection is made
+  by letting each node either send all S_pn = 12 inputs or none. Hence only N/2 = 8 nodes send
+  subbands, the other N/2 nodes are remain quite. The selected nodes are identified via the 
+  channel field, e.g. if node 0, 3, 4, 5, 6, 7, 8, 11 are selected for output, then the get
+  channel index 0:7 via M&C and the other nodes do not send subbands. The channel index
+  determines the order of the subbands in the output packet. In this way:
+  . the ring only transports selected subbands,
+  . subbands can be selected from different antennas, but only in groups of S_pn = 12 per PN,
+  . the antenna allocation per PN must suite the required SI selections.
+
+
+
+For LOFAR 2.0 the number of LBA doubles to S_lba = 192, but AARTFAAC2.0 assumes that still S = 96
+will offload subbands. Assume that only the selected S = 96 signal inputs are transported via the
+ring using scheme 2a. For the LBA the ring in SDP connects N = S_lba / S_pn = 192 / 12 = 16 nodes,
+so N-1 hops. Assume the subbands are send in one direction along the ring. The subband data load
+on the last hop is then (N-1)/N * 19.2G = 15/16 * 19.2G = 18.0 Gbps, excluding packet overhead.
+Given a lane load capacity of L_lane = 7.8125 Gbps, this implies that the subband offload requires
+at least ceil(18.0 / 7.8125) = ceil(2.3) = 3 lanes. Assume that 3 lanes will be used to transport
+the S_sub_so = 64 subbands for AARTFAAC2.0. Choose S_sub_lane >= 22 subbands per lane. At the end
+node the selection from 3*22 = 66 to 64 subbands will be made.
+
+
+If the subband data is transported in one packet using scheme 1, then the payload can contain all
+96 signal inputs. The payload size for S_sub_lane = 22 subbands then becomes P_payload = 
+S * S_sub_lane * N_complex * W_subband_so / W_byte = 96 * 22 * 2 * 8/8 = 4224 octets. The packet
+overhead is P_overhead = 60, so (60 + 4224) / 4224 = 1.014, so 1.4% overhead. Each node then
+inserts its local subbands at the appropriate offset in the payload. The packet size is P_packet =
+60 + 4224 = 4288 octets and the data rate is f_sub, so the load is on all links in the ring is
+P_packet * W_byte * f_sub * R_os = 4288 * 8 * 195312.5 * 1.28 = 8.576 Gbps. Hence transporting
+S_sub_lane = 22 subbands for S = 96 SI fits on a 10GbE lane.
+
+If the subband data is send in separate packets for each PN using scheme 2a, then the payload size
+for S_sub_lane = 24 subbands per lane, S_pn = 12 signal inputs per PN and W_subband_so = 8 bit
+becomes S_pn * S_sub_lane * N_complex * W_subband_so / W_byte = 12 * 24 * 2 * 8/8 = 576 octets.
+The packet size is P_packet = 60 + 576 = 636 octets. The packet overhead is P_packet / P_payload
+= 636 / 576 = 1.10, so 10% overhead. At the end node there are N/2-1 packets on the ring if the
+end node is selected for offload and N/2 packets if the end not does not contribut SI for offload.
+Assume worst case N packets on the last link. The load on the last link in the ring is then
+N/2 * P_packet * W_byte * f_sub * R_os = 16/2 * 636 * 8 * 195312.5 * 1.28 ~= 10.018 Gbps. Choosing
+instead S_sub_lane = 22 subbands yields P_payload = 12 * 22 * 2 * 8/8 = 528, P_packet = 60 + 528
+= 588 and an aggregate load of 16/2 * 588 * 8 * 195312.5 * 1.28 = 9.408 Gbps, which fits on a
+10GbE lane.
+
+Both scheme 1 and scheme 2a can transport S_sub_lane = 22 subbands per 10GbE lane. The difference
+is that scheme 1 has a load of 8.576 Gbps on all hops, whereas for scheme 2a the load increases
+with every hop and has a maximum of 9.408 Gbps on the last hop. With scheme 1 each node has to
+put its local subbands at the right location in the packet. In this way the end node only needs
+to output the payload, because the data is already in the subband offload payload format. With 
+scheme 2a all nodes just send their local data and pass on the transit data. At the end node a 
+demultiplexer and BSN aligner are needed to align the packets from all N/2 = 16 nodes. After that
+the end node needs to reorder the data from these N/2 = 8 input payloads into the subband offload
+payload format. This functionality in the end node is similar to the rsp_terminal function on
+UniBoard1 for AARTFAAC. Scheme 1 is specific to the ring, scheme 2a would also work if the
+subband data is send to the end node via a switch (or via URI like with RSP).
+
+With scheme 2a the ring could be used in both directions, but this does not improve the capacity
+of the ring. With scheme 2a in one direction  the packets travel 1+2+3+...+(16-1) = 120 hops.
+With scheme 2a in both directions the packets travel 1+2+3+4+5+6+7+8 = 36 hops left and
+1+2+3+4+5+6+7 = 28 hops right, so total 64 hops. For the transport load on the ring as a whole
+using both directions is a factor 102/64 = 1.875 more efficient. However at the end node both
+the ring still has to transfer the same load of N/2 = 8 packets. Therefore at the end node the
+total load from both directions is the same. Hence with 8 packets arriving from one direction or
+2 * 4 packets arriving from two directions, the end node has no spare capacity left to receive
+more subband packets via these two links. Using the ring in both directions does reduce the
+latency and therefor the input buffering at the end node by a factor 1.875. Furthermore less
+hops also proportionally reduces the packet error rate. It is easier to use the ring in only
+one direction, because all nodes then send in the same direction, independent of their location
+in the ring. Design decision: use the ring only in one direction.
+
+At the output node the packet payload is put in an UDP/IP packet and with an SDO application
+header. The Ethernet oveerhead is 40 octets, the UDP/IP header has 8+20 = 28 octets and the SDO
+header in LOFAR 1.0 has 22 octets. Hence the packet overhead is P_overhead = 40 + 28 + 22 = 90
+octets. With 4 offload lanes and 16 subbands per lane, the packet size is P_packet = 90 = 3072
+= 3163 octets and the offload data rate is P_packet * W_byte * f_sub * R_os = 3072 * 8 *
+195312.5 * 1.28 ~= 6.144 Gbps. The output load is independent of the ring scheme. The ring has
+12 full duplex 10GbE links.
 
 Design decision:
 - Gather subbands at output node (instead of having a dedicated offload port at each node)
 - Gather the subbands via the ring (to avoid the need for a 10GbE switch with about 40 ports)
-- Reorder the subbands to have all subbands from signal inputs in one payload (to ease input stage of user application)
-- Use scheme 2a and in both directions (to reduce the number of hops and latency)
+- Reorder the subbands to have all subbands from signal inputs in one payload (to ease input
+  stage of user application)
+- Select subbands in groups of S_pn SI per node from N/2 = 8 nodes to have S = 96 signal
+  inputs. The SI from the other nodes are then not used and not transported.
+- Use ring transport scheme 2a and in one direction (simpler control than using both
+  directions)
+- On the ring 3 lanes are sufficient to transport 22 subbands per lane. At the output 3 lanes
+  are sufficient, but use 4 lanes and output 16 subbands per lane, to reduce the load per
+  lane.
+
+==> S_sub_so = 64 subbands, with W_subband_so = 8 bit, for N = 48 antennas can be offloaded
+    using 4 lanes on one QSFP port, with ~ 6.144 Gbps per lane in case of R_os = 1.28.
 
 
 *******************************************************************************
 * Transient buffer readout
 *******************************************************************************
 
-The transient buffer stores the data in frames of 2 kByte. A frame contains data from one signal input. The
-memory is divided into pages and each page can contain one frame. The transient buffer readout is controlled
-per signal input and defined by a start time and a number of pages. The start time translates into a start
-page. The SCU issues the read commands per signal input. The SDP firmware then reads and outputs the 
-requested frames to CEP. When the transfer has finished, then the SDP firmware sends an event message to the
-SCU, and then the SCU issues a read command for the next signal input, until all signal inputs have been
-handled. For the ring the read out per signal input implies that at any time only one node will send data.
+The transient buffer stores the data in frames of 2 kByte. A frame contains data from one signal
+input. The memory is divided into pages and each page can contain one frame. The transient buffer
+readout is controlled per signal input and defined by a start time and a number of pages. The
+start time translates into a start page. The SCU issues the read commands per signal input. The
+SDP firmware then reads and outputs the requested frames to CEP. The SDP firmware keeps a count 
+of the number of frames that have been output and that still need to be output. The SCU can poll
+these counts or wait on an event message from the SDP that signals that all frames have been
+send. When the transfer has finished, then the SCU issues a read command for the next signal
+input, until all signal inputs have been handled. For the ring the read out per signal input
+implies that at any time only one node will send data.
 
-The read frames are encoded into an DP/ETH frame. The first frame that is read is encoded with a sync and the
-subsequent frames that are read can be counted via the BSN field. In this way a BSN monitor at the end node
-can monitor whether all frames for a signal input read out have arrived at the end node. The end node decodes
-the frame and then encodes them into and UDP/IP/ETH frame to CEP. The transit nodes pass on the frames, and
-also decode the frames to be able to monitor them with a BSN monitor. After each read command has finished the
-SCU can check the BSN monitor at the end node to know whether all frames arrived correctly at the end node.
+The read frames are encoded into a DP/ETH frame. The first frame that is read is encoded with a
+sync and the subsequent frames that are read can be counted via the BSN field. In this way a BSN
+monitor at the end node can monitor whether all frames for a signal input read out have arrived
+at the end node. The end node decodes the frame and then encodes them into and UDP/IP/ETH frame
+to CEP. The transit nodes pass on the frames, and also decode the frames to be able to monitor
+them with a BSN monitor. After each read command has finished the SCU can check the BSN monitor
+at the end node to know whether all frames arrived correctly at the end node.
 
-For 1 Gbps data rate to CEP and packets of about 2 kByte the packet rate is 1e9/ 2000 / 8 = 62500 packets/s
-or about one packet every 16 us, so about every 3 T_sub. It is allowed to let multiple nodes output TB data,
-but the total number of packets/s has to still fit the output link.
+For 1 Gbps data rate to CEP and packets of about 2 kByte the packet rate is 1e9/ 2000 / 8 = 62500
+packets/s or about one packet every 16 us, so about every 3 T_sub. It is allowed to let multiple
+nodes output TB data, but the total number of packets/s has to still fit the output link. The
+readout node provides a programmable inter packet delay to throttle the output rate. The end node
+immediately outputs the packets when they arrive.
 
 
 
@@ -983,7 +1046,7 @@ The BF 10GbE has no statistics, the XC 10GbE does have statistics. One 10GbE tak
     tr_10GbE:u_tr_10GbE                                                          4748   5957    8      0
       dp_fifo_dc:\gen_dp_fifo_dc_rx:0:u_dp_fifo_dc_rx                              82    138    0      0
       dp_fifo_fill_dc:\gen_dp_fifo_fill_dc:0:u_dp_fifo_fill_dc                    186    244    4      0
-      tech_eth_10g:u_tech_eth_10g                                                4337   5307    4      0 <-- 3 % of 182400 FF
+      tech_eth_10g:u_tech_eth_10g                                                4337   5307    4      0 <-- 3% of 182400 FF
       tr_xaui_mdio:\gen_mdio:u_tr_xaui_mdio                                       187    268    0      0
 
 * apertif_unb1_correlator_full (3x 10gbE Rx)
@@ -996,7 +1059,7 @@ The BF 10GbE has no statistics, the XC 10GbE does have statistics. One 10GbE tak
     dp_fifo_fill_dc:\gen_dp_fifo_fill_dc:0:u_dp_fifo_fill_dc                       86    111    2      0
     dp_fifo_fill_dc:\gen_dp_fifo_fill_dc:1:u_dp_fifo_fill_dc                       93    109    2      0
     dp_fifo_fill_dc:\gen_dp_fifo_fill_dc:2:u_dp_fifo_fill_dc                       90    108    2      0
-    tech_eth_10g:u_tech_eth_10g                                                 10552  16487   21      0 <-- ~=5500 ~= 3 % of 182400
+    tech_eth_10g:u_tech_eth_10g                                                 10552  16487   21      0 <-- ~=5500 ~= 3% of 182400
     tech_eth_10g_stratixiv:\gen_ip_stratixiv:u0                                 10552  16487   21      0
       ip_stratixiv_eth_10g:u_ip_stratixiv_eth_10g                               10552  16487   21      0
   tr_xaui_mdio:\gen_mdio:u_tr_xaui_mdio                                           486    780    0      0
diff --git a/applications/lofar2/doc/prestudy/station2_to_do_erko.txt b/applications/lofar2/doc/prestudy/station2_to_do_erko.txt
index b7862111dffe12b5d73d5ce0a8953f36b3789a82..bc761bcd553fadb341f5d9234dcafd3dc4a08719 100755
--- a/applications/lofar2/doc/prestudy/station2_to_do_erko.txt
+++ b/applications/lofar2/doc/prestudy/station2_to_do_erko.txt
@@ -227,6 +227,9 @@ Open issues:
 - Write RadioHDL article
 - Write HDL RL=0 article - desp_hdl_design_article.txt
 - XST : SNR = 1 per visibility for 10000 samples, brigthtest sourcre log 19.5 --> 4.5 dB --> T_int = 1 s is ok.
+- BSP registers:
+  . duration of operations : counts time since last power cycle (passive heartbeat)
+  . cause of reboot (power cycle, overtemperature, ...)