diff --git a/applications/lofar2/doc/lofar_station_firmware_model.vsd b/applications/lofar2/doc/lofar_station_firmware_model.vsd
new file mode 100755
index 0000000000000000000000000000000000000000..641cca64b715e3c59398be802142a24b99a02a99
Binary files /dev/null and b/applications/lofar2/doc/lofar_station_firmware_model.vsd differ
diff --git a/applications/lofar2/doc/prestudy/desp_hdl_design_article.txt b/applications/lofar2/doc/prestudy/desp_hdl_design_article.txt
new file mode 100644
index 0000000000000000000000000000000000000000..96f97f9f40737ee07d827c4c43b9b8c3cf46f0ee
--- /dev/null
+++ b/applications/lofar2/doc/prestudy/desp_hdl_design_article.txt
@@ -0,0 +1,33 @@
+Idea / rule: Distinguish beteen state registers and pipeline registers.
+
+. The state registers keep the state of the function and the function itself is programmed in combinatorial logic.
+  In this way the pipelining that is needed to achieve timing closure can be added independent of the function.
+  This approach could be described in a paper, because it is quite significant and differs from the well known
+  Gailser approach (that uses RL=1 and does not separate state from pipeline). AXI uses RL=0 but need to check 
+  how it then handles pipelining.
+. Components need pipelining to achieve timing closure. This pipelining causes a latency in the data
+  stream. This latency is typically no problem, because it only delays the output. If components need
+  flow control then the stream has a siso backpressure signal that must have a certain timing relation
+  to the sosi data signal. This timing relation is the ready latency (RL) and the RL can be >= 0. For 
+  RL = 0 the ready signal acts as a data acknowledge and for RL > 0 the ready signal acts as a data
+  request signal. Adding pipelining to the sosi data increases the RL.
+. The RL is explained in the Avalon specification. An example of RL = 0 are so called look ahead (Altera)
+  or first word fall through (Xilinx) FIFOs. In our UniBoard applications we use RL = 1. For most parts
+  of the design we try to not use flow control. I think that the Axi stream use RL = 0.
+. The function operates with ready latency (RL) = 0, if it is combinatorial. If the stream has no flow
+  control then the pipeline is achieved as an output register stage. If the stream does need flow control,
+  then this output register stage increases the RL by 1. To restore the RL to 0 a dp_latency_adapter.vhd
+  is needed. This latency adapter also registers the ready, so it provides pipelining for both the output
+  stream sosi data  as well as the output stream siso ready flow control.
+. For new components the development approach implement the function for RL=0, so only with the state
+  registers. If the component does not use flow control, then it may still just wire the flow control
+  from output to input. If the component does use flow control than it can combinatorially impose this
+  on the incomming flow control and pass the combined flow control on to its input. For timing closure
+  the pipelining is added as a seperate stage. Either pipeline sosi if no flow control is needed
+  or pipeline siso if flow control is needed. For example: dp_block_resize.vhd, dp_counter.vhd.
+
+
+
+Ref:
+ $RADIOHDL/tools/oneclick/doc/desp_firmware_dag_erko.txt
+ $RADIOHDL/tools/oneclick/doc/desp_firmware_overview.txt
\ No newline at end of file
diff --git a/applications/lofar2/doc/prestudy/dupllo_aad_erko.txt b/applications/lofar2/doc/prestudy/dupllo_aad_erko.txt
new file mode 100644
index 0000000000000000000000000000000000000000..665166746c259698443d00a7451509de219f12b9
--- /dev/null
+++ b/applications/lofar2/doc/prestudy/dupllo_aad_erko.txt
@@ -0,0 +1,99 @@
+1) Introduction
+
+a) Focus on checking UniBoard2 based solution, because:
+
+- we need an FPGA to interface the ADC
+- using an existing board saves development time.
+
+b) Assumptions:
+- the subband filterbank will be implemented on the FPGA, because we need the FPGA anyway so then it can
+  also do some DSP
+- the station beamformer is implemented on the FPGA because for a LOFAR station the number of beams is
+  small, so this yields a larger data rate reduction to the subsequent processing (on CPU / GPU)
+- use a ring beamformer (like in Lofar 1.0), because it avoids having to use a large mesh (like in Apertif
+  BF) and because the ring can easily be extended with more nodes if necessary (like for the international
+  Lofar 1.0 stations).
+- Starting with critical sampled filterbank saves time, in design allow for oversampled filterbank
+
+c) New compared to Lofar 1.0
+- Same analogue band width and sample frequency ranges, but in total 4x more RCU input:
+  . 3 times more input due to simultaneous 2x LBA + 1x HBA
+  . ready for 4 times more input to support another 1x HBA input for Lofar Space Weather
+- Ready for output to Aartfaac 2.0
+
+d) Other relevant aspects
+- System requirements must be clear and complete at PDR, otherwise the project will delay due to unclarity
+- We used to have delay and therefore 'conflict' at end of project due to reactive, passive planning, now
+  we need 'conflict' at start of  project to be proactive and meet the end date.
+- X and Y and LBA and HBA can be implemented on independent hardware because they are processed
+  independently. The same firmware can be the same (apart from different parameter settings). Therefore
+  best use separate RCU for LBA and HBA, instead of combining LBA + HBA on one RCU, because otherwise they
+  will also need to be processed toghether in a subrack (assuming that the serial ADC link goes via a
+  backplane and not via a fiber.)
+- the current HBA uses DC power via one coax (x-pol) and control via the other coax (y-pol). The control
+  uses a propietory contral protocol based on Manchester encoding and implemented using a PIC micro
+  controller. The PIC microcontroller is a I2C slave.
+- all input should also be available at output of the FPGA to be future prove, this is a lesson learned
+  from Lofar 1.0 where only with high effort still only a small band could be made available for Aartfaac
+- during life time of 10 year FPGAs remain available, GPU will require an upgrade to a new version
+
+  
+d) Development time:
+- Starting with critical sampled filterbank saves time, in design allow for oversampled filterbank
+- With a critical sampled filterbank like in Lofar 1.0 the new Lofar 2.0 station can operate together with
+  a Lofar 1.0 station
+- Much reuse from Apertif and RSP firmware
+- Some new aspects: 
+   . oversampled filterbank (oversample factor increases the output load)
+   . JESD serial ADC data interface
+   . how to connect RCU I2C control interface (via microprocessor with 1GbE on PAC)
+   . TBB function on UniBoard2
+   . separate MM clock domain and sample clock domains (160, 200 MHz)
+   . reuse M&C protocol from Gemini instead of UniBoard Control Protocol
+   . station correlator via TBB function or via crosslet statistics (similar as in RSP, Aperif PAF
+     correlator)
+
+- Detailed design must include M&C and test
+
+     
+2) Oversampled filterbank:
+- See dupllo_oversampled_subband_filterbank.txt.
+
+
+3) TBB memory
+a) Lofar 1.0 (96 RCU for core and remote, 192 for interbnational)
+There is 1 TBB / 2 RSP, so 1 TBB / 16 RCU --> so 32 GByte/ 16 RCU = 2 GByte / RCU.
+With 200 MHz and assume 2 byte per sample this corresponds to 2 GByte / 0.2 GHz / 2 byte = 5 sec.
+
+b) 6 UniBoard1 (288 RCU)
+The largest DDR3 SODIMM that can fit on UniBoard is 16 GByte and each PN on UniBoard can have two DDR3
+SODIMMs. With 6 UniBoard1 there are 6 * 8 * 2 * 16 GByte = 1536 Gbyte for 288 RCU = 5.3 GByte / RCU.
+With 200 MHz and assume 2 byte per sample this corresponds to 5.3 GByte / 0.2 GHz / 2 byte = 10.3 sec.
+Uses 6 * 8 * 2 = 96 DDR3 SODIMMs. DDR3 can achieve 1.6 GTps @ 200 MHz.
+
+c) 3 or 4 UniBoard2 (288 RCU for 2x LBA + HBA, 384 including also 1x HBA for Space Weather)
+The largest DDR4 SODIMM that can fit on UniBoard2 is 36 GByte and each PN2 can have two DDR4 SODIMMs.
+With 3 UniBoard2 there are 3 * 4 = 12 PN, so in total 12 * 2 * 36 Gbyte = 864 Gbyte for 288 RCU = 3
+GByte / RCU. With 200 MHz and assume 2 byte per sample this corresponds to 3 GByte / 0.2 GHz / 2 byte
+= 7.5 sec. Uses 12 * 2 = 24 DDR4 SODIMMs. The required write rate per SODIMM is 288 RCU * 16b * 200 MHz
+/ 24 SODIMMs = 38.4 Gbps. The data width of the SODIMM is 64b (or 72b) so this is 38.4 Gbps / 64b =
+0.6 GTps, which is easily feasible, because DDR4 can achieve 3.2 GTps @ 400 MHz (transfers per second). 
+
+==> 1 UniBoard2 per 96 RCU can buffer 1.5 more transient data than Lofar 1.0. Possibly use factor 2 less
+    number of DDR4 or use smaller DDR4.
+
+d) UniBoard2 + external TBB storage cluster:
+Perhaps the TBB function can be implemented on an external storage cluster, because UniBoard2 can output
+all input. The total data rate is 288 * 200M * 16b = 288 * 3.2 Gbps = 921 Gbps, so with 2 ADC / 10GbE
+link this requires 144 10GbE links. For 10 s the cluster needs about 1152 TByte  = 72 * 16 Gbyte DDR
+modules.
+
+
+4) FPGA resource usage
+
+See Station ADD section 4.5.2.9
+
+
+[1] "HBA Control Design Description", LOFAR-ASTRON-MEM-175, apr 2010, E. Kooistra
+[2] "RSP Firmware Design Description", LOFAR-ASTRON-SDD-018, sep 2013, E. Kooistra
+  
\ No newline at end of file
diff --git a/applications/lofar2/doc/prestudy/dupllo_oversampled_subband_filterbank.txt b/applications/lofar2/doc/prestudy/dupllo_oversampled_subband_filterbank.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f61acda9b0cae483ad8e6ea8e39862ad03393bb1
--- /dev/null
+++ b/applications/lofar2/doc/prestudy/dupllo_oversampled_subband_filterbank.txt
@@ -0,0 +1,86 @@
+Oversampled filterbank:
+
+1) Purpose
+
+- to measure line spectra in the channels at the edges of a subband, could the AAF for Apertif be an alternative?
+- to use a synthesis filterbank on the beamformed data, why reconstruct the time series ?
+
+
+2) Working of analysis oversampled filterbank
+
+  PFB = PFIR -> FFT
+  
+The polyphase filterbank (PFB) consists of a FIR prefilter (PFIR) and an FFT. The downsample factor is set by the FFT block size N_fft. For computational efficiency N_fft needs to be a power of 2, but a factor 3 or 5 may be included too. The PFIR section has N_fft phases and N_tap taps per phase. The coefficients follow from a low pass prototype FIR filter, as a snake pattern for all taps, for all points. In a criticaly sampled PFB the input data is shifted in in blocks of size N_fft. In an oversampled PFB the data is shifted in in blocks of size M and M < N_fft, so r = N_fft/M is the oversample factor. The shift less then N_fft causes a phase step between blocks in the PFB output. This phase step can be compensated by counter rotating the data that inputs into the FFT [harris, tuthil].
+
+The oversampling N_fft / M also implies that multiple PFB in parallel also need to keep aligned not only the N_fft blocks, but also oversampling sub blocks M. In ASKAP r = 32/27 with 1 MHz subbands causes that an integer number of fine channels periods takes 27 seconds, so causing a periodicity at large time scales to align at the human (and VDIF) 1 sec grid.
+
+
+    0                    f_s/2
+  |-.-|---|..............|---|
+    .
+  |-.-| f_sub/2
+    .
+  <-.-> N_chan
+    .  
+ |--.--| f'_sub/2
+    . 
+ <--.--> N'_chan
+
+For the critically sampled PFB the downsampled frequency per subband is f_sub = f_s / N_fft. In case of a real input their are N_sub = N_fft / 2 subbands, where the factor 2 is because for a real input only the positive and negative frequency spectra are complex conjugate, so only half of the subbands are unique.
+In the PFB this results in that each downsampled subband is centred around 0 Hz with subband sample frequency f_sub and complex subband samples. Hence for a complex signal the Nyquist sample rate is equal to the bandwidth, so the Nyquist factor 2 then appears in the fact that the signal is complex, so with 2 values (real and imaginary) per sample. 
+
+The subband bandwidth B_sub is determined by the PFIR and independent of the subband rate f_sub, so B_sub <= f_sub. The f_sub = f_s / N_fft defines the frequency grid. The f'_sub > f_sub makes it possible to oversample B_sub and to have B_sub = f_sub without aliasing. For the oversampled filterbank the f'_sub = r * f_sub. The subband bandwidth B_sub can be selected such that it is still almost flat up to f_sub and then drops down to the stop band level at f'_sub. The width of the transition region is set by r. ASKAP and SKA LFAA use r = 32/27 ~= 1.185. For two neighbour subbands the transition region to attenuate the aliasing is 2*(r-1)*f_sub. A larger oversampling factor r eases the PFIR filter for a required aliasing attenuation, but increases the data rate. 
+
+Oversampling does not change the frequency grid of the PFB, because the frequency grid is set by the FFT size. The oversampling only increases the sample rate per frequency bin (subband or channel) and this can be used to achieve more attenuation between neighbouring bins (subband or channel) to eliminate aliasing.
+
+
+   ----    ---- ^
+       \  /     .
+        \/      .
+        /\      .
+       /  \     .
+      /    \    v 
+      <->       aliasing attenuation
+        f'_sub 
+      f_sub
+         
+         
+The subbands (coarse channels) are again separated into smaller bandwidth channel (fine channels). The number of channels in f'_sub is N'_chan, so f'_chan = f'_sub / N'_chan. If f_sub = K * f'_chan then K * N_sub channels from the oversampled subbands provide a continuous flat spectrum, without aliasing between subbands. The N'_chan - K channels in transition regions are dropped. The channel PFB The FFT size of the channel PFB is equal to the number of channels N'_chan, because the channel PFB has complex subband input.
+
+Define r = p/q = N_fft/M where p and q are the smallest integers to represent r. 
+
+ f_sub = f'_sub/r = N'_chan * f'_chan / r = K * f'_chan
+ --> K = N'_chan / r = N'_chan * q / p
+ 
+Hence to fit the integer constrain for K both N_fft and N'_chan must be integer dividible by p. The q is free to choose, but must be integer and <= p.
+
+Beamforming is done per subband sample from S_ant inputs. The result is a beamlet, which can be regarded as a subband with direction. A subband may be used for multiple beam directions, so it results in a beamlet for each direction. For the subband and beamlet samples the data rate is a factor r higher, it is only after a channel PFB that the channels in the transistion band can be dropped.
+
+
+3) Compatibility with LOFAR 1.0
+
+In LOFAR 1.0 the subband PFB F_sub has N_fft = 1024, so N_sub = 512. The channel PFB F_chan has N_chan = 16, 64 or 256 channels. The 16 channels is use for pulsar timing (PST). In LOFAR 1.0 both F_sub and F_chan are critically sampled. Using r = p / q = 32 / 27 for LOFAR 1.0 with 64 channels fits and yields a spectrum with 54 channels per f_sub, so the channel width then increases by the oversample factor.
+
+To achieve the same width as for LOFAR 1.0 requires using r = 2 and N'_chan = 128, because r = p/q = 2/1 then yields N_chan = 64 channels per f_sub. Compared to a LOFAR 1.0 channel the phase slope over the channels from an oversampled F_sub will be a factor r less, due to that f'_sub = r * f_sub.
+
+I do not think it is possible to support LOFAR 1.0 channel width with an oversampled F_sub for r < 2. Also not with an oversampled channel PFB, because oversampling does not change the channel frequency grid. Using r = 2 does fit the existing LOFAR 1.0 frequency grid, but will cause a factor r = 2 higher output rate to CEP, because the data rate can only be reduced again after the channel filter. Therefore a solution can be to move the fine channel filter from CEP to the stations. 
+
+
+4) Required oversampling factor
+
+The required oversampling factor depends on the stop band attenuation and stop band bandwidth, and is a trade of between data rate and processing load. The N_fft = 1024 is a power of 2, so p in r = p/q also has to be a power of two, e.g.:
+
+32/28 =  8/7  ~= 1.143
+32/27         ~= 1.185  <-- used by ASKAP, LFAA
+32/26 = 16/13 ~= 1.231
+32/25 =       ~= 1.280
+32/24 =  4/3  ~= 1.333
+
+
+5) Working of synthesis oversampled filterbank
+
+Reconstruction from f'_sub (beamlets) or from f'_chan
+
+Why reconstruct to time series, to sperate to new channels?
+Reconstruct the whole band or only a part of the band e.g. 16 MHz for VLBI?
+
diff --git a/applications/lofar2/doc/prestudy/station2_opc_ua.txt b/applications/lofar2/doc/prestudy/station2_opc_ua.txt
new file mode 100644
index 0000000000000000000000000000000000000000..8994049a8de91450503e7785ce2483ad340f6412
--- /dev/null
+++ b/applications/lofar2/doc/prestudy/station2_opc_ua.txt
@@ -0,0 +1,42 @@
+OPC-UA is  IEC 62541 standard
+
+Large open platform independent standard, but if only a subset of the features is supported, then the it becomes less
+standard or platform independent.
+
+OPC classic = Object Linking and Embedding (OLE) for Process Control
+OPC = Open Platform Communications.  
+OPC-UA = OPC Unified Architecture
+
+https://opcfoundation.org/
+http://wiki.opcfoundation.org/index.php/UA_Overview
+https://en.wikipedia.org/wiki/OPC_Unified_Architecture
+
+- Service oriented architecture (SOA) using asynchronous request/response pattern
+- transport: via TCP in binary or web based
+- data model: more than hierarchy of files/folder/registers, object oriented nodes that can send meta information and data
+- expandability via profiles:
+  . DI = device integration
+  . DA = data access
+  . A&C = alarms and conditions
+  . HDA = historical data access
+- security
+- authentication
+
+Needed:
+- OPC-UA SDK (software development kit)
+  . Considerations regarding Software Development Kits for OPC-UA:
+    http://www.ascolab.com/images/stories/ascolab/doc/ua_whitepaper_implementation_e.pdf 
+    - UA server requires at least ~200 kByte RAM
+  . https://documentation.unified-automation.com/uasdkhp/1.0.0/html/index.html
+  
+- TCP/IP stack
+  . NicheStack (free via Intel)
+    https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/tt/tt_nios2_tcpip.pdf
+  . https://www.micrium.com/rtos/tcpip/
+  . Lightweight IP for UDP about 5.1 Mbps transmit and 3.4 Mbps receive using NiosII/f at 50 MHz, so less for TCP
+    https://www.ee.ryerson.ca/~courses/coe718/Data-Sheets/RTOS/tt_nios2_lwip_tutorial.pdf
+- RTOS (realtime operating system)
+  . MircoC/OS-II: https://www.micrium.com/ (needed with NicheStack, requires license from Micrium)
+
+
+
diff --git a/applications/lofar2/doc/prestudy/station2_sdp_dsp.txt b/applications/lofar2/doc/prestudy/station2_sdp_dsp.txt
new file mode 100644
index 0000000000000000000000000000000000000000..993d1734ffb94c9cb6f85c8e022f46ef4d71b355
--- /dev/null
+++ b/applications/lofar2/doc/prestudy/station2_sdp_dsp.txt
@@ -0,0 +1,70 @@
+*******************************************************************************
+* Beamformer
+*******************************************************************************
+
+M&C:
+* BF weights per PN: 
+  - N_pol * S_pn * S_sub_bf * W_bf_weight * N_complex / W_byte = 2 * 12 * 488 * 16 * 2 / 8 = 46848 byte ~= 48 kByte
+  - N_pol * S_pn = 2 * 12 = 24 times S_sub_bf = 488 complex weights
+  - These weights can be send in 24 packets with 1952 octets/packet
+  - Arria10 has 2713 BRAM of M20k = 2 kByte, so the BF weights use 24 BRAM
+  - BF weight memory options
+    . single buffer -->
+       - BF weights are applied immediately when written,
+       - SCU can send BF weights at arbitrary intervals,
+       - SCU must send BF weights at the time for which they were calculated,
+       - BF weights update rate must be high enough such that they change smoothly
+    . double buffer switch at PPS 
+       - BF weights are applied at next PPS,
+       - SCU must send BF weights in the preceding second
+    . double buffer switch at BSN timestamp
+       - BF weights are applied at scheduled timestamp or immediately if the timestamp is in the past,
+       - SCU can send BF weights at arbitrary intervals,
+       - SCU can send BF weigths in advance within the current update interval.
+
+- Subband weights and BF weights design decision:
+  . General Jones matrix operation:
+  
+    |wx cy|   |x|   |wx*x + cx * y|
+    |cx wy| * |y| = |wy*y + cy * x|
+    
+  . Requirement [LOFAR2-3098] states that Station beams have to be independent per polarization. Therefore
+    wx /= wy allows making independent X and Y beams. Otherwise wx, wy could have had the same value, because X
+    and Y are at same location and subband calibration is done separately.
+    
+  . cx, cy can be 0 because no polarization correction per element is needed:
+  
+    |wx  0|   |x|   |wx*x|
+    | 0 wy| * |y| = |wy*y|
+    
+  . Wsing cx = wx and cy = wy and wx /= wy allows making two independent unpolarized beams using all antenne elements:
+  
+    |wx  0|   |1 1|   |x|   |wx wx|   |x|   |wx * (x + y)|
+    | 0 wy| * |1 1| * |y| = |wy wy| * |y| = |wy * (x + y)|
+    
+    The (x+y) could be implemented as first (x+y) and then *w, or as first weight and then add. 
+
+*******************************************************************************
+* Subband correlator
+*******************************************************************************
+
+
+
+
+*******************************************************************************
+* Transient buffer
+*******************************************************************************
+
+
+
+*******************************************************************************
+* Transient detection
+*******************************************************************************
+
+
+
+*******************************************************************************
+* Subband offload
+*******************************************************************************
+
+ 
\ No newline at end of file
diff --git a/applications/lofar2/doc/prestudy/station2_sdp_firmware_design.txt b/applications/lofar2/doc/prestudy/station2_sdp_firmware_design.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f3c52363fd654395f910129a534e159f9b9f588a
--- /dev/null
+++ b/applications/lofar2/doc/prestudy/station2_sdp_firmware_design.txt
@@ -0,0 +1,110 @@
+*******************************************************************************
+* Detailed Design Document of the LOFAR 2.0 Station SDP firmware
+*******************************************************************************
+
+
+? Link with functions in ADD
+? Link with L4 requirements on SDP
+? Link with ICDs (what is described in ICD and what in this DD):
+  * L2-ICD 11207 : RCU2S-SDP (JESD204B)
+  * L2-ICD 11209 : STF-SDP (SYSCLK / SYSREF and 200MHz / PPS)
+  * L2-ICD 11211 : SC-SDP (1GbE, Gemini M&C, create MM registers ICD from YAML files with ARGS)
+  * L2-ICD 11218 : SDP-STCA (no firmware interface)
+  * L1-ICD 11109 : STAT.SDP-CEP (beamlets, transient data read out)
+  * L1-ICD 11108 : STAT.SDP-NW (PHY, ARP, ping, XON/XOFF pause frames, no DHCP)
+? Oversampled subband filterbank first needs modelling design
+
+
+Title: Detailed Design of the LOFAR 2.0 Station Digital Processing (SDP) Firmware
+Table of contents
+References
+Terminology
+Definitions
+Introduction
+- Context
+  . ADD fig 3.1-1 (E)ICD and L3 PBS overview
+- Scope
+- Document overview
+
+Station overview
+  . ADD fig 4.1.1-1 M&C SCU -- PCC -- Unb2
+  . ADD fig 4.5.1.2-1 UniBoard2 with 4 PN
+  . ADD fig 4.5.2-1 Firmware toplevel with ICDs
+  . ADD fig 4.5.2-2 External FPGA interfaces for M&C and data offload
+               
+Hardware architecture (SDP, STCA)
+  . Two UniBoard2 per subrack, one PCC, 32 RCU each with 3 signal inputs (ADCs)
+  . 12 ADC per FPGA, 48 ADC per UniBoard, 96 ADC per subrack
+  . LBA ring : two subracks
+  . HBA ring : one subrack for core (two sub-arrays, but one ring to have subband correlations for all)
+               one subrack for remote
+               two subracks for international
+               
+Firmware infrastructure
+  . BSP (unb2_minimal_gmi)
+    - Clock, reset, PPS, flash, fpga regmap info from YAML
+    - MM bus and ARGS
+    - Gemini M&C protocol (impact of AXI MM and ST)
+  . FPGA interface test designs
+    - M&C using 1GbE (unb2_minimal_gmi)
+    - ADC using JESD204B (unb2_test_adc)
+    - QSFP using 10GbE (unb2_test_qsfp)
+    - Ring using 10GbE (unb2_test_ring)
+    - DDR4 (unb2_test_ddr4)
+  . Board test design
+    - All interfaces (unb2_pinning, unb2_test)
+  . Clock domains
+    - ~50 -100 MHz M&C
+    - 200M ADC, 160M ADC
+    - > 200 MHz for processing to fit S_sub_bf = 488 or even 512?, and to prepare for R_os ~=1.25, f_max Arria10?
+    - transceivers, DDR4
+  . Firmware development
+    - RadioHDL
+    - Revisions
+    - Technology wrappers, component libraries and application libraries
+    - M&C software
+    - Coding style (constants package derived from parameters in doc)
+    
+
+Firmware architecture
+  . Application overview  (array notation of interfaces and packets, ...)
+    - ADC ingress and time stamp
+    - Subband filterbank (critically sampled)
+    - Subband filterbank (oversampled)
+    - Beamformer
+    - Subband correlator
+    - Transient buffer (DDR4 interface, subband select and DM >= 0, packet format, M&C, RW access via M&C) 
+    - Transient detection
+    - Subband offload
+  . Timing (how it is used, sync interval, PPS event, BSN scheduler)
+  . Quantization (where and how)
+  . Resource usage
+  . Debug, test and monitoring points (test functionality)
+    - BSN monitor
+    - Latency monitor
+    - FIFO fill monitor
+    - 1GbE, 10GbE statistics
+    - DDR4 CRC error counts
+    - Data buffer at signal input, beamlet output
+    
+Prototyping:
+- FPGA - ADC JESD204B links (test board with Unb2b, one to S_pn = 12 inputs coax splitter)
+- FPGA - PC 10GbE link stress tests (pause frames, ARP, data rate)
+
+Designs:
+- unb2c_minimal_gmi
+  
+
+References:
+- Preliminary design txt files:
+  . station2_sdp_m_and_c.txt        : Monitoring and control, Gemini protocol
+  . station2_sdp_timing.txt         : Station BSN, timestamp definition, BSN aligner
+  . station2_sdp_ring.txt           : ring access, packets for beamlets, crosslets, subbands, TB readout
+  . station2_sdp_dsp.txt            : beamformer, subband correlator, transient buffer, transient detection, subband offload
+  . station2_sdp_icd.txt            : ICD
+  . station2_sdp_hdl_components.txt : rework existing HDL components for LOFAR2.0
+  . station2_sdp_hdl_article.txt    : reference article on RTL design using RL = 0, state and pipelining, AXI4 streaming
+
+- Other:
+  . tools/oneclick/doc/desp_firmware_dag_erko.txt
+  . tools/oneclick/doc/desp_firmware_overview.txt
\ No newline at end of file
diff --git a/applications/lofar2/doc/prestudy/station2_sdp_firmware_planning.txt b/applications/lofar2/doc/prestudy/station2_sdp_firmware_planning.txt
new file mode 100644
index 0000000000000000000000000000000000000000..873b3c2a163a7685615e22fe60209fba0ee98ef6
--- /dev/null
+++ b/applications/lofar2/doc/prestudy/station2_sdp_firmware_planning.txt
@@ -0,0 +1,75 @@
+*******************************************************************************
+* SDP Firmware planning
+*******************************************************************************
+Includes design, implementation, verification on HW, technical commissioning.
+
+v1  v2 
+       Infrastructure
+10  20   - Development environment using GIT, RadioHDL, updating existing components
+20   .   - BSP using Gemini Protocol, ARGS
+10   .   - Ethernet access (OSI 1-4)
+10  20   - Ring access
+     
+       Applications:
+15   .   - ADC ingress and time stamp
+20  10   - Subband filterbank (critically sampled)
+ 0  30   - Subband filterbank (oversampled)
+10   .   - Beamformer
+20   .   - Subband correlator
+25   .   - Transient buffer (DDR4 interface, subband select and DM >= 0, packet format, M&C, RW access via M&C) 
+20   .   - Transient detection
+20   .   - Subband offload
+ 0   .   - 160 MHz
+     
+35   . Integration
+     5   - FPGA pinning
+    10   - Interface test designs unb2c
+     5   - Design revisions and lab tests
+    15   - Technical commissioning
+
+
+1 week = 100% project allocation, bruto 40 hours, netto 40 * 0.8 = 32 hours = 4 days
+sprint = 100% project allocation, bruto  3 weeks, netto 12 days
+
+v1 : 10 + 20 + 10 + 10 + 15 + 20 + 10 + 20 + 25 + 20 + 20 + 35 = 215 bruto weeks --> 215 / 40 = 5.4 FTE ~ 3 people each 2 years
+v2 : 10 less for critically sampled PFB
+     10 more for updating existing components
+     10 more for ring access
+     30 for oversampled PFB
+      . consider unb2c test part of SDP FW integration and of SDP HW
+     15 technical commisioning relies on proper Systems Engineering, otherwise may become 50 weeks
+
+==> EK, JH: v1 estimate of April 2019 is still valid as v2 on 10 Oct 2019.
+
+v3 : 
+
+   Infrastructure
+20   - Development environment using GIT, RadioHDL, updating existing components
+ 5   - unb2c FPGA pinning
+10   - unb2c FPGA interface test designs
+20   - Board Support Package using Gemini Protocol and ARGS
+20   - Ring access
+10   - 10GbE access (OSI 1-4)
+
+   Applications:
+15   - ADC input and time stamp
+10   - Subband filterbank (critically sampled)
+20   - Subband correlator
+10   - Beamformer
+25   - Transient buffer
+20   - Subband offload for AARTFAAC
+20   - Transient detection
+30   - Oversampled subband filterbank
+ 0   - Support 160 MHz
+ 
+   Integration:
+10   - Lab tests
+ 5   - Technical commissioning Dwingeloo
+ 5   - Technical commissioning Prototype Station
+
+All:
+20 + 5 + 10 + 20 + 20 + 10 + 15 + 10 + 20 + 10 + 25 + 20 + 20 + 30 +  0 + 10 + 5 + 5 = 255
+
+No oversampled filterbank:
+20 + 5 + 10 + 20 + 20 + 10 + 15 + 10 + 20 + 10 + 25 + 20 + 20 +       0 + 10 + 5 + 5 = 225
+
diff --git a/applications/lofar2/doc/prestudy/station2_sdp_hdl_components.txt b/applications/lofar2/doc/prestudy/station2_sdp_hdl_components.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d0569123d721f0678fb6a4170d0df710c3edf2e1
--- /dev/null
+++ b/applications/lofar2/doc/prestudy/station2_sdp_hdl_components.txt
@@ -0,0 +1,740 @@
+*******************************************************************************
+* DP encoder / decoder
+*******************************************************************************
+- dp_packet_enc / dp_packet_dec
+  . Current dp_packet_enc encodes sosi fields into: CHAN (32b), sync & BSN (64b), DATA (>= 1 b), ERR (32b).
+  . Use new dp_packet_enc with CRC to mitigate false positive ETH CRC --> dp_packet_enc_crc:
+      CHAN (32b), Sync & BSN (64b), DATA (>= 1 b), ERR (32b), CRC (32b)
+
+- RSP RAD frame:
+  . uses: FSI, FSN, DATA, CRC.
+  . The FSN is 16 bit but the MSbit is used for the sync. The other 15 bits count blocks.
+  . After Rx frame the FSI is stripped and the CRC is replace by a BRC.
+
+- CRC Error checking:
+  The CRC is a 32 bit number, so the chance that the CRC results in a false positive is 1/2**32 ~= 2.3e-10 or 1
+  in 4.3e9. The packet rate is f_sub. Per T_sub interval the ring carries about 10 - 20 packets between N = 16
+  nodes. Hence in total the packet rate of the ring for one LBA station is about 195312.5 * 20 * 16 ~= 60 M 
+  packets / s. With 50 stations and LBA and HBA this become about a factor 100 more, so about 6 G packets / s.
+  If 0.01 % of the packets have errors, then the packet error rate is 0.6 M /s, so then once every 1 / (0.6e6 * 2.3e-10)
+  ~= 2 hours somewhere in LOFAR there will occur a false positive CRC. If such an error occurs, then it must not
+  cause the entire processing to stall. Therefore some additional check is necessary using a CRC. It is not
+  sufficient to check e.g. the ETH type and the expected packet length, because these do not cover the other
+  data in the packet.
+  Each station has about 100 10GbE links and there are 50 stations. Suppose the BER per link is 1e-10, so one bit
+  error per second per link, and that each bit error causes a CRC error. The total ring CRC error rate in LOFAR is then
+  5000/s, so a false positive CRC will occur about once per 2**32/5000 = 10 days. This is not often, but if it
+  causes a station to fail then it is too often.
+  Having false positive CRCs even on a daily or weekly basis is too often. Therefore the application payload
+  should also have a CRC to ensure that no false positive CRC will occur during the life time of LOFAR 2.0.
+
+
+Design decisions:
+- Use CHAN (32b), Sync & BSN (64b), DATA (>= 1 b), ERR (32b), CRC (32b) to transport data between FPGAs
+  without false positive CRCs during the lifetime of LOFAR 2.0 to garantuee that only correct packets
+  enter the FPGA internal processing. The internal FPGA processing must be robust to lost packets, but
+  it does not have to be robust against corrupted packets (wrong contents, wrong length).
+
+
+*******************************************************************************
+* dp_validate_crc
+* - Validate (geldig verklaren) CRC and store-and-forward or store-and-discard this packet
+*******************************************************************************
+
+The Ethernet/DP packet has two CRC checksums in the packet tail:
+
+- the Ethernet CRC is calculated by the 1GbE MAC
+- the DP packet CRC is calculated by the dp_packet_dec.
+
+The packet needs to be stored before it can be forwarded or discarded, because the entire packet is needed 
+to calculate and verify the CRC. The CRC results are reported via the sosi.err field at the end of packet
+(eop). The dp_validate_crc forwards the packet when the CRC is oke and discards the packet when the CRC is
+wrong.
+
+
+
+*******************************************************************************
+* dp_validate_bsn_at_sync
+* - Validate (geldig verklaren) BSN at Rx sync and pass on or discard packets until next Rx sync
+*******************************************************************************
+
+The DP packet has a sync and BSN field in the packet header. This field is at the start of the packet (sop),
+so it can be verified while the packet arrives. The Rx BSN at the Rx sync in the received packet should be
+equal to the local Station BSN at the local sync. If the Rx sync BSN and the local sync BSN are:
+
+- not equal, then discard all subsequent blocks until the Rx sync BSN is equal again,
+- equal, then pass on all subsequent blocks until the next Rx sync
+
+The assumption is that if the BSN at sync is wrong, then the block processing at this node or at the remote
+node has not been started properly, so then subsequent blocks will have wrong BSN also. If the BSN at
+sync is oke, all nodes have been started properly abd then the BSN for all subsequent blocks in the sync
+interval will be correct too. The sync and BSN value are not corrupted, because they are determined inside
+the local and remote FPGA (so error free, because the logic is error free) and the remote BSN is
+transported using a CRC (so error free, because the CRC detects all errors).
+
+The initial state to discard or pass on block is don't care, because the assumption is that the block
+processing was (re)started properly on all nodes. At power up, choose to initially pass on packets.
+If the packet with the Rx sync and BSN is lost, then the last decision to discard or pass on packets
+remains, because it is still valid.
+
+The dp_validate_bsn_at_sync function verifies the entire 64 bit sync and BSN in an Rx packet. For local and
+remote inputs the BSN can only differ by a limited number dependent on the latency differences between the
+different inputs. Therefore if the input Rx BSN at sync matches the local Station BSN, then for the
+BSN aligner that aligns the inputs based on the BSN it is sufficient to only use a fraction of the BSN.
+Uding the fraction of the BSN as index is suffivient to distinguish between blocks within the maximum BSN
+latency. If the fraction N is a power of 2 , then only the log2(N) LSbits of the BSN need to be compared
+to ensure that all inputs have the same 64 bit sync and BSN.
+
+
+
+*******************************************************************************
+* BSN aligner 
+*******************************************************************************
+
+Assumptions:
+- Per input the Rx packets arrive in order
+  . a packet contains one or more blocks, on the ring every packet contains one block
+- Only allow correct blocks to enter the FPGA processing
+  . the block validation is based on Rx packet CRC and BSN at sync
+- Usage schemes:
+  . N = 2 inputs aligner with 1 local data and 1   remote data
+  . N > 2 inputs aligner with 1 local data and N-1 remote data
+  . N >=2 inputs aligner with 0 local data and N   remote data (not used on ring, but was used in APERTIF)
+- The local sync and BSN sources on all FPGAs are synchronous, to avoid additional BSN latency between inputs.
+- Static input enable or disable via M&C
+  - it is possible to enable or disable any combination of inputs
+  - if all inputs are disabled then the output stops.
+  - if the input enable or disable setting is changed, then the BSN aligner restarts trying to achieve alignment.
+  - for the ring with 1 local and 1 remote input the static input enable/disable supports the align modes:
+    . disabled,
+    . local only,
+    . remote only,
+    . local and remote
+- Input latency:
+  . the input latencies are fixed by design, so inputs have a maximum BSN latency g_bsn_latency that is fixed
+    and that does not have to be programmable via M&C.
+  . If all hops on the ring are active then the total latency will be (N-1)*(d + 1) where d is the transport
+    latency of each hop and 1 is due to store-and-forward at each node. Typically the total transport latency
+    on the ring is (N-1)*d < 1, so less than one block period. The total ring latency is covered by
+    g_bsn_latency > (N-1)*(d + 1). 
+- Lost input blocks:
+  . accept that the corresponding output is lost too, or output filler block to replace lost block
+  . should not cause subsequent blocks to get lost too
+  . must not induce a burst of output blocks due output catch up after late lost block detection
+  . If often blocks on one input get lost, then it is not acceptable that the output is lost.
+    - insert filler block to replace the lost input blocks, or
+    - support dynamic input enable/disable control
+- Only output correct blocks, either with the received input block or with flagged filler block
+- The output passes on the sync and therefore it does not have to pass on the BSN
+- The output should support flow control to provide output throttling
+- Stopped input:
+  . If all inputs of the BSN aligner stop, then the output stops.
+  . If after some block periods (e.g. g_bsn_latency) there is no more block pending at any input, then the
+    BSN aligner should restart trying to achieve alignment.
+
+Notes:
+- In LOFAR and APERTIF the BSN aligner does loose more blocks due to input flush and realign
+- a BSN aligner can align at any BSN, using a sync aligner that can only align at the sync, would cause
+  loosing an entire sync interval to realign, which is not acceptable
+- in APERTIF the sync_checker looses entire sync intervals to ensure filled sync intervals
+- In LOFAR and APERTIF the output is driven by the remote input to add minimal latency, however this
+  results in loosing more packets and having to realign if input packets get lost.
+- In dp_bsn_align the artifical local data stream was used to ensure that the output block size was correct,
+  by using extra CRC checking (ETH CRC and DP CRC) and store and forward in Rx it is already certain that only
+  correct input packets arrive at the BSN aligner input. Therefore an artifical local data stream is not needed.
+
+
+Design options:
+- Lost packet detection
+  . Rely on next received packet:
+    - check per input that the align BSN increments +1 within the align_sync interval
+    - requires a timeout or overflow detection on other inputs to detect a burst of lost packets
+    - after a burst of lost packets, typically the output cannot catch up anymore, so then the BSN aligner
+      needs to flush its input buffer and restart.
+  . Per packet using a local output block pacer.
+    The local output block pacer is offset by at least g_bsn_latency relative to the local BSN source, to
+    ensure that all inputs should have a new block pending for output. This is possible, because the input
+    latencies are static and within a fixed range:
+    - in circular buffer the Wr flag for the lost block remains unset
+    - in FIFO by no pending input or pending input with higher BSN then current output BSN
+  
+  ==> Design decision:
+      - Use local block reference to define when to detect lost packets, because one lost block should not
+        cause subsequent blocks to get lost too.
+
+
+- Output driven by remote input block arrival or by local block reference
+  . in case of 1 remote input, the remote input does not need a FIFO if it drives the output
+  . in case of > 1 remote input, then the remote inputs also requires FIFOs
+  . using local input increases the latency from remote input to output, because fixed to the T_sub grid
+  . using local input at T_sub grid avoids bursts, this can also be handled using flow control
+  . with local input driving the output the assumption is that if the local input has M packets, then all remote
+    inputs will have delivered at least one frame, so there should be a sop pending from all.
+  . if there is no local input, then an artifical local input can be derived when BSN is equal on all enabled remote inputs.
+  . if remote input is lost, then entire output is lost if remote drives output, because there is not enough spare time
+    to still output the other input packets
+  . For remote driven output a slot can be output when for all active inputs there is a block. However if one or
+    a series of packets got lost, then the other inputs will overflow. Hence remote driven output needs a timeout
+    to keep the output running, so a form of local driven output. Hence to avoid additional packet loss on other 
+    inputs or of subsequent packets in time it is necessary to have a local driven output. Therefore using a remote
+    driven output is not feasible. 
+
+  ==> Design decision:
+      - Use local block reference to define when aligned blocks should be output, because one lost block should
+        not cause subsequent blocks to get lost too, which is more important then adding minimal latency and
+        potentially saving BSN aligner input buffer memory.
+
+
+- Generation of local block reference to define the output pace:
+  . During initial input alignment it is important that all active inputs are indeed active, because together they
+    determine the latency difference between inputs. After initial alignment the data output can continue at at a
+    fixed rate, driven by a local block reference:
+    - The local input or the remote input with the least latency could be used as local output block reference,
+      because (N-1)*d << 1. This requires having a local input or detecting the closest remote input.
+    - Alternatively a dedicated local block reference can be started with a certain time offset can be started
+      after achieving input alignment. The time offset sets a margin that ensures that at subsequent block
+      refererence pulses all inputs will have a new block pending if the block is not lost.
+      
+  ==> Design decision:
+      - Generate local block reference when initial BSN alignment has been achieved and start it with a certain
+        fixed offset.
+        
+
+- Filler data insertion      
+  . Whether to drop a block or to replace it by a filler block depends on the application
+    - for BF drop all inputs, because beam is affected
+    - for XC insert filler data, because visibilities of active inputs are still oke.
+    - for the output via the Network insert filler data to keep the output at the nominal rate, such that
+      the destination can distinguish between data blocks that got lost inside Station and packet loss on
+      the Network.
+  . Filler blocks can be flagged using a sosi.channel bit as flag
+  . Filler data can be:
+    - undefined
+    - forced to zero
+    - random with similar noise level,
+    - most negative integer in real data
+    - most negative integer in complex real part and imag part (or use imag part as cause identifier).
+
+  ==> Design decision:
+      - Replace lost blocks by filler blocks, to preserve the nominal output rate
+      - Flag the filler block via a sosi.channel bit, to distinguish the block
+      - Forced the filler data to some constant dependent on a generic, to support transparant operation
+        in e.g. an adder where x + 0 = x or a multiplier where x * 1 = x, or to support flagging per data
+        value using most negative integer value.
+
+    
+  
+. dynamic input enable/disable in case of lost packets
+  - Scheme:
+    . Fine per packet scheme:
+      - Input packets arrive every block period, remote packets can arrive anywhere within a block period,
+      - If one input stops, then g_sop_timeout occurs in s_align and then that input could be dynamically disabled in s_xoff.
+        Inputs can dynamically be enabled if they arrive within two block periods, in s_align.
+      - too nervous, too difficult to debug and monitor
+    . Coarse per sync interval scheme:
+      - if an input has no lost packets during one (or more) sync interval then it can be dynamically enabled for the next sync interval
+      - if an input has lost packets during one (or more) sync interval then it can be dynamically disabled for the next sync interval
+      - This is a suitable scheme because it does not react too fast and it can be monitored via M&C.
+      - Define number of sync intervals for dynamic input control as a generic
+      - preferred because it is less active and easier to monitor
+  - Is dynamic input enable/disable necessary if a lost packet does not affect next packets?
+    . If lost data is replaced by filler data, then only static input enable/disable is necessary, because if an input
+      becomes inactive it will be flagged and the output can still continue.
+    . If lost data causes all inputs to be discarded, then dynamic input enable/disable may be useful to avoid that a
+      single input causes all output to stop.
+      
+  ==> Design decision:
+      - It is not necessary to support dynamic input enable/disable, because lost blocks are replaced by filler blocks.
+
+
+. Treat all inputs equal or use local input stream as reference to achieve input alignment:
+  - using the local data stream as reference stream can benefit from the fact that the local data stream has no
+    packet loss, because internally in the FPGA logic is error free.
+  - treating all streams equal is more general and also works when static input enable/disable disables the local
+    input. 
+    
+  ==> Design decision:
+      - Treat all inputs equal. Do not make use of the fact that the ring has a local input. In this way the BSN 
+        aligner can also work when there are only remote inputs.
+
+
+. Define align_sync  
+  -  ...
+  
+  ==> Design decision:
+      - Define align_sync to start initial alignment and to avoid need for twice as large input buffer given a 
+        certain BSN latency
+      
+      
+      
+      
+. Initial alignment declaration can be based on:
+  - All active inputs have data pending with the same BSN index (in the same circular buffer slot or at the FIFO output)
+  - If BSN latency number of slots on all inputs got filled, then set the Rd pointer. This requires that all inputs
+    start filling at the same BSN index, because then the input with the lowest latency will get filled first. The
+    Rd pointer is set at the BSN index.
+  - The same slot is filled on all active inputs, this slot index sets the Rd pointer:
+  
+                   t=0        t=1        t=2        t=3        t=4        t=5        t=6        t=7        t=8    
+                                                                            9          10         11         12
+      t=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3
+                                                                                                                  
+        0 1 2 3    W . . .    0 W . .    0 1 W .    . 1 2 W    W . 2 3    0 W . 3    0 1 W .    . 1 2 W    W . 2 3
+        3 0 1 2    . . . W    W . . 3    0 W . 3    . 1 W 3    . . 2 W    W . . 3    0 W . .    . 1 W .    . . 2 W
+        2 3 0 1    . . W .    . . 2 W    W . 2 3    . W 2 3    . . W 3    . . . W    W . . .    . W . .    . . W .
+                                         R            R            R            R    R            R            R      
+                                         
+    If a packet got lost, then the alignment will fail and needs to be restarted.
+  - Align_sync found in same slot on all active inputs, this slot index sets the Rd pointer
+    . Align_sync period:
+      The maximum latency between two inputs is g_bsn_latency. The minimum time between the last align_sync of the
+      previous align_sync interval and the first align_sync in this align_sync interval is align_sync period -
+      g_bsn_latency. Hence if the align_sync period - g_bsn_latency > circular buffer size, then the align_sync in
+      the circular buffer all apply to the same BSN.
+      The period of the align_sync is preferrably a power of two, such that the align_sync can easily be derived
+      from the BSN and such that the align_sync will always occur at first slot of the circular buffer.
+      The 1 s sync interval could be used as align_sync, but in LOFAR2.0 the sync period is not a power of two and
+      differs by 1 per sync interval, so the sync appears at different slots. Furthermore a 1 s period is
+      relatively slow, using a dedicated and much shorter alig_sync period allows fast initial alignment.
+    . Do we need align_sync?
+      - The advantange of using an align_sync is that if the alignment fails in one period, e.g. due to a lost
+        packet, then it will automatically try again in the next interval. Schemes without an
+        align_sync require a restart, because they wait until the buffer has filled sufficiently and need to refill
+        to try again. 
+
+                           t=0        t=1        t=2        t=3        t=3        t=4        t=5    
+      t=0 1 2 3 4 5 6 7  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3  i=0 1 2 3
+                                                                                                                                                                                     
+        4 5 6 7 0 1 2 3    W . . .    4 W . .    4 5 W .    4 5 6 W    W 5 6 7    S W 6 7    S 5 W 7
+        3 4 5 6 7 0 1 2    . . . W    W . . 3    4 W . 3    4 5 W 3    4 5 6 W    W 5 6 7    S W 6 7
+        2 3 4 5 6 7 0 1    . . W .    . . 2 W    W . 2 3    4 W 2 3    4 5 W 3    4 5 6 W    W 5 6 7
+                                                                                             R
+      - Without align_sync the buffer would need to be twice as large to ensure unambigous detection of aligment
+      - The align_sync is only used within the BSN aligner.
+      
+  - The input streams have an align_sync that has a period > 2 * g_bsn_latency and that is a power of 2
+    . using a power of 2 implies that the data block at the align_sync will be stored at slot 0.
+    . using an align_sync period > 2 * g_bsn_latency ensures that the sync applies to the same BSN for all inputs
+    . using align_sync << 1 s sync allows faster initial alignment
+    . using align_sync instead of 1 s sync allows using an interval that is a power of 2 so that the align_sync
+      always occurs in slot 0 of a circulare buffer. The 1 s sync can occur in any slot, so then the BSN aligner
+      needs to detect which slot the sync occurred to set the initial Rd pointer.
+  - For the ring the latency depends on the number of hops. Therefore require that initial BSN alignment is achieved
+    with all active input as defined by M&C, to ensure that the total input latency at each node on the ring is
+    determined by the nominal operation.
+    
+  
+
+      
+- Input FIFO
+  . Blocks are stored in arrival order, therefore the FIFO must pass on the BSN index to be able to align the inputs
+    and to detect lost packets.
+  . The BSN index does not have to be incrementing, but is must be unique per BSN latency interval
+  . The FIFO must pass on the 1 s sync, to allow timestamp recovery from Station BSN.
+  . Flushing:
+    - flush per packet or flush until empty?
+    - flush per input per input or flush all inputs?
+    - flush by reading, or by reset or by moving a Rd pointer
+    - Use packet count instead of FIFO full indicator
+    - can we do without flushing the FIFO? Not if we need to realign.
+    - If multiple packets on a remote input get lost, then the other inputs fill up if there is no timeout. Flush
+      all inputs empty if one of them got filled up. Flush empty to avoid that at some moment all inputs may have
+      multiple packets pending in the FIFO, that will then be output in a burst. The pending packets that
+      corresponded to the lost packet will need to be discarded anyway, because there is no time to output them still.
+    - also useful to know BSNs at FIFO inputs? --> No, because FIFO packet count can be used to detect pending FIFO overflow.
+  . Keep FIFOs outside or inside BSN aligner component.
+    - the input of the FIFO is needed to be able to maintain a count of the number of packets in the FIFO, which is
+      relevant for the align timeout. The input eop increments the count and the output eop decrements the count.
+    - inputs with a large latency could use a smaller FIFO, this is easier to control with external FIFOs
+    - if the BSN aligner relies on FIFO input information, then it is better to have the FIFOs inside.
+   
+
+- Input circular buffer
+  . can handle data arriving out of order, but this is not needed within SDP
+  . The buffer memory size is g_bsn_latency * g_nof_inputs slots that can store a packet.
+    - the maximum latency between any two inputs must be < g_bsn_latency number of data blocks
+    - For each slot there is a Wr flag that needs to be maintained. The Wr flag can be set when the data block write
+      begins, because then the read could already start as well since Wr and Rd run at same rate.
+    - For each slot there is also a sync flag to pass on the 1 s sync
+  . Can handle out-of-order data, because it uses the BSN as an index. However on the ring in SDP all data will be in order.
+  . The circular buffer could be used as a FIFO with internal access and an incrementing Wr pointer. However it
+    seems better to use it with a Wr pointer that is derived from the BSN.
+  . the BSN must be continuous BSN and incrementing, because then the remainder of the BSN divided by the buffer size can
+    be used as Wr pointer. 
+  . The buffer size is preferrably a power of two, but can be any size (to save memory):
+    - Using a buffer size that is a power of 2 avoids an integer divsion of the BSN, because it can then use the
+      corrsponding LSbits of the BSN as Wr pointer.
+    - Modulo 2**n - 1 can be calculated efficiently for binary numbers, by adding the n-bit digit parts. Similar as
+      mpdulo 3 (= (10-1)/3) can be calculated by adding the decimal digits.
+    - Modulo n for constnat n can be calculated efficiently suing multiplication by 1/n. The 1/n fraction must be 
+      represented with sufficient accuracy to determine the remainder.
+  . The slots in the circular buffer have a Wr flag that is set when the slot is written with an Rx packet and cleared
+    when the slot is read for output.
+  . Flushing:
+    - Clearing a Wr flag or all Wr flags is much faster than flush reading a FIFO.
+  . The Rd pointer increments at every output block period.
+  . The Rd pointer increments after every output slot.
+  . The write pointer always needs to be ahead of the Rd pointer. The minimum distance between the Wr and Rd pointer
+    is g_bsn_latency. The size of the circular buffer is the same for all inputs and must be > g_bsn_latency (for wr)
+    + 1 (for rd). The circular buffer read can occur when the write pointer exceeds rd pointer + g_bsn_latency. 
+  . the circular buffer is part of the BSN aligner component
+  . On CEP the beamlet data is written into a circular buffer based on the time stamp. A flag indicates whether data in the
+    circular buffer is valid. The size of the circular buffer is in the order of hundreds of ms to cover the distance latency 
+    of the international stations. An array of tupples lists the lenght of continuous blocks in the circular buffer, and 
+    therefore also to the gaps. A local timer determines when the circular buffer is read. The local timer has ms accuracy
+    compared to UTC, so the size of the circular buffer dominated by the largest latencies. The channel filterbank in CEP
+    also flags the initial channel data that is disturbed after a gap.
+
+
+
+
+
+. Circular buffer state machine
+    all:
+      Receive and monitor input
+      Derive align_sync and Wr pointer from input BSN
+      Write the input at the slot indexed by the Wr pointer and set the Wr flag for that slot.
+    s_xoff:
+      Accept static input enable/disable control
+      Clear all Wr flags of the slots to initially align or to realign the inputs.
+      Reset the Rd pointer at the first slot, because the align_sync is defined at slot 0
+      --> s_align
+    s_align:
+      If input control event --> s_xoff
+      If for all active inputs the Wr flag is set in slot 0 and slot 0 contains the align_sync then
+        restart a periodic slot pulse to set the pace for outputting the slots. An offset of the slot period
+        is used to ensure that in subsequent block periods all inputs will have a pending block --> s_sop
+    s_sop
+      If input control event --> s_xoff
+      If slot pulse --> s_output
+    s_output:
+      If all Wr flags are unset (empty buffer) --> s_xoff
+      else output one block, clear Wr flag of slot and increment Rd pointer --> s_sop
+      
+             
+    
+. BSN max/min scheme of dp_bsn_align.vhd core:
+  - State machine
+      WHEN s_xoff =>
+        accept input control
+        flush all inputs for g_xoff_timeout --> s_align
+      WHEN s_align =>
+        wait for sop on all inputs 
+        if all enabled inputs have sop:
+           if bsn max = bsn min --> s_data, Copy the BSN of one of the valid streams to all other streams
+           elseif bsn max <= bsn min + bsn_latency --> within range, flush one block from inputs that have bsn < bsn max
+           else bsn differ too much --> s_xoff
+        elsif input control event --> s_xoff
+        elsif g_sop_timeout --> s_xoff       # g_sop_timeout = bsn_latency * bsn period, so this is an align timeout
+      WHEN OTHERS =>  -- s_data
+        output one block
+        if at end of block:
+           if input control event --> s_xoff
+           else --> s_align
+  * the input with the most latency has only one packet in the (fill) FIFO
+  * more packets get lost, one input stops --> g_sop_timeout in s_align --> flush all inputs in s_xoff
+  * one packet gets lost, next input arrives within g_sop_timeout --> bsn in range, flush one block from all other inputs
+
+
+. sync aligner instead of BSN aligner
+  - Using the sosi.sync one packet lost causes whole interval lost, this is too much impact.
+  - Use as much BSN range as necessary. At the end of the range the limited range BSN will wrap. This will cause
+    g_bsn_latency out of 2**c_bsn_align_w possible limited BSN values to fail alignment initially, but for these
+    instants the alignment will be possible some BSN later. Using c_bsn_align_w = ceil_log2(g_bsn_latency) + 2
+    provides sufficient opportunity for BSN alignment at the first sop attempt and certainly at the next.
+  - Instead an internal align_sync can be defined, e.g. with period 2**c_bsn_align_w and starting at sosi.sync
+    . Per input derive align_sync = (sosi.bsn(c_bsn_align_w-1:0)=0 or sosi.sync) and sosi.sop
+    . If a packet is lost during a align_sync interval, then the remaining packets in that sync interval are also
+      lost, but the output can recover at the next sync interval.
+    . At the end of each align_sync interval go via s_align, to avoid having to check for BSN wrap and to reconfirm
+      the that all enabled inputs still have the same BSN, also at the sosi.sync
+    . The align_sync interval does not fit in the sosi.sync interval due to 195312.5. This can be coped with by
+      going via s_align first in case in s_sop the sop is there.
+      
+      align_bsn    0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0           correct input
+      align_sync   1               1               1
+      
+      align_bsn    0 1 2 - 4 5 6 7 0 1 2 3 4 5 6 7 0           lost packet at bsn = 3, detected by increment 2 /= 1 in s_sop
+      align_sync   1               1               1
+                   o o o x x x x x o o o o o o o o o           recover output in next align_sync interval
+      
+      align_bsn    0 1 2 3 4 5 6 - 0 1 2 3 4 5 6 7 0           lost packet at bsn = 7, detected by any align_sync in s_sop
+      align_sync   1               1               1
+                   o o o o o o o x o o o o o o o o o           recover output in next align_sync interval by all align_sync in s_align
+                   
+      align_bsn    0 1 2 3 4 5 6 7 - 1 2 3 4 5 6 7 0           lost packet at bsn = 0 and sync, detected by any align_sync in s_sop
+      align_sync   1               -               1
+                   o o o o o o o o x x x x x x x x o           recover output in next align_sync interval by all align_sync in s_align
+                   
+      align_bsn    0 1 2 3 - - - - - 1 2 3 4 5 6 7 0           lost packet at bsn = 4,5,6,7,0 detected by align timeout in s_sop
+      align_sync   1               1               1
+                   o o o o x x x x x x x x x x x x o           recover output in next align_sync interval
+
+    - Minimum number of blocks per sync interval.
+      . the align_sync interval has n = 2**c_bsn_align_w and must be > 2*g_bsn_latency
+      . n=1 block per sync interval:
+        If one packet gets lost then packets from different sync intervals get aligned, as indicated by 1'. This
+        is not detected because the bsn always remains 0 (so no increment) and if the align timeout is based on
+        maximum number of packets in any input FIFO, then the maximum = 1. the number of packets in the FIFO then
+        only becomes > 1 if two or more packets (so two sync intervals) get lost. A solution would be to
+        instead use a align timeout in number of clock cycles, so as a true timeout.
+        
+        align_bsn    0               0               0           correct input
+        align_sync   1               1'              1
+                     o               o               0
+                     
+        align_bsn    0               -               0
+        align_sync   1               -               1'          packet count <= 1: misaligned output at 1' align_sync
+                     1               1               1           packet timeout: recover output at next align_sync
+                     o               x               o           lost packet at bsn 0
+                                     <---> true timeout
+                   
+      . n=2 blocks per sync interval:
+        If one packet gets lost then the next packet will have the same BSN, so no increment. The aligner will
+        then recover at the next align_sync. If two or more packets get lost, then the increment can be 0 or 1,
+        but in any case the other input FIFO will fill with more than 2 packets, so then the align timeout in
+        s_sop will occur. The align timeout is based on > n/2 = 1 packets in the FIFO.
+      
+        align_bsn    0       1       0       1       0           correct input
+        align_sync   1               1               1
+                     o       o       o       o       o
+      
+        align_bsn    0       -       0       1       0           lost packet at bsn 1, detected by increment 0 /= 1 in s_sop
+        align_sync   1               1               1           recover output at next align_sync interval
+                     o       x       o       o       o
+
+        align_bsn    0       1       -       1       0           lost packet at bsn 0 and sync, detected by any align_sync or align timeout in s_sop
+        align_sync   1               -               1           recover output at next align_sync interval
+                     o       o       x       x       o
+
+        align_bsn    0       -       -       1       0           lost packet at bsn 1,0 and sync, detected by any align_sync or align timeout in s_sop
+        align_sync   1               -               1           recover output at next align_sync interval
+                     o       x       x       o       o
+      
+      . n=3 blocks per sync interval:
+        If one packet gets lost then the other input will get 2 packets, which is more than n/2 = 1. The aligner will
+        then recover at the next align_sync.
+      
+        align_bsn    0    1    2     0    1    2     0           correct input
+        align_sync   1               1               1
+                     o    o    o     o    o    o     o
+                                                 
+        align_bsn    0    -    2     0    1    2     0           lost packet at bsn 1, detected by increment 0 /= 1 in s_sop
+        align_sync   1               1               1           recover output at next align_sync interval
+                     o    x    x     o    o    o     o
+                                                 
+        align_bsn    0    1    -     0    1    2     0           lost packet at bsn 2, detected by increment 0 /= 1 or by any align_sync in s_sop
+        align_sync   1               1               1           recover output at next align_sync interval
+                     o    o    x     o    o    o     o
+                     
+        align_bsn    0    1    2     -    1    2     0           lost packet at bsn 0 and sync, detected by any align_sync or align timeout in s_sop
+        align_sync   1               -               1           recover output at next align_sync interval
+                     o    o    o     x    x    x     o
+                   
+        align_bsn    0    1    2     0    1          0
+        align_sync   1               1               1           recover output at next align_sync interval
+                     o    o    o     x    o    o     o           lost packet at bsn 2
+      
+    . Lost packets are detected by:
+      - idle input                : check align timeout in s_align and in s_sop 
+      - active inputs at bsn 1-max: check bsn increment /= 1 per input in s_sop
+      - active inputs at bsn 0    : check any align_sync in s_sop and then all align_sync in s_align
+      
+      The idle input is detected by the timeout. The active inputs are checked by the bsn increment, but at the
+      align_sync the bsn wraps to 0, so then the active inputs are checked by the all align_sync. The initial
+      alignment was achieved starting with empty input FIFOs, so the align_sync and all subsequent align_sync
+      ensure that all enabled inputs have the same BSN. 
+      
+      define:
+      . align timeout > g_bsn_latency, to ensure that the maximum latency difference between inputs in number
+        of packets can still be aligned
+      . align_sync interval must be > g_bsn_latency, to ensure that align_sync correspond to same BSN on all inputs
+      . choose align timout > g_bsn_latency and choose align_sync interval n = 2**c_bsn_align_w bsn slots, where
+        c_bsn_align_w = ceil_log2(align timeout).
+        However n > g_bsn_latency is sufficient, so n does not have to be a power of 2, but it is convenient to
+        use a power of 2.
+      . input FIFO size > align timeout, to fit align timeout number of packets
+      . each input may have different g_bsn_latency, so then each input also has different align timeout, align_sync
+        interval and input FIFO size.
+      
+    . State machine using input FIFOs
+      To initially align or to realign the input FIFOs are read empty in s_xoff. In s_xoff it is also possible
+      to change the static input enable/disable control.
+      Then in s_align the sync aligner waits for the align_sync on all enabled inputs or an align timeout. The
+      align_sync period can be much less then the sync period to ensure quick realignment. The align_sync period
+      and align timeout must be larger than the maximum possible BSN latency difference between any two inputs.
+      In this way the all align_sync condition in s_align ensures that all inputs then have the same BSN,
+      without explicitly having to check that they all have the same BSN. There is simply not enough memory in
+      the system that could cause two inputs to be have align_sync within the BSN latency that do not correspond
+      to the same instant. If the align timeout occurs then restart acquiring alignment on the next align_sync
+      in s_xoff.
+      The alignment of the input packets at the align_sync is always checked in s_align. If all input still have
+      align_sync active then the packets can be output. Otherwise if not all packets have align_sync, then the
+      after some time the align_sync timeout will occur.
+      The alignment of the input packets after the align_sync is checked in s_sop. If s_sop detects an align_sync 
+      on any input then it diverts the further control to s_align. By checking any align_sync both the normal case,
+      where all inputs have an align_sync, and the lost packet case where only some inputs have an align_sync are
+      covered. Otherwise if all enabled input BSN incremented by one, then this means that all inputs still have
+      the same BSN, because they had so at the previous align_sync. If an input has a wrong BSN increment, then 
+      try to recover at the next align_sync in s_align. If an input has no data then the align timeout occurs 
+      and try to recover via s_xoff.
+      The actual output of the aligned packets is done per packet in s_output. The BSN of any of the enabled inputs 
+      is used as output BSN for all outputs. At the end of the packet the next packet is waited for in s_sop, or
+      if a M&C event for the static input enable/disable control occured then this is handled in s_xoff.
+    
+      s_xoff
+          accept static and dynamic input enable/disable control
+          flush all inputs until they are empty --> s_align
+              # flush by reading the FIFOs empty and by dropping new input
+      s_align
+          wait for align_sync on all enabled inputs 
+          if all enabled inputs have align_sync --> s_output
+              # the align_sync garantuees that all inputs have the same BSN, so
+              # use the BSN of one of the enabled input streams for all output streams
+          elsif align timeout --> s_xoff
+              # the align times out when one or more enabled input FIFO contains more than g_bsn_latency packets,
+              # the input FIFOs are too full to recover via s_align, therefore recover via s_xoff
+      s_sop
+          wait for sop on all enabled inputs 
+          if any enabled input has a align_sync --> s_align
+              # Always at align_sync do output via s_align, this also covers the sosi.sync that can occur out of order
+              # check for any input having a align_sync, to detect lost packet at end of align_sync interval.
+              # if all enabled inputs have align_sync then s_align will continue output, else s_align will try
+              # to recover at next align_sync.
+          elsif all enabled inputs have sop:
+              # Within align_sync interval do output via s_sop
+              if no input lost a packet, so all enabled input BSN incremented by 1 --> s_output
+                  # all BSN are still the same because for all enabled inputs the BSN incremented by 1, so
+                  # use the BSN of one of the enabled input streams for all output streams
+              else --> s_align
+                  # out of order align_bsn so there was a lost packet, try to recover at next align_sync
+          elsif align timeout --> s_xoff
+              # the align times out when one or more enabled input FIFO contains more than g_bsn_latency packets,
+              # the input FIFOs are too full to recover via s_align, therefore recover via s_xoff
+      s_output
+          output one block
+          if at end of block:
+              if input control event --> s_xoff
+              else --> s_sop
+      
+
+. Comparison of align_sync scheme and BSN max/min scheme:
+  - Both schemes are limited by the fact that the align_bsn wraps if c_bsn_align_w < 32 b
+  - Deriving BSN max and min for g_nof_input inputs each with c_bsn_align_w bits per BSN requires much logic in case
+    of many inputs, and thus pipelining to achieve timing closure. Comparison of BSN increment per signal input and
+    combining these results between inputs takes less logic.
+  - If one remote packets gets lost, then output for all inputs is lost, because there is no time to output them still.
+    . with BSN max/min scheme the output may recover at the next packet (if not it did not occur near the wrap of the
+      align BSN), because the other inputs will flush one packet.
+    . with align_sync scheme the output will recover at the next align_sync, and thus mis the remaining packets in that
+      align_sync interval.
+  
+
+Design decisions:
+
+. Probably either circular buffer memory or FIFOs is suitable. For circular buffer the BSN fraction is used as slot
+  index and for the FIFO the BSN index needs to be passed on through the FIFO to compare pending inputs:
+. Support number of inputs >= 2
+. Treat all inputs equal, so no special role for a local input
+  - suits more general usage
+. Use local reference to drive the output block rate:
+  - adds somewhat more latency then using remote input to drive the output, but is necessary avoid extra loss in case
+    of lost packets and to support filler output
+. Support flow control
+  - to smoothen bursts (only an issue with remote drive output)
+  - to provide output throttling (requires output FIFOs or data blocks that have sufficient gaps)
+. Use sosi.sync and sosi.bsn(c_bsn_align_w-1:0) to align BSN
+  - using c_bsn_align_w much smaller than 32 b saves logic and thus eases timing closure
+  - if all enabled input BSN are equal then output
+. Use the align_sync scheme
+  - enable inputs will only be output if they all contain valid data, so packet loss on 
+    a single input will also briefly stop output for all inputs
+  - disabled inputs are output with zero or flagged data
+. Optionally support local input reference that can be used to drive the output BSN, instead of having
+  to use the BSN from the enabled inputs.
+. Optionally support dynamic input enable/disable based on expected number of packets per sync interval to avoid that one 
+  failing remote input causes all outputs to stop. This would be needed for APERTIF correlator input.
+. No need for artifical local block size (like in dp_bsn_align.vhd), because thanks to the CRC checking only 
+  correct packets (content and size) can enter the BSN aligner. Therefore any active input can drive the output.
+. Support static input enable/disable via M&C
+. Support dynamic input enable/disable based on whether the input had lost packets in the previous one or more sync interval.
+  - Maintain packet count per input per sync interval
+. Flagging:
+  - Static disabled inputs carry zero data
+  - Dynamically disabled inputs carry flagged data, using most negative real as flag and imag = 0.
+
+
+*******************************************************************************
+* Rx input status:
+*******************************************************************************
+
+* Existing components:
+  - RSP rad_frame_status of the previous PPS sync interval: 
+    . rx_cnt:   18 bits, number Rx frames
+    . brc   :    1 bit,  0 if no Rx frames with CRC error, 1 if >= 1 Rx frames had a CRC error
+    . sync  :    1 bit,  1 if the frame with Rx sync was detected, else 0
+    . align :    1 bit,  1 if all frames aligned OK, else 0
+  
+  - RSP rad_latency:
+    . rx_latency : 16 bit, stores an internal count value when the Rx sync is detected. The internal count
+                           restarts at the PPS sync. This measures the latency in clock cycles.
+                           
+  - APERTIF dp_bsn_monitor
+    . mon_sync_timeout        = '1' when the Rx sync did not occur within 200M cycles since last Rx sync    ~= sync
+    . mon_ready_stable        = '1' when ready was always '1' during last Rx sync interval
+    . mon_xon_stable          = '1' when xon   was always '1' during last Rx sync interval
+    . mon_bsn_at_sync         = BSN at Rx sync
+    . mon_nof_sop             = number of sop during last Rx sync interval             = rx_cnt
+    . mon_nof_err             = number of err at eop during last Rx sync interval     ~= brc
+    . mon_nof_valid           = number of valid during last Rx sync interval
+    . mon_bsn_first           = BSN at first Rx sync     --> not useful
+    . mon_bsn_first_cycle_cnt = latency at first Rx sync --> should use every Rx sync like on RSP
+  
+    ==> Reuse dp_bsn_monitor with improvements:
+    . Monitor the packets per sync interval using Rx sync. This is more precise then using the PPS sync. 
+      The Rx sync based values are only valid if mon_sync_timeout = 0.
+    . Remove mon_bsn_first and mon_bsn_first_cycle_cnt.
+    . Add mon_latency, use PPS sync like in RSP to measure the latency between PPS sync and Rx sync in
+      number of clock cycles.
+      
+
+  
+*******************************************************************************
+* Reorder 
+*******************************************************************************
+    . Page swap (needed for TB)
+    . Variable output size
+
+    
+*******************************************************************************
+* Fill FIFO
+*******************************************************************************
+  . based on packet in FIFO instead of number of data in FIFO
+  . no data left in FIFO when flushed via siso.xon
+
+
+*******************************************************************************
+* Xon_off
+*******************************************************************************
+  . sosi.xon/off provides flow control at block level
+
+Flow control is between end nodes
+Congestion control is within the network
+
+TCP uses a sliding window and is preferred over Ethernet flow control
+
+Ethernet flow control
+  . pause frames are part of IEEE 802.3x and halt all input
+  . pause time is in units of 512 bits and a 16 bit integer
+  . IEEE 802.1 Qbb provides priority-base flow control using pause frames per class of service 
+  . pause frames are only exchanged between two directly connected ports, so not through a switch
+  . not possible for a switch to pause fast 1G inputs that all send to a slow 100M output, because
+    then one output can pause all inputs
+  . more intended for a slow NIC
+
+
+Other:
+- Compare streaming axi4, avalon and DP
+- Synchronous global reset
+- Flush FIFO by resetting it
+- RL 0 development article and automatic pipelining tools
diff --git a/applications/lofar2/doc/prestudy/station2_sdp_icd.txt b/applications/lofar2/doc/prestudy/station2_sdp_icd.txt
new file mode 100644
index 0000000000000000000000000000000000000000..541cd7b5907c7246f51099c3059f595f86c146db
--- /dev/null
+++ b/applications/lofar2/doc/prestudy/station2_sdp_icd.txt
@@ -0,0 +1,157 @@
+ICD interface types:
+    m - Mechanical (structural, loading, tooling, etc)
+    f - Fluid (pneumatic, cooling, heating, condensate, fuels, lubricants, waste, exhaust, feedstocks etc)
+    t - Thermal (cooling, heating, heatsinking, etc )
+    em - Electromagnetic (DC field, RF, etc)
+    o - Optical (numerical aperture, focal position, etc)
+    p- Electrical (i.e. conducted power)
+    e - Electronic (i.e. conducted signals or data)
+    eo - Electro-optical (generally signals or data)
+    d - Data exchange specifications (protocol stack)
+    h - Human-Machine Interface (special combination of some of the above) 
+
+
+UDP link control
+- flow control = end-to-end
+- congestion control = peer-to-peer within the network
+  . reliable transmission, at fair rate, with high resource utilization
+  . implemented in network layer
+  . also called transport protocol --> TCP ++, UDP -- (selfish protocol, low delay)
+- ARP
+  . Tx ARP request
+- UDP/IPv4
+  . UDP checksum (not used in LOFAR1)
+  
+> nslookup <hostname> # e.g. <astron.nl> to find IP address
+> sudo arp
+> ping <IP address>  # to find MAC address for IP address ?
+
+
+LFAA-CSP_Low : OSI (Open Systems Interconnection) layers
+
+7 Application  : Not applicable, this is the level where the STAT and CEP products each perform their
+                 allocated functions.
+6 Presentation :
+  - SPEAD header
+    header first word:
+      magic = 0x53 ='S' 8b, version = 0x4 8b, itemPointerWidth = 0x2 8b, HeapAddrWidth = 0x0 8b, rsvd=0 16b,
+      number of items = 0x8 16b
+
+    header items:
+      heap_counter     = coarse channel number (1-511) 16b, packet counter 32b # restart at 0 for new
+                         observation, 2k samples per packet --> packet counter wraps after few days
+      pkt_len          = packet payload length 48b
+      sync_time        = unix_epoch_time [s] 48b # last time system was syncrhonised by PPS in seconds since 1 Jan 1970
+      timestamp        = timestamp [ns] 48b      # time of center of first sample in packet since sync_time in
+                                                   ADC sample periods of 1.25 ns
+      center_freq      = frequency [Hz] 48b      # center frequency of coarse channel (1-511) * 781250 in Hz
+      csp_channel_info = rsvd 16b, beam_id 16b, freq_id 16b
+      csp_antenna_info = substation_id (1-512) 8b, subarray_id (1-16) 8b, station_id 16b, nof_contributing_antenna
+                         (typ. 256) 16b
+      sample_offset    = payload_offset = 0x0
+
+    data
+      - 1 beam, 1 coarse channel
+      - sampling period is 1.25 ns * 1024 * 27/32 = 1080 ns
+      - 8 bit complex coarse channel samples
+      - Xre, Xim, Yre, Yim = 32b
+      - samples are in strict time order
+      - 2's complement
+      - most negative value -128 indicates error
+          
+5 Session   : Controls connections (start, manage, terminate)
+  - SPEAD header 
+4 Transport : Flow control, error recovery, retransmission
+  - UDP [RFC 768]
+  - The peak data rate on a link shall be no more than 20% (TBC) above the average data rate
+3 Network   : addressing, routing
+  -  IPv4 Internet Protocol
+2 Data link : link between two nodes
+  - Ethernet standard [IEEE Std 802.3-2015], 40 GbE
+1 Physical  :
+  - Ethernet standard [IEEE Std 802.3-2015], 40 GbE
+      
+
+L1 ICD 11109 : STAT - CEP
+ . Beamlet data
+ . Transient buffer read out
+ 
+Not included:
+ . SST, BST, XST, because these are for monitoring and calibration, not for science data
+ . Subband offload for AARTFAAC2.0 will have own EICD
+
+STAT-CEP Beamlet data interface:
+
+- VERSION_ID 8b
+  . 2,3,4 for LOFAR1
+  . 5 first for LOFAR2.0
+  
+- SOURCE_INFO 16b
+  . 2b Array ID (core station 1 LBA, 2 HBA, ...)
+  . 1b f_adc = 200 MHz, 160 MHz
+  . 1b critically PFB, oversampled PFB (or p, q for R_os = p/q)
+  . 4b beamlet width in number of bits (default 8 for W_beamlet = 8 bit, instead of BM = beamlet mode)
+  . 5b UniBoard2 FPGA id (16 FPGAs for LBA, 16 for HBA in International Station, instead of RSP ID)
+  . ==> Also beamlet scale setting
+  . ==> Number of antenna in beam (core, LBA, HBA inner to make HBA international look like HBA remote)
+  
+- CONFIGURATION_ID 8b (used in LOFAR1? intended to refer to the parset that defines this observation)
+  ==> observation ID 32b
+
+- STATION_ID 16b (idem as LOFAR1)
+  ==> or 8b because there are only ~50 stations
+
+- One packet per range of Station beamlets out of 488 beamlets
+  . Full band : S_sub_bf * W_beamlet * N_complex / W_byte = 488 * 8b * 2 / 8b = 976 octets
+  . NOF_BEAMLETS_PER_BANK not needed anymore
+  . nof_streams = Number of beamlet streams
+    - Separate destination address per stream
+    - LOFAR1 supports 4 streams
+    - LOFAR2.0 preferrably supports >> 4 streams
+      - beamlet_id to identify start beamlet in stream (provides more info than a stream ID)
+      - NOF_BEAMLETS_PER_BLOCK to identify range of beamlets from beamlet_id
+      - LOFAR1: beamlet_id = 0 and NOF_BEAMLETS_PER_BLOCK = 61 (dual pol beamlets, 4 streams):
+  
+- NOF_BLOCKS 16b in payload
+  . Multiple beamlet time slots in one packet to increase payload efficiency.
+  . For W_beamlet = 8 bit there can be maximum 9 blocks per payload (9 * 976 = 8784 octets < 9000)
+  . With nof_streams >> 4 the NOF_BLOCKS can become larger, therefore use 16b. For example:
+    - NOF_BEAMLETS_PER_BLOCK = S_sub_bf / nof_streams = 488 / 32 = 16
+    - NOF_BEAMLETS_PER_BLOCK * W_beamlet * N_complex / W_byte = 16 * 8b * 2 / 8b = 32 octets
+    - 9000 / 32 = 281 > 256 --> use 16b for NOF_BLOCKS
+    - nof_streams = 22 destination nodes, each with 8k Byte payload, possibly a double buffer:
+      22 * 8 kByte * 2 = 352 kByte = 176 BRAM (1 BRAM = 2 kByte, FPGA has 2713 BRAM)
+    - 488 / 22 = 22.18, so 488 = 4 * 23 + 18 * 22
+  . Only send correct data to CEP (so no need for SOURCE_INFO/payload error bit).
+  . How to handle blocks that got lost within the Station?
+
+- TIMESTAMP 64b (instead of 32b seconds TIMESTAMP and 32b BLOCK_SEQUENCE_NUMBER within second)
+  . A 64 bit timestamp in 0.2 ns resolution since t_base = 1970 for first block in payload:
+    - to fit both T_adc = 5 ns and 6.4 ns
+    - for 116 year span since t_base = 1970 --> 2086
+
+- BLOCK_PERIOD 16b
+  . bit block period in 0.2 ns resolution
+  . 2**16 * 0.2 ns = 13.1 us block period (block rate > 76 kHz) fits T_sub
+  
+- BSN 64b
+  . Block sequence number since t_base = 1970 of first block in payload, increments by 1 for every block
+  . Used to detect lost blocks and to align blocks from different stations
+  
+
+- TX_PACKET_COUNT 32b
+  ==> Not useful, because then CEP needs to count Rx packets. Better send filler packets to keep the
+      packet rate at the nominal rate, so that any packet loss is due to the Network and already 
+      clear at OSI 2 layer using lower level tools like Wireshark.
+  . OSI transport layer 4
+  . Per stream
+  . Started at Station power up, increments by 1 for every transmitted packet.
+  . To allow CEP to recognize packets that got lost on the Network, from data blocks that got lost
+    in the Station ring or packets that were not send because the output was disabled.
+  . Only transmit packets that have continous blocks / allow varying number of blocks per packet
+    in case a block is lost on the ring.
+
+- Data 
+  . X, Y paired dual polarization beamlets
+  
+ 
\ No newline at end of file
diff --git a/applications/lofar2/doc/prestudy/station2_sdp_m_and_c.txt b/applications/lofar2/doc/prestudy/station2_sdp_m_and_c.txt
new file mode 100644
index 0000000000000000000000000000000000000000..eaf504b262098fe26e103f193416d3251b4bb7f3
--- /dev/null
+++ b/applications/lofar2/doc/prestudy/station2_sdp_m_and_c.txt
@@ -0,0 +1,179 @@
+*******************************************************************************
+* Station Control software:
+*******************************************************************************
+
+The Station contains hardware and software devices that deliver the functionality of the application
+[4.1.2.1]. The Station Control software consist of Control and M&C. The Control determines the behaviour
+of the devices in time. Via the M&C the Control can control the devices and monitor them. The M&C uses
+a standard software interface for the Control to access the devices. For the Station M&C the M&C will 
+use OPC-UA as standard M&C access interface for all devices [4.1.2.2]. Only in certain case there can
+be an exception to not use OPC-UA [4.1.2.3.2]
+
+The M&C system is an abstraction layer between the high level software of the Control and the low level
+software or firmware in the devices [4.1.2.3]. The M&C will use the master-slave pattern to monitor a
+device,so the device will only provide monitoring information on request and never by itself. In this 
+way the control and monitoring traffic are independent. If the device performs a certain task, then it
+may provide a monitoring point that allows the master to monitor the progress. Only for low latency
+events that originate in the device it may be necessary to use the publish-subscribe pattern, whereby
+the slave self-generates an event message.
+
+
+*******************************************************************************
+* M&C of SDP firmware
+*******************************************************************************
+
+For the M&C of the SDP firmware that runs on the array of FPGAs on the UniBoard2s there will be an
+SDP converter/bridge that translates between the FPGA memory map and OPC-UA [4.1.2.3.1]. Using ARGS
+it may be possible to generate the device specific parts of the bridge software, because the number
+of FPGAs and all register fields in the FPGA memory map are known [4.1.2.5.1].
+
+  
+  
+*******************************************************************************
+* Monitoring interval
+*******************************************************************************
+In LOFAR1 the M&C that is supported by the FPGA firmware has two flavors:
+
+- Asynchronous (immediate)
+  . C the data point values are applied upon arrival of the request message.
+  . M the data point values are reported upon arrival of the request message.
+- Synchronous (fixed at the PPS grid):
+  . C: the data point values are applied in the next PPS period
+  . M: the data point values of the previous period are reported in this PPS period.
+
+The asynchronous M&C is suitable if the data point value is static or if its precise timing does not 
+have to be more accurate than what the M&C can achieve (order of 10 ms). The synchronous M&C is
+suitable for data point values that need sample period accurate timing within one FPGA or between
+FPGAs in parallel. The synchronous M&C can be for a single PPS instant or for every PPS instant.
+    
+
+- Use fixed internal sync aligned to PPS
+  . In LOFAR1 and APERTIF the sync period is used as fixed update interval for periodic monitoring,
+    periodic control (the beamformer weights) and periodic integration intervals (AST, SST, BST and 
+    XST).
+  . The advantage of a fixed update interval is that it is well defined and does not need control.
+    This can also be a disadvantage because a fixed interval is inflexible and cannot be controlled
+    by the SCU. Probably only for the XST this flexibility is nice to have.
+  . With a fixed interval the monitored information may only reflect what happened during the previous
+    period. Therefore if the monitoring has to be without gaps in time then the SCU needs to monitor
+    and aggregate the information at every period. Using a configurable period this aggregation in the 
+    SCU can be avoided.
+  . SCU must read the statistics in second between two PPS (with some 10 ms margin). This is feasible
+    but a strict grid.
+  . If the SCU reads at arbitrary time, then part of the read values may apply to this second and some
+    to the previous second. For most monitoring this is no problem. If necessary the SCU can wait for
+    PPS and then read the monitoring to ensure that it relates to the same interval on all FPGAs.
+    
+- Use single event BSN timestamp scheduler
+  . Gemini M&C protocol does not have timestamp activated control yet, therefore use separate BSN scheduler
+    control point.
+  . SCU can read the statistics after the scheduled BSN
+  . The next integration lasts until the next scheduled BSN
+  . The programmable interval allows arbitrary intergration intervals, which avoid the need for the
+    SCU to intergrate 1 s intervals in case longer intervals are needed.
+  . The SCU can then scale the statistics result based on the actual integration period of each
+    measured interval, while the intervals are still all without gaps.
+  . Dependent on the speed of the SCU it can use shorter integration intervals, by scheduling the next
+    BSN as soon as it has finished reading the statistics from the previous interval
+  . The BSN scheduler should also provide a monitoring value for the integration interval, i.e. the
+    number of block periods since the previous scheduled BSN.
+  . If the schedule interval is too long then the statistics and monitoring counts may overflow.
+    The values should then clip and not wrap, to show that they overflowed.
+    
+- Use periodic event timestamp scheduler.
+  . Control: The period interval is defined by a start time and a period time. If the period time is -1
+    then the period scheduler acts as a single event scheduler.
+  . Monitor: The periodic scheduler can report current time at when read, time at last event, time at
+    next event (or -1 for no scheduled event) and deltas cur - prev and next - cur.
+  . A periodic event only needs to be setup once by the SCU. The setup can be changed at any time.
+  . The BSN cannot be used directly, because the PPS grid does not always fit the BSN grid. Therefore use
+    the 64 bit timestamp with 0.2 ns resolution to schedule the start time and the period. The event will
+    occur at the BSN slot that is at or directly after the event time.
+  . Default after power up the start time of the timestamp scheduler starts at the PPS using the initial
+    BSN. The default period is 1 s, so 5000000000 [0.2 ns]. In this way the periodic scheduler behaves 
+    similal as the PPS driven sync interval in LOFAR1.
+  . Using the 64 bit timestamp with 0.2 ns is more clear than using a BSN scheduler with fractional BSN
+    period control
+  . For short integration intervals the SCU may not be able to keep up. It is more robust to allow a
+    short but not necessarily constant integration interval, which is known via the monitoring point.
+    Instead of the periodic scheduler the SCU then schedules a new event after it has finished reading
+    the mointoring data from the previous event.
+
+Behaviour of the data points:
+- Asynchronous:
+  . Only clear data points on control write access, so not as side effect of a monitor read access
+- Synchronous:
+  . Dual page data points swap or shift page at a synchronous event, to provide a precisely timed
+    and stable data value that can be written for control before the event or read for monitor after
+    the event.
+
+- Apertif MM registers
+  . Async :
+    - ETH control and status
+    - WDI
+    - UNB_SENS
+    - COMMON_PULSE_DELAY
+    - ADC_QUAD
+    - FIL_COEFS  
+    - SS_REORDER
+    - DIAGNOSTICS_BACK       counts clear after dedicated write access
+    - TR_NONBONDED_BACK
+    - DP_RAM_FROM_MM
+    - BF_WEIGHTS
+    - DP_PKT_MERGE
+    - DP_SPLIT
+    - DP_SWITCH
+    - DP_SYNC_CHECKER        side effect counts clear when read
+    - DP_BSN_ALIGN_INPUT           
+    - DP_FIFO_FILL                 
+    - DP_XONOFF_OUTPUT             
+    - DP_OFFLOAD_RX_HDR_DAT        
+    - DP_OFFLOAD_TX_HDR_DAT        
+    - DPMM_CTRL                    
+    - DPMM_DATA                    
+    - MMDP_CTRL                    
+    - MMDP_DATA                    
+    - IO_DDR          
+    - DP_XONOFF_OUTPUT 
+    - DP_OFFLOAD_TX        
+    - TR_XAUI         
+    - MDIO_0                         
+    - TR_10GBE        
+    - EPCS                         
+    - REMU                         
+    
+  . Async, restart immediate after last write
+    Sync, restart by external sync from PPS, BSN scheduler, PPS after write, or
+    - I2C master
+    - DIAG_WG
+    - DIAG_BG
+    - DP_SHIFTRAM
+    - BSN_SOURCE
+    
+  . Sync, generate single event at BSN
+    - BSN_SCHEDULER_WG
+    
+  . Sync, single page, periodic event latch value at every sosi.sync
+    - ADUH_MON (mean, sum)
+    - BSN_MONITOR
+    
+  . Sync, single page, periodic event store values at every sosi.sync, or
+    Async store data after last read
+    - ADUH_MON (buffer)
+    - DIAG_DATA_BUFFER
+              
+  . Sync, dual page monitor, periodic event latch sum values and restart integration at every sosi.sync
+    - ST_SST
+    
+  . SYnc, dual page control, periodic event page swap at sync when last value was written (so only then swap)
+    - DP_FRINGE_STOP_OFFSET
+                          
+                               
+
+                                 
+Conclusion:
+- Identify casue of error preferrably via a single monitoring point
+- With proper monitoring no test time is needed
+- Support writing status fields in a test mpd for SW - FW interface testing 
+- Use 1 s sync interval of PPS to time period M&C events for all. Optionally support a local BSN scheduler
+  for the XST.
diff --git a/applications/lofar2/doc/prestudy/station2_sdp_ring.txt b/applications/lofar2/doc/prestudy/station2_sdp_ring.txt
new file mode 100644
index 0000000000000000000000000000000000000000..b484c32f0ece90b0c77debf5c079ce623ce21ddb
--- /dev/null
+++ b/applications/lofar2/doc/prestudy/station2_sdp_ring.txt
@@ -0,0 +1,755 @@
+Detailed design: RING
+
+
+*******************************************************************************
+* Data format
+*******************************************************************************
+
+Support for oversampled subband filterbank
+The oversampling increases the processing rate and data rate by a factor R_os. Typical R_os are 32/28 = 1.142, 
+32/27 = 1.185, 32/26 = 1.231, 32/25 = 1.28, 32/24 = 1.333. Assume R_os <= 1.28.
+
+Assume the processing for critically sampled filterbank runs at 200 MHz and for oversampled subbands it will run at
+R_os * 200 MHz. For R_os = 1.28 this requires processing at >= 256 MHz. In this way if the processing fits for the
+critically sampled subbands, then it will also fit for the oversampled subbands.
+
+The IO data rate on the ring increases with the oversampling factor R_os.  For oversampled data the ring 10GbE has
+the full 10 Gbps capacity and for critically sampled data the effective ring capacity becomes 10G / R_os = 
+10G / 1.28 = 7.8125 Gbps. The aim is to be able to replace the critically sampled filterbank by an oversampled
+filterbank without having to change other parts in the design. Therefore assume that the ring capacity for the
+critically sampled data is restricted to 7.8125 Gbps. The alternative to use full ring capacity for critically 
+sampled data and then support less (S_sub_bf / R_os = 488 / 1.28) beamlets for oversampled data is not compliant
+with the requirement of S_sub_bf = 488.
+
+Design descision: Support S_sub_bf = 488 also for maximum R_os = 1.28.
+
+
+W_beamlet_sum
+LOFAR 1.0 had 24 bit for 16 bit beamlet mode and 12 bit for 8 bit beamlet mode. LOFAR 2.0 will only support 8 bit.
+Using W_beamlet_sum = 18 bit provides 5 bits more dynamic range for 8 bit beamlet mode, which is sufficient to
+detect overflow. Using W_beamlet_sum = 18 bit also fits the input data width of the FPGA hard core multipliers in
+the BST. Given that the signal input level is 4 bit the beamformer could round 2 LSbit to effectively achieve
+20 bit dynamic range, even for S = 1 signal input. However the same effect can also be achieved by reducing the
+beamlet weights by a factor 2**2 = 4. Choose the same W_beamlet_sum = 18 bit for both the critically sampled 
+beamlet data and the oversampled beamlet data, to avoid differences in the design. 
+The beamlet sum that is transported across the ring needs to fit on a 10GbE link. With S_sub_bf = 488 and R_os <=
+1.28 the data rate for one full band station beam is N_pol * S_sub_bf * f_sub * R_os * N_complex * W_beamlet_sum
+= 2 * 488 * 195312.5 * 1.28 * 2 * 18 = 8.784 Gbps. This leaves about 13.8 % margin for packet overhead, which is
+sufficient. Using W_beamlet_sum = 18 bit fits the input data width of the FPGA hard core multipliers and also 
+provides sufficent dynamic range to scale the final beamlet sum to W_beamlet = 8 bit for output.
+
+Design descision: W_beamlet_sum = 18 bit for both critically sampled beamlet and oversampled beamlets
+
+
+*******************************************************************************
+* Ring function
+*******************************************************************************
+
+Ring transceiver medium access (MAC):
+Use Ethernet per transceiver link.The Ethernet MAC provides link establishment, so it uses a full duplex transceiver. The
+Ethernet packet header contains destination MAC address, source MAC address and Ethernet type. The Ethernet packet tail
+contains a CRC. The CRC provides data error detection. No need to use UDP/IP and ARP, because the links in the ring are
+point to point and will not be used in a network. The Ethernet fields can be used as:
+ - Destination MAC = destination PN index
+ - Source MAC = source PN index
+ - Ethernet type = packet type
+Design decision: Use Ethernet for the ring transceiver links
+
+Ring application packet types:
+The ring is used for the following application packet types:
+
+- 0x10FB for beamlets,
+- 0x10FC for crosslets,
+- 0x10FD for subband offload,
+- 0x10FE for transient buffer read out
+
+The packet type information can be transported via the Ethernet type field or via an UDP port number. If each link
+is only used for one kind of packet type, then the packet type is only used for information, because the PN
+already knows the packet type. The packet type value is based on packet types that were defined in RSP, where
+0x10FA was used to identify M&C data (0x10FA ~= LOFAR) and the other type values just increment the 0x10FA value.
+Design decision: Transport application packet type via Ethernet type field for information
+
+
+Use UDP/IP/ETH or only ETH on the ring:
+We already have a UDP offload component that supports UDP/IP/ETH, but a similar component that only supports ETH is
+easily derived from it. With an UDP the LOFAR packet type information can be transported via the UDP port field.
+Using UDP/IP makes it easier to send the data to a PC for monitoring purposes, however it is also possible to sniff
+raw Ethernet packets on a PC. Using a PC to verify the ring allows capturing large amounts of data. On an FPGA we
+can use a data buffer to sniff the packets, but only a few.
+The extra overhead of UDP = 8 octets and IP = 20, so 28 octets in total. The disadvantage of using UDP/IP is that
+it adds some extra traffic overhead and uses some extra logic resources, but that could be acceptable. The
+disadvantage of verifying the ring using a PC are:
+- between FPGAs on the same UniBoard the ring can only be observed on the FPGA
+- the ring will only connect FPGAs in the application, so using a PC is a side track that as such may cause extra
+  work.
+Using UDP/IP does not make it possible to replace the ring by a switch without modifications, so changing from a
+ring based design to a switch based design will still imply a redesign of the data transport scheme.
+Design decision: Use raw Ethernet and verification on FPGA, because that fits the ring (especially between FPGAs
+                 on UniBoard2) and avoids the extra overhead of UDP/IP.
+
+Ring application header:
+The packet payload needs to have an application header to carry the timestamp and a stream identifier. This
+information can be tranported via the DP packet header which has a BSN field and a channel field. The BSN is the
+timestamp. The channel field can carry the source PN index and destination PN index. These PN indices are also
+available in the ETH source and destination MAC addresses of ETH encoded packets, but they also need to be
+available in ETH decoded packets. In ETH encoded packets the destination MAC address allow direct pass on of
+transit packets on the ring, without having to ETH decode them. In ETH decoded packets the BSN and channel fields
+can be passed along inside the encoded DP packet or in parallel with the decoded DP packet application data. The
+channel information can be used to process the remote packets in parallel e.g. per source PN index.
+
+
+
+What is the ETH packet overhead?
+The ETH packet overhead consists of:
+. Add  8 octets (c_network_eth_preamble_len) for Ethernet preamble
+. Add 14 octets for the ETH header that contains destination MAC (6), source MAC (6) and Ethernet type (2)
+. Add  2 octets to pad the ETH header to align to 8 byte word boundary
+. Add  4 octets for CRC
+. Add 12 octets (c_network_eth_gap_len) for Ethernet gap size between packets
+  = 8 + 14 + 2 + 4 + 12 = 40 octets
+
+
+How many transceivers are needed for the ring?
+There are four data types beamlets, crosslets, subband offload and transient buffer read out. The data loads are:
+- 488 beamlets (R_os = 1 --> W_beamlet_sum = 24 bit, R_os = 1.25 --> W_beamlet_sum = 19.2 ~= 20 bit)
+- ~10 crosslets (R_os = 1 --> 15 crosslets, R_os = 1.25 --> 12 crosslets)
+- ~    subbands (R_os = 1
+- 
+
+Choose to transport one data type packet per 10GbE link direction. 
+
+The ring can be used in both directions. The forward direction is e.g. from PN0 to 15, the backward direction is e.g.
+from PN 15 to 0. The ring uses 4 of the 12 available transceivers, to match the QSFP cable link that is needed to connect
+the ring between UniBoard2.
+
+The ring function has the following sub functions:
+- Receive packets from ring (and remove CRC field)
+- Discard incorrect packets (based on CRC)
+- Pass on transit packets (Destination MAC > PN index for forward ring, MAC < PN index for backward ring)
+- Decode packets (get packet from ring for internal use)
+- Encode packets (put internal packet onto ring)
+- Multiplex local and transit packets
+- Transmit packets onto ring
+
+
+Use 10GbE or 40GbE:
+From the low-latency Ethernet core user guides it follows that the Ethernet core with statistics registers use:
+ 10GbE core :  4300 FF,  4 M9K
+ 40GbE core : 21200 FF, 13 M20K
+The synhesis fitter results from Apertif BF and XC show that the tech_eth_10g takes about 5500 FF and 4 (BF) or 7 (XC) M9K.
+The BF MAC has no statistics, the XC MAC does have statistics.
+Hence the 40GbE core is about a factor 4 larger than the 10GbE core, so from a resource usage point of view it is does not matter
+whether we use 4  x 10GbE  or 1 x 40GbE. The advantage of 40GbE is that it can fit data rates > 10Gbps per data type stream. The
+advantage of using 10GbE is that we can use one link per data type stream and thereby avoid having to multiplex different data 
+streams onto the same 40GbE link. However some multiplexing of local packets and remote transit packets can also be needed. 
+UniBoard2 has been tested with 10GbE but not yet with 40GbE.
+The Arria10 on UniBoard2 has 1708800 FF so 1708800 / 182400 = 9.3 times more than the Stratix IV on UniBoard1. On UniBoard2 one
+10GbE interface uses maximum about 5500 / 1708800 = 0.32 % of the FF and maximum about 7 / 2713 = 0.25% of the block RAM.
+In total there will be 4 x 10GbE for the intra board ring, 4 x 10GbE for the inter board ring and 1 x 10GbE for external IO, so
+these will take about 3% of the FF and block RAM resources.
+The packet rate is f_sub = 195312.5 Hz. At 10GbE this means that the maximum packet size is 10e9/195312.5 = 6400 octets. For
+oversampled subbands the maximum packet size drops to about 6400 / 1.25 = 5120 octets. If the minimum packet size is e.g. 4000
+octets, then at 10GbE this means that the link cannot be fully used, whereas at 40GbE multiple packets will still fit. The 
+maximum packet size for 10GbE also depends on the number of packets on the ring:
+. With one packet on the ring the maximum packet size for 10GbE is 5120 (R_os = 1.25) octets,
+. With N = 16 nodes and all nodes sending to the same end node the maximum packet size is 5120/16 = 320 (R_os = 1.25) octets,
+. If the packet only needs to travel N/2 nodes then the maximum packet size is 5120/8 = 640 (R_os = 1.25) octets.
+Design descision: Assume the ring will use 4 x 10GbE, because it is known technology and suitable.
+
+
+Use one packet type per ring link.
+This avoids having to multiplex different packet types onto a single link. Still the Ethernet type can be used to fill
+in the packet type to more easily identify data on different links of the ring.
+
+Use application packets to monitore the link quality:
+This allows monitoring the link during normal operation and avoids the need to define and control a test packet (e.g.
+like ping).
+
+Wormhole routing or store-and-forward routing:
+With worm hole routing a received packet or a received and modified packet is already transmitted, while the tail of 
+the packet is still being received. The advantage of wormhole routing is that it minimizes the latency along the ring
+and therefore also local buffering to align between local and remote data. The disadvantage of wormhole routing is
+that a CRC error on the received packet needs to be propagated by forcing the CRC of the transmitted packet to be 
+wrong. This implies that all subsequent hops will show this CRC error. For link diagnoses this is confusing, because
+the subsequent links did not cause the CRC error. With store-and-forward routing a packet is first received entirely
+before it is passed on for transmit. This allows to discard a received packet with a CRC error, but does increase the
+latency on the ring. For LOFAR 2.0 choose to use store-and-forward, because it allows discarding packets with CRC
+errors when they occur and because there is sufficient internal block RAM to buffer the local data for the worst case
+ring latency.
+
+Ring latency:
+The latency of 1 hop is about 0.2 us. The time to transmit one Ethernet frame of 1500 octets at 10Gbps is about 1.2 us
+and a jumbo frame of 6400 octets takes about 5.12 us (= T_sub). Hence for packets >~ 300 octets the ring latency is
+dominated by the store and forward routing at each node. The 10GbE Ethernet MAC uses 64 bit data. At 200 MHz this can
+achieve 64 * 0.2 = 12.8 Gbps. Hence if the processing operates without data valid gaps, then the Ethernet transmit
+will not run empty during a payload. Therefore it is not necessary to use a fill FIFO, which would add to the ring 
+latency. For a packet that travels the entire ring the latency is then about (N-1) * T_sub and the corresponding 
+FIFO depth to align the local data with this remote data is (N-1) * packet size.
+
+
+Only accept correct packets:
+Discard all packets that have a CRC error. This also prevents that packets of wrong length enter the internal
+processing. The Ethernet CRC error is 32 bit, so it is very unlikely that packet with errors still has a 
+correct CRC. With wormhole routing it was necessary to limit or extend a packet to a known fixed length, because
+also packets with CRC error are passed on. With store-and-forward routing the CRC provides sufficient protection
+to ensure that only correct packets enter the application.
+
+Ring data transport schemes:
+  - beamlets on ring: l --> r+l --> r+l --> ... --> r+l
+    . on each node align two inputs: l,r
+    . output filler data if remote got lost, to preserve nominal output rate to CEP
+    
+  - crosslets on ring:  rrrrrrrr,l --> rrrrrrrr,l --> ... --> rrrrrrrr,l
+    . on each node separately align N/2 pairs of inputs l,r, have one pair per XC cell
+    or
+    . on each node first align all inputs l,N/2*r, and then split into N/2 pairs of l,r to have one pair per XC cell
+    . discard output data if remote got lost, to count number of active blocks per integration sync interval
+      or
+      output filler data if remote got lost, and use zero to not disturb the intergation and count unflagged blocks
+      to know the number of active blocks per integration sync interval
+    
+  - subbands on ring: l, rl, rrl, rrrl, ..., rrrrrrrrrrrrrrrl
+    . on final node align all l,(N-1)*r inputs
+    . output filler data if remote got lost, to preserve nominal output rate to AARTFAAC
+    
+  - transient buffer readout: l, r, r, ..., r
+    . no align, readout from one node at a time
+
+
+Ring access schemes:
+
+- 1) start node sends packet to end node, intermediate nodes modify the packet.
+- 2a) each node starts sending its packets to an end node, intermediate nodes pass on the packet
+- 2b) each node starts sending its packets to an end node, intermediate nodes pass on the packet and use the packet (= multi cast)
+
+If both scheme 1 and 2 are suitable than scheme 1 typically yields a larger payload, because it reserves slots for all
+nodes, whereas the payload for scheme 2 only contains data from one node. Scheme 1 and 2b are useful if the transit nodes
+also use or modify the packet data. Scheme 2a is suitable for packet transport from start to end node, whereby transit
+nodes only pass on the packet.
+
+For the beam former beamlets scheme 1 is most suitable. The start node prepares the packet with the initial beamlet sums.
+The subsequent nodes add there local beamlet sum to the packet beamlet sums and then pass on the packet.
+
+For the subband correlator both scheme 1 and scheme 2b are suitable. For scheme 1 the start node creates a packet with
+slots for all nodes and fills in its own slot with its crosslets. Scheme 1 was used in LOFAR 1.0. The subsequent nodes fill in
+their slots with their crosslets and also use the packets to correlate the remote crosslets with their local crosslets.
+With scheme 2b each node creates a packet with its own crosslets and sends it to N/2 nodes further. The intermediate node
+pass on the packets and use the packets to correlate the remote crosslets with their local crosslets.
+
+For the subband offload both scheme 1 and scheme 2a are suitable. For scheme 1 the start node creates a packet with slots for all
+nodes and fills in its own slot with its subbands. The subsequent nodes fill in their slots with their subbands. With scheme 2a
+each node creates a packet with its own subbands and sends it to the output end node. The other nodes only pass on the remote packets.
+
+For transient buffer read out scheme 2a is most suitable to gather the read out data from each node at the output end node.
+
+
+Ring access directions:
+All schemes can be used in two directions for the same type of data transport. In one direction the maximum number
+of hops between start and end node is N-1, while by using both directions the maximum number of hops between start
+and end node is N/2. If the data is used on all intermediate nodes, then there is no advantage to use the ring in
+both directions. If the data is only passed along by intermediate nodes, then the link capacity is used
+about a factor two more efficiently by sending data in both directions. Disadvantages of using the ring in both
+directions for the same type of data are that each node needs to decide which direction to use, that the data
+arrives from both directions at the end node, and that it is somewhat more difficult to understand and diagnose. 
+Design decision : Therefore choose to use the ring in only one direction per link.
+
+Use one link per packet type:
+For scheme 2 use only one link for all source nodes, so do not let different source nodes use different links. For
+N/2 = 8 or N = 16 the number of links would become too large. By using one link, increasing the processing becomes
+a matter of using and instantiating more links.
+
+
+Remote and local data alignment:
+In APERTIF the data arrived from >= 2 remote streams. With the LOFAR ring there is always local data that arrives
+first and needs to be aligned with only one remote data stream. The local data needs to be buffered until the remote
+data from the farthest PN has arrived. The latency on the ring is about 1 packet per transit hop, due to the store
+and forward. The first hop has negligible latency. Hence with H hops the local data buffer size needs to be (H-1) *
+local data size. When the remote data arrive the local data is popped from the buffer. It the remote data has not
+arrived in time, then the local data is popped from the buffer when the next local data is pushed into the buffer.
+
+
+*******************************************************************************
+* Beamformer
+*******************************************************************************
+
+What is the beamlet packet size?
+The beamlet sum is passed on along the ring from start PN to end PN using ring access scheme 1. At the end PN the
+final beamlet sum is scaled to W_beamlet = 8 bit and output to CEP. The intermediate beamlet sum has W_beamlet =
+18 bit and is complex. There are N_pol * S_sub_bf = 2 * 488 = 976 beamlets per packet. The payload size is
+N_pol * S_sub_bf * N_complex * W_beamlet_sum / W_byte = 2 * 488 * 2 * 18 / 8 = 4392 octets. The effective packet
+size is 40 + 4392 = 4432 octets. With f_sub = 195312.5 Hz and R_os = 1.28 the data rate is 4432 * 195312.5 * 1.28
+* 8 = 8.864 Gbps, which fits on a 10GbE link.
+
+Packet decoding and encoding:
+The start node encodes the packet and the end node decodes the packet. The intermediate nodes could operate on
+the encoded packet, however the payload beamlets are packed into bytes and are not word aligned. Therefore the
+intermediate nodes also need to decode the packet to be able to update the payload data, and then encode the 
+packet. The decode and encode function is available in any node, because all nodes run the same firmware image.
+Therefore the decoding and encoding at intermediate nodes can reuse the encoding function of the start node and
+the decode function of the end node, so no extra logic is needed.
+
+Ring adder payload processing:
+The station beam is a dual polarization beam and each beam has S_sub_bf = 488 beamlets, so in total there are 
+976 complex beamlets per subband period of N_fft = 1024 cycles @ 200 MHz. For an oversampled filterbank with
+R_os = 4/3 there are N_fft / R_os = 768 cycles @ 200 * R_os MHz. Hence to be compatible with an oversampled
+filter bank the beamformer cannot process all 976 beamlets in series, instead it has to apply ceil(R_os) = 2
+streams in parallel that each process 488 beamlets. Therefore to support the oversampled beamlets the paylaod
+needs to be encoded from and decoded to two streams of beamlets:
+
+  0 : 0 2 4 ............. 974
+  1 : 1 3 5 ............. 975
+  
+The 10Gbps data on the ring interface is available as 32 bit data at 312.5 MHz (32 * 312.5M = 10G). 
+
+Local beamlet sums FIFO size:
+The local subband data needs to be buffered until the beamlet sum arrives. The last node experiences the largest
+latency, because then the beamlet sum has travelled N-1 hops, each adding about 5888 * 8 / 10G = 4.71 us. The
+total latency for the LBA ring is (16 - 1) * 4.71 us = 70.6 us or about 14 T_sub. With some extra margin assume
+that the last N-1 or N local beamlets need to be buffered. Per PN this yields a FIFO size of N_pol * S_sub_bf *
+N * N_complex * W_subband = 2 * 488 * 16 * 2 * 18 = 562176 bit, which takes about 32 M20k block RAMs.
+
+Ring modes:
+- off
+- local
+- remote
+- combine
+With dp_bsn_align all these modes are supported by enabling/disabling the corresponding inputs.
+
+FIFO flush:
+A FIFO can be flushed by resetting it, but this requires careful control to ensure that the reset is noticed
+in both clock domains, and that the reset is applied in between input packets to avoid that only a tail
+of a packet gets into a FIFO. Therefore in LOFAR 1.0 and APERTIF a FIFO is flushed by reading the packets 
+from it until it is empty. This scheme also allows flushing per packet. The disadvantage of reading the 
+packets and the discard them, is that it takes as long as reading at full speed.
+
+Lost remote packet detection:
+Local FIFO full:
+The local FIFO needs to buffer the local data to be able to align with the remote data. The latency between
+nodes depends on the number of hops. With N = 16 nodes and store and forward packet transport the maximum
+latency will be < N * T_sub. To compensate for this latency the local FIFO needs to be able to store at most
+about N local packets. If the FIFO runs full, then this is an indicator that remote packets got lost and
+then the local FIFO needs to be flushed until it is empty.
+Rx timeout:
+The average packet rate on the ring is f_sub, so within T_sub there should arrive a new packet. If no packet
+arrives within T_sub, then the local FIFO can flush one packet. In this way the local FIFO does not need to
+be flushed until empty and less packets will get lost once the remote packets arrive again. Using Rx timeout
+does rely on that packets fit within a T_sub interval and that every T_sub interval contains at least part
+of a packet, so the actual packet rate must be close to the average packet rate.
+
+
+Remote packets:
+The remote packets drive the ring adder and are processed on arrival. The local packet with the same time stamp
+is already pending in the local beamlets FIFO. If a burst of remote packet gets lost, then the node will 
+notice this because its local beamlets keep arriving and will overflow the local beamlets FIFO. The node will
+read and discard packets from the local beamlets FIFO to make sure that the FIFO does not overflow. If only
+one or a few remote packets got lost, then the node will noticethis during the time stamp alignment, but
+only as soon as the next packet has arrived. This next packet will be ahead of the local packet, so the local
+packets need to be flushed. The node will then read and discard packets from the local beamlets FIFO until it 
+can align the remote and local data. During this realignment process the next remote packet may already arrive
+as well. Therefore the remote packet needs to be buffered, or discarded. Assume the FIFO is flushed by reading
+and then discarding packets from it. The local packets and the remote packets arrive at the same rate. If the
+flushing of the packets goes faster then reading them, because flushing can use all clock cycles. The flushing
+can only catch up if the gaps between packets are large enough. Therefore in LOFAR 1.0 the remote packets were
+discarded during the flushing. This does mean that when one packet gets lost, the flushing will also discard 
+the next packet and some more for as long as it takes to empty the local beamlets FIFO. An alternative would
+be to keep on flushing and discarding remote packets, until the local beamlet FIFO is again ahead of the
+remote packets. Typically packets will get lost rarely or in bursts. In both cases it is fine to just flush
+the local beamlet FIFO until it is empty.
+
+   PN0     PN1     PN2     PN3     PN4   
+t                                        
+0: L0      L1      L2      L3      L4         <-- S_sub_bf = 488 beamlets (dual pol complex) per packet
+     R4      R0      R1      R2      R3  
+       R3      R4      R0      R1      R2
+
+The beamformer function has the following sub functions:
+- "Beamlet subband select" : Select S_sub_bf = 488 subbands per signal input
+- "Local beamformer" : Form N_pol * S_sub_bf = 2 * 488 = 976 local beamlet sums for S_pn = 12 signal inputs
+- "Beamlet ring adder" : 
+  if start node:
+    - Encode beamlet sums packet to ring
+  else:
+    - Buffer the local beamlet sums for >= N subband intervals
+    - Decode remote beamlet sums packet from ring
+    - Align remote beamlet sums packet and local beamlet sums packet
+    - Add local beamlet sums to remote beamlet sums packet
+    if transit node:
+      - Encode beamlet sums packet to ring
+    else:
+      - "Beamlet data output" : Scale and output beamlet sums
+- "Beamlet statistics (BST)": Calculate BST
+
+
+*******************************************************************************
+* Subband Correlator
+*******************************************************************************
+
+Crosslet transport scheme:
+Use transport scheme 2b with N/2 hops where every node sends its local crosslets N/2 hops. The remote crosslets
+are correlated with the local crosslets. The remote crosslets arrive in packets from the N/2 preceding nodes.
+First the local crosslets are correlated with themselves and then the local crosslets are kept in a barrel shifter,
+such that they can also be correlated with the remote crosslets that arrive in the packets.
+- count N_int for monitoring
+
+
+Square correlator cell:
+There are S_pn = 12 local crosslets. A packet contains S_pn = 12 remote crosslets. There are N/2 remote crosslet
+packets. The local crosslets have to be correlated with the local crosslets and with each of the remote crosslet
+packets. The correlation with the local crosslets is a square matrix that yields X_sq = S_pn * S_pn = 144 visibilities.
+
+Number of square correlator cells per PN:
+With N = 16 PN for LBA there are N/2 = 8 remote crosslet packets. Hence together with the local crosslet visibilities
+this yields X_pn = (N/2 + 1) * X_sq = (8 + 1) * 144 = 1296 visibilities per PN.
+
+Crosslet period:
+The subband correlator needs to finished within one subband period, so T_xc < T_sub. For the critically sampled
+filterbank the subband period is N_fft = 1024 sample periods. The X_pn = 1296 visibililies per PN can be
+caluculated using one complex multiplier if the multiplier runs at 1296 / 1024 * 200 M > 253 MHz. For an oversampled
+filterbank with R_os <= 1.25 this requires 1.25 * 253 = 317 MHz, which may be too much.
+
+Time in diagrams:
+- equal time for all PN in same row and in same relative column
+- left to right time in time slot
+- top to bottom time slots
+
+   PN0     PN1     PN2     PN3     PN4   
+t                                        
+0: L0      L1      L2      L3      L4         <-- S_pn = 12 crosslets (single pol complex subband) per packet
+     R4      R0      R1      R2      R3  
+       R3      R4      R0      R1      R2
+                                              <-- T_sub > latency on ring
+1: L0      L1      L2      L3      L4    
+     R4      R0      R1      R2      R3  
+       R3      R4      R0      R1      R2
+
+2: ... 
+                                                  For every slot intergate
+   00      11      22      33      44         <-- XST first LL at each PN upon L arrival
+     04      10      21      32      43       <-- XST then  LR at each PN upon R arrival with L in barrel
+       03      14      20      31      42     <-- XST then  LR at each PN upon R arrival with L in barrel
+       
+N_int-1:                                      <-- Dump and restart XST:
+
+                                                  0 00 10 20  *  *
+                                                  1  - 11 21 31  *
+                                                  2  -  - 22 32 42
+                                                  3 03  -  - 33 43
+                                                  4 04 14  -  - 44
+                                                     0  1  2  3  4
+                                                     
+                                                  * is obtained via conj()
+                                                  - not calculated because conj()
+                                                                                                    
+
+What is the crosslet packet size?
+With S_pn = 12 signal inputs per PN and one crosslet per signal input there are 12 crosslets per packet. A crosslet is
+a W_crosslet = 16 bit complex value, so 12 * 4 = 48 octets payload, so the effective packet size is 40 + 48 = 88 octets.
+The relative packet overhead for single crosslet payloads is 40 / 88 = 45 %.
+
+There are f_sub = 195312.5 subbands per s, and the packets have to travel N/2 hops. This yields a packet load of
+packet size * f_sub * N/2 = (88 * 8b) * 195312.5 * 16 / 2 = 1.1 Gbps. The data load of only the payload data is
+payload size * f_sub * N/2 = (48 * 8b) * 195312.5 * 16 / 2 = 0.6 Gbps. Hence the small packet size causes a large
+packet overhead, but is still acceptable, since it fits on a single 10G link of the ring.
+
+Calculate one or multiple crosslets:
+With small payloads the 10G link could fit about  10/1.1 ~= 8 different crosslets. With larger payloads the 10G link
+could fit about 10 / 0.6 = 16 crosslets. The advantage of using small payloads is that adding more crosslets can be done
+by instantiating the same single crosslets XC multiple times. However the small packets do have to travel sequentially 
+via the same 10G link, so there needs to be a multiplexer after that the local ETH frames have been made. The advantage of
+using larger payloads is that they can be made by putting the extra crosslets in the same payload. With 16 crosslets
+the payload size is 16 * 48 = 768 and the effective packet size is 40 + 768 = 808 octets. The relative packet overhead for 
+multi crosslet payloads is 40 / 808 ~= 5 %. The packet load for multi crosslet payloads is (808 * 8b) * 195312.5 * 16 / 2 =
+10.1 Gbps, so this will just not fit on a 10GbE link, but 15 crosslets would.
+                                                                                   
+At 200 MHz for the critically sampled subbands, a X_pn correlator cell can correlate N_fft / X_sq = 1024 / 144 = 7
+different crosslets frequencies. With N = 16 for LBA there need to be N/2 + 1 = 9 of these X_pn correlator cells in
+parallel. One X_pn correlates the local-local crosslets and the other N/2 X_pn correlates the local-remote crosslets.
+These 9 X_pn in parallel can correlate up to 7 crosslets. The link can transport 15 crosslets, so 18 X_pn in parallel
+could correlate 14 different crosslets to make better use of the link capacity.
+
+One X_pn takes one complex multiplier. For one crosslet using N/2+1 = 9 X_pn is a waste of resources, but still 
+acceptable and providing a clear design.
+
+
+Send more than one time slot per packet?
+To reduce the relative packet overhead for single crosslet XC it is an option to put multiple time slots per payload.
+This is considered to complicating.
+
+   PN0     PN1     PN2     PN3     PN1   
+t                                        
+0: L00     L11     L22     L33     L44        <-- For example two time slots per packet
+      R44     R00     R11     R22     R33 
+         R33     R44     R00     R11     R22
+
+2:
+
+What if a node fails?
+The next N/2 nodes will then miss packets. The order of the packets is not affected, because on each node it will
+be the last one or more packets that are missed. There will be no correlations for the missed packets, but the
+correlation should continue if the next time slot the node starts again. A packet count per packet source at each
+node will reveal missed packets and thus also the number of integrations that happened in the final visibilities.
+If no packets are missed then the packet count is 195312.5 per integration interval on every PN for every packet
+source PN.
+
+   PN0     PN1     PN2     PN3     PN1   
+t                                        
+0: L0      .       L2      L3      L4         <-- PN1 fails, so next N/2 nodes will miss packets
+     R4    .         .       R2      R3  
+       R3  .           .       .       R2
+   00      .       22      33      44
+     04      .       .       32      43
+       03      .       .       .       42
+
+       
+What if a packet gets lost?
+If a packet gets lots then it can cause a gap in the packet order, so the next packet must not be mistaken as
+the lost packet. Therefore the packets must have a time slot number and a source number, such that the XST in
+each node will use it for the correct visibilities.
+
+   PN0     PN1     PN2     PN3     PN1   
+t                                        
+0: L0      L1      L2      L3      L4
+     R4      .       R1      R2      R3       <-- L0 from PN0 gets lost at PN1
+       R3      R4      .       R1      R2
+
+Packet order is guarantueed?
+At the start of every time slot the local L# packet is send first. After that each node passes on the packets that 
+it receives. Therefore the packets arrive in order with packet from closest node first and from furtherst node
+last. If a packet gets lost then there will be a gap, but the order is still preserved.
+
+What if T_sq > T_hop latency on ring?
+What if T_sub > N/2 * T_hop latency on ring?
+If the correlation of one packet takes less time than than one hop, then the correlator remote and local input do
+not need a FIFO.
+
+If T_sub > N/2 * T_hop latency on ring then all packets have been correlated within one slot, so then the next 
+local packet starts the new slot and can thus also serve as timeout for lost remote packets from the current slot.
+The correlator cell is then always ready for the first local correlation L00, L11, etc, so the local data does
+not need a FIFO at the XST input. If T_sq for correlating one packet with the local packet takes less than T_hop,
+then the remote data does not need a FIFO at the XST input. If T_sq > T_hop then the remote data input to the XST
+does need a FIFO.
+
+If T_sub < N/2 * T_hop then the local data input to the XST also needs a FIFO. If the slot period is smaller than
+the latency of N/2 hops on the ring, then the next time slot packet is already send before the fartest away
+packets from the curent time slot have been received. For example A00 and A07 have been
+correlated, but before A06 can be correlated the next time slot already starts with B0. This then means that the 
+correlator input should get a FIFO to store B0 such that A06 can be correlated first when A6 arrives, and A05 and
+A04. After that the correlator can continue with B00 and then B07 when it arrives. This also means that the remote
+packet input also needs a buffer, to store B7 in case the correlation of B00 is still busy.
+
+
+   PN0        PN1        PN2        PN3        PN4        PN5        PN6        PN7
+t            
+0: L0        
+     R7      
+       R6    
+         R5  
+           R4
+             
+0: A0        
+     A7      
+1: B0        
+       A6    
+     B7      
+2: C0        
+         A5  
+           A4
+       B6    
+     C7      
+3: D0        
+         B5  
+           B4
+       C6    
+     D7      
+4: E0        
+         C5  
+           C4
+       D6    
+     E7      
+5: F0        
+         D5  
+           D4
+       E6    
+     F4      
+         E5  
+           E4
+           
+  A00
+     07
+       06        <-- queue local  B0 in FIFO, because first finish time slot A for A6,5,4
+         05      <-- queue remote B7 in FIFO, because first finish time slot A for A5,4
+           04
+  B00            
+     07      
+       06        <-- queue local  C0 in FIFO, because first finish time slot B for B6,5,4
+         05      <-- queue remote C7 in FIFO, because first finish time slot B for A5,4
+           04
+
+If remote packets get lost then the local FIFO will run full, this can then be used to flush
+the FIFOs and restart the alignment. The flush time should be long enough, such that it will
+cause that all PN in the ring will restart. However it is important that all PN restart at
+the same time or using the same time slot. This can be achieved by restarting at the sync
+(so once per second) or by restarting at every time slot in case the previous time slot did
+not receive any remote packet, which indicates that the source node was still flushing its
+FIFOs.
+
+Support for other (shorter) integration period T_int_x?
+- Longer T_int as multiple of 1 s can be supported outside SDP
+- Shorter T_int < 1 s (PPS):
+  . Using BSN scheduler
+  . increases M&C data rate
+  . should still fit within PPS grid
+- Publish T_int_x period ended event message to Station Control
+
+How can it be scaled to more than one crosslet per XST?
+  - multiple per packet
+  - multiple instances of one
+
+
+*******************************************************************************
+* Subband offload for AARTFAAC
+*******************************************************************************
+Current AARTFAAC can offload S_sub_so = 36 subbands for S = 96 signal inputs (SI) in W_subband_so = 16 bit mode,
+so a bandwidth of 36 * 1953125.5 Hz = 7.03 MHz. This corresponds to a load of S_sub_so * S * f_sub * N_complex *
+W_subband_so = 36 * 96 * 195312.5 * 2 * 16 = 21.6 Gbps. The 8 bit subband mode does not work in RSP, but would
+be sufficient for AARTFAAC. Therefore assume W_subband_so = 8 bit for LOFAR 2.0. For LOFAR 2.0 the number of LBA
+doubles to S_lba = 192, so assume S = 192. The load from one 8 bit subband from all 192 signal inputs is 
+S * f_sub * N_complex * W_subband_so = 192 * 195312.5 * 2 * 8 = 0.6 Gbps for R_os = 1 and 0.75 Gbps for maximum
+expected R_os = 1.25 of an oversampled filterbank. Per 10GbE output link this then yields maximum of 10G / 0.6G
+= 16.6 subbands for R_os = 1 and 10G / 0.75G = 13.3 subbands for R_os = 1.25. The 10GbE requires some spare
+capacity, so therefore assume S_sub_so = 12 subbands / 10GbE link will just fit for R_os <= 1.25, provided that
+the packet overhead is < (13.3-12)/12 ~= 10 %. Hence with one 4 * 10GbE QSFP port at the final PN it is possible
+to offload 4 * 12 = 48 subbands or 9.375 MHz bandwidth with S_lba = 192 signal paths and W_subband_so = 8 bit. 
+The ring can be used to transport the subbands to some single destination PN that then performs the output via
+the 4 x 10GbE ports or 40GbE port on the QSFP. The destination PN could also do subband reordering to group
+subbands per S_lba = 192 inputs.
+
+Remark: On the RSP - Uniboard interface there are 9 subbands per lane, so S_sub_so = 36 in total, but on the
+UniBoard - UDP interface to the GPU correlator only 8 subbands, so 32 in total are output.
+
+The subbands are gathered at the output node via the ring. Using the ring avoids the need to use a 10GbE switch.
+Such a switch would need > 16 + 16 ports to support LBA + international HBA and some output ports. If the data
+is gathered, then it can as well be reordered to combine all S signal inputs in a single payload. The subbands
+can be send to the output node via the ring using either scheme 1 or scheme 2a:
+
+If the subband data is transported in one packet using scheme 1, then the payload can contain all 192 signal
+inputs. The payload size for S_sub_so = 12 subbands then becomes S * S_sub_so * N_complex * W_subband_so / W_byte
+= 192 * 12 * 2 * 8/8 = 4608 octets. The packet overhead is then (40 + 4608) / 4608 = 1.009, so 0.9 % overhead.
+Each node then inserts its local subbands at the appropriate offset in the payload. The packet size is 40 + 4608
+= 4648 octets and the data rate is f_sub, so the load is on all links in the ring is:
+packet size * W_byte * f_sub * R_os = 4648 * 8 * 195312.5 * 1.25 ~= 9.08 Gbps.
+
+If the subband data is send in separate packet for each PN using scheme 2a, then the payload size for
+S_sub_so = 12 subband / 10GbE link, S_pn = 12 signal inputs per PN and W_subband_so = 8 bit becomes
+S_pn * S_sub_so * N_complex * W_subband_so / W_byte = 12 * 12 * 2 * 8/8 = 288 octets. The packet overhead is
+then (40 + 288) / 288 = 1.14, so 14 % overhead. The packet size is 40 + 288 = 328 octets and at the end node
+there are N-1 packets on the ring. This yields a aggregate 'packet size' of (16-1)*328 = 4920 octets. The
+load on the last link in the ring is: (N-1) * packet size * W_byte * f_sub * R_os =
+(16-1) * 328 * 8 * 195312.5 * 1.25 ~= 9.61 Gbps. Note that the packet overhead of 14 % is larger than the
+maximum estimated allowable packet overhead of 10 % to transport 12 subbands. The reason that 12 subbands
+still fit is that the last node in the ring does not have to transport its local subbands via the ring, 
+so the use capacity on the last link is a factor (N-1)/N less. At the output node the data from all N node
+is combined into one payload of size N * 288 = 4608 octets, so the output load is ~= 9.18 Gbps (identical
+to scheme 1, because the output rate does not depend on which ring scheme was used).
+
+Both scheme 1 and scheme 2a can send offload 12 subbands per 10GbE link. The difference is that scheme 1 has
+a load of ~9.08 Gbps on all hops, whereas for scheme 2a the load increases wit every hop and has a maximum
+of ~9.61 Gbps on the last hop. With scheme 1 each node has to put its
+local subbands at the right location in the packet. In this way the end node only needs to output the 
+payload, because the data is already in the subband offload payload format. With scheme 2a all nodes just
+send their local data and pass on the transit data. At the end node a dispatcher and BSN aligner are needed
+to align the packets from all N = 16 nodes. After that the end node needs to reorder the data from these
+N = 16 input payloads into the subband offload payload format. This functionality in the end node is similar
+to the rsp_terminal function on UniBoard1 for AARTFAAC. Scheme is specific to the ring, scheme 2a would also
+work if the subband data is send to the end node via a switch (or via URI like with RSP).
+
+With scheme 2a the ring could be used in both directions, but this does not improve the capacity of the
+ring. With scheme 1 the packets travel 1+2+3+...+(16-1) = 120 hops. With scheme 2a the packets travel
+1+2+3+4+5+6+7+8 = 36 hops left and 1+2+3+4+5+6+7 = 28 hops right, so total 64 hops. For the transport load
+on the ring as a whole scheme 2 is a factor 102/64 = 1.875 more efficient. However at the end node both
+schemes still have transfer the same load of 15 packets. Therefore at the end node the load for both
+schemes is the same. Hence with 15+15 or (8+7)+(7+8) packets arriving at the end node, this node has no
+spare capacity left to receive more subband packets via these two links.
+Using the ring in both directions does reduce the latency and therefor the input buffering at the end node
+by a factor 1.875. Furthermore less hops also proportionally reduces the packet error rate. It is easier
+to use the ring in only one direction, because all nodes then send in the same direction, independent of
+their location in the ring.
+
+At the output node the packet payload is put in an UDP/IP packet and with an SDO application header. The UDP/IP
+header has 8+20 = 28 octets. The SDO header in LOFAR 1.0 has 22 octets. The output packet size is 40 + 28 + 22
++ 4608 = 4698 octets and the output data rate is packet size * W_byte * f_sub * R_os = 4698 * 8 * 195312.5 *
+1.25 ~= 9.18 Gbps. The output load is independent of the ring scheme. The ring has 12 full duplex 10GbE links.
+Suppose 8 of these can be allocated to subband offload, then the ring can suppport subband offload for maximum
+2*8 * 12 = 192 subbands (= 37.5 MHz). This then requires 2*8 / 4 = 4 QSFP ports, on different nodes.
+
+Design decision:
+- Gather subbands at output node (instead of having a dedicated offload port at each node)
+- Gather the subbands via the ring (to avoid the need for a 10GbE switch wit about 40 ports)
+- Reorder the subbands to have all subbands from signal inputs in one payload (to ease input stage of user application)
+- Use scheme 2a and in both directions (to reduce the number of hops and latency)
+
+
+*******************************************************************************
+* Transient buffer readout
+*******************************************************************************
+
+The transient buffer stores the data in frames of 2 kByte. A frame contains data from one signal input. The
+memory is divided into pages and each page can contain one frame. The transient buffer readout is controlled
+per signal input and defined by a start time and a number of pages. The start time translates into a start
+page. The SCU issues the read commands per signal input. The SDP firmware then reads and outputs the 
+requested frames to CEP. When the transfer has finished, then the SDP firmware sends an event message to the
+SCU, and then the SCU issues a read command for the next signal input, until all signal inputs have been
+handled. For the ring the read out per signal input implies that at any time only one node will send data.
+
+The read frames are encoded into an DP/ETH frame. The first frame that is read is encoded with a sync and the
+subsequent frames that are read can be counted via the BSN field. In this way a BSN monitor at the end node
+can monitor whether all frames for a signal input read out have arrived at the end node. The end node decodes
+the frame and then encodes them into and UDP/IP/ETH frame to CEP. The transit nodes pass on the frames, and
+also decode the frames to be able to monitor them with a BSN monitor. After each read command has finished the
+SCU can check the BSN monitor at the end node to know whether all frames arrived correctly at the end node.
+
+For 1 Gbps data rate to CEP and packets of about 2 kByte the packet rate is 1e9/ 2000 / 8 = 62500 packets/s
+or about one packet every 16 us, so about every 3 T_sub. It is allowed to let multiple nodes output TB data,
+but the total number of packets/s has to still fit the output link.
+
+
+
+
+
+*******************************************************************************
+* Appendix: Resource usage in Stratix IV of UDP offload via 10GbE:
+*******************************************************************************
+
+
+The BF 10GbE has no statistics, the XC 10GbE does have statistics. One 10GbE takes maximum about 5500 FF and about 7 M9K
+* apertif_unb1_fn_beamformer_trans (1x 10gbE Tx)
+                                                                                  ALM     FF  M9K  M144K
+  node_apertif_unb1_fn_beamformer_output:\gen_node_output:u_node_output          7569  10518   23      2
+    apertif_unb1_fn_beamformer_udp_offload:u_apertif_unb1_beamformer_udp_offload 2547   4200   15      0
+    common_areset:u_common_areset                                                   2      4    0      0
+    dp_fifo_fill:u_dp_fifo_fill                                                   120    130    0      2
+    dp_fifo_monitor:u_dp_fifo_monitor                                             154    220    0      0
+    mms_dp_xonoff:u_mms_dp_xonoff                                                   8      7    0      0
+    tr_10GbE:u_tr_10GbE                                                          4748   5957    8      0
+      dp_fifo_dc:\gen_dp_fifo_dc_rx:0:u_dp_fifo_dc_rx                              82    138    0      0
+      dp_fifo_fill_dc:\gen_dp_fifo_fill_dc:0:u_dp_fifo_fill_dc                    186    244    4      0
+      tech_eth_10g:u_tech_eth_10g                                                4337   5307    4      0 <-- 3 % of 182400 FF
+      tr_xaui_mdio:\gen_mdio:u_tr_xaui_mdio                                       187    268    0      0
+
+* apertif_unb1_correlator_full (3x 10gbE Rx)
+                                                                                  ALM     FF  M9K  M144K
+  dp_offload_rx:u_dp_offload_rx                                                   950   1463    0      0
+  tr_10GbE:u_tr_10GbE                                                           11368  17976   33      0
+    dp_fifo_dc:\gen_dp_fifo_dc_rx:0:u_dp_fifo_dc_rx                                88    127    2      0
+    dp_fifo_dc:\gen_dp_fifo_dc_rx:1:u_dp_fifo_dc_rx                                85    127    2      0
+    dp_fifo_dc:\gen_dp_fifo_dc_rx:2:u_dp_fifo_dc_rx                                74    127    2      0
+    dp_fifo_fill_dc:\gen_dp_fifo_fill_dc:0:u_dp_fifo_fill_dc                       86    111    2      0
+    dp_fifo_fill_dc:\gen_dp_fifo_fill_dc:1:u_dp_fifo_fill_dc                       93    109    2      0
+    dp_fifo_fill_dc:\gen_dp_fifo_fill_dc:2:u_dp_fifo_fill_dc                       90    108    2      0
+    tech_eth_10g:u_tech_eth_10g                                                 10552  16487   21      0 <-- ~=5500 ~= 3 % of 182400
+    tech_eth_10g_stratixiv:\gen_ip_stratixiv:u0                                 10552  16487   21      0
+      ip_stratixiv_eth_10g:u_ip_stratixiv_eth_10g                               10552  16487   21      0
+  tr_xaui_mdio:\gen_mdio:u_tr_xaui_mdio                                           486    780    0      0
+
+
+
diff --git a/applications/lofar2/doc/prestudy/station2_sdp_timing.txt b/applications/lofar2/doc/prestudy/station2_sdp_timing.txt
new file mode 100644
index 0000000000000000000000000000000000000000..ac488b6e9b194cde7a6d910d3e0060e1ba0ac694
--- /dev/null
+++ b/applications/lofar2/doc/prestudy/station2_sdp_timing.txt
@@ -0,0 +1,394 @@
+*******************************************************************************
+* Fixed Station BSN grid and the PPS grid
+*******************************************************************************
+
+The Station needs an external trigger to align all ADCs in the RCU2S and all FPGA procesing nodes in the SDP.
+For this trigger a pulse from the pulse per second (PPS) is used. The PPS is aliged to the top of second of the
+UTC time of day (ToD). The PPS is a hardware trigger that is available within the entire SDP at sample clock
+cycle accuracy. Thanks to the Timing Distributor (TD) the PPS trigger is also available as hardware trigger in
+all Stations. Thanks to the TD the PPS is aligned to UTC ToD, and the ToD is available to the Telescope Manager
+(TM) in LOFAR2.0 and to Station Control in each Station. The TM controls, via Station Control, which PPS pulse
+is used to start SDP. The PPS is identified by a Seconds Sequence Number (SSN) that counts PPS since a certain
+date in the past, e.g. t_epoch = 1 jan 1970, but some other fixed date is possible too.
+
+The SDP processes the data in blocks of ADC samples that are identified by a Station block sequence number
+(BSN). The Station BSN time grid should be fixed, so independent of when the data processing starts. Therefore
+the Station BSN counts blocks since the same t_epoch as the SSN, so the t_epoch defines the common reference
+moment in history for the Station BSN grid and for the PPS grid. The PPS grid does not necessarily always
+coincide with the Station BSN grid. The BSN period determines whether the Station BSN can start exactly at an
+PPS or not.
+
+The processing of the ADC inputs in SDP is done by multiple FPGAs in parallel. Each FPGA has a BSN source that
+creates the Station BSN grid. The BSN source is the wall clock of the FPGA. To be able to start the data
+processing at any PPS it is necessary that the BSN source can start at a programmable fraction of a BSN period
+after the PPS. In this way processing of one Station ADC signal input can be restarted at any PPS (with zero
+phase offset to the other signal input) and an entire Station can be restarted at any PPS (with zero phase 
+offset to the other Stations). The BSN source ensures that the BSN timing is always on the fixed Station
+BSN time grid. The initial BSN and the offset fraction of a BSN period need to be provided to SDP via the
+M&C interface by Station Control. Both Station Control and SDP know the PPS grid. Station Control also knows
+UTC and with that information Station Control can program and initialize the BSN source in the FPGAs to start
+counting data blocks at the next PPS. 
+
+The sample frequency f_adc = 200 MHz is an integer number of Hz and locked to the PPS, therefore the PPS grid
+always coincides with the ADC sample period T_adc = 1/f_adc = 5 ns grid. The Station BSN block period is an
+integer number of N_blk sample periods T_adc. The Station BSN period is set by the subband rate of the subband
+polyphase filterbank (PFB), so the Station BSN period is equal to the subband period is T_sub. 
+The input to the subband filterbank is the real signal from the ADC. For both the critically sampled PFB and 
+the oversampled PFB the data block size of the input data is N_blk = N_FFT = 1024 ADC samples, however for the
+oversampled PFB the blocks overlap by a factor R_os. Hence for the critically sampled PFB the BSN period is
+T_sub = N_blk = 1024 [T_adc] and for the oversampled PFB the BSN period is T_sub = N_blk / R_os = 864
+[T_adc], in case R_os = 32/27. At the output of the subband filterbank a data block contains 
+N_sub = N_FFT / N_complex = 512 complex subband samples that all correspond to the same time instant as
+defined by the Station BSN. Each subband sample represents another frequency.
+
+The offset fraction between the PPS grid and the BSN grid is between 0 and N_blk-1 ADC sample periods. Hence
+to start the processing at any PPS the BSN source has to be able to start the Station BSN at an offset
+of 0:N_blk-1 sample periods. The BSN source starts at a PPS with an initial Station BSN of:
+
+  initial Station BSN = ceil((SSN * 1 s) / T_sub) = ceil((SSN * f_adc) / N_blk)
+
+and a BSN offset fraction of:
+
+  BSN offset fraction = mod(SSN * 1 s, T_sub) / T_adc = mod(SSN * f_adc, N_blk)
+  
+to make sure that the BSN grid is always relative to t_epoch, independent of at which PPS the BSN source was
+started. The Station BSN increments after every block. The time ToD_BSN at the BSN grid is:
+
+  ToD_BSN = t_epoch + Station BSN * T_sub
+  
+
+Note:
+- The BSN offset fraction could also be compensated for by delaying the sample data in the ADC signal input
+  buffers at the input of SDP. Delaying the data does compensate for phase differences in the subband
+  data, but does not compensate for the offset in the BSN grid. The BSN alignment buffers between signal 
+  inputs from different FPGAs then still need to compensate for this BSN offset fraction. Hence delaying the
+  data is an indirect and incomplete solution, and therefore it is not used.
+- In LOFAR1 the Station BSN is divided in a 32 bit seconds sequence number (SSN) that counts PPS intervals and
+  a 32 bit local BSN that counts blocks within each SSN interval. In APERTIF the BSN is a continuous BSN that
+  can start at any PPS, but without support for a BSN offset fraction. In APERTIF f_sub = 1/T_sub = 781250 so
+  the timing of the APERTIF BSN grid does not depend on at which PPS it is started. For LOFAR2.0 the
+  continuously incrementing BSN is preferred, but with the BSN offset fraction to ensure that the BSN grid can
+  be started at any PPS.
+
+
+*******************************************************************************
+* Relation between Station BSN and timestamp in fractional seconds
+*******************************************************************************
+
+The T_sub and the fixed BSN grid given by ToD_BSN provide sufficient timing resolution to timestamp the data
+and any M&C upon the data, because:
+
+- It is not necessary to facilitate using an offset 0 < T_sub_o < T_sub to start the BSN grid at an integer
+  number of T_adc after t_epoch, because the BSN grid is sufficiently fine.
+- It is not necessary to represent fine group delays of digital filters or analogue electronics and 
+  cables in the BSN, because these delays are all accounted for after calibration.
+  . Course group delays and cable delay differences can be compensated for in steps to T_adc via the signal
+    input buffer of every ADC input in SDP.
+  . Fine group delay differences within a Station can be calibrated via the subband calibration weights.
+- Group delay differences between Stations need to be calibrated at CEP, and can be compensated at CEP or
+  at Station via the input buffers and the subband calibration weights.
+- For the PFB it does not matter at which Station BSN period it was started. What matters is that the PFB
+  for all signal inputs (within a Station and between Stations) is started at the same fixed Station BSN
+  grid, because this ensures that the output from all signal inputs will not get relative phase offsets.
+
+The actual resolution T_sub of the Station BSN in LOFAR2 depends on the ADC sample frequency and on the
+subband filterbank:
+
+  N_blk  T_adc    T_sub       T_sub_i
+  1024 * 5   ns = 5120   ns = 25600 [0.2 ns] for critical sampled filterbank at 200 MHz
+  1024 * 6.4 ns = 6553.6 ns = 32768 [0.2 ns] for critical sampled filterbank at 160 MHz
+   864 * 5   ns = 4320   ns = 21600 [0.2 ns] for oversampled filterbank R_os = 32/27 = 1.185 at 200 MHz
+   864 * 6.4 ns = 5529.6 ns = 27648 [0.2 ns] for oversampled filterbank R_os = 32/27 = 1.185 at 160 MHz
+   800 * 5   ns = 4000   ns = 20000 [0.2 ns] for oversampled filterbank R_os = 32/25 = 1.28 at 200 MHz
+   800 * 6.4 ns = 5120   ns = 25600 [0.2 ns] for oversampled filterbank R_os = 32/25 = 1.28 at 160 MHz
+
+In LOFAR2 the timestamp should be independent of:
+
+- using 200 MHz sample rate or 160 MHz sample rate,
+- using critically sampled subband filterbank or oversampled subband filterbank
+  
+If T_sub was fixed then T_sub could be used as timestamp resolution (like in APERTIF). However T_sub depends
+on the type of subband filterbank with a resolution of T_adc. If T_adc was fixed then T_adc could be used
+as timestamp resolution. However T_adc depends on the sample clock rate. Therefore the timestamp resolution
+needs to be as fine as the greatest common time resolution of T_adc = 5 ns and T_adc = 6.4 ns, which is 0.2 ns.
+A 64 bit timestamp with 0.2 ns resolution can count 2**64 / (365.25 * 24 * 3600 / 0.2e-9) = 116 years. Hence
+for t_epoch = 1970 this is until 2086, which is sufficient for the lifetime of LOFAR2.0. Internally in SDP
+firmware use the BSN to count T_sub. Externally at the SDP interface use timestamp values with a resolution
+of 0.2 ns such that they are:
+
+ * integer values, and
+ * independent of the sample period.
+ 
+The actual timestamp in fractional seconds of 0.2 ns follows from:
+
+  timestamp = Station BSN * T_sub_i * 0.2 [ns].
+  
+The BSN and T_sub_i can be specified as:
+
+- single 64 bit integer timestamp value of BSN * T_sub_i [0.2 ns]
+- two separate fields with an incrementing BSN and resolution given by T_sub_i [0.2 ns]
+
+To cover 116 years for a BSN with smallest T_sub = 4000 ns for R_os = 32/25 = 1.28 requires:
+
+  log2( 116 * (365.25 * 24 * 3600 / 4000e-9) ) = 49.7, so 50 bits
+  
+Therefore allocate 64b in a packet header to send the BSN information. The BSN and timestamp are direcly
+related via T_sub_i, but the advantage of providing the BSN separately is that it increments by 1 for
+each block period T_sub, so it can be used as block index.
+
+The range of T_sub is 4000 ns - 5120 ns, so the range of T_sub_i is 20000 - 25600. These T_sub_i values
+can be covered in a 16 bit number. Alternatively T_sub_i can be derived from the four possible
+combinations of f_adc = 200M or 160M and R_os = 1 or 32/25, that can be represented with 2 bits.
+
+
+
+*******************************************************************************
+* Data timing using BSN and sync
+*******************************************************************************
+
+Together the initial BSN, counting blocks and the order of the data within a block uniquely define the timing
+of the data in a Station. However counting blocks is not sufficient to maintain the data timing, because:
+
+- The data flow at the SDP input may be stopped and restarted,
+- at the external interfaces of the FPGAs in SDP it is possible that blocks of data get lost.
+
+The assumptions are:
+
+- data is transported and processed in blocks,
+- partial blocks cannot occur. 
+- the data flow can only stop or continue at block boundaries.
+
+To recover from gaps in the data flow the BSN can be transported along with every data block.
+For the external FPGA interfaces one or more data blocks get packed into the payload and the BSN is then
+transported via the header. The BSN in the header corresponds to the first data block in the payload, the 
+position of a data block in the payload defines the offset to this BSN.
+
+For data transport within the FPGA it is costly from a resource point of view to tranport the 64 bit BSN
+along with every data block throughout each data move and data processing function.
+Instead of the full 64 bit BSN a single bit sync signal can be passed on along with the data inside the FPGA.
+At the BSN source the sync is linked to the BSN. Each
+FPGA in the SDP firmware has a BSN source, that all run synchronously within a Station and also between
+Stations, because they have been started by the external PPS from the Timing Distributor (TD). The sync is a
+periodic signal with period larger than the maximum latency of the data within the SDP, to ensure
+that a sync at any FPGA refers to the same time instant. If somewhere in an FPGA the sync comes along, then
+the BSN can be recreated by directly using the BSN that was held at the sync by the BSN source and start
+continue counting blocks from there.
+
+
+*******************************************************************************
+* Data loss and filler data
+*******************************************************************************
+
+The assumptions are:
+
+- Within an FPGA there occur no logic errors (provided that the code synthesis went ok)
+- Within an FPGA no blocks get lost (so e.g. FIFO overflow must not occur and would be as bad as an logic error)
+- At external FPGA interfaces data can get lost
+- Packets that are received at external FPGA interfaces with a CRC error are discarded
+
+
+For the blocks between sync pulses the Station BSN is incremented with every block. This implies that if the
+BSN needs to be preserved during the sync interval, then lost or discarded blocks must be replaced by filler
+blocks. Whether only the BSN at the data sync is relevant, or whether also the BSN of subsequent data blocks is
+needed depends on the function. For the statistics (AST, SST, BST, XST) the BSN at the data sync is sufficient to
+mark the timing of integration results. For these integration results the number of data blocks within the 
+integration interval is relevant to know how many blocks contributed (and thus also how many blocks were lost).
+However for the integration result it is not relevant which blocks got lost, because the statistics do not have
+to keep accurated time centroid information. It is sufficient to use the BSN at data sync to timestamp the 
+integration results, as if all blocks contributed. As another example, for the beamformer it is important to
+be able to recreate the BSN at the data sync and all subsequent data blocks, because the beamformer must weight
+and sum the input beamlets that coincide in time. Similar for the beamformer output to CEP and for the subband
+offload to AARTFAAC it is important to accurately timestamp every data output block. For the output via the
+Network to CEP and to AARTFAAC it is also benefical to replace lost data blocks by filler packets such
+that the Station output remains at the nominal rate. In this way the destination can distinguish between data
+blocks that got lost inside Station and packet loss on the Network.
+
+The assumptions are:
+
+- Data blocks from packets that are lost or discarded may or may not be replaced by a filler blocks dependend
+  on the downstream function.
+- Within an FPGA only approved packets are processed (so either correct data blocks or filler blocks).
+- The filler data is flagged and typically the filler data is set to zero.
+
+Note:
+- Internally in the FPGA no data is lost, so then a valid and eop are sufficient to mark all blocks. The first
+  valid and the valids after each eop then identify the start of block (sop). Counting eop (or sop) and adding
+  the initial BSN yield the Station BSN. However it is convenient to do transport a sop and a sync along with
+  the data, because:
+  . using a sop avoids having to derive the sop information from the first valid after an eop
+  . using a sync avoids having to derive the sync information from counting valids or sops
+
+
+*******************************************************************************
+* Creation of data blocks and filler data
+*******************************************************************************
+
+There are two places in the FPGA where data blocks can be created:
+
+- BSN source
+- Input streams BSN aligner
+
+The BSN source creates the original data blocks using the wall clock time. The BSN aligner can only create
+filler blocks in an already active stream of data blocks. Within SDP each FPGA has a BSN source that times
+the local data blocks. In the beamformer and the subband correlator data blocks from all
+signal inputs are combined. Therefore local data from this FPGA and remote data from other FPGAs needs to
+be aligned at the BSN by an input streams BSN aligner. The task of the BSN aligner is to time align
+external or external and internal inputs streams based on their BSN. If at least one input is
+active, then output data can be created by inserting filler data for any inactive inputs. If all inputs of
+a BSN aligner are inactive, then no output can created. Within SDP the local input is always active,
+unless the BSN source is stopped via M&C. The remote input(s) can be inactive due to M&C or due to packet
+loss. If the output has stopped due to that the local input went inactive, then the output can resume as
+soon the local input becomes active again. It is not necessary to wait for a sync to resume the output.
+
+
+
+*******************************************************************************
+* Internal sync interval
+*******************************************************************************
+
+In LOFAR1 the sync interval was chosen to be aligned to the external PPS, so a period of 1 s. This resulted
+in having 195313 T_sub for even PPS sync intervals and 195312 T_sub for odd PPS sync intervals. In LOFAR1
+the notion of odd and even PPS arises due to that the BSN grid can start at any PPS. This notion is akward,
+because it means that Stations should start at an even sync interval when the PPS grid and BSN grid coincide
+to ensure that all stations remain aligned. Starting only at even PPS ensures that LOFAR1 uses a BSN grid
+that is fixed to t_epoch = 1970. For an oversampled subband filterbank the PPS grid and BSN grid
+coincide every q-th PPS, where R_os = p/q, so then a Station should only start every q-th PPS.
+In APERTIF the sync interval was chosen to be an integer number of fine channel periods T_chan = 
+N_Chan * T_sub, which resulted in 12500 T_chan and 800000 T_sub or a period of 1.024 s. This 1.024 s is used
+as unit integration period of the correlator in APERTIF. A sync interval of 1 s would have resulted in
+781250 T_sub and 12207.03125 T_chan. The APERTIF sync interval of 1.024 s is akward too, because it differs
+from 1 s and because this also causes that different dishes should only restart when the PPS grid and sync
+grid coincide, which is once every 125 s, because 128/125 = 1.024.
+LOFAR1 and APERTIF show that in general application periods do not integer fit with the 1 s PPS grid. For
+integration periods the only two options are to either use another integration interval (like 1.024 s in
+APERTIF) or to accept that the number of samples per integration interval can differs by one (like 195313 or
+195312 in LOFAR1). 
+Both LOFAR1 and APERTIF cannot start at any PPS without affecting the BSN grid. This needs to be solved for
+LOFAR2.0. Like in LOFAR1, for LOFAR2.0 the PPS grid and BSN grid are fixed to t_epoch = 1970. However,
+instead of waiting until the BSN grid and PPS grid coincide, the BSN source in
+LOFAR2.0 has the capability to adjust the timing by a BSN offset fraction, such that the Station processing
+can start or restart at any PPS, while still preserving the fixed BSN grid that starts at t_epoch = 1970.
+In this way starting a Station or restarting a Station in an already running LOFAR telescope become possible
+at any PPS. Still like in LOFAR1 even sync intervals will have 195313 blocks and odd sync intervals will have
+195312 blocks, but this is inevitable due to that at f_adc = 200 MHz and with N_FFT = 1024 the BSN grid and
+PPS grid only coincide at even PPS.
+
+For the Station the maximum data latency within SDP mainly depends on:
+
+- the number of taps in the subband filterbank,
+- the number of hops and latency per node on the ring.
+
+Therefore the maximum data latency will be less than 100 data blocks, so for T_sub = 5.12 us this is less
+than 1 ms. For LOFAR2.0 choose the same internal sync interval as in LOFAR1, so with a period of 1 s and
+aligned to the PPS grid. The sync interval of 1 s is a suitable choice because it [ADD 4.5.2.1]:
+
+- can serve to pass on the BSN information within the FPGA and within the entire Station
+- is short enough to restart the SDP processing after it had stopped (due to an M&C stop command)
+- fits the way in which humans count time
+- can start at any PPS and thus aligns with the PPS grid
+- is short enough to provide a fixed update interval with sufficient time resolution for periodic M&C
+  . suits as integration period for the periodic statistics like ADC power, SST, BST and XST
+  . suits as update period for monitoring e.g. transport latencies, BSN alignement
+  . suits as update period for controlling the BF weights
+- is long enough to perform time critical periodic M&C in time within the correct sync interval
+
+
+*******************************************************************************
+* Transport of BSN and sync
+*******************************************************************************
+
+Between FPGAs the BSN is transported in the packet header. It is beneficial to also transport the sync
+information in the packet header, to avoid having to derive the sync interval from the received BSN.
+Deriving the sync interval from the BSN is awkward if the sync interval is not a power of 2 number of
+blocks, because then the derivation requires a division. For packets that contain multiple blocks the sync
+will only be transported if it applies to the first block in the payload. A separate header field could
+identify the index of the block within the payload to which the sync applies. However, within SDP the
+packets contain only one data block per payload, so the index field is always 0 and can be omitted.
+
+The Station BSN need 50 bits to count 116 years. In the packet header use 64 bit to transport the Station
+BSN and use the MSbit to transport the sync. The sync in the received packets is used for monitoring
+purposes to:
+
+- Measure the transport latency of the received packet relative to the PPS
+- Verify that for remote input streams the received BSN at the sync agrees with the BSN at the sync of
+  the local BSN source. If the received sync does not coincide with the local reference sync then
+  something unexpected has happened. The action for the input sync monitor is then to make this remote
+  input inactive and report this via its M&C monitoring point.
+
+If for an remote input the packet with a sync got lost, then this should be reported via a received sync
+timeout M&C monitoring point. The M&C monitoring points that rely on receiving the packet with the sync
+should then flag that the reported value is invalid, e.g. by reporting -1 = 0xFFFFFFFF.
+
+To summarize, transporting the sync pulse instead of the BSN saves logic and FIFO resources in the FPGA
+and avoids having to derive the sync interval from the BSN. Therefore:
+
+- Internally in the FPGA transport a sync along with the data for each data block.
+- Externally at the FPGA interface transport the sync and BSN in the packet header of each data packet,
+
+The BSN in LOFAR2.0 is a continuous counter with reference to t_ecpoch = 1970, so the BSN in LOFAR2.0
+does not restart at 0 at every like in LOFAR1 and in APERTIF.
+
+
+
+*******************************************************************************
+* Input streams BSN aligner
+*******************************************************************************
+
+In Station SDP the BSN serves two purposes;
+
+- the entire BSN provides wall clock time on the BSN grid and is thus linked to UTC,
+- the difference in BSN is used to time align input streams
+
+In Station SDP each FPGA has a local BSN source and a local stream that carries the data from its local ADC 
+signal inputs. The BSN aligner needs to align the local stream with the remote streams that are received
+from the other FPGA via the ring. The maximum BSN latency on the ring depends on the number of FPGAs in the
+ring. Suppose each FPGA introduces a latency of at least one packet, because it applies store and forward on
+packets, and less than two packets. Furthermroe assume that on the ring each packet contains one data block.
+The maximum number of hops between the first FPGA on the ring and the final FPGA is N_FPGA-1. For the LBA
+ring N_FPGA = 16. Hence the maximum BSN latency that can occur within
+SDP is < (N_FPGA-1) * 2 < 32 block periods. Hence the maximum BSN difference between the local input and a 
+remote input of the BSN aligner is < 32. Therefore to align the input streams the BSN aligner only has to
+compare the log2(32) = 5 LSbits of the BSN of all input streams. This implies that for the BSN aligner it
+would be sufficient to only transport these bits in the packet header, however it is convenient and not too
+costly to transport the entire 50 bit Station BSN and the sync bit in the packet header.
+
+Note:
+. In LOFAR1 the CDO sends a 32 bit SSN (called timestamp) to CEP and a local BSN. The local BSN is based
+  on a 16 bit frame sequence number (FSN) for bits [15:0] and every time the FSN = -1 the local BSN is
+  incremented by 2**16. This scheme allowes using an FSN of only 16 bit, but relies on that the block with
+  FSN = -1 = 2**16-1 did not get lost. Alternatively the check on FSN = -1 could be replaced by FSN <
+  previous FSN. The FSN and local BSN both restart at the sync, so they also both rely on that the block
+  with the sync did not get lost.
+- The local BSN per sync interval in LOFAR1 wraps every sync interval and moreover it wraps at two different
+  values 195313 or 195312. Therefore the lcoal BSN is less suitable to be used in a BSN aligner.
+- In LOFAR1 the SDO uses rad_bsn to caluculate continuous BSN from seconds timestamp, odd and even second and
+  local BSN. The PPS timestamp is captured by the stream sync (so the sync must not have gone lost). The
+  local BSN is derived counting eop and restarting at the sync (so no by an input fsn, an therefore lost
+  packets will cause a wrong local BSN). The BSN on UniBoard is thus a continous BSN that counts since some
+  PPS in the past, defined by the seconds timestamp. The RSP clock frequency is represented by a bit (0 =
+  160 MHz, 1 = 200 MHz).
+
+
+
+
+
+
+
+Design decisions:
+
+- Use a pulse from the PPS to align ADCs via JESD204D
+- Use a pulse from the PPS to start the internal sync interval
+- Use a pulse from the PPS to start the subband processing, so start a 1 s UTC grid
+- Allow starting the subband processing at any PPS, by supporting a BSN offset fraction in the BSN source.
+- Use 1 s period of 200M (or 160M) cycles for the internal sync interval, because this fits the integration
+  time for the statistics and the update rate of the beamlet weights
+- The number of samples per sync interval of 200M or 160M must be programmable via M&C. The default is 200M,
+  but in simulation it can be much less.
+- Use central UTC timestamp at PPS initialized by M&C and incremented by SDP firmware for the SSN
+  per FPGA.
+- Use 32 bit SSN to fit UTC in seconds for 136 years since 1970
+- Use local BSN that counts data blocks within a sync interval, so it restarts at 0 at the internal sync
+- Within SDP transport the sync and the local BSN. The sync is transported via the MSbit of the local BSN.
+  At the sync transport the 31 bit SSN instead of local BSN 0, but only for monitoring purposes.
+- Derive 64 bit UTC timestamp in units of T_sub in SDP firmware and use this for data output to CEP
+
diff --git a/applications/lofar2/doc/prestudy/station2_semi_float32.txt b/applications/lofar2/doc/prestudy/station2_semi_float32.txt
new file mode 100644
index 0000000000000000000000000000000000000000..07469d4c8060e4fcc0d88ec963fb3c2fe17a84dd
--- /dev/null
+++ b/applications/lofar2/doc/prestudy/station2_semi_float32.txt
@@ -0,0 +1,69 @@
+Semi-floating point values for SST, XST and BST
+
+int2float.vhd uses a 32bit semi-float with 1 bit exponent and 31 bit mantissa
+- float_w = 32
+- int_w = 54
+- exp_w = 1
+- mantissa_w = float_w - exp_w = 31
+- mantissa = 2**31
+- base_w = int_w - mantissa_w = 23
+- base = 2**base_w = 2**23
+- if int in range -mantissa/2 to +mantissa/2-1 then exp = 0 and float = int
+                                               else exp = 1 and float = round(int / base)
+- if exp = 0 --> int = mantissa
+- if exp = 1 --> int = mantissa * base
+
+53 52 51 50 49 48 47 46 45 ... 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 ... 4  3  2  1  0
+<s--s--s--s--s--s--s--s--s-...--s--s><s---------mantissa-----------------------...-------------->
+
+53 52 51 50 49 48 47 46 45 ... 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 ... 4  3  2  1  0
+<s------mantissa-----------...------------------------------><------2**23------...--------------> 
+
+
+The SNR decrease due to quantization is:
+ 
+  (s/s0)^2 = 1 + 1/(12*g^2)
+  
+where s = sigma, s0 = sigma of sky noise, d = quantization step, g = s0 / d.
+After integration over N powers the SNR improves by a factor sqrt(N), so the integrated power S of s needs
+to be represented by log2(sqrt(N)) extra bits to contain the processing gain of the incoherent integration
+plus twice as much bits as s to contain the power s^2. The sqrt(N) processing gain for integrated powers is
+also derived in MEM-131 of ARTS by SJW:
+
+  (s/s0)^2 = 1/N + 1/(12*g^2)
+
+For N = 1 there is no integration, so then the SNR decrease is defined by the input quantization. For N > 1
+the integration improves the SNR by a factor N, so the quantization needs to become finer as well.
+
+The power value of s^2 needs twice as many bits, to contain the whole range, but only half of these are
+significant, because squaring a number does not add information. Hence for sigma s with 4 bits including
+sign bit and N = 195312.5 so log2(sqrt(195312.5)) = 8.8 bits this results in at least 4 + 8.8 = 12.8 bit.
+The power values use twice as many bit, so 2 * 12.8 - 1 sign = 24.6 bits or about 25 bits to represent
+the SST and XST power values. For the BST the BF also provides a processing gain for incoherent noise
+of log2(sqrt(96)) = 3.3 bits, so the BST need about 12.8 + 3.3 = 16.1 bits. The power values use twice as
+many bits, so 2 * 16.1 - 1 sign = 31.2 bits or about 31 bits to represent the BST power values.
+
+The semi-floating point value of 32 bit with 1 bit exponent can represent 31 bit values without extra
+rounding, which is suitable to represent the SST, XST and BST values for measurements without RFI or
+weak RFI. The exponent = 1 representation is suitable and needed to represent the strong RFI signals.
+The range of the semi-floating point values for the powers can be increased by first rounding the powers
+by up to 4 + 8.8 = 12.8 LSbits, because these power bits are not significant. A safe value would be to
+round e.g. 8 LSbits of the power int values before converting them to semi-floating point. The calculated
+power values then become log10(2**8) = 24 dB lower.
+
+Note:
+- The ADC sign bit is also a bit that counts in the SNR as 6 dB. The sigma value is positive by definition.
+  For (s/s0)2 = 1.01, so 1 % worse SNR due to quantization, g = 3.53 and s0 = 3.53 d. Hence s0 is log2(3.53)
+  = 1.8 bit including sign bit (1.8b = 10.8 dB). For Gaussian noise the -3 to +3 sigma range contains 99 %
+  of the values. The 3 sigma corresponds to log2(3) = 1.6 bit. In total the ADC 3 sigma input then fits in
+  3.4 bit, so use 4 bit as a practical lower limit for ADC input quantization with negligible quantization
+  loss.
+  
+- The rounding of a large value A can cause that +A and - A becomes +B and -B+1, when rounding is done 
+  to + infinity (which is what round() does). In the LOFAR 1.0 subband correlator this caused confusion,
+  because both the X*Y and Y*X were calculated, which should agree to X*Y = conj(Y*X), but due to rounding
+  affect could differ by 1. For LOFAR 2.0 this can again occur for the cross correlation of the local
+  signal inputs per PN. The cross correlations with the remote signal inputs are calculated only once, so
+  there the rounding affect is not noticed. To avoid this difference common_int2float.vhd in LOFAR 1.0
+  can use g_symmetric=TRUE, which applies truncating to zero.
+  
diff --git a/applications/lofar2/doc/prestudy/station2_to_do_erko.txt b/applications/lofar2/doc/prestudy/station2_to_do_erko.txt
new file mode 100755
index 0000000000000000000000000000000000000000..96c255608455ed015caa5f8384f4b859802a36f8
--- /dev/null
+++ b/applications/lofar2/doc/prestudy/station2_to_do_erko.txt
@@ -0,0 +1,534 @@
+*******************************************************************************
+* SKA experience:
+*******************************************************************************
+- Experiences -> believes ->  choices -> actions -> results
+- Model noise as chorus of 100 sine waves
+- Phase closure with 3 stations: angle(ab*) + angle(bc*) + angle(ca*) = 0 degrees
+- Amplitude closure with 4 stations: |ab*| * |cd*| / (|ac*| * |bd*|) = 1
+- Correlator efficiency
+- RFI flag using most negative value (in LFAA), but this value must be replaced in subsequent processing
+- signal capture, power statistics, power max hold
+- signal generator sinus, sinus in noise, noise, test sequence, ...
+- Jones matrix
+  . orthogonalisation due orientation with rotation of earth and frequency dependence due to changes in ionosfeer?
+  . beamforming due to geometrical delay with rotation of earth
+  . analogue complex gain calibration (temperature dependent)
+  . beam shape tapering? (fixed)
+
+*******************************************************************************
+* LaTeX
+*******************************************************************************
+- \sigma \sqrt{}
+- 4.15 \cdot 10^{15}
+- M =
+    \left[ {begin{array}{cc}
+    1 & 2 & 3 & 4\\
+    5 & 6 & 7 & 8\\
+    \end{array} } \right]
+    
+    
+*******************************************************************************
+* Run RadioHDL with SVN
+*******************************************************************************
+echo "Uniboard trunk is selected"                                                                                                                                                                                                                      
+export SVN=${HOME}/svnroot/UniBoard_FP7                                                                                                                                                                                                                
+#Setup RadioHDL environment for UniBoard2 and and new Uniboard1 applications                                                                                                                                                                          
+. ${SVN}/RadioHDL/trunk/tools/setup_radiohdl.sh                                                                                                                                                                                                        
+# Support old UniBoard environment (including Aarfaac and Paasar)                                                                                                                                                                                      
+. ${SVN}/RadioHDL/trunk/tools/setup_unb.sh                                                                                                                                                                                                             
+
+
+*******************************************************************************
+* Run RadioHDL with GIT
+*******************************************************************************
+
+> cd ~/git/hdl
+> . ./init_hdl.sh               # setup development environment for hdl/
+                                # hdl/libraries, hdl/boards and hdl/applications are developed simultaneously and therefor in one git hdl/ repository
+                                # automatically also sources ../radiohdl/init_radiohdl.sh if necessary
+> compile_altera_simlibs unb1   # creates build/unb1/hdl_libraries_ip_stratixiv.txt
+                                # creates build/quartus/<tool version> simulation models that need to be moved to /home/software/modelsim_altera_libs
+> generate_ip_libs unb1         # creates build/unb1/qmegawiz/
+                                # creates build/unb1/quartus_sh --> empty dir, why is it there?
+> quartus_config unb1           # creates build/unb1/quartus/<hdllib libraries> for synthesis
+                                # creates build/unb1/quartus/technology_select_pkg.vhd
+> modelsim_config unb1          # creates build/unb1/modelsim/<hdllib libraries> for simulation
+                                # creates build/unb1/modelsim/modelsim_project_files.txt for Modelsim commands.do
+                                # creates build/unb1/modelsim/technology_select_pkg.vhd
+> run_qsys unb1 unb1_minimal_qsys                                
+                                
+                                
+*******************************************************************************
+* GIT workflow
+*******************************************************************************
+
+difftool ?
+mergetool ?
+
+* Pro Git book by Scott Chacon: https://git-scm.com/book/en/v2 
+* YouTube : David Mahler part 1,2,3 
+
+Part 1: 
+
+# After GIT install
+git version 
+git config --global http://user.name "EricKooistra" 
+git config --global http://user.email "erkooi@gmail.com" 
+git config --list 
+touch .gitignore       # create .gitignore if it does not already exist
+.gitignore # file with working tree dirs and files to ignore, must also be commited 
+
+# To start a repo
+git init  # start new repo at this dir, creates .git/ 
+git clone # get and start with existing repo 
+
+git status # what is in stage area and what is modified 
+
+Three areas: 
+* working tree # local directory tree 
+| git add 
+v 
+* staging area (index) 
+| 
+v git commit 
+* history # .git repository with entire commit graph 
+
+# To use a repo
+git add <dir>/<file>    # add to stage area, set for commit. Cannot add empty dir, need empty file in it
+git add .               # add all new and modified to stageing area 
+git diff                # diff between file in working tree and staging area 
+git diff --staged # diff between file in staging area and history 
+git rm <filename> # remove file from working tree and stage the delete 
+git checkout -- <filename> # revert a working tree change 
+git reset               # clear stage area
+git reset HEAD <filename> # revert staged change 
+git log -- <filename> # show history of file 
+git checkout <version hash> -- s2 # retrieve file from history into staged area and working tree 
+
+
+Part 2: 
+
+git commit -m "" # commit what is in stage area 
+git commit -a -m "" # add to stage area and commit what is in stage area 
+alias graph="git log --all --decorate --oneline --graph" 
+git branch <branch name> # creat branch 
+git branch # show branches 
+git checkout <branch name> # change working tree and stage area to branch 
+git checkout master 
+git merge <branch name> # Fast forward merge of branch name to master if there is a direct path 
+                        # by moving master to branch name, this is when there have been no updates 
+                        # on the master branch since the branch was created. 
+                        # Three way merge combine the differences of the branch and the master 
+                        # compared to their common version, this can lead to merge conflicts if 
+                        # changes on both branches occur at same parts of a file. 
+git branch --merged # show branches that ghave been merged to master 
+git branch -d <branch name> # remove branch 
+git checkout <commit hash> # detached HEAD because it points to a version not a branch 
+git branch <branch name> # start a branch from the commit hash, HEAD is attached again 
+
+# Stash area to store working tree
+git stash save "comment" # store working tree and stage area to get a clean 
+git stash list # show all stashes 
+git stash apply <label> # restore stash 
+git stash apply # restore last stash 
+
+
+Part 3: Remote repositories (Github, Gitlab, Bitbucket, ...) 
+
+create repo on Github 
+http://README.md # md = mark down 
+git clone <url:.../<repo name>.git> # get copy from url 
+cd <repo name> 
+git config --local http://user.name "EricKooistra" 
+git config --local http://user.email "erkooi@gmail.com" 
+git remote # origin 
+git remote -v # full url 
+
+# To align with remote repo
+# update from remote 
+git status # shows also origin/master, but not live 
+git fetch origin 
+git status # shows also origin/master, now with latest remote 
+git merge origin/master 
+git pull # get latest from remote repo, combines fetch and merge 
+
+# upload to remote 
+git push
+git push origin master  # put local repo to remote repo
+
+# On Github fork is a copy of the a repo in Github to get a repo on your account 
+git clone <url of fork> # get copy of fork repo, will be origin 
+git remote add upstream <url of original repo on Github> # will be upstream 
+git fetch upstream 
+git status 
+# commit local change on branch 
+# git push origin <branch name> # push to my fork repo on Github 
+# pull request on Github 
+# delete branch and fetch npstream if the pull request was accepted 
+git remote remove <remote name> # remove a remote repo 
+
+
+
+*******************************************************************************
+* Confluence:
+*******************************************************************************
+- space tools menu links onder om secties the ordenen.
+ 
+ 
+*******************************************************************************
+* RadioHDL 
+*******************************************************************************
+Open issues:
+- Central HDL_IO_FILE_SIM_DIR = build/sim --> Project local sim dir
+- avs_eth_coe.vhd per tool version? Because copying avs_eth_coe_<buildset>_hw.tcl to $HDL_BUILD_DIR copies the 
+  last <buildset>, using more than one buildset at a time gices conflicts.
+
+
+
+*******************************************************************************
+* To do:
+*******************************************************************************
+- Check that the Expert users (MB, SJW, MN), Maintainers (HM) and Local users are happy with the design decisions
+- H6 M&C loads section
+- H3 Functions mapping
+- H3/4 Timing (1s default, PPS, event message)
+  Timing en M&C in ADD --> detailed design sectie doorlezen
+  * Use PPS event message from SDP --> SC for:
+    1 Hz --> Set ToD (this requires that PPS is avaiable in SDP independent of data path processing)
+    1 Hz --> Update BF weigths
+    1 Hz --> Monitor AST, SST, BST
+  * Use dedicated event message from SDP --> SC for XST readout
+    1 Hz or more --> XST
+  All other M&C is not time critical, which means that required rates are average rates.
+- H4/8
+  . Subband correlator read out via MM or offload --> let SC do read out via M&C and provide offload to TM
+  . One or more subbands per interval
+  . Shorter < 1 s intervals and longer > 1 s intervals
+- HBA tile elements individually controlable, trimmable (e.g. in case of new production batch)
+- Discuss GIT, RadioHDL with PD (read radiohdl readme and Ruuds docu to prepare)
+- System enigineering (map, SEMP doc HvdM)
+- System testcases in SEMP doc --> check with AJB, BH --> test plan after PDR (see SEMP doc)
+- S_hba_international = 192
+- nof tx, rx pkt/s, CRC error count, nof pause frames
+- monitor functions publish their data points, other functions can use these for alarms, visualisation, calibration
+- data buffer on output beamlets --> histogram
+   . buffer all beamlets per T_sub at sync
+   . buffer one beamlet for some T_sub time series after sync
+
+- Use GIT
+- Understand AXI4 streaming (versus avalon, RL =0)
+- Use ARGS to define peripherals
+  . check docs and article
+- Learn how gmi_minimal HDL code works to prepare for porting to unb2b_minimal_gmi
+- Update RadioHDL docs
+- Write RadioHDL article
+- Write HDL RL=0 article - desp_hdl_design_article.txt
+
+
+
+
+    
+- RCU2S-SDP signal input allocation:
+  Decided:
+  . LBA and HBA on seperate RCUs and also on seperate subracks and on independent rings, so SDP is independent
+    for LBA and HBA.
+  . X and Y on the same ring to support dual polarization ACM
+  Still open:
+  . X and Y together per PN or X and Y on seperate subracks
+    - polarization correction via subband weights is not needed, so X and Y can be on different PN
+    - EMI between X and Y, but X and Y have only about 40 dB isolation
+    - EMI between single pol inputs get suppressed dependent on the station digital beam pointing
+  . LBA inner signal inputs and LBA outer signal inputs on different subracks or arbitrary. Inner is not used
+    Instead they use sparse odd and sparse even to have two more or less random antenna allocations.
+  . HBA core station sub-array inputs on different Uniboard2 to reduce EMI
+  . HBA-X power, Y control --> power distribution
+  . HBA RCU 3X and 3Y
+  
+- EMI:
+  . RCU2S-SDP signal input allocation
+  . Shielding
+  . more cross talk between horizontal RCUs than between vertical RCUs
+  . more cross talk between neighbour RCUs
+  . AC-DC instead of using AC-DC48-DC
+  
+  
+*******************************************************************************
+* System engineering:
+*  
+*******************************************************************************
+  
+- Requirements,
+- Functions have their own hierarchy, independent of requirement levels, and independent of PBS.
+  Requirements map to products according to the L0, L1, L2 hierarchy
+  Requirements map to functions at any level
+- Products
+- ICDs
+
+Polarion:
+- Maintain hierarchy of functions and link them to products
+- Start with parent function and allocate that to one or more L3 products/ functions in ADD tables,
+  after that number the child level functions
+- Start with L3 product an look for potential requirements in L2 SRS contents
+  allocate requirement number + keywords
+- Check for unallocated requirements
+  Check for unallocated parent functions
+  Check for unallocated child functions
+- Number child functions in ADD tables
+- After PDR:
+  . Update/add parent functions and child functions in Polarion and in ADD tables
+  . Number the child functions in Polarion
+
+* Non compliant requirements
+  - 3099 LOFAR 1.0 frequency grid only with critically sampled filterbank
+  - 2278 Alias free broadband reconstruction only with oversampled filterbank
+  
+* Missing requirements for functions
+  - operational: all sky image (maybe 2400)
+  - transient buffer: trade inputs for duration
+  - transient detection: what algorithm should be used and with what parameters
+  - Expert user interface requirements
+  - Station Control support for EAA (LORA, NenuFAR)
+  - Rack space for NenuFar
+
+* Missing functions for requirements
+  - 2404 Control of RFI mitigation processes
+  - 2405 RFI process handling
+  - 2400 
+  
+* Unmapped requirements
+  - 
+
+
+* Mark Ruiter: Missing calibration strategy.
+
+  level 1: Analogue input (hba, lba)
+  level 2: Beam ( analog)
+  level 3: (station)
+  level 4: (local ionosphere)
+  level 5: (distributed ionosphere lofar 2)
+  
+  This only mentions digital station calibrating assuming a lot.
+  --> Gijs Schoonderbeek: Added this list in chapter 10, where the expected performance is described.                                       
+
+
+Station:
+1) SC-TM: Local mode - Station and TM - Station ICD needed to define SC functionality
+   - ICD for SC-TM
+2) SC: SCU for application layers with OPC-UA frame work like EPICS, WinCC 
+3) SC-SDP: Separate 'PC / microcontroller platform' to translate OPC-UA - Gemini Protocol of UniBoard2
+   - ICD for SC-SDP
+   - work starts with supporting the 'hello world' of controlling a UniBoard2
+4) SC-RCU2S : 'Microprocessor / microcontroller platform' to translate OPC-UA - I2C, on/off line, SPI.
+   - ICD for SC-PCC
+   - PCC connects to RCU2 and UniBoards
+5) SC-STCA
+6) SC-STF
+7) SC-STIN
+   
+
+
+*******************************************************************************
+* Terminology
+*******************************************************************************
+
+* Software blocks:
+  . server
+  . driver
+  . handler
+
+  . device
+  
+* Synonyms for Control:
+  . setup
+  . configure
+  . handle
+  . set
+  . supply
+  . send
+  . update
+  . boot, run, initialize
+  . write
+  . store
+  
+* Synonyms for Monitor
+  . obtain
+  . get
+  . read
+
+
+
+*******************************************************************************
+* Station Control
+*******************************************************************************
+Remarks:
+- Provide direct TB access to the buffer data is an L4 or L5 function
+- Station beam or station beams? --> Station beams, otherwise call it full band station beam
+- There is no automatic feedback:
+  . AST and SST are only for diagnosis and alarms.
+  . XST is used in separate function to determine a new set of the subband calibration weigths.
+  . The scale beamlets setting is fixed per beamlet mode and nominal signal input levels
+
++ RCU2-LBA
+    * Monitor RCU2-LBA
+         . Monitor RCU2 hardware                    . Sense RCU2 temperatures, voltages and currents
+         . Monitor LBA current                      . Sense LBA current
+    * Setup RCU2-LBA
+         . Control LBA power                        . Supply LBA with power
+         . Select frequency band                    . Filter band of interest
+         . Select signal level                      . Set signal level
+         . Set ADC mode                             . Digitize signal
+       
++ RCU2-HBAT    
+    * Monitor RCU2-HBAT
+         . Monitor RCU2 hardware                    . Sense RCU2 temperatures, voltages and currents
+         . Monitor HBA tile current                 . Sense HBA tile current
+    * Setup RCU2-HBAT
+         . Control HBA tile power                   . Send power or control
+         . Communicate with HBA tile
+         . Select frequency band                    . Filter band of interest
+         . Select signal level                      . Set signal level
+         . Set ADC mode                             . Digitize signal
+        
+    * Create HBA tile beam
+         . Calculate HBA tile delays
+         . Update HBA tile delays
+         . Communicate with HBA tile                . Convert to HBA tile control protocol
+
++ SDP    
+    * Monitor SDP
+         . Monitor SDP status                       * Monitor HW, FW and interface
+                                                    * Receive and timestamp ADC input
+         . Monitor timing                             . L4 PPS monitor, BSN monitor (ToD timestamp, alignment)
+         . Monitor ADC signal input
+                                                      . L4 ADC interface monitor (L4, because only for expert/developer)
+                                                      . Take snapshot of input samples
+                                                      . Calculate ADC statistics (AST) (mean, power, histogram?)
+                                                      
+                                                    * Create calibrated subbands
+         . Monitor subband statistics                 . Calculate subband statistics (SST)
+                                                    * Form station beams
+         . Monitor beamlet statistics                 . Calculate beamlet statistics (BST)
+    
+    * Setup SDP
+         . Control firmware application             * Support and run FW application (upload, store, select, boot, version)
+                                                    * Receive and timestamp ADC input  
+         . Control ADC signal input                   . Receive ADC samples (L4 including WG and DB)
+                                                      . Insert test signal
+         . Compensate coarse cable length             . Align input samples
+         . Set time of day                            . Time stamp input samples 
+                                                    * Create calibrated subbands
+         . Set subband weights                        . Equalize subbands
+                                                    * Form station beams
+         . Select beamlet output bit range            . Scale beamlets
+         . Set output header                          . Output beamlets
+    
+    * Handle subband correlator                     * Create calibrated subbands
+         . Select subband                             . Correlate subband for all signal inputs (XST)
+         . Monitor subband correlations
+         
+    * Derive subband weights
+         . Calculate subband weights from subband correlations
+         . Store subband weights
+                             
+    * Create station beams                          * Form and output station beams
+      . Set subband selection                         . Select subbands
+      . Calculate beamlet weights
+      . Update beamlet weights                        . Form beamlets
+      
+    * Handle transient detection                    * Detect radio transients
+      . Set detection parameters                      
+      . Collect and forward triggers
+    
+    * Handle transient buffer                       * Buffer ADC or subband data
+      . Control TB data input                         . Select and record input data
+      . Control TB data capture
+      . Control TB data read out                      . Select and read out buffered data
+          
+    * Handle subband offload                        * Offload subbands
+    
+
+STF
+    * Monitor Station TF
+      . Monitor status of TD
+      . Monitor timing distribution                 . Distribute timing (10 MHz, PPS)
+      
+    * Setup Station TF
+      . Select sample clock frequency               . Generate sample clock (200 MHz or 160 MHz)
+      . Receive time of day from L2-TD
+      . Set Time of Day timestamp in SDP            . Receive and timestamp ADC input
+      
+STCA
+    * Setup Station Cabinet
+      . Setup power
+      . Setup cooling
+      . Setup network
+    * Monitor Station Cabinet
+      . Monitor power
+      . Monitor cooling
+      . Monitor network
+
+STIN
+    * Monitor Station Infrastructure
+      . Monitor station housing
+
+*******************************************************************************
+* Station ADD
+*******************************************************************************
+
+- H4
+  . SDP dynamisch bereik figure versimpelen + formule + simuleer processing gain van FFT voor real input
+  . Check timing dither with PK, because this would require detailed timing control for SDP and RCU2
+  . In SDP JESD details maybe too detailed design, purpose here is to explain JESD in own words to understand it without
+    having to read references.
+  . Check that +48 v is mentioned (not -48 V)
+  . io_ddr in FN beamformer uses 26 M9K waarvan 10 in de IP. De IP gebruikt ook 1 M144K (~= 16 M9K). De FN beamformer gebruikt 1 DDR3 module.
+    --> voor twee DDR modules zou je ongeveer (26 + 10) * 2 = 72 block RAM verwachten, terwijl TBB-base in LOFAR1 maar 6 gebruikt volgens MP
+        syntesis report.
+  . Timimg fixed op interne sync of op single or periodic timestamp set by SC.
+    - let SDP send event for each interne sync, to help SC align its time critical control and monitoring
+    - possibly use linear interpolation for BF weights to relieve SC from tigth 1 Hz control and to allow fast tracking of satellites
+    - define timestamp as 32b seconds and 32b blocks within second
+    
+
+   
+- H6
+  PK:
+  . Write design decision on dithering (meant to make ADC effectively more linear):
+    1) none
+    2) level using additative CW --> suppress RFI harmonics already at ADC by making the ADC sampling more linear by crossing more levels
+    3) time using delays --> does this work anyway, because the digital BF already suppresses RFI from all directions in which it does not point. 
+       The dithering makes that it looks like the RFI comes from all directions, this does not help, does it?
+  
+PDR:  
+- Polarization correction at station via subband weights, via BF weights?
+- Station Control SCU (Local mode capabilities, Linux, ..)
+- SC and L2-TD at Station
+- CEP ICD (separate beamlet streams)
+- HBA2 tile control
+- Oversampled filterbank (design, R_os rate)
+- TB for LBAS and HBAS simultaneously
+- TB size
+- S_sub_bf = 488
+- Subband weights update rate > 10 min is sufficient?
+- There are >= 488 independent Station beams per polarization. This also implies that the polarizations can have independent
+  Station beams, so different frequencies and pointings. In LOFAR 1.0 a subband and a beamlet are defined as a X,Y tuple.
+  For LOFAR 2.0 the beamlets are still output as tuples but the X and Y are treated independently, so beamlet index i for X 
+  may use a different subband frequency and pointing than beamlet index i for Y. Therefore in LOFAR 2.0 define a subband and
+  a beamlet per single polarization.
+- How to continue SE after PDR:
+  . Update ADD with OAR answers and internal remarks from team
+  . Mapping of function tree, L2 requirements and L3 products in ADD
+  . Put function tree in Polarion and update function names
+  . Add station functions from ADD to function tree in Polarion?
+    - what is the purpose of the function tree, does it provide a check list?
+    - how deep will the function tree go, till the lowest level products?
+  . Do we need more SE views then that we already have with states/modes, requirements, functions and products?
+  . ICDs
+  . L4 requirements?
+  . L4 products?
+    - Uniboard2 hardware is an L4 product?
+    - Uniboard2 firmware is an L4 product?
+  . Detailed design of the firmware product
+
+
+ 
\ No newline at end of file