Merge branch 'master' into HPR-127

2a782937 · Reinier van der Walle · a1f9c9b0 · 6c28468d · 2a782937 · 2a782937
Commit 2a782937 authored Jun 27, 2023 by Reinier van der Walle
--- a/applications/lofar2/doc/prestudy/lift_sdp_transient_buffer.txt
+++ b/applications/lofar2/doc/prestudy/lift_sdp_transient_buffer.txt
@@ -31,15 +31,3 @@ a) Transient buffer:
  . 3.33 * 16b / 14b = 3.81 s
  . header overhead 5 % --> 3.81 * 0.95 = 3.61 s
-b) Transient detection:
- notch filter ?
- transient detection based on power after 40 - 80 MHz BPF
- measure power during 10 us to detect pulse, if > threshold then send pulse detection events
-  per signal input via 1GbE
- e.g. use dead time after a detection to avoid storm of event messages
- Waarom kan LIFT niet commensal met BF?
-  --> during thunderstorm BF measurements get disturbed anyway
-  --> For maximum transport capacity to CEP
--- a/applications/lofar2/doc/prestudy/station2_sdp_transient_buffer.txt
+++ b/applications/lofar2/doc/prestudy/station2_sdp_transient_buffer.txt
@@ -12,7 +12,7 @@ Detailed design: Transient Buffer (TBuf) function for LIFT project
 8) Crossbar
 10) Planning
 11) Transient detection (TDet) Design
-12) Cosmic ray
+12) Development planning
 References:
@@ -444,7 +444,7 @@ The CP FPGA_beamlet_output_nof_beamlets_RW is not supported in SDPTR and SDPFW y
              dump_nof_pages = 0
          # Set packets that will be dumped by SDPTR
-          reg_dump_start_page_RW = dump_start_page % (page_max + 1)
+          reg_dump_start_page_RW = dump_start_page % nof_pages_in_buffer
          reg_dump_nof_pages_RW = dump_nof_pages
    . reg_memory_read_nof_packets_R
@@ -656,7 +656,7 @@ Het is gewoon een json-bestandje dat je naast een databestand met alleen complex
  mux right, then apply drv_copi with wr_not_rd and burstbegin.
-10) Planning
+9) Design planning
 - ICD STAT-CEP --> tbuf packet format
 - ICD SC-SDP --> OPC-UA CP and MP
@@ -687,33 +687,263 @@ The selection between recording all or half of the antennas per FPGA has the fol
 - requires decision making at higher software, configuration or user control levels, to decide which selection to use
-11) Transient detection (TDet) Design
- no self triggering yet for MVP
+10) Development planning
- Pulse detection messages contain event info and timestamp, which is still
-  useful for ligthning science even without dumping buffer.
+[1] FW design decisions, https://support.astron.nl/confluence/display/L2M/L4+SDPFW+Decision%3A+Transient+buffer+raw+data
+[2] FW detailed design, https://support.astron.nl/confluence/display/L2M/L5+SDPFW+Design+Document%3A+Transient+buffer+raw+data
- will use Hilbert transform of real input and > 30MHz BPF
+[3] ICD SC-SDP, https://support.astron.nl/confluence/display/L2M/L2+STAT+Decision%3A+SC+-+SDP+OPC-UA+interface
-  https://nl.mathworks.com/help/signal/ug/single-sideband-modulation-via-the-hilbert-transform.html
+[4] ICD SDPTR-SDPFW, https://support.astron.nl/confluence/display/L2M/L3+SDP+Decision%3A+SDPTR+-+SDPFW+register+map+interface
-  For the FIR Hilbert transformer we will use an odd length filter which is
+[5] ICD STAT-CEP, https://plm.astron.nl/polarion/#/project/LOFAR2System/wiki/L1%20Interface%20Control%20Documents/STAT%20to%20CEP%20ICD
-  computationally more efficient than an even length filter. Albeit even
+[6] https://support.astron.nl/confluence/display/L2M/L3+SDP+Testing+Notebook%3A+Transient+buffer
-  length filters enjoy smaller passband errors. The savings in odd length
-  filters is a result that these filters have several of the coefficients that
+a) SDPTR
-  are zero. Also, using an odd length filter will require a shift by an
-  integer time delay, as opposed to a fractional time delay that is required
+Read TBuf documentation
-  by an even length filter. For an odd length filter, the magnitude response
+  done: when FW design and ICDs are clear
-  of a Hilbert Transformer is zero for w=0 and w=π. For even length filers the
-  magnitude response doesn't have to be 0 at π, therefore they have increased
+#######################################
-  bandwidths. So for odd length filters the useful bandwidth is limited to
+# CP and MP
-  0 < w < π.
+#######################################
- https://en.wikipedia.org/wiki/Analytic_signal --> Smith, J.O. "Analytic Signals and Hilbert Transform Filters", in Mathematics of the Discrete Fourier Transform (DFT) with Audio Applications, Second Edition
+Add direct access CP and MP for TBuf
- https://nl.mathworks.com/help/dsp/ug/envelope-detection.html
+ddr4 memory interface
+signal_input
- forced trigger message every 30 s, for logging of default radio environment --> send to LCU2
+recording
-  radio triggered message once per hour
+output
+ring
-12) Cosmic ray
+Add composite CP and MP for dump bit rate
-* Katie Mulrey (RU, Cosimc Ray group)
+  - convert bps to inter packet gap and unit test, see section in [4]
-* Stijn Buitink (VUB, Cosmic Ray group)
+Add composite CP and MP for dump page range
+  - convert requested dump interval to actual dump interval and unit test, see section in [4]
+Access CP and MP of TBuf for one node on HW
+  done: when all CP and MP can be accessed via OPC-UA and SDPTR using sdp_rw.py
+#######################################
+# tbuf_dump script
+#######################################
+Create tbuf_dump script for recording on one node
+    similar as *_stream.py scripts for statistics and beamlets
+    follow dynamic behavior template for recording and dumping in ICD [4]
+    extend simpel stub for recording a fixed interval when recording is enabled
+    Done: when tbuf_dump script can setup, recording and freeze in simulation
+Extend tbuf_dump script with dumping on one node
+  - follow dynamic behavior template for recording and dumping in ICD [4]
+  - support different dump intervals
+  - extend simpel stub for actual dumping interval and MP counters
+  done: when tbuf_dump script can dump intervals in simulation
+Verify tbuf_dump script with one node in simulation
+  - verify MP counters
+  done: when tbuf_dump script and stub are part of SDPTR SW CICD
+Verify tbuf_dump M&C with one node on HW
+  - verify MP counters
+  done: when tbuf_dump script can do a record, freeze and dump.
+Support tbuf packet decoding in stream_reader.py
+  - no need for stream_reader stub in simulation (?), because stream_reader has been verified already on HW for beamlets
+  done: when stream_reader.py can unpack tbuf packets
+Verify tbuf_dump stream header with one node on HW
+  done: when tbuf_dump script can verify the dumped packet headers
+Verify tbuf_dump stream data with one node on HW
+  done: when tbuf_dump script can verify the dumped packet data (based on expected amplitude level)
+Extend tbuf_dump script for multiple nodes in simulation
+  - setup CP for tbuf ring lane
+  - support dumping from a list of nodes
+  done: when tbuf_dump script with multiple SDPFW node stubs works in SDPTR SW CICD
+Verify tbuf_dump script for multiple nodes with ring on HW
+  done: when tbuf_dump script can do a record, freeze and dump and read all dumped packet headers
+#######################################
+# tbuf in CICD
+#######################################
+Maintain tbuf_dump SDPTR SW CICD test in simulation
+  done: when tbuf in SDPTR SW CICD test in simulation still runs ok after an update
+Setup unb2c HW platform for SDP CICD test on HW
+  done: when SDP CICD test can run with at least one unb2c every weekend
+Add tbuf_dump script to SDP CICD test on HW
+  done: when tbuf_dump test is part of SDP CICD test on unb2c
+Maintain tbuf_dump SDP CICD test on HW
+  done: when tbuf in SDP CICD test on HW still runs ok after an update
+b) SDPFW
+Review TBuf documentation
+  done: when FW design and ICDs are clear
+#######################################
+# tbuf coding and initial synthesis
+#######################################
+Code sdp_tbuf_registers.vhd for REG_TBUF
+  - sdp_tbuf_registers.vhd --> sdp.peripheral.yaml
+  done: when HDL code compiles
+Prepare tbuf firmware design revision for one node
+  - node_sdp_transient_buffer.vhd with MM connected for:
+    . sdp_tbuf_registers.vhd
+  - lofar2_unb2c_sdp_station_tbuf_one.vhd -->
+    . lofar2_unb2c_sdp_station.yaml
+    . qsys_lofar2_unb2c_sdp_station.qsys
+    . mmm_lofar2_unb2c_sdp_station.vhd
+  done: when VHDL code compiles and generated MMAP is ok
+Add tbuf remaining MM ports for one node
+  - node_sdp_transient_buffer.vhd with MM connected for:
+    . dp_rsn_source.vhd
+    . dp_bsn_monitor_v2.vhd
+    . dp_strobe_total_count.vhd
+    . io_ddr.vhd
+    . sdp_tbuf_output.vhd skeleton only with:
+      - dp_offload_tx_v3.vhd : app header
+      - dp_offload_tx_v3.vhd : network header
+      - mms_dp_xonoff
+  - lofar2_unb2c_sdp_station_tbuf_one.vhd -->
+    . lofar2_unb2c_sdp_station.yaml
+    . qsys_lofar2_unb2c_sdp_station.qsys
+    . mmm_lofar2_unb2c_sdp_station.vhd
+  done: when VHDL code compiles and generated MMAP is ok
+Code tbuf firmware for record all
+  - sdp_tbuf_pkg.vhd
+  - sdp_tbuf_arbiter.vhd
+  - sdp_tbuf_writer.vhd
+  - sdp_tbuf_reader.vhd
+  - sdp_tbuf_output.vhd
+  - node_sdp_transient_buffer.vhd
+  done: when VHDL code is complete and compiles (no verification yet)
+Try synthesize tbuf design revision for one node
+  done:
+  . when synthesis of lofar2_unb2c_sdp_station_tbuf_one.vhd yields expected resource usage and meets timing
+  . report resource usage in [6]
+Access CP and MP for DDR4 in tbuf design on HW
+  . access the CP and MP via OPC-UA and SDPTR using sdp_rw.py
+  . read MP for DDR4 to check that it is available using sdp_rw.py
+  done: when all CP and MP can be accessed via OPC-UA and SDPTR and the DDR4 interface is calibrated
+#######################################
+# tbuf verification in simulation
+#######################################
+Prepare tbuf test bench on one node
+  - tb_sdp_tbuf_pkg.vhd
+  - tb_lofar2_unb2c_sdp_station_tbuf_one.vhd
+  done: when VHDL code compiles
+Verify tbuf recording in simulation
+  done: when sdp_tbuf_writer in design can record to DDR4
+Verify tbuf reading in simulation
+  done: when sdp_tbuf_reader in design can read from DDR4
+Verify tbuf recording and dumping (headers) in simulation
+  - p_verify_dump_header
+  done: when tbuf loop recording and dumping works in simulation and the headers are ok
+Verify tbuf recording and dumping (payloads) in simulation
+  - p_verify_dump_data
+  done: when tbuf loop recording and dumping works in simulation and the payloads are ok
+Synthesize tbuf design revision for one node
+  done: when synthesis of lofar2_unb2c_sdp_station_tbuf_one.vhd yields expected resource usage and meets timing
+  . report resource usage in [6]
+Reconsider dp_repack_data to 504b instead of 512b
+  . see TODO in dp_repack_data section in [2]
+  . necessary if there are synthesis issues
+  . requires update of FW design document
+Verify tbuf MP total strobe counters in simulation
+  done: when MP total strobe counters are correct
+Verify tbuf with small inter packet gap in simulation
+  done: when no FIFO overflow occurs and MP report dropped packets
+Verify tbuf output in combination with beamlet output in simulation
+  - see TBuf output in combination with beamlet output section in [2]
+  - see TODO in packet transport flow control section in [2]
+  done: when no FIFO overflow occurs and MP report dropped packets
+#######################################
+# tbuf verification on HW
+#######################################
+Access CP and MP of tbuf design revision for one node on HW
+  done: when all CP and MP can be accessed via OPC-UA and SDPTR
+Verify tbuf output on HW
+  - use tbuf_dump script
+  done: when tbuf dumps the expected range of packets, report results in [6]
+Verify tbuf output in combination with beamlet output on HW
+  - use beamlet_stream.py script to enable beamlet output
+  - use tbuf_dump script
+  done: when tbuf dumps the expected range of packets, report results in [6]
+#######################################
+# tbuf with ring
+#######################################
+Prepare tbuf firmware design revision with ring
+  - connect ring_lane
+  - lofar2_unb2c_sdp_station_tbuf_ring.vhd -->
+    . lofar2_unb2c_sdp_station.yaml
+    . qsys_lofar2_unb2c_sdp_station.qsys
+    . mmm_lofar2_unb2c_sdp_station.vhd
+  done: when VHDL code compiles and generated MMAP is ok
+Synthesize tbuf design revision with ring
+  done:
+  . when synthesis of lofar2_unb2c_sdp_station_tbuf_ring.vhd yields expected resource usage and meets timing
+  . report resource usage in [6]
+Verify tbuf recording and dumping via ring in simulation
+  - see TBuf with ring transport section in [2]
+  done: when tbuf loop recording and dumping via ring works in simulation
+Change c_err_bi = 6 for all ring interfaces
+  - see TODO in ring interface section in [2]
+  done: when VHDL regression tests still run ok.
+Change RX_select in sdp_station.vhd
+  - see TODO in ring interface section in [2]
+  done: when VHDL regression tests still run ok.
+Verify tbuf recording and dumping via ring on HW
+  done: when tbuf loop recording and dumping works on HW, report results in [6]
+c) Support record all or half of the antenna inputs
+#######################################
+# tbuf support record all or half
+#######################################
+Extend tbuf firmware with record half support
+  done: when VHDL code is complete and compiles
+Verify tbuf record all or half in simulation
+  done: when tbuf loop recording and dumping works ok for alternatingly all or half
+Extend tbuf_dump script with support for record all or half
+Verify tbuf record all or half on hardware
--- a/applications/lofar2/doc/prestudy/station2_sdp_transient_detection.txt
+++ b/applications/lofar2/doc/prestudy/station2_sdp_transient_detection.txt
+1) Transient detection (TDet) Design
+- no self triggering yet for MVP
+- Pulse detection messages contain event info and timestamp, which is still
+  useful for ligthning science even without dumping buffer.
+- will use Hilbert transform of real input and > 30MHz BPF
+  https://nl.mathworks.com/help/signal/ug/single-sideband-modulation-via-the-hilbert-transform.html
+  For the FIR Hilbert transformer we will use an odd length filter which is
+  computationally more efficient than an even length filter. Albeit even
+  length filters enjoy smaller passband errors. The savings in odd length
+  filters is a result that these filters have several of the coefficients that
+  are zero. Also, using an odd length filter will require a shift by an
+  integer time delay, as opposed to a fractional time delay that is required
+  by an even length filter. For an odd length filter, the magnitude response
+  of a Hilbert Transformer is zero for w=0 and w=π. For even length filers the
+  magnitude response doesn't have to be 0 at π, therefore they have increased
+  bandwidths. So for odd length filters the useful bandwidth is limited to
+  0 < w < π.
+- https://en.wikipedia.org/wiki/Analytic_signal --> Smith, J.O. "Analytic Signals and Hilbert Transform Filters", in Mathematics of the Discrete Fourier Transform (DFT) with Audio Applications, Second Edition
+- https://nl.mathworks.com/help/dsp/ug/envelope-detection.html
+- forced trigger message every 30 s, for logging of default radio environment --> send to LCU2
+  radio triggered message once per hour
+2) Transient detection:
+- notch filter ?
+- transient detection based on power after 40 - 80 MHz BPF
+- measure power during 10 us to detect pulse, if > threshold then send pulse detection events
+  per signal input via 1GbE
+- e.g. use dead time after a detection to avoid storm of event messages
+- Waarom kan LIFT niet commensal met BF?
+  --> during thunderstorm BF measurements get disturbed anyway
+  --> For maximum transport capacity to CEP
+3) Cosmic ray
+* Katie Mulrey (RU, Cosimc Ray group)
+* Stijn Buitink (VUB, Cosmic Ray group)
+4) Meeting Brian, Katie 21 june 2023
+https://support.astron.nl/confluence/display/L2M/2023-06-21+LIFT+meeting+notes+on+Transient+Detection
+CR = cosmic ray
+LI = Lightning
+Complex envelope detector:
+. real --> BPF           --> **2
+       --> BPF + Hilbert --> **2 + --> threshold -- event
+. 32 tap FIR filter, coefficients fixed (CP needed, but in MMAP)
+. 32 * 2 * 12 = 768 multipliers
+. FPGA has 3036 multipliers = 1518 DSP blocks
+. SDP Fsub FIR 192 DSP blocks (16 tap * 12 si = 192)
+           FFT 228 DSP blocks
+  Could gain factor 2 with more efficient real multiplier usage in FIR
+Threshold level:
+. fixed
+. via CP most likely
+. tracking dependend on current input power (not needed)
+CR, LI pulse duration in comlex envelope is ~50 - 100ns
+. LI burst of 10000's pulses during 1 s flash
+. CR ~ 1 pulse per min
+LI binning in time, to limit maximum message rate
+. bin time fixed in time of 10 - 100 us
+. only report strongest event during bin
+. typically each bin will show an event during a flash
+Message
+. CR for all inputs that crossed threshold
+. CR sent event message when at least e.g. half of the antenna / FPGA has crossed threshold
+. LI one input per FPGA is enough (or per Station ?)
+. group bins in message to reduce message rate
+. group antennas in mesage to reduce message rate
+. antenna index
+. timestamp of crossing threshold (or of maximum after crossing threshold until uncrossing threshold?)
+. max level of pulse is not needed
--- a/applications/lofar2/model/lofar2_station_sdp_firmware_model.html
+++ b/applications/lofar2/model/lofar2_station_sdp_firmware_model.html
--- a/applications/lofar2/model/signal_statistics.html
+++ b/applications/lofar2/model/signal_statistics.html
--- a/doc/erko_teaser_talks.txt
+++ b/doc/erko_teaser_talks.txt
@@ -27,26 +27,153 @@ at university.
 1) Teaser talk: Quantization in LOFAR2.0 Station Firmware
 * floating point - fixed point - integer
-  . 2**+127 -------------------------- 1. ------------------ 2**-127
-                              <n bit int>
-                                        .   <n bit fxp>           fraction only
-                                    <n bit fxp>                   with fraction
-                     <n bit fxp>        .                         scaled
-  . two complement, so range e.g. -8 to +7 for 4 bit value
+             <p bit exponent>
+  . max -----<w bit mantissa>----- 1. ------------------ min  floating point
+                          <w bit int>                         integer
+                                    .   <w bit fxp>           fixed point fraction only
+                                <w bit fxp>                   fixed point with fraction
+                 <w bit fxp>        .                         fixed point scaled
  . format:
-    - unsigned : u(w, p)
+    - w = number of bits including sign bit
-    - signed : s(w, p)
+    - p = number of bits after point (p > 0) or before point (p < 0)
+    - p = 0 for integer
+    - u = unsigned : u(w, p)
+    - s = signed : s(w, p)
+ * two complement representation of signed numbers
+    - E.g. range -8 to +7 for 4 bit value:
+      -1 1111
+      -2 1110
+      -3 1101
+      -4 1100
+      -5 1011
+      -6 1010
+      -7 1001
+      -8 1000
+      +7 0111
+      +6 0110
+      +5 0101
+      +4 0100
+      +3 0011
+      +2 0010
+      +1 0001
+       0 0000
+    - E.g. range -4 to +3 for 3 bit value:
+      -1 111
+      -2 110
+      -3 101
+      -4 100
+      +3 011
+      +2 010
+      +1 001
+       0 000
+    - MSbit is the sign bit
+    - Sign extension of MSbit
+             -1          +1
+3 bit       111         001
+4 bit      1111        0001
+8 bit  11111111    00000001
+   . overflow wrapping, e.g. with 4 bit:
+       7 + 1 = -8
+      -8 - 1 = +7
+       -(-8) = -8,  negating most negative
+   . product has 2w bits, if most negative * most negative is excluded then 2w - 1 bits:
+       -8 * -8 =  64 = 01000000 = largest positive requires 2w bits
+        7 * +7 =  49 = 00110001 = largest positive  fits in 2w-1 bits
+       -8 * +7 = -56 = 11001000 = smallest negative fits in 2w-1 bits
+* Operations (*, /, +, -) in the digital processing cause bit growth
+  - Keep as many LSbits as needed to preserve sensitivity
+  - Keep as many MSbits as needed to preserve dynamic range
+  Note:
+  . typically we can avoid using a / b in VHDL:
+    . do a * 1/b if b is a constant
+    . use bit shift if b is power of 2
+  . similar for a % b to get remainder after division by b
+    side note: beware of a % b == 1, result is language dependent when a or b is negative
+* Removing LSbits (unused resolution, insignificant bits)
+  - truncation by discarding LSbit is free in logic:
+    . E.g. 4 bit value discard 2 LSbit
-* Operations (*, +) cause bit growth
+      -1 1111  -1 11
-  - rounding (to remove LSbits)
+      -2 1110  -1 11
-    . truncation: int(x), //  (-7 // 6 = -2, 7 // 6 = 1)
+      -3 1101  -1 11
-    . half away: python2, matlab
+      -4 1100  -1 11
+      -5 1011  -2 10
+      -6 1010  -2 10
+      -7 1001  -2 10
+      -8 1000  -2 10
+      +7 0111  +1 01
+      +6 0110  +1 01
+      +5 0101  +1 01
+      +4 0100  +1 01
+      +3 0011   0 00
+      +2 0010   0 00
+      +1 0001   0 00
+       0 0000   0 00
+    Corresponds to:
+    . shift right by n bits >> : -5 >> 2 = -2, 5 >> 2 = 1
+    . divide // 2**n : -5 // 4 = -2, 5 // 4 = 1
+    . floor: floor(-5 / 4) = -2, floor(5 / 4) = 1
+    ==> discarding b LSbits in logic = a >> b = a // 2**b in python
+    ==> a // b = floor(a / b) for integer a, b
+  - rounding LSbits costs logic:
+    + rounding at whole
+      . ceil()
+      . int(x): int(-5 / 4) = -1, int(5 / 4) = 1
+      ==> beware: int(x) != floor(x)
+    + rounding at half
+      . rounding scheme for half, becomes more significant when fraction has fewer bits
+      . round() : language dependent
        . half up
-    . half to even: python3, SDPFW
+        . half away: python2, matlab --> no DC bias
-  - clipping or wrapping (to remove MSbits)
+        . half to even: python3, SDP Firmware --> no DC bias and no power bias
-    . intermediate beamlet sum in BF uses wrapping
+          -1.500000 --> -2
-    . final subband output and beamlet output use clipping
+          -0.500000 --> 0
+           0.500000 --> 0
+           1.500000 --> 2
+           2.500000 --> 2
+* Removing MSbits (unused dynamic range)
+  - wrapping
+    . truncation by discarding MSbits is free in logic
+    ==> beware wrap(x, 4 bits) != x % 2**4
+    . wrap operator is distributive similar as the modulo operator:
+        (a + b) % n = [(a % n) + (b % n)] % n
+    ==> intermediate beamlet sum in SDP digital beamformer uses wrapping
+  - clipping
+    . clipping to min and max costs logic
+    ==> subband output for uses clipping
+    ==> final beamlet output in SDP output to CEP use clipping
+* Reuse standard requantization component
+  - supports removing MSbits and LSbits
+  - to ensure same Q scheme is used consistently throughout the SDP firmware
 * SDP signal path
  - Task: Preserve sensitivity of the ADC input and maintain sufficient dynamic range
@@ -56,6 +183,7 @@ at university.
    . beamlets (BF processing gain)
      - BST
      - beamlet output (8 bit samples to CEP)
+  - Show where Q is applied, always same component is used for Q
  - Figure of internal signal levels
    . dBFS
    . SNR, P_quant
@@ -63,6 +191,52 @@ at university.
    . coherent input (sine), incoherent input (sky noise, weak astronomical signal burried in
      noise)
+    Received signal at ADC consists of (q = one quantization step):
+    . RFI, typically narrow band
+    . sky noise >> q
+    . receiver noise < sky noise
+    . quantization noise = 0.29 q
+    . astronomical signal << q, buried in the sky noise
+    The sky noise sets the nominal input level for the ADC --> sigma_sky = 1 bit is enough for 2% SNR degredation
+    The RFI sets the maximum input level for the ADC --> -50 dBFS margin to avoid overflow (clipping)
+    Used FPGA build in multipliers of 18b * 18b or 27b * 18b
+    FIR filter
+    - Consists of multiply (real coefficients) accumulates (taps)
+    - DC gain of the filter is 1 so sum does not grow beyond 30b
+    - Treat the ADC data as integers and the coefficients as fixed point numbers between +-1
+    - W_adc = 14b and W_coeff = 16b yields product of 30b
+    - Round 30b filter output to 14b
+    - Next step (FFT) can fit 18b input, so round to 18b to keep some more accuracy
+    FFT
+    - Consists of multiply (complex twiddle factors) and accumulates (butterfly)
+    - Narrow band input (sine wave) has gain of 0.5 = -1b
+    - FFT has processing gain of sqrt(N_point) = 5b for N_point = 1024
+    ==> Need 14b -1 + 5 = 18b to fit the subband samples
+    ==> internally the FFT calculates with 26 bits data to avoid accumulation of rounding errors
+    Subband calibration using complex subband weigths
+    - Consists of complex multiply to calibrate fine gain and fine delay differences between antenna inputs
+    - Initially we used 18b subbands, but that yielded artifacts in the SST
+    - Now we keep 26b subbands until after applying the subband weigths ands the round to 18b
+    Beamforming using complex weights and summation of N antenna inputs
+    - std level of coherent input increases with N_ant, e.g. RFI in the beam, the (weak) astronomical signal in the beam
+    - std level of incoherent noise increases with sqrt(N_ant), therefore we can round log2(sqrt(N_ant)) bits
+  - Subband statistics
+    . relate SST level to subband level
+  - Subband weights
+    . calibrate for gain and phase (= fine delay) differences between signal inputs
+    . can also calibrate cross-polarization between X and Y per antenna (used in Disturb)
+    . no equalization accross the subband frequencies is done
+  - Beamlet weights
 * Implementation details
  - Use separate function to do DFT for two real ADC inputs with complex FFT
  - Spectral inversion to have incrementing subband indexes and frequencies in all Nyquist zones
@@ -72,6 +246,8 @@ at university.
  - Interally extra LSbit inside PFB and before applying the weights, see try_round_weight.py
 * Conclusion:
+  - int(), floor(), ceil(), round(), //, % differ in details and can depend on sign, try
+    to be sure per language
  - Fixed point arithmetic uses less FPGA resources (multipliers, RAM, logic) than floating
    point, but requires carefull bookkeeping or the fixed point position in the FW
    implementation.

--- a/libraries/dsp/doc/pfb.txt
+++ b/libraries/dsp/doc/pfb.txt
+1) Quantization in LOFAR2 station: https://support.astron.nl/confluence/display/L2M/L4+SDPFW+Decision%3A+LOFAR2.0+SDP+Firmware+Quantization+Model
+2) Fixed point numbers: https://support.astron.nl/confluence/display/L2M/L3+SDP+Decision%3A+Definition+of+fixed+point+numbers
+3) Rounding: https://support.astron.nl/confluence/display/L2M/L4+SDPFW+Decision%3A+Number+representation%2C+resizing+and+rounding
+4) LOFAR station Polyphase Filterbank (PFB) model in Matlab: https://git.astron.nl/desp/apertif_matlab/-/blob/master/matlab/ The apertif_matlab_readme.txt gives a brief desciption of all files in this repository. The one_pfb.m runs the model.
+5) PFB implementation in Apertif and in LOFAR2 station firmware: https://git.astron.nl/desp/hdl/-/blob/master/libraries/dsp/fft/doc/ASTRON_SP_054_filterbank_spec_part2.pdf
+Het FIR filter als geheel is symmetrisch, maar per phase is het niet symmetrisch.
+Het FIR filter heet ook prototype filter en geeft de filtering aan per output bin (= subband). Alle bins ondergaan dezelfde filtering. In een rechtstreekse implementatie zou je steeds na elk input sample de FFT doen en dan voor elke FFT output bin het FIR filter. Dat is echter niet efficient, omdat je met een factor D kunt downsamplen, wat betekent dat je D-1 samples weggooid. Daarom worden de FIR filters aan de output bins als het ware door de FFT heen gedrukt naar de input. Dan wordt het een polyphase FIR filter en pas je de FFT niet toe na elk sample, maar na elk blok van D samples. Hiermee wordt de implementatie dan wel efficient, omdat alleen gedowsamplede outputs berekend worden. In ons geval is D = N_points = 1024 van de FFT, daarom heet dit een kritisch gesamplede filterbank. Als je D < N_points kiest krijg je een zogenaamde oversampled filterbank.
+De FFT kun je zien als een bank met mixers die het input signaal voor elke bin naar 0 Hz mixen. De bin waarde is een complex getal, omdat je amplitude en phase nodig hebt om de bin golf uniek te kunnen definieren. Meerdere bin waardes sequentieel vormen het gedownsamplede bin signaal in de tijd. De downsample factor D = N_points als de FFT per block van N_point.