diff --git a/applications/lofar2/doc/prestudy/station2_sdp_160MHz.txt b/applications/lofar2/doc/prestudy/station2_sdp_160MHz.txt index a0df2f55ed3aaf79e194015d6f075d92c86d5da0..9b7fc186d16f46cc7372de833c4acb854bb4c8d2 100644 --- a/applications/lofar2/doc/prestudy/station2_sdp_160MHz.txt +++ b/applications/lofar2/doc/prestudy/station2_sdp_160MHz.txt @@ -7,6 +7,7 @@ Daarvoor moeten we een component toevoegen die gebruik maakt van de 25MHz crysta Na clock wissel 200M --> 160M of andersom is het volgende nodig en genoeg voor SC richting SDP: * doe FPGA_boot_image_RW zodat de images opnieuw geladen worden +* poll tot bijv FPGA_firmware_version_R de juiste naam weergeeft (dan is image op) * write FPGA_pps_expected_cnt_RW met 160M of 200M JDM: Als ik FPGA_boot_image_RW schrijf naar de huidige waarde, hoe kan ik dan zien of de FPGAs gereboot zijn? wachten op TR_FPGA_communication_error_R == False oid? diff --git a/applications/lofar2/doc/prestudy/station2_sdp_transient_buffer.txt b/applications/lofar2/doc/prestudy/station2_sdp_transient_buffer.txt index 9b003c2d1e9d586d55d85425ddf795e97b6f499b..b9caf3c067360518dd2d819a8abe2784a0e670d4 100644 --- a/applications/lofar2/doc/prestudy/station2_sdp_transient_buffer.txt +++ b/applications/lofar2/doc/prestudy/station2_sdp_transient_buffer.txt @@ -1,14 +1,35 @@ -Detailed design: Transient Buffer function (LIFT) +Detailed design: Transient Buffer (TBuf) function for LIFT project + + +0) MVP = minimal viable product: +1) DDR4 memory per receiver input +2) Meeting EK-BH 24 okt 2022 [2] +3) TBB (Transient Buffer Board) LOFAR1 +4) TBuf (Transient Buffer) Design +5) TBuf ICD SC-SDP, SDPTR-SDPFW +6) TBuf ICD STAT/SDP-CEP +7) Transient detection (TDet) Design + References: + [1] LIFT requirements: https://plm.astron.nl/polarion/#/project/LOFAR2System/wiki/Overview%20pages/LIFT%20Reference + https://git.astron.nl/desp/hdl/-/blob/L2SDP-857/applications/lofar2/doc/prestudy/lift_sdp_transient_buffer.txt + [2] https://support.astron.nl/confluence/display/L2M/2022-10-24+LIFT+meeting+notes + https://support.astron.nl/confluence/display/L2M/2023-02-08+LIFT+meeting+notes + [3] L1 LOFAR2 Decision: Transport of buffer data from Station to CEP, https://support.astron.nl/confluence/pages/viewpage.action?pageId=94766339 + [4] LOFAR1 TBB: https://support.astron.nl/confluence/display/L2M/Temporary+storage+of+documents+and+papers + https://support.astron.nl/confluence/pages/viewpage.action?spaceKey=L2M&title=Temporary+storage+of+documents+and+papers&preview=/17335979/23069390/TBB_Design_Description_ASTRON_SDD_047.pdf + [5] LOFAR2 PDR: https://support.astron.nl/confluence/pages/viewpage.action?spaceKey=L2M&title=2019-07-01+Meeting+notes%3A+Transient+Buffer+functionality + + 0) MVP = minimal viable product: - - omvat TB, maar nog niet Transient Detectie (TDet) + - omvat TBuf, maar nog niet Transient Detectie (TDet) - needs ring, because 6 antennas per event are typically not connected to one FPGA @@ -35,8 +56,8 @@ In LOFAR12060 [1] time series data en pulse data, wat is pulse data : Hoe streng is de 3.33 s, mag 3.2 s ook ? Gebruik een 16 GByte module per FPGA, zodat uitbreiding naar 6.66 s mopgelijkl is door beide slots the gebruiken Uitlezen per receiver input, zodat uitlezen van een deel vd receiver inputs mogelijk is (bijv 12 vd 192 in [1]) -Defineer TB functie per receiver input: -. zodat de TB functie makkelijk uitbreidbaar is naar meer inputs en naar meer DDR4 modules. +Defineer TBuf functie per receiver input: +. zodat de TBuf functie makkelijk uitbreidbaar is naar meer inputs en naar meer DDR4 modules. . data capture en uitlezen van een receiver input onafhankelijk kan van de andere receiver inputs @@ -60,24 +81,75 @@ Design decision 16GByte DDR4 na L2SDP-854, 850 - Buffer lengte versus nof antennes - Self trigger +<<<<<<< HEAD 3) Design - buffer raw data, no need to buffer subbands - no self triggering yet for MVP +======= + +3) TBB (Transient Buffer Board) LOFAR1 +- From 2.3 in [4] + . uses 2048 Byte pages + . addressed based -> typically for write/store + time based -> typically for read/retriev + . 16 channels free size --> not fragmented, not overlap, nof pages/channel, circular + + +4) Transient Buffer (TBuf) Design + +- buffer raw data, no need to buffer subbands + +- choose fixe 14b data, so not e.g. 8 Msbits for lighting and 8 Lsbits for + cosmic ray. Always using full W_adc = 14b makes design and usage more clear. +>>>>>>> master - Station --> CEP --> Data Writer . SDP output UDP directly to CEP or to LCU so that LCU can pass it on via TCP, to recover from data loss . SDP output via 10GbE +<<<<<<< HEAD . SDP CP for speed dial output, to avoid data loss - treat all signal inputs independently (even though X and Y are always needed together) +======= + . SDP CP for speed dial (= throttle) output, to avoid data loss + +- treat all signal inputs independently (even though X and Y are always needed together) + +- timing + . Use sample sequence number (SSN) or mem_bsn: + - SSN increments by nof_samples_per_page = 8176 + - mem_bsn increments by 1 per page, so per block of nof_samples_per_page = 8176. + . SSN counts sample periods (5 ns) since t_epoch = 1970, can fit + 2**64 / (365.25 * 24 * 3600 / 5e-9) > 2922 years + . TBuf uses sop and eop to mark nof_samples_per_page = 8176 + . TBuf does not need sync ? + . Start SSN or mem BSN at same time as SDP BSN by FPGA_processing_enable_RW. + +>>>>>>> master - CP per signal input buffer . flexible start and end address (so flexible buffer time per signal input) . freeze, unfreeze . no need to whipe (zero) buffer contents after unfreeze ? +<<<<<<< HEAD +======= +- State + . rst --> stop <--> record + +- Block diagram: + + per si: pack 14b to 64b --> add mem hdr (= ssn or bsn) --> add crc --> pack 64b to 256b + + mux 12 si to 1 --> mux with + 1 MM --> write to DDR4 + + read from DDR4 --> demux to + . 1 MM + . 1 retrieve --> unpack 256b to 64b --> check CRC --> add output hdr --> dump + +>>>>>>> master - support MP on buffer state . signal input index . frozen, buffering, reading @@ -85,8 +157,13 @@ Design decision 16GByte DDR4 na L2SDP-854, 850 - Provide direct MM access interface to DDR4 . New access multiplexer component to interface with io_ddr with: +<<<<<<< HEAD . write 12 signal input streams + 1 MM write stream . read 1 stream for TB readout + 1 MM read stream +======= + . write 12 signal input streams for TBuf recording + 1 MM write stream + . read 1 stream for TBuf readout + 1 MM read stream +>>>>>>> master . Write multiplexer for 12 + 1 = 13 inputs will take ~100 M20K, because it needs to multiplex and FIFO streams of 256 bit each and 256 bit requires 256 /40 = 7 M20K in parallel, so 13 * 7 = 91 M20K. @@ -94,17 +171,29 @@ Design decision 16GByte DDR4 na L2SDP-854, 850 16 kByte, so FIFO can fit (almost) two 8 kB payloads, which seems sufficient. +<<<<<<< HEAD Use 1 DDR4 module / FPGA . Because 16GB is enough for T_tbuf = 3.3 s . 1 DDR4 @ 200MHz yields 200MHz * 256b/8b = 6.4 GB/s maximum write access. Samples data from 12 ADCs is 12 * 200MHz * 16b/8b = 4.8 GB/s. Hence the TB function then uses 4.8 / 6.4 = 0.75 of the capacity, +======= +- Use 1 DDR4 module / FPGA + . Because 16GB is enough for T_tbuf = 3.3 s + . 1 DDR4 @ 200MHz yields 200MHz * 256b/8b = 6.4 GB/s maximum write + access. Samples data from 12 ADCs is 12 * 200MHz * 16b/8b = 4.8 GB/s. + Hence the TBuf function then uses 4.8 / 6.4 = 0.75 of the capacity, +>>>>>>> master which is fine and leaves sufficient spare capacity for some buffer read out, because 10Gbps / 8b = maximum 1.2 GB/s. . If we would use 2 DDR4 modules/ FPGA, then treat them as one big buffer with extended address space by DDR4 II, so use them sequentially, rather than in parallel, and to still have full +<<<<<<< HEAD freedom of allocationg memory space to signal inputs. +======= + freedom of allocating memory space to signal inputs. +>>>>>>> master - support partial dump . lightning >~ 1 s, cosmic ray >~ 1 ms @@ -115,9 +204,28 @@ Use 1 DDR4 module / FPGA . packtetize at 64b or 256b ? . 16b -> 64b packetize --> 64b --> 256b store . data in buffer must have CRC --> 64b CRC ? +<<<<<<< HEAD - dp_offload_tx header is the same for all 12 signal inputs, only si differs, so create one header for all and modify si field to save logic and RAM +======= + . ddr page packet format: SSN + packed data + CRC + - 8 KByte page to have integer number of pages (= slots) in 16G memory, so + that DDR4-I can wrap without a gap or extend to DDR-II without a gap. + - 14b packed data + - SSN = 64b = 8B + - CRC = 64b = 8B + - 8K - 8 - 8 = 8192 - 16 = 8176 B / 14b = 4672 samples per page + - 4672 * 5 ns = 23.36 us per page, so ~42.8 pages / ms + + +- dp_offload_tx header is the same for all 12 signal inputs, only si differs, + so create one header for all and modify si field to save logic and RAM + . readout 1 page per tx packet + . add additional eth/ip/udp header and application header + . send packed 14b data + +>>>>>>> master - 12 input multiplexer with 12 x 256b in and 256b out to write 256b words @ 200 MHz - use SSN as timestamp, SSN = BSN * N_fft, so can be derived from bsn_source BSN, or do we need a dp_ssn_source.vhd? @@ -147,9 +255,52 @@ Use 1 DDR4 module / FPGA - Maximum number of packets per dump . max memory size 16GB . max payload size 8kB +<<<<<<< HEAD --> 16G / 8k = 2M packets --> log2(2e6) = 20.93b . use packet serial number, instead of sop, eop bit fields, to show progress of the packet dump to CEP +======= + --> 16G / 8k = 2M packets --> log2(2M) = 21b + . use packet serial number, instead of sop, eop bit fields, to show progress of + the packet dump to CEP + . allocate start_page/nof_pages per si to memory, wrap at max memory size + circular buffer per si, wrap after nof_pages + keep track of nof_recorded pages, when > nof pages then circular buffer is + full and carries only fresh data + keep track of ssn + page index of last recorded page + + + +5) TBuf ICD SC-SDP, SDPTR-SDPFW + +- Control Points (CP): + . FPGA_tbuf_alloc_RW [pn][si] --> start page, nof_pages (or as seperate CP?) + - nof_pages = 0 means si has no buffer, > 0 means si has buffer + . FPGA_tbuf_record_RW [pn][si] --> start/continue (True) or stop (= freeze) (False) recording + . FPGA_tbuf_retrieve_RW --> pn, si, ssn, nof_pre_pages, nof_post_pages (or as seperate CP?) + - allow only retrieve from one (pn, si) at a time + - total nof pages = nof_pre_pages + 1 (pointed by ssn) + nof_post_pages + . FPGA_tbuf_output_hdr_eth_destination_mac_RW + . FPGA_tbuf_output_hdr_ip_destination_address_RW + . FPGA_tbuf_output_hdr_udp_destination_port_RW + . FPGA_tbuf_output_enable_RW + +- Monitor Points (MP): + . FPGA_tbuf_total_nof_pages_R --> 16G / 8k = 2M + . FPGA_tbuf_page_size_R --> 8 kByte + . FPGA_tbuf_nof_samples_per_page_R --> 8176 + . FPGA_tbuf_page_period_R --> 23.36 us + . FPGA_tbuf_recording_R [pn][si] + . FPGA_tbuf_retrieving_R [pn][si] + Maybe: + . FPGA_tbuf_last_page [pn][si] --> index of last recorded page + . FPGA_tbuf_last_ssn [pn][si] --> ssn of last recorded page + . FPGA_tbuf_nof_recorded_pages[pn][si] --> number of fresh recorded pages <= alloc nof_pages + + + +6) TBuf ICD STAT/SDP-CEP +>>>>>>> master - application header fields: . 8b marker @@ -159,7 +310,11 @@ Use 1 DDR4 module / FPGA - 1b antenna_band_index - 1b nyquist_zone_index - 1b f_adc --> sample period is 5 ns or 6.25 ns +<<<<<<< HEAD - 1b payload_error --> based on DDR4 read CRC +======= + - 1b memory_error --> based on DDR4 read CRC +>>>>>>> master - 5b sample_width --> 14b . 8b signal_input_index . 16b nof_samples_per_packet @@ -171,4 +326,24 @@ Use 1 DDR4 module / FPGA - 5b gn_index --> signal_input_index provides already all this information +<<<<<<< HEAD + +======= +7) Transient detection (TDet) Design + +- no self triggering yet for MVP +- will use Hilbert transform of real input and > 30MHz BPF + https://nl.mathworks.com/help/signal/ug/single-sideband-modulation-via-the-hilbert-transform.html + For the FIR Hilbert transformer we will use an odd length filter which is + computationally more efficient than an even length filter. Albeit even + length filters enjoy smaller passband errors. The savings in odd length + filters is a result that these filters have several of the coefficients that + are zero. Also, using an odd length filter will require a shift by an + integer time delay, as opposed to a fractional time delay that is required + by an even length filter. For an odd length filter, the magnitude response + of a Hilbert Transformer is zero for w=0 and w=π. For even length filers the + magnitude response doesn't have to be 0 at π, therefore they have increased + bandwidths. So for odd length filters the useful bandwidth is limited to + 0 < w < π. +>>>>>>> master