From 17ca52ba6ef9d42aabace66e8c59467fddced356 Mon Sep 17 00:00:00 2001 From: Eric Kooistra <kooistra@astron.nl> Date: Thu, 16 Feb 2023 14:33:34 +0100 Subject: [PATCH] Updates. --- .../doc/prestudy/station2_sdp_160MHz.txt | 1 + .../station2_sdp_transient_buffer.txt | 241 +++++++++++++++++- 2 files changed, 238 insertions(+), 4 deletions(-) diff --git a/applications/lofar2/doc/prestudy/station2_sdp_160MHz.txt b/applications/lofar2/doc/prestudy/station2_sdp_160MHz.txt index a0df2f55ed..9b7fc186d1 100644 --- a/applications/lofar2/doc/prestudy/station2_sdp_160MHz.txt +++ b/applications/lofar2/doc/prestudy/station2_sdp_160MHz.txt @@ -7,6 +7,7 @@ Daarvoor moeten we een component toevoegen die gebruik maakt van de 25MHz crysta Na clock wissel 200M --> 160M of andersom is het volgende nodig en genoeg voor SC richting SDP: * doe FPGA_boot_image_RW zodat de images opnieuw geladen worden +* poll tot bijv FPGA_firmware_version_R de juiste naam weergeeft (dan is image op) * write FPGA_pps_expected_cnt_RW met 160M of 200M JDM: Als ik FPGA_boot_image_RW schrijf naar de huidige waarde, hoe kan ik dan zien of de FPGAs gereboot zijn? wachten op TR_FPGA_communication_error_R == False oid? diff --git a/applications/lofar2/doc/prestudy/station2_sdp_transient_buffer.txt b/applications/lofar2/doc/prestudy/station2_sdp_transient_buffer.txt index 19f503f7f4..54de8bba5e 100644 --- a/applications/lofar2/doc/prestudy/station2_sdp_transient_buffer.txt +++ b/applications/lofar2/doc/prestudy/station2_sdp_transient_buffer.txt @@ -1,14 +1,35 @@ -Detailed design: Transient Buffer function (LIFT) +Detailed design: Transient Buffer (TBuf) function for LIFT project + + +0) MVP = minimal viable product: +1) DDR4 memory per receiver input +2) Meeting EK-BH 24 okt 2022 [2] +3) TBB (Transient Buffer Board) LOFAR1 +4) TBuf (Transient Buffer) Design +5) TBuf ICD SC-SDP, SDPTR-SDPFW +6) TBuf ICD STAT/SDP-CEP +7) Transient detection (TDet) Design + References: + [1] LIFT requirements: https://plm.astron.nl/polarion/#/project/LOFAR2System/wiki/Overview%20pages/LIFT%20Reference + https://git.astron.nl/desp/hdl/-/blob/L2SDP-857/applications/lofar2/doc/prestudy/lift_sdp_transient_buffer.txt + [2] https://support.astron.nl/confluence/display/L2M/2022-10-24+LIFT+meeting+notes + https://support.astron.nl/confluence/display/L2M/2023-02-08+LIFT+meeting+notes + [3] L1 LOFAR2 Decision: Transport of buffer data from Station to CEP, https://support.astron.nl/confluence/pages/viewpage.action?pageId=94766339 + [4] LOFAR1 TBB: https://support.astron.nl/confluence/display/L2M/Temporary+storage+of+documents+and+papers + https://support.astron.nl/confluence/pages/viewpage.action?spaceKey=L2M&title=Temporary+storage+of+documents+and+papers&preview=/17335979/23069390/TBB_Design_Description_ASTRON_SDD_047.pdf + [5] LOFAR2 PDR: https://support.astron.nl/confluence/pages/viewpage.action?spaceKey=L2M&title=2019-07-01+Meeting+notes%3A+Transient+Buffer+functionality + + 0) MVP = minimal viable product: - - omvat TB, maar nog niet Transient Detectie (TDet) + - omvat TBuf, maar nog niet Transient Detectie (TDet) - needs ring, because 6 antennas per event are typically not connected to one FPGA @@ -35,8 +56,8 @@ In LOFAR12060 [1] time series data en pulse data, wat is pulse data : Hoe streng is de 3.33 s, mag 3.2 s ook ? Gebruik een 16 GByte module per FPGA, zodat uitbreiding naar 6.66 s mopgelijkl is door beide slots the gebruiken Uitlezen per receiver input, zodat uitlezen van een deel vd receiver inputs mogelijk is (bijv 12 vd 192 in [1]) -Defineer TB functie per receiver input: -. zodat de TB functie makkelijk uitbreidbaar is naar meer inputs en naar meer DDR4 modules. +Defineer TBuf functie per receiver input: +. zodat de TBuf functie makkelijk uitbreidbaar is naar meer inputs en naar meer DDR4 modules. . data capture en uitlezen van een receiver input onafhankelijk kan van de andere receiver inputs @@ -60,3 +81,215 @@ Design decision 16GByte DDR4 na L2SDP-854, 850 - Buffer lengte versus nof antennes - Self trigger + +3) TBB (Transient Buffer Board) LOFAR1 +- From 2.3 in [4] + . uses 2048 Byte pages + . addressed based -> typically for write/store + time based -> typically for read/retriev + . 16 channels free size --> not fragmented, not overlap, nof pages/channel, circular + + +4) Transient Buffer (TBuf) Design + +- buffer raw data, no need to buffer subbands + +- choose fixe 14b data, so not e.g. 8 Msbits for lighting and 8 Lsbits for + cosmic ray. Always using full W_adc = 14b makes design and usage more clear. + +- Station --> CEP --> Data Writer + . SDP output UDP directly to CEP or to LCU so that LCU can pass it on via TCP, to + recover from data loss + . SDP output via 10GbE + . SDP CP for speed dial (= throttle) output, to avoid data loss + +- treat all signal inputs independently (even though X and Y are always needed together) + +- timing + . Use sample sequence number (SSN) or mem_bsn: + - SSN increments by nof_samples_per_page = 8176 + - mem_bsn increments by 1 per page, so per block of nof_samples_per_page = 8176. + . SSN counts sample periods (5 ns) since t_epoch = 1970, can fit + 2**64 / (365.25 * 24 * 3600 / 5e-9) > 2922 years + . TBuf uses sop and eop to mark nof_samples_per_page = 8176 + . TBuf does not need sync ? + . Start SSN or mem BSN at same time as SDP BSN by FPGA_processing_enable_RW. + +- CP per signal input buffer + . flexible start and end address (so flexible buffer time per signal input) + . freeze, unfreeze + . no need to whipe (zero) buffer contents after unfreeze ? + +- State + . rst --> stop <--> record + +- Block diagram: + + per si: pack 14b to 64b --> add mem hdr (= ssn or bsn) --> add crc --> pack 64b to 256b + + mux 12 si to 1 --> mux with + 1 MM --> write to DDR4 + + read from DDR4 --> demux to + . 1 MM + . 1 retrieve --> unpack 256b to 64b --> check CRC --> add output hdr --> dump + +- support MP on buffer state + . signal input index + . frozen, buffering, reading + . start address (time), end address (time) + +- Provide direct MM access interface to DDR4 + . New access multiplexer component to interface with io_ddr with: + . write 12 signal input streams for TBuf recording + 1 MM write stream + . read 1 stream for TBuf readout + 1 MM read stream + . Write multiplexer for 12 + 1 = 13 inputs will take ~100 M20K, + because it needs to multiplex and FIFO streams of 256 bit each and + 256 bit requires 256 /40 = 7 M20K in parallel, so 13 * 7 = 91 M20K. + . One M20K = 20b * 1024 words = 40b * 512 words, 512 words of 256b = + 16 kByte, so FIFO can fit (almost) two 8 kB payloads, which seems + sufficient. + +- Use 1 DDR4 module / FPGA + . Because 16GB is enough for T_tbuf = 3.3 s + . 1 DDR4 @ 200MHz yields 200MHz * 256b/8b = 6.4 GB/s maximum write + access. Samples data from 12 ADCs is 12 * 200MHz * 16b/8b = 4.8 GB/s. + Hence the TBuf function then uses 4.8 / 6.4 = 0.75 of the capacity, + which is fine and leaves sufficient spare capacity for some buffer + read out, because 10Gbps / 8b = maximum 1.2 GB/s. + . If we would use 2 DDR4 modules/ FPGA, then treat them as one big + buffer with extended address space by DDR4 II, so use them + sequentially, rather than in parallel, and to still have full + freedom of allocating memory space to signal inputs. + +- support partial dump + . lightning >~ 1 s, cosmic ray >~ 1 ms + . dump t0 - t1 + . dump last dt + +- packetize voor buffer write of na buffer read? --> voor + . packtetize at 64b or 256b ? + . 16b -> 64b packetize --> 64b --> 256b store + . data in buffer must have CRC --> 64b CRC ? + . ddr page packet format: SSN + packed data + CRC + - 8 KByte page to have integer number of pages (= slots) in 16G memory, so + that DDR4-I can wrap without a gap or extend to DDR-II without a gap. + - 14b packed data + - SSN = 64b = 8B + - CRC = 64b = 8B + - 8K - 8 - 8 = 8192 - 16 = 8176 B / 14b = 4672 samples per page + - 4672 * 5 ns = 23.36 us per page, so ~42.8 pages / ms + + +- dp_offload_tx header is the same for all 12 signal inputs, only si differs, + so create one header for all and modify si field to save logic and RAM + . readout 1 page per tx packet + . add additional eth/ip/udp header and application header + . send packed 14b data + +- 12 input multiplexer with 12 x 256b in and 256b out to write 256b words @ 200 MHz +- use SSN as timestamp, SSN = BSN * N_fft, so can be derived from bsn_source BSN, + or do we need a dp_ssn_source.vhd? + +- unb2c_test_ddr_16G resource usage + . git/hdl/boards/uniboard2c/designs/unb2c_test/revisions/unb2c_test_ddr_16G/unb2c_test_ddr_16G_resource_usage.jpg + . per module: + wr_fifo 13 M20K + tech_ddr 9 M20K + rd_fifo 4 M20K + diag db 0 M20K + diag bg 0 M20K + --> Total 26 M20K/DDR4 module + . board common: + MMM : 69 M20K voor Nios memory + ctrl: 42 M20K voor MMAP ROM en 1GbE + +- store and send 14b packed data + . so do not use 16b (with 2b sign extension), to optimize for memory usage and + transport capacity (at the expense of requiring tools to observe the payload + contents). + . store application packet with CRC in DDR4 + . store packed 14b data for 16/14 = 1.14 more buffer space (3.3s --> 3.8s) + . send unpacked 16b data to CEP with new CRC + . CRC = 64b, header multiple of 64b, nof samples per payload multiple of 64b + +- Maximum number of packets per dump + . max memory size 16GB + . max payload size 8kB + --> 16G / 8k = 2M packets --> log2(2M) = 21b + . use packet serial number, instead of sop, eop bit fields, to show progress of + the packet dump to CEP + . allocate start_page/nof_pages per si to memory, wrap at max memory size + circular buffer per si, wrap after nof_pages + keep track of nof_recorded pages, when > nof pages then circular buffer is + full and carries only fresh data + keep track of ssn + page index of last recorded page + + + +5) TBuf ICD SC-SDP, SDPTR-SDPFW + +- Control Points (CP): + . FPGA_tbuf_alloc_RW [pn][si] --> start page, nof_pages (or as seperate CP?) + - nof_pages = 0 means si has no buffer, > 0 means si has buffer + . FPGA_tbuf_record_RW [pn][si] --> start/continue (True) or stop (= freeze) (False) recording + . FPGA_tbuf_retrieve_RW --> pn, si, ssn, nof_pre_pages, nof_post_pages (or as seperate CP?) + - allow only retrieve from one (pn, si) at a time + - total nof pages = nof_pre_pages + 1 (pointed by ssn) + nof_post_pages + . FPGA_tbuf_output_hdr_eth_destination_mac_RW + . FPGA_tbuf_output_hdr_ip_destination_address_RW + . FPGA_tbuf_output_hdr_udp_destination_port_RW + . FPGA_tbuf_output_enable_RW + +- Monitor Points (MP): + . FPGA_tbuf_total_nof_pages_R --> 16G / 8k = 2M + . FPGA_tbuf_page_size_R --> 8 kByte + . FPGA_tbuf_nof_samples_per_page_R --> 8176 + . FPGA_tbuf_page_period_R --> 23.36 us + . FPGA_tbuf_recording_R [pn][si] + . FPGA_tbuf_retrieving_R [pn][si] + Maybe: + . FPGA_tbuf_last_page [pn][si] --> index of last recorded page + . FPGA_tbuf_last_ssn [pn][si] --> ssn of last recorded page + . FPGA_tbuf_nof_recorded_pages[pn][si] --> number of fresh recorded pages <= alloc nof_pages + + + +6) TBuf ICD STAT/SDP-CEP + +- application header fields: + . 8b marker + . 8b version_id + . 16b station_id + . 32b source_info + - 1b antenna_band_index + - 1b nyquist_zone_index + - 1b f_adc --> sample period is 5 ns or 6.25 ns + - 1b memory_error --> based on DDR4 read CRC + - 5b sample_width --> 14b + . 8b signal_input_index + . 16b nof_samples_per_packet + . 24b packet serial number in current dump + . 24b total nof packets in current dump + . 64b SSN = Sample Sequence Number + No need for: + - 32b observation_id --> also not in LOFAR1 + - 5b gn_index --> signal_input_index provides already all this information + + +7) Transient detection (TDet) Design + +- no self triggering yet for MVP + +- will use Hilbert transform of real input and > 30MHz BPF + https://nl.mathworks.com/help/signal/ug/single-sideband-modulation-via-the-hilbert-transform.html + For the FIR Hilbert transformer we will use an odd length filter which is + computationally more efficient than an even length filter. Albeit even + length filters enjoy smaller passband errors. The savings in odd length + filters is a result that these filters have several of the coefficients that + are zero. Also, using an odd length filter will require a shift by an + integer time delay, as opposed to a fractional time delay that is required + by an even length filter. For an odd length filter, the magnitude response + of a Hilbert Transformer is zero for w=0 and w=π. For even length filers the + magnitude response doesn't have to be 0 at π, therefore they have increased + bandwidths. So for odd length filters the useful bandwidth is limited to + 0 < w < π. -- GitLab