diff --git a/applications/lofar2/doc/prestudy/lift_sdp_transient_buffer.txt b/applications/lofar2/doc/prestudy/lift_sdp_transient_buffer.txt index b2ea82ad8188c614b5b4f3cce909b6836098b007..90b0583c0fdb40c2632f63fe3959026df5ad2221 100644 --- a/applications/lofar2/doc/prestudy/lift_sdp_transient_buffer.txt +++ b/applications/lofar2/doc/prestudy/lift_sdp_transient_buffer.txt @@ -40,3 +40,6 @@ b) Transient detection: - Waarom kan LIFT niet commensal met BF? + + --> during thunderstorm BF measurements get disturbed anyway + --> For maximum transport capacity to CEP diff --git a/applications/lofar2/doc/prestudy/station2_sdp_160MHz.txt b/applications/lofar2/doc/prestudy/station2_sdp_160MHz.txt index a0df2f55ed3aaf79e194015d6f075d92c86d5da0..9b7fc186d16f46cc7372de833c4acb854bb4c8d2 100644 --- a/applications/lofar2/doc/prestudy/station2_sdp_160MHz.txt +++ b/applications/lofar2/doc/prestudy/station2_sdp_160MHz.txt @@ -7,6 +7,7 @@ Daarvoor moeten we een component toevoegen die gebruik maakt van de 25MHz crysta Na clock wissel 200M --> 160M of andersom is het volgende nodig en genoeg voor SC richting SDP: * doe FPGA_boot_image_RW zodat de images opnieuw geladen worden +* poll tot bijv FPGA_firmware_version_R de juiste naam weergeeft (dan is image op) * write FPGA_pps_expected_cnt_RW met 160M of 200M JDM: Als ik FPGA_boot_image_RW schrijf naar de huidige waarde, hoe kan ik dan zien of de FPGAs gereboot zijn? wachten op TR_FPGA_communication_error_R == False oid? diff --git a/applications/lofar2/doc/prestudy/station2_sdp_transient_buffer.txt b/applications/lofar2/doc/prestudy/station2_sdp_transient_buffer.txt index 19f503f7f4f22edc05a01cefa9b56bb720496035..54de8bba5e3d4725d51b17250891aa178e05f598 100644 --- a/applications/lofar2/doc/prestudy/station2_sdp_transient_buffer.txt +++ b/applications/lofar2/doc/prestudy/station2_sdp_transient_buffer.txt @@ -1,14 +1,35 @@ -Detailed design: Transient Buffer function (LIFT) +Detailed design: Transient Buffer (TBuf) function for LIFT project + + +0) MVP = minimal viable product: +1) DDR4 memory per receiver input +2) Meeting EK-BH 24 okt 2022 [2] +3) TBB (Transient Buffer Board) LOFAR1 +4) TBuf (Transient Buffer) Design +5) TBuf ICD SC-SDP, SDPTR-SDPFW +6) TBuf ICD STAT/SDP-CEP +7) Transient detection (TDet) Design + References: + [1] LIFT requirements: https://plm.astron.nl/polarion/#/project/LOFAR2System/wiki/Overview%20pages/LIFT%20Reference + https://git.astron.nl/desp/hdl/-/blob/L2SDP-857/applications/lofar2/doc/prestudy/lift_sdp_transient_buffer.txt + [2] https://support.astron.nl/confluence/display/L2M/2022-10-24+LIFT+meeting+notes + https://support.astron.nl/confluence/display/L2M/2023-02-08+LIFT+meeting+notes + [3] L1 LOFAR2 Decision: Transport of buffer data from Station to CEP, https://support.astron.nl/confluence/pages/viewpage.action?pageId=94766339 + [4] LOFAR1 TBB: https://support.astron.nl/confluence/display/L2M/Temporary+storage+of+documents+and+papers + https://support.astron.nl/confluence/pages/viewpage.action?spaceKey=L2M&title=Temporary+storage+of+documents+and+papers&preview=/17335979/23069390/TBB_Design_Description_ASTRON_SDD_047.pdf + [5] LOFAR2 PDR: https://support.astron.nl/confluence/pages/viewpage.action?spaceKey=L2M&title=2019-07-01+Meeting+notes%3A+Transient+Buffer+functionality + + 0) MVP = minimal viable product: - - omvat TB, maar nog niet Transient Detectie (TDet) + - omvat TBuf, maar nog niet Transient Detectie (TDet) - needs ring, because 6 antennas per event are typically not connected to one FPGA @@ -35,8 +56,8 @@ In LOFAR12060 [1] time series data en pulse data, wat is pulse data : Hoe streng is de 3.33 s, mag 3.2 s ook ? Gebruik een 16 GByte module per FPGA, zodat uitbreiding naar 6.66 s mopgelijkl is door beide slots the gebruiken Uitlezen per receiver input, zodat uitlezen van een deel vd receiver inputs mogelijk is (bijv 12 vd 192 in [1]) -Defineer TB functie per receiver input: -. zodat de TB functie makkelijk uitbreidbaar is naar meer inputs en naar meer DDR4 modules. +Defineer TBuf functie per receiver input: +. zodat de TBuf functie makkelijk uitbreidbaar is naar meer inputs en naar meer DDR4 modules. . data capture en uitlezen van een receiver input onafhankelijk kan van de andere receiver inputs @@ -60,3 +81,215 @@ Design decision 16GByte DDR4 na L2SDP-854, 850 - Buffer lengte versus nof antennes - Self trigger + +3) TBB (Transient Buffer Board) LOFAR1 +- From 2.3 in [4] + . uses 2048 Byte pages + . addressed based -> typically for write/store + time based -> typically for read/retriev + . 16 channels free size --> not fragmented, not overlap, nof pages/channel, circular + + +4) Transient Buffer (TBuf) Design + +- buffer raw data, no need to buffer subbands + +- choose fixe 14b data, so not e.g. 8 Msbits for lighting and 8 Lsbits for + cosmic ray. Always using full W_adc = 14b makes design and usage more clear. + +- Station --> CEP --> Data Writer + . SDP output UDP directly to CEP or to LCU so that LCU can pass it on via TCP, to + recover from data loss + . SDP output via 10GbE + . SDP CP for speed dial (= throttle) output, to avoid data loss + +- treat all signal inputs independently (even though X and Y are always needed together) + +- timing + . Use sample sequence number (SSN) or mem_bsn: + - SSN increments by nof_samples_per_page = 8176 + - mem_bsn increments by 1 per page, so per block of nof_samples_per_page = 8176. + . SSN counts sample periods (5 ns) since t_epoch = 1970, can fit + 2**64 / (365.25 * 24 * 3600 / 5e-9) > 2922 years + . TBuf uses sop and eop to mark nof_samples_per_page = 8176 + . TBuf does not need sync ? + . Start SSN or mem BSN at same time as SDP BSN by FPGA_processing_enable_RW. + +- CP per signal input buffer + . flexible start and end address (so flexible buffer time per signal input) + . freeze, unfreeze + . no need to whipe (zero) buffer contents after unfreeze ? + +- State + . rst --> stop <--> record + +- Block diagram: + + per si: pack 14b to 64b --> add mem hdr (= ssn or bsn) --> add crc --> pack 64b to 256b + + mux 12 si to 1 --> mux with + 1 MM --> write to DDR4 + + read from DDR4 --> demux to + . 1 MM + . 1 retrieve --> unpack 256b to 64b --> check CRC --> add output hdr --> dump + +- support MP on buffer state + . signal input index + . frozen, buffering, reading + . start address (time), end address (time) + +- Provide direct MM access interface to DDR4 + . New access multiplexer component to interface with io_ddr with: + . write 12 signal input streams for TBuf recording + 1 MM write stream + . read 1 stream for TBuf readout + 1 MM read stream + . Write multiplexer for 12 + 1 = 13 inputs will take ~100 M20K, + because it needs to multiplex and FIFO streams of 256 bit each and + 256 bit requires 256 /40 = 7 M20K in parallel, so 13 * 7 = 91 M20K. + . One M20K = 20b * 1024 words = 40b * 512 words, 512 words of 256b = + 16 kByte, so FIFO can fit (almost) two 8 kB payloads, which seems + sufficient. + +- Use 1 DDR4 module / FPGA + . Because 16GB is enough for T_tbuf = 3.3 s + . 1 DDR4 @ 200MHz yields 200MHz * 256b/8b = 6.4 GB/s maximum write + access. Samples data from 12 ADCs is 12 * 200MHz * 16b/8b = 4.8 GB/s. + Hence the TBuf function then uses 4.8 / 6.4 = 0.75 of the capacity, + which is fine and leaves sufficient spare capacity for some buffer + read out, because 10Gbps / 8b = maximum 1.2 GB/s. + . If we would use 2 DDR4 modules/ FPGA, then treat them as one big + buffer with extended address space by DDR4 II, so use them + sequentially, rather than in parallel, and to still have full + freedom of allocating memory space to signal inputs. + +- support partial dump + . lightning >~ 1 s, cosmic ray >~ 1 ms + . dump t0 - t1 + . dump last dt + +- packetize voor buffer write of na buffer read? --> voor + . packtetize at 64b or 256b ? + . 16b -> 64b packetize --> 64b --> 256b store + . data in buffer must have CRC --> 64b CRC ? + . ddr page packet format: SSN + packed data + CRC + - 8 KByte page to have integer number of pages (= slots) in 16G memory, so + that DDR4-I can wrap without a gap or extend to DDR-II without a gap. + - 14b packed data + - SSN = 64b = 8B + - CRC = 64b = 8B + - 8K - 8 - 8 = 8192 - 16 = 8176 B / 14b = 4672 samples per page + - 4672 * 5 ns = 23.36 us per page, so ~42.8 pages / ms + + +- dp_offload_tx header is the same for all 12 signal inputs, only si differs, + so create one header for all and modify si field to save logic and RAM + . readout 1 page per tx packet + . add additional eth/ip/udp header and application header + . send packed 14b data + +- 12 input multiplexer with 12 x 256b in and 256b out to write 256b words @ 200 MHz +- use SSN as timestamp, SSN = BSN * N_fft, so can be derived from bsn_source BSN, + or do we need a dp_ssn_source.vhd? + +- unb2c_test_ddr_16G resource usage + . git/hdl/boards/uniboard2c/designs/unb2c_test/revisions/unb2c_test_ddr_16G/unb2c_test_ddr_16G_resource_usage.jpg + . per module: + wr_fifo 13 M20K + tech_ddr 9 M20K + rd_fifo 4 M20K + diag db 0 M20K + diag bg 0 M20K + --> Total 26 M20K/DDR4 module + . board common: + MMM : 69 M20K voor Nios memory + ctrl: 42 M20K voor MMAP ROM en 1GbE + +- store and send 14b packed data + . so do not use 16b (with 2b sign extension), to optimize for memory usage and + transport capacity (at the expense of requiring tools to observe the payload + contents). + . store application packet with CRC in DDR4 + . store packed 14b data for 16/14 = 1.14 more buffer space (3.3s --> 3.8s) + . send unpacked 16b data to CEP with new CRC + . CRC = 64b, header multiple of 64b, nof samples per payload multiple of 64b + +- Maximum number of packets per dump + . max memory size 16GB + . max payload size 8kB + --> 16G / 8k = 2M packets --> log2(2M) = 21b + . use packet serial number, instead of sop, eop bit fields, to show progress of + the packet dump to CEP + . allocate start_page/nof_pages per si to memory, wrap at max memory size + circular buffer per si, wrap after nof_pages + keep track of nof_recorded pages, when > nof pages then circular buffer is + full and carries only fresh data + keep track of ssn + page index of last recorded page + + + +5) TBuf ICD SC-SDP, SDPTR-SDPFW + +- Control Points (CP): + . FPGA_tbuf_alloc_RW [pn][si] --> start page, nof_pages (or as seperate CP?) + - nof_pages = 0 means si has no buffer, > 0 means si has buffer + . FPGA_tbuf_record_RW [pn][si] --> start/continue (True) or stop (= freeze) (False) recording + . FPGA_tbuf_retrieve_RW --> pn, si, ssn, nof_pre_pages, nof_post_pages (or as seperate CP?) + - allow only retrieve from one (pn, si) at a time + - total nof pages = nof_pre_pages + 1 (pointed by ssn) + nof_post_pages + . FPGA_tbuf_output_hdr_eth_destination_mac_RW + . FPGA_tbuf_output_hdr_ip_destination_address_RW + . FPGA_tbuf_output_hdr_udp_destination_port_RW + . FPGA_tbuf_output_enable_RW + +- Monitor Points (MP): + . FPGA_tbuf_total_nof_pages_R --> 16G / 8k = 2M + . FPGA_tbuf_page_size_R --> 8 kByte + . FPGA_tbuf_nof_samples_per_page_R --> 8176 + . FPGA_tbuf_page_period_R --> 23.36 us + . FPGA_tbuf_recording_R [pn][si] + . FPGA_tbuf_retrieving_R [pn][si] + Maybe: + . FPGA_tbuf_last_page [pn][si] --> index of last recorded page + . FPGA_tbuf_last_ssn [pn][si] --> ssn of last recorded page + . FPGA_tbuf_nof_recorded_pages[pn][si] --> number of fresh recorded pages <= alloc nof_pages + + + +6) TBuf ICD STAT/SDP-CEP + +- application header fields: + . 8b marker + . 8b version_id + . 16b station_id + . 32b source_info + - 1b antenna_band_index + - 1b nyquist_zone_index + - 1b f_adc --> sample period is 5 ns or 6.25 ns + - 1b memory_error --> based on DDR4 read CRC + - 5b sample_width --> 14b + . 8b signal_input_index + . 16b nof_samples_per_packet + . 24b packet serial number in current dump + . 24b total nof packets in current dump + . 64b SSN = Sample Sequence Number + No need for: + - 32b observation_id --> also not in LOFAR1 + - 5b gn_index --> signal_input_index provides already all this information + + +7) Transient detection (TDet) Design + +- no self triggering yet for MVP + +- will use Hilbert transform of real input and > 30MHz BPF + https://nl.mathworks.com/help/signal/ug/single-sideband-modulation-via-the-hilbert-transform.html + For the FIR Hilbert transformer we will use an odd length filter which is + computationally more efficient than an even length filter. Albeit even + length filters enjoy smaller passband errors. The savings in odd length + filters is a result that these filters have several of the coefficients that + are zero. Also, using an odd length filter will require a shift by an + integer time delay, as opposed to a fractional time delay that is required + by an even length filter. For an odd length filter, the magnitude response + of a Hilbert Transformer is zero for w=0 and w=π. For even length filers the + magnitude response doesn't have to be 0 at π, therefore they have increased + bandwidths. So for odd length filters the useful bandwidth is limited to + 0 < w < π. diff --git a/boards/uniboard2c/designs/unb2c_test/revisions/unb2c_test_ddr_16G/hdllib.cfg b/boards/uniboard2c/designs/unb2c_test/revisions/unb2c_test_ddr_16G/hdllib.cfg index a482d069257253a798fb284197b5de0cee7e160c..bb3b67dbb7cd5b3179f39144691daef93b1146c4 100644 --- a/boards/uniboard2c/designs/unb2c_test/revisions/unb2c_test_ddr_16G/hdllib.cfg +++ b/boards/uniboard2c/designs/unb2c_test/revisions/unb2c_test_ddr_16G/hdllib.cfg @@ -33,14 +33,14 @@ quartus_copy_files = ../../src/hex hex quartus_qsf_files = - $RADIOHDL_WORK/boards/uniboard2c/libraries/unb2c_board/quartus/unb2c_board.qsf + $HDL_WORK/boards/uniboard2c/libraries/unb2c_board/quartus/unb2c_board.qsf quartus_sdc_pre_files = quartus/unb2c_test_ddr_16G.sdc - $RADIOHDL_WORK/boards/uniboard2c/libraries/unb2c_board/quartus/unb2c_board_pre.sdc + $HDL_WORK/boards/uniboard2c/libraries/unb2c_board/quartus/unb2c_board_pre.sdc quartus_sdc_files = - $RADIOHDL_WORK/boards/uniboard2c/libraries/unb2c_board/quartus/unb2c_board.sdc + $HDL_WORK/boards/uniboard2c/libraries/unb2c_board/quartus/unb2c_board.sdc quartus_tcl_files = quartus/unb2c_test_ddr_16G_pins.tcl diff --git a/boards/uniboard2c/designs/unb2c_test/revisions/unb2c_test_ddr_16G/quartus/unb2c_test_ddr_16G_pins.tcl b/boards/uniboard2c/designs/unb2c_test/revisions/unb2c_test_ddr_16G/quartus/unb2c_test_ddr_16G_pins.tcl index 776b733cab71d36b5b3bb51beafe10e7b9142542..74859378180bbdd49c6c04a3fa87868be3b01e2e 100644 --- a/boards/uniboard2c/designs/unb2c_test/revisions/unb2c_test_ddr_16G/quartus/unb2c_test_ddr_16G_pins.tcl +++ b/boards/uniboard2c/designs/unb2c_test/revisions/unb2c_test_ddr_16G/quartus/unb2c_test_ddr_16G_pins.tcl @@ -19,7 +19,7 @@ # ############################################################################### -source $::env(RADIOHDL_WORK)/boards/uniboard2c/libraries/unb2c_board/quartus/pinning/unb2c_minimal_pins.tcl +source $::env(HDL_WORK)/boards/uniboard2c/libraries/unb2c_board/quartus/pinning/unb2c_minimal_pins.tcl # module I: set_location_assignment PIN_AP20 -to MB_I_OU.a[0] diff --git a/boards/uniboard2c/designs/unb2c_test/revisions/unb2c_test_ddr_16G/unb2c_test_ddr_16G_resource_usage.jpg b/boards/uniboard2c/designs/unb2c_test/revisions/unb2c_test_ddr_16G/unb2c_test_ddr_16G_resource_usage.jpg new file mode 100644 index 0000000000000000000000000000000000000000..297b87f63306b020d272c578db920a4013caa15d Binary files /dev/null and b/boards/uniboard2c/designs/unb2c_test/revisions/unb2c_test_ddr_16G/unb2c_test_ddr_16G_resource_usage.jpg differ diff --git a/doc/erko_howto_tools.txt b/doc/erko_howto_tools.txt index 38e5e51a4bf56ff2cf2535ff9327aedb0123fd50..e640a078f51c23bfb64e95e7f99cf852534d5ecb 100755 --- a/doc/erko_howto_tools.txt +++ b/doc/erko_howto_tools.txt @@ -151,6 +151,14 @@ run_qcomp unb2c unb2c_test_1GbE_II --clk=CLK run_rbf unb2c unb2c_test_1GbE_II ==> All in one: build_image unb2c unb2c_test --rev=unb2c_test_1GbE_II --seed=1,2 +quartus_config unb2c +run_qsys_pro unb2c unb2c_test_ddr_16G +gen_rom_mmap.py --avalon -d unb2c_test -r unb2c_test_ddr_16G +run_reg unb2c unb2c_test_ddr_16G +run_qcomp unb2c unb2c_test_ddr_16G --clk=CLK +run_rbf unb2c unb2c_test_ddr_16G +==> All in one: build_image unb2c unb2c_test --rev=unb2c_test_ddr_16G --seed=1,2 + # Run command line synthesis for dts quartus_config unb2c run_qsys_pro unb2c lofar2_unb2c_sdp_station_full