Skip to content
Snippets Groups Projects
Commit c162afef authored by Kenneth Hiemstra's avatar Kenneth Hiemstra
Browse files

Merge branch 'master' into L2SDP-37

parents ca7700e1 6e258b82
No related branches found
No related tags found
2 merge requests!100Removed text for XSub that is now written in Confluence Subband correlator...,!77Resolve L2SDP-37 (merge request)
......@@ -291,133 +291,6 @@ The beamformer function has the following sub functions:
* Subband Correlator
*******************************************************************************
With transport scheme 1 crosslets from different source nodes are combined into one packet.
Scheme 3 packs only local crosslets into a packet. Compared to scheme 1, scheme 3:
- treats the local crosslets and remote crosslets independently
- has small payload and thus more packet overhead, but the packet load still fits on a lane
- has small payload that can be enlarged by transporting more local crosslets, to support
a subband correlator with N_crosslets > 1 per integration interval.
Design decision:
Use transport scheme 3 with N/2 hops where every node sends its local crosslets N/2 hops,
because it is more flexible to have only local crosslets per packet.
Number of square correlator cells per PN:
There are S_pn = 12 local crosslets. A packet contains S_pn = 12 remote crosslets. There are N/2
remote crosslet packets. The local crosslets have to be correlated with the local crosslets and
with each of the S_lba - S_pn remote crosslet packets. The correlation with the local crosslets
is a square matrix that yields X_sq = S_pn * S_pn = 144 visibilities. For the local-local square
correlator cell the efficiency is (S_pn * (S_pn+1)) / 2 / X_sq = 54%, but for the N/2 other
local-remote square correlator cells the efficiency is 100%. With N = 16 PN for LBA there are
N/2 = 8 remote crosslet packets. Hence together with the local crosslet visibilities this yields
X_pn = (floor(N/2) + 1) * X_sq = (8 + 1) * 144 = 1296 visibilities per PN. In total the subband
correlator calculates N * X_pn = 16 * 1296 = 20736 visibilities. There are
S_lba * (S_lba + 1)/2 = 192 * 193 / 2 = 18528 unique visibilities. The difference 20736 - 18528
- 2208 is due to that:
. for any N the N * S_pn*(S_pn-1)/2 = 16 * 12*11/2 = 1056 local-local visibilities are calculated
twice
. for N is even floor(N/2) * S_pn*S_pn = 16/2 * 12*12 = 1152 local-remote visibilities are
calculated twice. For N is odd the local-remote visibilities are only calculated once.
and to check 1056 + 1152 = 2208 indeed.
Number of multipliers per crosslet:
The subband correlator needs to finished within one subband period, so within N_fft = 1024 clock
cycles. The X_pn = 1296 visibililies per PN can be caluculated using one complex multiplier if
the complex multiplier runs at 1296 / 1024 * 200 M > 253 MHz. For an oversampled filterbank with
R_os <= 1.28 this requires 324 MHz, which is too much. All X_pn = 1296 can be calculated using
two complex multipliers running at > 161 MHz. However another option is to use one pultiplier
per X_sq = 144 visibilities, so one complex multiplier per correlator cell and N/2 + 1 = 9
correlator cells in parallel. The FPGA has sufficient multipliers to support this scheme and the
spare capacity of each correlator cell can be used to support a subband correlator with more
than 1 subband per integration interval, so N_crosslets > 1.
Design decision:
Use 1 + N/2 parallel correlator cells, for the local-local visibilities and for the local-
remote visibilities for each remote source.
What is the crosslet packet size?
With S_pn = 12 signal inputs per PN and one crosslet per signal input there are 12 crosslets per
packet. A crosslet is a W_crosslet = 16 bit complex value, so P_payload = 12 * 4 = 48 octets
payload, so the effective packet size is P_packet = P_overhead + P_payload = 60 + 48 = 108 octets.
The relative packet overhead for single crosslet payloads is P_overhead / P_packet = 60 / 108 =
55%. Note that P_overhead_dp + P_payload = 20 + 48 = 68 octets still meets the minimum Ethernet
payload size requirement of 46 octets.
Maximum number of crosslets per lane:
There are f_sub = 195312.5 subbands per s, and the packets have to travel N/2 hops. This yields
a packet load of P_packet * f_sub * N/2 = (108 * 8b) * 195312.5 * 16 / 2 = 1.35 Gbps. The data
load of only the payload data is P_payload * f_sub * N/2 = (48 * 8b) * 195312.5 * 16 / 2 =
0.6 Gbps. Hence the small packet size causes a large packet overhead, but is still acceptable,
since it is < L_lane = 7.8125 Gbps, so it fits on a single 10G lane of the ring.
Multiple local crosslets could be transported via seperate packets, a lane can then fit about
7.8125 / 1.35 ~= 5 different crosslets. Packing the local crosslets into a single payload
reduces the packet overhead. The maximum number of crosslets per packet follows from
(P_overhead + x * P_payload * 8b) * f_sub * N/2 < L_lane. For N = 16 this yields x ~=
(7.8125 Gbps / (16/2) / 195312.5 - 60) / (48 * 8b) = 12. With x = 12 crosslets the payload size
is 12 * 48 = 576 and the effective packet size is P_packet = 60 + 576 = 636 octets. The relative
packet overhead for multi crosslet payloads is P_overhead / P_packet = 60 / 636 ~= 9.4%. The
packet load for multi crosslet payloads is (636 * 8b) * 195312.5 * 16/2 = 7.95 Gbps >
L_lane = 7.8125 Gbps, so this just does not fit on a 10GbE lane, due to the still significant
packet overhead. Using x = 11 instead of x = 12 crosslets per packet yields a total crosslet
packet load per lane of ((60 + 11 * 48) * 8b) * 195312.5 * 16/2 = 7.35 Gbps, which does fit on
a lane.
Design decision:
Pack local crosslets into a single payload if N_crosslets > 1, because then the relative packet
overhead is much reduced to support transporting more crosslets per lane (11 instead of 5).
Maximum number of crosslets per correlator cell:
An X_pn correlator cell can correlate N_clk / X_sq = 1024 / 144 = 7 different crosslets frequencies.
With N = 16 for LBA, there need to be P_xc = N/2 + 1 = 9 of these X_pn correlator cells in parallel. One
X_pn correlates the local-local crosslets and the other N/2 = 8 X_pn correlate the local-remote
crosslets. These 9 X_pn in parallel can correlate up to 7 different crosslets. The link can
transport maximum 11 crosslets. Hence the processing capacity of 9 X_pn is less than the IO
capacity of one 10GbE lane, therefore 9 X_pn in parallel can correlate 7 different crosslets.
The crosslet data rate on a lane is then ((60 + 7 * 48) * 8b) * 195312.5 * 16/2 = 4.95 Gbps, so a
utilization of 4.95 / 7.8125 = 63%. Another set of 9 X_pn could be used to correlate the remaining
11 - 7 = 4 crosslets that can be transported via that lane. However, if more than N_crosslet = 7
crosslets need to be correlated in parallel per integration interval, then it is easier to allocate
an extra lane and to instantiate an extra set of 9 X_pn to correlate 14 crosslets in parallel in
total.
One X_pn takes one complex multiplier. For N_crosslets = 1 crosslet per integration interval using
1 + N/2 = 9 X_pn uses only 144 / 1024 = 14% of the processing resources. However this is acceptable
because:
- the FPGA has sufficient multipliers
- it provides a clear design
- the spare capacity can be used to process more crosslets per integration interval
Design decision:
Use 1 + N/2 = 9 parallel correlator cells to correlate N_crosslets = 1 crosslet, or upto 7
crosslets in parallel, per integration interval.
Send more than one time slot per packet?
To reduce the relative packet overhead for single crosslet XC it is an option to put multiple
time slots per payload. Design decision: This is considered to complicated.
What if a packet gets lost?
The local crosslets cannot get lost, but remote packets may get lost. For transit crosslet packets
a lost packet remains lost, because it cannot be replaced. For the subband correlator at this
node the lost remote packets can be replaced by filler data, because the BSN aligner can use the
local input as reference to detect lost packets. The BSN aligner will replace lost remote packets
with filler packets that are flagged. The crosslets in the filler packets contain zero data, so in
the correlator they do not contribute to the visibilities. Each X_pn correlator cell operates on
crosslets from another source. Therefore each X_pn correlator cell has to maintain a count of the
number of valid N_valid and of the number of flagged N_flagged crosslets per integration interval.
The N_valid can be used to weight the visibility relative to the expected number of N_int
crosslets. The N_flagged is used for monitoring. For every integration interval N_int = N_valid
+ N_flagged should be true, by design of the BSN aligner.
What if T_sq > T_hop latency on ring?
What if T_sub > N/2 * T_hop latency on ring?
......@@ -486,53 +359,7 @@ If the BSN aligners allows direct memory access to its input buffers then the X_
correlator cell can read the crosslets from the BSN aligner in arbitrary order and multiple
times.
X_sq correlator cell:
The X_sq correlator cell has two input streams. One input stream delivers the crosslet from
S_pn = 12 signal inputs on one PN and the other input stream delivers the crosslet from
S_pn = 12 signal inputs on the same PN (for local-local visibilities) or another PN (for the
local-remote visibilities). In total the X_sq calculates X_sq = S_pn * S_pn = 12*12 = 144
visibilities. The crosslets are delivered sequentially using a double for loop, so for each
crosslet i in range(S_pn) on one input and for each crosslet j in range(S_pn) on the other
input calculate the product and intergrate the visibility. This calculation sequence requires
that crosslets can be addressed multiple times. For N_crosslets = 1 the X_sq correlator cell
only correlates the first S_pn = 12 crosslets that are delivered on its two inputs. For
N_crosslets > 1 the X_sq continues correlating the next S_pn = 12 crosslets that are delivered
on its two inputs. Hence N_crosslets > 1 merely adds another for loop level to the X_sq, that
loops for k in range(N_crosslets). The visibilities are calculated in order:
k, i, j
0, 0, 0
0, 0, 1
. . .
0, 0,11
0, 1, 0
0, 1, 1
. . .
0, 1,11
. . .
. . .
0,11, 0
0,11, 1
. . .
0,11,11
1, 0, 0
etc.
Support for other (shorter) integration period T_int_x?
- Longer T_int as multiple of 1 s can be supported outside SDP
- Longer T_int can be supported within SDP by:
. Using BSN scheduler
. Reduces M&C data rate
. Should still fit in number of bit of visibility
- Shorter T_int < 1 s (PPS):
. Using BSN scheduler
. increases M&C data rate
. should still fit within PPS grid
- Publish T_int_x period ended event message to Station Control
How can it be scaled to more than one crosslet per XST?
- multiple per packet
- multiple instances of one
*******************************************************************************
......
......@@ -63,6 +63,12 @@
-- dp_block_gen. This assumes that the DSP does pass on the valid, that the
-- block size is known and that the first valid at the output corresponds
-- to a sop.
-- . These are related components that try to pass on sosi info from begin to
-- end, without having to pass it on through each step in the sosi data
-- processing.
-- - dp_paged_sop_eop_reg
-- - dp_fifo_info.vhd
-- - dp_block_gen_valid_arr
LIBRARY IEEE, common_lib, technology_lib;
USE IEEE.STD_LOGIC_1164.ALL;
......
......@@ -32,6 +32,12 @@
-- eop_wr_en <= snk_in.eop & snk_in.eop;
-- to capture the input at the first wr_en and hold it for output at the
-- next wr_en.
-- . These are related components that try to pass on sosi info from begin to
-- end, without having to pass it on through each step in the sosi data
-- processing.
-- - dp_paged_sop_eop_reg
-- - dp_fifo_info.vhd
-- - dp_block_gen_valid_arr
LIBRARY IEEE, common_lib;
USE IEEE.STD_LOGIC_1164.ALL;
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment