From f43d94c568f82ce8f441143e0497bcd52e4772f9 Mon Sep 17 00:00:00 2001 From: Jan David Mol <mol@astron.nl> Date: Thu, 23 Nov 2017 08:45:27 +0000 Subject: [PATCH] Task #11059: Data loss: Added impact of payload, and hint to check VLAN IPs --- RTCP/Cobalt/GPUProc/doc/data-loss.txt | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/RTCP/Cobalt/GPUProc/doc/data-loss.txt b/RTCP/Cobalt/GPUProc/doc/data-loss.txt index a6beaa88a0a..dc3ef5cfea7 100644 --- a/RTCP/Cobalt/GPUProc/doc/data-loss.txt +++ b/RTCP/Cobalt/GPUProc/doc/data-loss.txt @@ -22,9 +22,15 @@ Total input loss occurs when: ||_|\_ 1-digit bord number (0..3, and 6..9 for HBA1) | \___ 3-digit station number \_____ fixed prefix + * For international stations, the receiving COBALT node needs to have the right VLANs configured. If not, the packets will + arrive on eth5 (cbt00x-10GB04), but dropped as the destination IP (belonging to the VLAN) does not exist. * As root on COBALT, run "tcpdump -i <interface> udp -c 100", and check if the packets are received and correctly addressed. + * For international stations, the receiving COBALT node needs to have the right VLANs configured. If not, the packets will + arrive on eth5 (cbt00x-10GB04), but dropped as the destination IP (belonging to the VLAN) does not exist. Check with + "ip addr" which IPs exist, if you see packets arriving to VLAN IPs. + * The network drops the datagrams due to routing issues. Trace the station route through the network: https://www.astron.nl/lofarwiki/doku.php?id=wanarea:start @@ -55,6 +61,20 @@ Fractional or total input loss occurs when: - "payload error" means the packet is marked as incomplete by the station. - "otherwise bad" means the packet header is corrupted. + * The impact of payload errors is signficant. They arrive scattered over time, and any flagged input is smeared over hundreds of samples + during processing due to the FIR filter. For a 64-channel interferometry observation, we measured the following: + + % payload errors % visibilities flagged + -------------------------------------------- + 3.5% 91% + 1.9% 73% + 1.5% 63% + 1.06% 44% + 0.22% 14% + 0.19% 12% + 0.10% 6.7% + 0.002% 0.13% + * COBALT is not running at real time, and is thus unable to keep up with the input data. This triggers many errors, but all cases devolve into printing: >>> ERROR RTCP.Cobalt.GPUProc - [block 1] Not running at real time! Deadline was 1.23456 seconds ago -- GitLab