Skip to content
Snippets Groups Projects
Commit 515e0771 authored by Jan David Mol's avatar Jan David Mol
Browse files

L2SS-434: More XST/SST faqs, added deep clean procedure

parent b6080e19
No related branches found
No related tags found
1 merge request!150L2SS-434: Add sphinx documentation content
...@@ -12,7 +12,11 @@ The docker setup is managed using ``make`` in the ``docker-compose`` directory. ...@@ -12,7 +12,11 @@ The docker setup is managed using ``make`` in the ``docker-compose`` directory.
- ``make build <container>`` to rebuild the image for the container, - ``make build <container>`` to rebuild the image for the container,
- ``make build-nocache <container>`` to rebuild the image for the container from scratch, - ``make build-nocache <container>`` to rebuild the image for the container from scratch,
- ``make restart <container>`` to restart a specific container, for example to effectuate a code change. - ``make restart <container>`` to restart a specific container, for example to effectuate a code change.
- ``make clean`` to remove all images, containers, and volumes. - ``make clean`` to remove all images and containers, and the ``tangodb`` volume. To do a deeper clean, we need to remove all volumes and rebuild all containers from scratch::
make clean
docker volume prune
docker build-nocache
Since the *Python code is taken from the host when the container starts*, restarting is enough to use the code you have in your local git repo. Rebuilding is unnecessary. Since the *Python code is taken from the host when the container starts*, restarting is enough to use the code you have in your local git repo. Rebuilding is unnecessary.
...@@ -32,6 +36,8 @@ The networks are defined in ``docker-compose/networks.yml``: ...@@ -32,6 +36,8 @@ The networks are defined in ``docker-compose/networks.yml``:
The ``$NETWORK_MODE`` defaults to ``tangonet`` in the ``docker-compose/Makefile``. The ``$NETWORK_MODE`` defaults to ``tangonet`` in the ``docker-compose/Makefile``.
.. _corba:
CORBA CORBA
```````````````````` ````````````````````
......
FAQ
===================================
*Q: My device is unreachable, but the device logs say it's running fine.*
The ``$HOSTNAME`` may have been incorrectly guessed by ``docker-compose/Makefile``, or you accidently set it to an incorrect value. See :ref:`corba`.
*Q: I get "API_CorbaException: TRANSIENT CORBA system exception: TRANSIENT_NoUsableProfile" when trying to connect to a device.*
The ``$HOSTNAME`` may have been incorrectly guessed by ``docker-compose/Makefile``, or you accidently set it to an incorrect value. See :ref:`corba`.
*Q: The elk container won't start, saying "max virtual memory areas vm.max_map_count [65530] is too low"?*
The ELK stack needs the ``vm.max_map_count`` sysctl kernel parameter to be at least 262144 to run. See :ref:`elk-kernel-settings`.
*Q: How do I prevent my containers from starting when I boot my computer?*
You have to explicitly stop a container to prevent it from restarting. Use::
cd docker-compose
make stop <container>
or plain ``make stop`` to stop all of them.
*Q: Some SSTs/XSTs packets do arrive, but not all, and/or the matrices remain zero?*
So ``sst.nof_packets_received`` / ``xst.nof_packets_received`` is increasing, telling you packets are arriving. But they're apparently dropped or contain zeroes. First, check the following settings:
- ``sdp.TR_fpga_mask_RW[x] == True``, to make sure we're actually configuring the FPGAs,
- ``sdp.FPGA_processing_enabled_R[x] == True``, to verify that the FPGAs are processing, or the values and timestamps will be zero,
- For XSTs, ``xst.FPGA_xst_processing_enabled_R[x] == True``, to verify that the FPGAs are computing XSTs, or the values will be zero.
Furthermore, the ``sst`` and ``xst`` devices expose several packet counters to indicate where incoming packets were dropped before or during processing:
- ``nof_invalid_packets_R`` increases if packets arrive with an invalid header, or of the wrong statistic for this device,
- ``nof_packets_dropped_R`` increases if packets could not be processed because the processing queue is full, so the CPU cannot keep up with the flow,
- ``nof_payload_errors_R`` increases if the packet was marked by the FPGA to have an invalid payload, which causes the device to discard the packet,
*Q: I am not receiving any XSTs and/or SSTs packets from SDP!*
Are you sure? If ``sst.nof_packets_received`` / ``xst.nof_packets_received`` is actually increasing, the packets are arriving, but are not parsable by the SST/XST device. If so, see the previous question.
Many settings need to be correct for the statistics emitted by the SDP FPGAs to reach our devices correctly. Here is a brief overview:
- ``sdp.TR_fpga_mask_RW[x] == True``, to make sure we're actually configuring the FPGAs,
- ``sdp.FPGA_communication_error_R[x] == False``, to verify the FPGAs can be reached by SDP,
- SSTs:
- ``sst.FPGA_sst_offload_enable_RW[x] == True``, to verify that the FPGAs are actually emitting the SSTs,
- ``sst.FPGA_sst_offload_hdr_eth_destination_mac_R[x] == <MAC of your machine's mtu=9000 interface>``, or the FPGAs will not send it to your machine. Use f.e. ``ip addr`` on the host to find the MAC address of your interface, and verify that its MTU is 9000,
- ``sst.FPGA_sst_offload_hdr_ip_destination_address_R[x] == <IP of your machine's mtu=9000 interface>``, or the packets will be dropped by the network or the kernel of your machine,
- ``sst.FPGA_sst_offload_hdr_ip_destination_address_R[x] == 5001``, or the packets will not be sent to a port that the SST device listens on.
- XSTs:
- ``xst.FPGA_sst_offload_enable_RW[x] == True``, to verify that the FPGAs are actually emitting the SSTs,
- ``xst.FPGA_xst_offload_hdr_eth_destination_mac_R[x] == <MAC of your machine's mtu=9000 interface>``, or the FPGAs will not send it to your machine. Use f.e. ``ip addr`` on the host to find the MAC address of your interface, and verify that its MTU is 9000,
- ``xst.FPGA_xst_offload_hdr_ip_destination_address_R[x] == <IP of your machine's mtu=9000 interface>``, or the packets will be dropped by the network or the kernel of your machine,
- ``xst.FPGA_xst_offload_hdr_ip_destination_address_R[x] == 5002``, or the packets will not be sent to a port that the XST device listens on.
If this fails, see the next question.
*Q: I am still not receiving XSTs and/or SSTs, even though the settings appear correct!*
Let's see where the packets get stuck. Let us assume your MTU=9000 network interface is called ``em2`` (see ``ip addr`` to check):
- Check whether the data arrives on ``em2``. Run ``tcpdump -i em2 udp -nn -vvv -c 10`` to capture the first 10 packets. Verify:
- The destination MAC must match that of ``em2``,
- The destination IP must match that of ``em2``,
- The destination port is correct (5001 for SST, 5002 for XST),
- The source IP falls within the netmask of ``em2`` (unless ``net.ipv4.conf.em2.rp_filter=0`` is configured),
- TTL >= 2,
- If you see no data at all, the network will have swallowed it. Try to use a direct network connection, or a hub (which broadcasts all packets, unlike a switch), to see what is being emitted by the FPGAs.
- Check whether the data reaches user space on the host:
- Turn off the ``sst`` or ``xst`` device. This will not stop the FPGAs from sending.
- Run ``nc -u -l -p 5001 -vv`` (or port 5002 for XSTs). You should see raw packets being printed.
- If not, the Linux kernel is swallowing the packets, even before it can be sent to our docker container.
- Check whether the data reaches kernel space in the container:
- Enter the docker device by running ``docker exec -it device-sst bash``.
- Run ``sudo bash`` to become root,
- Run ``apt-get install -y tcpdump`` to install tcpdump,
- Check whether packets arrive using ``tcpdump -i eth0 udp -c 10 -nn``,
- If not, Linux is not routing the packets to the docker container.
- Check whether the data reaches user space in the container:
- Turn off the ``sst`` or ``xst`` device. This will not stop the FPGAs from sending.
- Enter the docker device by running ``docker exec -it device-sst bash``.
- Run ``sudo bash`` to become root,
- Run ``apt-get install -y netcat`` to install netcat,
- Check whether packets arrive using ``nc -u -l -p 5001 -vv`` (or port 5002 for XSTs),
- If not, Linux is not routing the packets to the docker container correctly.
- If still on error was found, you've likely hit a bug in our software.
...@@ -23,6 +23,7 @@ Even without having access to any LOFAR2.0 hardware, you can install the full st ...@@ -23,6 +23,7 @@ Even without having access to any LOFAR2.0 hardware, you can install the full st
devices/configure devices/configure
configure_station configure_station
developer developer
faq
Indices and tables Indices and tables
......
...@@ -76,6 +76,8 @@ The following commands start all the software devices to control the station har ...@@ -76,6 +76,8 @@ The following commands start all the software devices to control the station har
See :ref:`boot` for more information on the ``boot`` device. See :ref:`boot` for more information on the ``boot`` device.
.. _elk-kernel-settings:
ELK ELK
```` ````
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment