diff --git a/docs/source/developer.rst b/docs/source/developer.rst index 169ed0ef53c0f3630b7f952299b8447964502c0a..e14e3350d10507dc9a425fe42080b1573f4e2674 100644 --- a/docs/source/developer.rst +++ b/docs/source/developer.rst @@ -12,7 +12,11 @@ The docker setup is managed using ``make`` in the ``docker-compose`` directory. - ``make build <container>`` to rebuild the image for the container, - ``make build-nocache <container>`` to rebuild the image for the container from scratch, - ``make restart <container>`` to restart a specific container, for example to effectuate a code change. -- ``make clean`` to remove all images, containers, and volumes. +- ``make clean`` to remove all images and containers, and the ``tangodb`` volume. To do a deeper clean, we need to remove all volumes and rebuild all containers from scratch:: + + make clean + docker volume prune + docker build-nocache Since the *Python code is taken from the host when the container starts*, restarting is enough to use the code you have in your local git repo. Rebuilding is unnecessary. @@ -32,6 +36,8 @@ The networks are defined in ``docker-compose/networks.yml``: The ``$NETWORK_MODE`` defaults to ``tangonet`` in the ``docker-compose/Makefile``. +.. _corba: + CORBA ```````````````````` diff --git a/docs/source/faq.rst b/docs/source/faq.rst new file mode 100644 index 0000000000000000000000000000000000000000..361aac78c8dad37dc1690757fbeeb1cee62587fb --- /dev/null +++ b/docs/source/faq.rst @@ -0,0 +1,99 @@ +FAQ +=================================== + +*Q: My device is unreachable, but the device logs say it's running fine.* + +The ``$HOSTNAME`` may have been incorrectly guessed by ``docker-compose/Makefile``, or you accidently set it to an incorrect value. See :ref:`corba`. + +*Q: I get "API_CorbaException: TRANSIENT CORBA system exception: TRANSIENT_NoUsableProfile" when trying to connect to a device.* + +The ``$HOSTNAME`` may have been incorrectly guessed by ``docker-compose/Makefile``, or you accidently set it to an incorrect value. See :ref:`corba`. + +*Q: The elk container won't start, saying "max virtual memory areas vm.max_map_count [65530] is too low"?* + +The ELK stack needs the ``vm.max_map_count`` sysctl kernel parameter to be at least 262144 to run. See :ref:`elk-kernel-settings`. + +*Q: How do I prevent my containers from starting when I boot my computer?* + +You have to explicitly stop a container to prevent it from restarting. Use:: + + cd docker-compose + make stop <container> + +or plain ``make stop`` to stop all of them. + +*Q: Some SSTs/XSTs packets do arrive, but not all, and/or the matrices remain zero?* + +So ``sst.nof_packets_received`` / ``xst.nof_packets_received`` is increasing, telling you packets are arriving. But they're apparently dropped or contain zeroes. First, check the following settings: + +- ``sdp.TR_fpga_mask_RW[x] == True``, to make sure we're actually configuring the FPGAs, +- ``sdp.FPGA_processing_enabled_R[x] == True``, to verify that the FPGAs are processing, or the values and timestamps will be zero, +- For XSTs, ``xst.FPGA_xst_processing_enabled_R[x] == True``, to verify that the FPGAs are computing XSTs, or the values will be zero. + +Furthermore, the ``sst`` and ``xst`` devices expose several packet counters to indicate where incoming packets were dropped before or during processing: + +- ``nof_invalid_packets_R`` increases if packets arrive with an invalid header, or of the wrong statistic for this device, +- ``nof_packets_dropped_R`` increases if packets could not be processed because the processing queue is full, so the CPU cannot keep up with the flow, +- ``nof_payload_errors_R`` increases if the packet was marked by the FPGA to have an invalid payload, which causes the device to discard the packet, + +*Q: I am not receiving any XSTs and/or SSTs packets from SDP!* + +Are you sure? If ``sst.nof_packets_received`` / ``xst.nof_packets_received`` is actually increasing, the packets are arriving, but are not parsable by the SST/XST device. If so, see the previous question. + +Many settings need to be correct for the statistics emitted by the SDP FPGAs to reach our devices correctly. Here is a brief overview: + +- ``sdp.TR_fpga_mask_RW[x] == True``, to make sure we're actually configuring the FPGAs, +- ``sdp.FPGA_communication_error_R[x] == False``, to verify the FPGAs can be reached by SDP, +- SSTs: + + - ``sst.FPGA_sst_offload_enable_RW[x] == True``, to verify that the FPGAs are actually emitting the SSTs, + - ``sst.FPGA_sst_offload_hdr_eth_destination_mac_R[x] == <MAC of your machine's mtu=9000 interface>``, or the FPGAs will not send it to your machine. Use f.e. ``ip addr`` on the host to find the MAC address of your interface, and verify that its MTU is 9000, + - ``sst.FPGA_sst_offload_hdr_ip_destination_address_R[x] == <IP of your machine's mtu=9000 interface>``, or the packets will be dropped by the network or the kernel of your machine, + - ``sst.FPGA_sst_offload_hdr_ip_destination_address_R[x] == 5001``, or the packets will not be sent to a port that the SST device listens on. + +- XSTs: + + - ``xst.FPGA_sst_offload_enable_RW[x] == True``, to verify that the FPGAs are actually emitting the SSTs, + - ``xst.FPGA_xst_offload_hdr_eth_destination_mac_R[x] == <MAC of your machine's mtu=9000 interface>``, or the FPGAs will not send it to your machine. Use f.e. ``ip addr`` on the host to find the MAC address of your interface, and verify that its MTU is 9000, + - ``xst.FPGA_xst_offload_hdr_ip_destination_address_R[x] == <IP of your machine's mtu=9000 interface>``, or the packets will be dropped by the network or the kernel of your machine, + - ``xst.FPGA_xst_offload_hdr_ip_destination_address_R[x] == 5002``, or the packets will not be sent to a port that the XST device listens on. + +If this fails, see the next question. + +*Q: I am still not receiving XSTs and/or SSTs, even though the settings appear correct!* + +Let's see where the packets get stuck. Let us assume your MTU=9000 network interface is called ``em2`` (see ``ip addr`` to check): + +- Check whether the data arrives on ``em2``. Run ``tcpdump -i em2 udp -nn -vvv -c 10`` to capture the first 10 packets. Verify: + + - The destination MAC must match that of ``em2``, + - The destination IP must match that of ``em2``, + - The destination port is correct (5001 for SST, 5002 for XST), + - The source IP falls within the netmask of ``em2`` (unless ``net.ipv4.conf.em2.rp_filter=0`` is configured), + - TTL >= 2, + +- If you see no data at all, the network will have swallowed it. Try to use a direct network connection, or a hub (which broadcasts all packets, unlike a switch), to see what is being emitted by the FPGAs. +- Check whether the data reaches user space on the host: + + - Turn off the ``sst`` or ``xst`` device. This will not stop the FPGAs from sending. + - Run ``nc -u -l -p 5001 -vv`` (or port 5002 for XSTs). You should see raw packets being printed. + - If not, the Linux kernel is swallowing the packets, even before it can be sent to our docker container. + +- Check whether the data reaches kernel space in the container: + + - Enter the docker device by running ``docker exec -it device-sst bash``. + - Run ``sudo bash`` to become root, + - Run ``apt-get install -y tcpdump`` to install tcpdump, + - Check whether packets arrive using ``tcpdump -i eth0 udp -c 10 -nn``, + - If not, Linux is not routing the packets to the docker container. + +- Check whether the data reaches user space in the container: + + - Turn off the ``sst`` or ``xst`` device. This will not stop the FPGAs from sending. + - Enter the docker device by running ``docker exec -it device-sst bash``. + - Run ``sudo bash`` to become root, + - Run ``apt-get install -y netcat`` to install netcat, + - Check whether packets arrive using ``nc -u -l -p 5001 -vv`` (or port 5002 for XSTs), + - If not, Linux is not routing the packets to the docker container correctly. + +- If still on error was found, you've likely hit a bug in our software. diff --git a/docs/source/index.rst b/docs/source/index.rst index cc731b347308ac5393e431d3f10d2b345474caa9..524d21369c9e0ded662f12a365d479ce3dc39abc 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -23,6 +23,7 @@ Even without having access to any LOFAR2.0 hardware, you can install the full st devices/configure configure_station developer + faq Indices and tables diff --git a/docs/source/installation.rst b/docs/source/installation.rst index 2cfb177a17d93ef4254ddf440af4ad4aac503934..cb0122ae95cc01de7f55e333345a6ec4d41bc369 100644 --- a/docs/source/installation.rst +++ b/docs/source/installation.rst @@ -76,6 +76,8 @@ The following commands start all the software devices to control the station har See :ref:`boot` for more information on the ``boot`` device. +.. _elk-kernel-settings: + ELK ````