@@ -12,7 +12,11 @@ The docker setup is managed using ``make`` in the ``docker-compose`` directory.
...
@@ -12,7 +12,11 @@ The docker setup is managed using ``make`` in the ``docker-compose`` directory.
- ``make build <container>`` to rebuild the image for the container,
- ``make build <container>`` to rebuild the image for the container,
- ``make build-nocache <container>`` to rebuild the image for the container from scratch,
- ``make build-nocache <container>`` to rebuild the image for the container from scratch,
- ``make restart <container>`` to restart a specific container, for example to effectuate a code change.
- ``make restart <container>`` to restart a specific container, for example to effectuate a code change.
- ``make clean`` to remove all images, containers, and volumes.
- ``make clean`` to remove all images and containers, and the ``tangodb`` volume. To do a deeper clean, we need to remove all volumes and rebuild all containers from scratch::
make clean
docker volume prune
docker build-nocache
Since the *Python code is taken from the host when the container starts*, restarting is enough to use the code you have in your local git repo. Rebuilding is unnecessary.
Since the *Python code is taken from the host when the container starts*, restarting is enough to use the code you have in your local git repo. Rebuilding is unnecessary.
...
@@ -32,6 +36,8 @@ The networks are defined in ``docker-compose/networks.yml``:
...
@@ -32,6 +36,8 @@ The networks are defined in ``docker-compose/networks.yml``:
The ``$NETWORK_MODE`` defaults to ``tangonet`` in the ``docker-compose/Makefile``.
The ``$NETWORK_MODE`` defaults to ``tangonet`` in the ``docker-compose/Makefile``.
*Q: My device is unreachable, but the device logs say it's running fine.*
The ``$HOSTNAME`` may have been incorrectly guessed by ``docker-compose/Makefile``, or you accidently set it to an incorrect value. See :ref:`corba`.
*Q: I get "API_CorbaException: TRANSIENT CORBA system exception: TRANSIENT_NoUsableProfile" when trying to connect to a device.*
The ``$HOSTNAME`` may have been incorrectly guessed by ``docker-compose/Makefile``, or you accidently set it to an incorrect value. See :ref:`corba`.
*Q: The elk container won't start, saying "max virtual memory areas vm.max_map_count [65530] is too low"?*
The ELK stack needs the ``vm.max_map_count`` sysctl kernel parameter to be at least 262144 to run. See :ref:`elk-kernel-settings`.
*Q: How do I prevent my containers from starting when I boot my computer?*
You have to explicitly stop a container to prevent it from restarting. Use::
cd docker-compose
make stop <container>
or plain ``make stop`` to stop all of them.
*Q: Some SSTs/XSTs packets do arrive, but not all, and/or the matrices remain zero?*
So ``sst.nof_packets_received`` / ``xst.nof_packets_received`` is increasing, telling you packets are arriving. But they're apparently dropped or contain zeroes. First, check the following settings:
- ``sdp.TR_fpga_mask_RW[x] == True``, to make sure we're actually configuring the FPGAs,
- ``sdp.FPGA_processing_enabled_R[x] == True``, to verify that the FPGAs are processing, or the values and timestamps will be zero,
- For XSTs, ``xst.FPGA_xst_processing_enabled_R[x] == True``, to verify that the FPGAs are computing XSTs, or the values will be zero.
Furthermore, the ``sst`` and ``xst`` devices expose several packet counters to indicate where incoming packets were dropped before or during processing:
- ``nof_invalid_packets_R`` increases if packets arrive with an invalid header, or of the wrong statistic for this device,
- ``nof_packets_dropped_R`` increases if packets could not be processed because the processing queue is full, so the CPU cannot keep up with the flow,
- ``nof_payload_errors_R`` increases if the packet was marked by the FPGA to have an invalid payload, which causes the device to discard the packet,
*Q: I am not receiving any XSTs and/or SSTs packets from SDP!*
Are you sure? If ``sst.nof_packets_received`` / ``xst.nof_packets_received`` is actually increasing, the packets are arriving, but are not parsable by the SST/XST device. If so, see the previous question.
Many settings need to be correct for the statistics emitted by the SDP FPGAs to reach our devices correctly. Here is a brief overview:
- ``sdp.TR_fpga_mask_RW[x] == True``, to make sure we're actually configuring the FPGAs,
- ``sdp.FPGA_communication_error_R[x] == False``, to verify the FPGAs can be reached by SDP,
- SSTs:
- ``sst.FPGA_sst_offload_enable_RW[x] == True``, to verify that the FPGAs are actually emitting the SSTs,
- ``sst.FPGA_sst_offload_hdr_eth_destination_mac_R[x] == <MAC of your machine's mtu=9000 interface>``, or the FPGAs will not send it to your machine. Use f.e. ``ip addr`` on the host to find the MAC address of your interface, and verify that its MTU is 9000,
- ``sst.FPGA_sst_offload_hdr_ip_destination_address_R[x] == <IP of your machine's mtu=9000 interface>``, or the packets will be dropped by the network or the kernel of your machine,
- ``sst.FPGA_sst_offload_hdr_ip_destination_address_R[x] == 5001``, or the packets will not be sent to a port that the SST device listens on.
- XSTs:
- ``xst.FPGA_sst_offload_enable_RW[x] == True``, to verify that the FPGAs are actually emitting the SSTs,
- ``xst.FPGA_xst_offload_hdr_eth_destination_mac_R[x] == <MAC of your machine's mtu=9000 interface>``, or the FPGAs will not send it to your machine. Use f.e. ``ip addr`` on the host to find the MAC address of your interface, and verify that its MTU is 9000,
- ``xst.FPGA_xst_offload_hdr_ip_destination_address_R[x] == <IP of your machine's mtu=9000 interface>``, or the packets will be dropped by the network or the kernel of your machine,
- ``xst.FPGA_xst_offload_hdr_ip_destination_address_R[x] == 5002``, or the packets will not be sent to a port that the XST device listens on.
If this fails, see the next question.
*Q: I am still not receiving XSTs and/or SSTs, even though the settings appear correct!*
Let's see where the packets get stuck. Let us assume your MTU=9000 network interface is called ``em2`` (see ``ip addr`` to check):
- Check whether the data arrives on ``em2``. Run ``tcpdump -i em2 udp -nn -vvv -c 10`` to capture the first 10 packets. Verify:
- The destination MAC must match that of ``em2``,
- The destination IP must match that of ``em2``,
- The destination port is correct (5001 for SST, 5002 for XST),
- The source IP falls within the netmask of ``em2`` (unless ``net.ipv4.conf.em2.rp_filter=0`` is configured),
- TTL >= 2,
- If you see no data at all, the network will have swallowed it. Try to use a direct network connection, or a hub (which broadcasts all packets, unlike a switch), to see what is being emitted by the FPGAs.
- Check whether the data reaches user space on the host:
- Turn off the ``sst`` or ``xst`` device. This will not stop the FPGAs from sending.
- Run ``nc -u -l -p 5001 -vv`` (or port 5002 for XSTs). You should see raw packets being printed.
- If not, the Linux kernel is swallowing the packets, even before it can be sent to our docker container.
- Check whether the data reaches kernel space in the container:
- Enter the docker device by running ``docker exec -it device-sst bash``.
- Run ``sudo bash`` to become root,
- Run ``apt-get install -y tcpdump`` to install tcpdump,