Skip to content
Snippets Groups Projects
Commit 0c8d09dd authored by Jan David Mol's avatar Jan David Mol
Browse files

Merge branch 'nomad-jumppad-documentation' into 'master'

L2SS-1679 L2SS-1680: Added some nomad/jumppad documentation

See merge request !848
parents 8394ed9c d2b0016b
No related branches found
No related tags found
1 merge request!848L2SS-1679 L2SS-1680: Added some nomad/jumppad documentation
[[_TOC_]]
# HOWTO
## Login to the client VM
Almost all docker containers of the software stack run within the client VM,
which is actually run as a docker container in the development environment.
This container can be identified programmatically as follows. Use the additional
``-q`` parameter to obtain just the container ID:
```
$ docker ps --filter 'name=client.station.nomad.nomad-cluster.jumppad.dev'
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
90f6f253fb58 shipyardrun/nomad:1.6.1 "/usr/bin/supervisor…" 3 minutes ago Up 3 minutes fee02e87.client.station.nomad.nomad-cluster.jumppad.dev
$
```
You can login interactively to this container using ``sh``:
```
$ CLIENT_CONTAINER_ID=$(docker ps -q --filter 'name=client.station.nomad.nomad-cluster.jumppad.dev')
$ docker exec -it ${CLIENT_CONTAINER_ID} sh
#
```
## Attach to a running client process
The bulk of our client processes run within the client "VM", which has docker running as well:
```
$ docker exec "${CLIENT_CONTAINER_ID}" docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
00ba94025605 git.astron.nl:5000/lofar2.0/tango/grafana:latest "/run-wrapper.sh" 12 seconds ago Up 11 seconds grafana-de5690ec-219f-51d5-2946-f2f5133e9612
c30d367c387f git.astron.nl:5000/lofar2.0/tango/loki:latest "/usr/bin/loki -conf…" About a minute ago Up About a minute loki-e9073864-c281-7390-1b03-8c10fa072e73
7d1d7990031c envoyproxy/envoy:v1.26.4 "/docker-entrypoint.…" About a minute ago Up About a minute connect-proxy-loki-e9073864-c281-7390-1b03-8c10fa072e73
3f02415bd6f6 gcr.io/google_containers/pause-amd64:3.1 "/pause" About a minute ago Up About a minute nomad_init_e9073864-c281-7390-1b03-8c10fa072e73
02a644ecf279 git.astron.nl:5000/lofar2.0/tango/postgres:15.4 "docker-entrypoint.s…" About a minute ago Up About a minute postgres-4d957b81-0de1-0bc0-425b-743b28ba6a8e
[...]
```
You can interact with these containers by logging into the client VM, or directly by chaining docker calls:
```
$ docker exec "${CLIENT_CONTAINER_ID}" docker logs grafana-de5690ec-219f-51d5-2946-f2f5133e9612
Wait until grafana is ready...
logger=settings t=2024-02-13T13:14:47.766739611Z level=info msg="Starting Grafana" version=10.3.1 commit=00a22ff8b28550d593ec369ba3da1b25780f0a4a branch=HEAD compiled=2024-01-22T18:40:42Z
logger=settings t=2024-02-13T13:14:47.767369632Z level=warn msg="ngalert feature flag is deprecated: use unified alerting enabled setting instead"
logger=settings t=2024-02-13T13:14:47.767679942Z level=info msg="Config loaded from" file=/usr/share/grafana/conf/defaults.ini
logger=settings t=2024-02-13T13:14:47.767713444Z level=info msg="Config loaded from" file=/etc/grafana/grafana.ini
[...]
```
This allows you to use the regular docker commands like ``attach``, ``logs``, and ``restart``. Note that any interactive use requires ``-it`` for in the top lin ``docker exec -it "${CLIENT_CONTAINER_ID}"``.
## Patching a device server live
Sometimes it is handy to modify the tangostationcontrol source code for a running device server. To do so:
1. Log into the client VM using ``docker exec -it "${CLIENT_CONTAINER_ID}" bash``
2. Find the docker container of the device server (f.e. stationmanager), using ``docker ps -a | grep stationmanager``
3. Enter the device server container with ``docker exec -it <container> bash``
4. Install an editor, f.e. ``sudo apt-get install -y vim``
5. Edit the relevant source file in ``/usr/local/lib/python3.10/dist-packages/tangostationcontrol`` (for Python 3.10)
6. Call the ``restart_device_server()`` command for any device in the changed device server
7. Once restarted, call the ``boot()`` command for all devices in the changed device server to reconfigure them
## Login to the server
The nomad and consul management processes run on the server, which
is actually a docker container in the development environment.
This container can be identified programmatically as follows. Use the additional
``-q`` parameter to obtain just the container ID:
```
$ docker ps --filter 'name=server.station.nomad.nomad-cluster.jumppad.dev'
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b75f633c837e shipyardrun/nomad:1.6.1 "/usr/bin/supervisor…" 2 minutes ago Up 2 minutes [...] server.station.nomad.nomad-cluster.jumppad.dev
$
```
## Using nomad: Manage jobs on the client
The server allows you to manage the jobs on the client through Nomad. Each *job* consists of one or more *tasks* that are collectively managed. The tasks are (typically) the docker containers. A job is run inside an *allocation*, which represents an execution instance of a job.
The nomad server spins up on http://localhost:4646, allowing interactive browsing and control. There is also a CLI however, accessed through
```
$ SERVER_CONTAINER_ID=$(docker ps -q --filter 'name=server.station.nomad.nomad-cluster.jumppad.dev')
$ docker exec "${SERVER_CONTAINER_ID}" nomad
Usage: nomad [-version] [-help] [-autocomplete-(un)install] <command> [args]
[...]
```
To list the status of the jobs, use ``nomad status``:
```
$ docker exec "${SERVER_CONTAINER_ID}" nomad status
ID Type Priority Status Submit Date
connector service 50 running 2024-02-13T13:12:02Z
monitoring service 50 dead 2024-02-13T13:12:09Z
```
To get more info about a job, use ``nomad status <job>``:
```
$ docker exec "${SERVER_CONTAINER_ID}" nomad status monitoring
ID = monitoring
Name = monitoring
Submit Date = 2024-02-13T13:12:09Z
Type = service
Priority = 50
Datacenters = stat
Namespace = default
Node Pool = default
Status = dead
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost Unknown
grafana 0 0 0 0 1 0 0
loki 0 0 0 1 1 0 0
postgres 0 0 0 0 1 0 0
prometheus 0 0 0 2 0 0 0
Allocations
No allocations placed
```
To restart a job, use ``nomad job restart <job>``:
```
$ docker exec -t ${SERVER_CONTAINER_ID} nomad job restart connector
==> 2024-02-13T19:32:02Z: Restarting 1 allocation
2024-02-13T19:32:02Z: Restarting running tasks in allocation "e80cdf6f" for group "connector"
==> 2024-02-13T19:32:03Z: Job restart finished
Job restarted successfully!
$ docker exec -t ${SERVER_CONTAINER_ID} nomad job restart monitoring
No allocations to restart
```
The monitoring job cannot be restarted as it is not running.
## Clean up lingering resources
To clean up jumppad lingering resources:
```
# teardown running configuration
.bin/jumppad down
# remove lingering configuration
rm -rf ~/.jumppad
```
A deeper clean might require:
```
# clear download cache
rm .bin/jumppad
# kill all docker processes
docker ps -a -q | xargs docker rm -f
# clear docker cache
docker system prune
# remove all volumes
docker volume prune
# remove all networks
docker network prune
```
# FAQ
## a network with the label id: module.nomad.resource.network.station, was not found
Solution: jumppad is confused by a lingering configuration. Clean up any lingering
resources (see [above](#clean-up-lingering-resources)).
## unable to destroy resource Name: station, Type: network
Solution: some containers are still running. Stop them, or force stop all containers using ``docker ps -q -a | xargs docker rm -f``.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment