From 71034536dc279f63171764d6edb18fb55345abf8 Mon Sep 17 00:00:00 2001 From: Achton Smidt Winther Date: Wed, 8 Nov 2023 22:24:30 +0100 Subject: [PATCH] Add runbook for RabbitMQ issue. --- docs/runbooks/rabbitmq-broker.md | 43 ++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) create mode 100644 docs/runbooks/rabbitmq-broker.md diff --git a/docs/runbooks/rabbitmq-broker.md b/docs/runbooks/rabbitmq-broker.md new file mode 100644 index 00000000..d1da38d9 --- /dev/null +++ b/docs/runbooks/rabbitmq-broker.md @@ -0,0 +1,43 @@ +# RabbitMQ broker force start + +## When to use + +When the PR environments are no longer being created, and the +`lagoon-core-broker-` pods are missing or not running, and the container logs +contain errors like `Error while waiting for Mnesia tables: +{timeout_waiting_for_tables`. + +This situation is caused by the RabbitMQ broker not starting correctly. + +## Prerequisites + +* A [dplsh session](using-dplsh.md) with DPLPLAT_ENV exported . + +## Procedure + +You are going to exec into the pod and stop the RabbitMQ application, and then +start it with [the `force_boot` +feature](https://www.rabbitmq.com/rabbitmqctl.8.html#force_boot), so that it can +perform its Mnesia sync correctly. + +Exec into the pod: + +```shell +dplsh:~/host_mount$ kubectl -n lagoon-core exec -ti pod/lagoon-core-broker-0 -- sh +``` + +Stop RabbitMQ: + +```shell +/ $ rabbitmqctl stop_app +Stopping rabbit application on node rabbit@lagoon-core-broker-0.lagoon-core-broker-headless.lagoon-core.svc.cluster.local ... +``` + +Start it immediately after using the `force_boot` flag: + +```shell +/ $ rabbitmqctl force_boot +``` + +Then exit the shell and check the container logs for one of the broker pods. It +should start without errors.