Add how-to troubleshoot for charm deployments #953

berkayoz · 2025-01-13T11:03:39Z

Adds a How-To page on troubleshooting a Canonical Kubernetes charm deployment

louiseschmidtgen

Thanks for your great work Berkay!
Please consider the comments I made on https://github.com/canonical/k8s-snap/pull/943/files and also apply them to this PR.

louiseschmidtgen · 2025-01-13T15:48:48Z

docs/src/charm/howto/troubleshooting.md

+
+Maybe your issue has already been solved? Check out the [troubleshooting reference][charm-troubleshooting-reference] page to see a list of common issues and their solutions. Otherwise continue with this guide to help troubleshoot your {{product}} cluster.
+
+## Verify that the cluster status is ready


Suggested change

## Verify that the cluster status is ready

## Check the cluster status

duplication

louiseschmidtgen · 2025-01-13T15:49:28Z

docs/src/charm/howto/troubleshooting.md

+juju status
+```
+
+You should see output similar to the following:


Suggested change

You should see output similar to the following:

You should see a command output similar to the following:

louiseschmidtgen · 2025-01-13T15:58:49Z

docs/src/charm/howto/troubleshooting.md

+0        started  10.94.106.136  juju-380ff2-0  [email protected]      Running
+1        started  10.94.106.154  juju-380ff2-1  [email protected]      Running
+```
+In this example we can glean some information. The `Workload` column will show the status of a given service. The `Message` section will show you the health of a given service in the cluster. During deployment and maintenance these workload statuses will update to reflect what a given node is doing. For example the workload may say `maintenance` while message will describe this maintenance as `Ensuring snap installation`.


Suggested change

In this example we can glean some information. The `Workload` column will show the status of a given service. The `Message` section will show you the health of a given service in the cluster. During deployment and maintenance these workload statuses will update to reflect what a given node is doing. For example the workload may say `maintenance` while message will describe this maintenance as `Ensuring snap installation`.

Interpreting the Output:

- The `Workload` column shows the status of a given service.

- The `Message` section details the health of a given service in the cluster.

- The `Agent` column reflects any activity of the Juju agent.

During deployment and maintenance the workload status will reflect the node's activity. An example workload may display `maintenance` along with the message details: `Ensuring snap installation`.

louiseschmidtgen · 2025-01-13T16:03:19Z

docs/src/charm/howto/troubleshooting.md

+In this example we can glean some information. The `Workload` column will show the status of a given service. The `Message` section will show you the health of a given service in the cluster. During deployment and maintenance these workload statuses will update to reflect what a given node is doing. For example the workload may say `maintenance` while message will describe this maintenance as `Ensuring snap installation`.
+
+
+During normal operation the Workload should read `active`, the Agent column (which reflects what the Juju agent is doing) should read `idle`, and the messages will either say `Ready` or another descriptive term.


Suggested change

During normal operation the Workload should read `active`, the Agent column (which reflects what the Juju agent is doing) should read `idle`, and the messages will either say `Ready` or another descriptive term.

During normal cluster operation the `Workload` column reads `active`, the `Agent` column shows `idle`, and the messages will either read `Ready` or another descriptive term.

louiseschmidtgen · 2025-01-13T16:03:41Z

docs/src/charm/howto/troubleshooting.md

+
+During normal operation the Workload should read `active`, the Agent column (which reflects what the Juju agent is doing) should read `idle`, and the messages will either say `Ready` or another descriptive term.
+
+## Verify that the API server is healthy


Suggested change

## Verify that the API server is healthy

## Verify the API server is health

louiseschmidtgen · 2025-01-13T16:14:19Z

docs/src/charm/howto/troubleshooting.md

+juju ssh <k8s/unit#>
+```
+
+Check the status of these services on the failing node by running the following command:


Suggested change

Check the status of these services on the failing node by running the following command:

Check the status of the services on the failing node by running:

Alternate from running the following command, Some other alternatives are justrun, or execute

louiseschmidtgen · 2025-01-13T16:16:33Z

docs/src/charm/howto/troubleshooting.md

+sudo systemctl status snap.k8s.<service>
+```
+
+The logs of a failing service can be checked by running the following command:


Suggested change

The logs of a failing service can be checked by running the following command:

Check the logs of a failing service by executing:

louiseschmidtgen · 2025-01-13T16:18:46Z

docs/src/charm/howto/troubleshooting.md

+
+## Collecting debug information
+
+To collect comprehensive debug output from your {{product}} cluster, install and run [juju-crashdump][] on a computer that has the Juju client installed, with the current controller and model pointing at your {{product}} deployment.


Suggested change

To collect comprehensive debug output from your {{product}} cluster, install and run [juju-crashdump][] on a computer that has the Juju client installed, with the current controller and model pointing at your {{product}} deployment.

To collect comprehensive debug output from your {{product}} cluster, install and run [juju-crashdump][] on a computer that has the Juju client installed. Please ensure that the current controller and model are pointing at your {{product}} deployment.

louiseschmidtgen · 2025-01-13T16:19:37Z

docs/src/charm/howto/troubleshooting.md

+juju-crashdump -a debug-layer -a config
+```
+
+Running the `juju-crashdump` script will generate a tarball of debug information that includes systemd unit status and logs, Juju logs, charm unit data, and Kubernetes cluster information. It is recommended that you include this tarball when filing a bug.


Add a systemd docs link and also consider adding one for the juju-crashdump docs

louiseschmidtgen · 2025-01-13T16:20:32Z

docs/src/charm/howto/troubleshooting.md

+juju-crashdump -a debug-layer -a config
+```
+
+Running the `juju-crashdump` script will generate a tarball of debug information that includes systemd unit status and logs, Juju logs, charm unit data, and Kubernetes cluster information. It is recommended that you include this tarball when filing a bug.


Suggested change

Running the `juju-crashdump` script will generate a tarball of debug information that includes systemd unit status and logs, Juju logs, charm unit data, and Kubernetes cluster information. It is recommended that you include this tarball when filing a bug.

Running the `juju-crashdump` script will generate a tarball of debug information that includes systemd unit status and logs, Juju logs, charm unit data, and Kubernetes cluster information. Please include the generated tarball when filing a bug.

eaudetcobello · 2025-01-13T16:49:20Z

docs/src/charm/howto/troubleshooting.md

+0        started  10.94.106.136  juju-380ff2-0  [email protected]      Running
+1        started  10.94.106.154  juju-380ff2-1  [email protected]      Running
+```
+In this example we can glean some information. The `Workload` column will show the status of a given service. The `Message` section will show you the health of a given service in the cluster. During deployment and maintenance these workload statuses will update to reflect what a given node is doing. For example the workload may say `maintenance` while message will describe this maintenance as `Ensuring snap installation`.


Suggested change

In this example we can glean some information. The `Workload` column will show the status of a given service. The `Message` section will show you the health of a given service in the cluster. During deployment and maintenance these workload statuses will update to reflect what a given node is doing. For example the workload may say `maintenance` while message will describe this maintenance as `Ensuring snap installation`.

In this example we can glean some information. The `Workload` column will show the status of a given service. The `Message` section will show you the health of a given service in the cluster. During deployment and maintenance these workload statuses will update to reflect what a given node is doing. For example the workload may say `maintenance` while `Message` will describe this maintenance as `Ensuring snap installation`.

Add how-to troubleshoot for charm deployments

398c4b4

berkayoz force-pushed the KU-2407/charm-troubleshooting branch from f185610 to 398c4b4 Compare January 13, 2025 11:16

louiseschmidtgen reviewed Jan 13, 2025

View reviewed changes

eaudetcobello reviewed Jan 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add how-to troubleshoot for charm deployments #953

Add how-to troubleshoot for charm deployments #953

berkayoz commented Jan 13, 2025

louiseschmidtgen left a comment

louiseschmidtgen Jan 13, 2025

louiseschmidtgen Jan 13, 2025

louiseschmidtgen Jan 13, 2025

louiseschmidtgen Jan 13, 2025

louiseschmidtgen Jan 13, 2025

louiseschmidtgen Jan 13, 2025

louiseschmidtgen Jan 13, 2025

louiseschmidtgen Jan 13, 2025

louiseschmidtgen Jan 13, 2025

louiseschmidtgen Jan 13, 2025

eaudetcobello Jan 13, 2025 •

edited

Loading


		Maybe your issue has already been solved? Check out the [troubleshooting reference][charm-troubleshooting-reference] page to see a list of common issues and their solutions. Otherwise continue with this guide to help troubleshoot your {{product}} cluster.

		## Verify that the cluster status is ready

	## Verify that the cluster status is ready
	## Check the cluster status

	You should see output similar to the following:
	You should see a command output similar to the following:

-In this example we can glean some information. The `Workload` column will show the status of a given service. The `Message` section will show you the health of a given service in the cluster. During deployment and maintenance these workload statuses will update to reflect what a given node is doing. For example the workload may say `maintenance` while message will describe this maintenance as `Ensuring snap installation`.
+Interpreting the Output:
+- The `Workload` column shows the status of a given service.
+- The `Message` section details the health of a given service in the cluster.
+- The `Agent` column reflects any activity of the Juju agent.
+During deployment and maintenance the workload status will reflect the node's activity. An example workload may display `maintenance` along with the message details: `Ensuring snap installation`.

		In this example we can glean some information. The `Workload` column will show the status of a given service. The `Message` section will show you the health of a given service in the cluster. During deployment and maintenance these workload statuses will update to reflect what a given node is doing. For example the workload may say `maintenance` while message will describe this maintenance as `Ensuring snap installation`.


		During normal operation the Workload should read `active`, the Agent column (which reflects what the Juju agent is doing) should read `idle`, and the messages will either say `Ready` or another descriptive term.

	During normal operation the Workload should read `active`, the Agent column (which reflects what the Juju agent is doing) should read `idle`, and the messages will either say `Ready` or another descriptive term.
	During normal cluster operation the `Workload` column reads `active`, the `Agent` column shows `idle`, and the messages will either read `Ready` or another descriptive term.


		During normal operation the Workload should read `active`, the Agent column (which reflects what the Juju agent is doing) should read `idle`, and the messages will either say `Ready` or another descriptive term.

		## Verify that the API server is healthy

	## Verify that the API server is healthy
	## Verify the API server is health

	Check the status of these services on the failing node by running the following command:
	Check the status of the services on the failing node by running:

	The logs of a failing service can be checked by running the following command:
	Check the logs of a failing service by executing:


		## Collecting debug information

		To collect comprehensive debug output from your {{product}} cluster, install and run [juju-crashdump][] on a computer that has the Juju client installed, with the current controller and model pointing at your {{product}} deployment.

	Running the `juju-crashdump` script will generate a tarball of debug information that includes systemd unit status and logs, Juju logs, charm unit data, and Kubernetes cluster information. It is recommended that you include this tarball when filing a bug.
	Running the `juju-crashdump` script will generate a tarball of debug information that includes systemd unit status and logs, Juju logs, charm unit data, and Kubernetes cluster information. Please include the generated tarball when filing a bug.

Add how-to troubleshoot for charm deployments #953

Are you sure you want to change the base?

Add how-to troubleshoot for charm deployments #953

Conversation

berkayoz commented Jan 13, 2025

louiseschmidtgen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eaudetcobello Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

eaudetcobello Jan 13, 2025 •

edited

Loading