-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add how-to troubleshoot for charm deployments #953
base: main
Are you sure you want to change the base?
Conversation
f185610
to
398c4b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your great work Berkay!
Please consider the comments I made on https://github.com/canonical/k8s-snap/pull/943/files and also apply them to this PR.
|
||
Maybe your issue has already been solved? Check out the [troubleshooting reference][charm-troubleshooting-reference] page to see a list of common issues and their solutions. Otherwise continue with this guide to help troubleshoot your {{product}} cluster. | ||
|
||
## Verify that the cluster status is ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Verify that the cluster status is ready | |
## Check the cluster status |
duplication
juju status | ||
``` | ||
|
||
You should see output similar to the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should see output similar to the following: | |
You should see a command output similar to the following: |
0 started 10.94.106.136 juju-380ff2-0 [email protected] Running | ||
1 started 10.94.106.154 juju-380ff2-1 [email protected] Running | ||
``` | ||
In this example we can glean some information. The `Workload` column will show the status of a given service. The `Message` section will show you the health of a given service in the cluster. During deployment and maintenance these workload statuses will update to reflect what a given node is doing. For example the workload may say `maintenance` while message will describe this maintenance as `Ensuring snap installation`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this example we can glean some information. The `Workload` column will show the status of a given service. The `Message` section will show you the health of a given service in the cluster. During deployment and maintenance these workload statuses will update to reflect what a given node is doing. For example the workload may say `maintenance` while message will describe this maintenance as `Ensuring snap installation`. | |
Interpreting the Output: | |
- The `Workload` column shows the status of a given service. | |
- The `Message` section details the health of a given service in the cluster. | |
- The `Agent` column reflects any activity of the Juju agent. | |
During deployment and maintenance the workload status will reflect the node's activity. An example workload may display `maintenance` along with the message details: `Ensuring snap installation`. |
In this example we can glean some information. The `Workload` column will show the status of a given service. The `Message` section will show you the health of a given service in the cluster. During deployment and maintenance these workload statuses will update to reflect what a given node is doing. For example the workload may say `maintenance` while message will describe this maintenance as `Ensuring snap installation`. | ||
|
||
|
||
During normal operation the Workload should read `active`, the Agent column (which reflects what the Juju agent is doing) should read `idle`, and the messages will either say `Ready` or another descriptive term. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During normal operation the Workload should read `active`, the Agent column (which reflects what the Juju agent is doing) should read `idle`, and the messages will either say `Ready` or another descriptive term. | |
During normal cluster operation the `Workload` column reads `active`, the `Agent` column shows `idle`, and the messages will either read `Ready` or another descriptive term. |
|
||
During normal operation the Workload should read `active`, the Agent column (which reflects what the Juju agent is doing) should read `idle`, and the messages will either say `Ready` or another descriptive term. | ||
|
||
## Verify that the API server is healthy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Verify that the API server is healthy | |
## Verify the API server is health |
juju ssh <k8s/unit#> | ||
``` | ||
|
||
Check the status of these services on the failing node by running the following command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the status of these services on the failing node by running the following command: | |
Check the status of the services on the failing node by running: |
Alternate from running the following command
, Some other alternatives are justrun
, or execute
sudo systemctl status snap.k8s.<service> | ||
``` | ||
|
||
The logs of a failing service can be checked by running the following command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logs of a failing service can be checked by running the following command: | |
Check the logs of a failing service by executing: |
|
||
## Collecting debug information | ||
|
||
To collect comprehensive debug output from your {{product}} cluster, install and run [juju-crashdump][] on a computer that has the Juju client installed, with the current controller and model pointing at your {{product}} deployment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To collect comprehensive debug output from your {{product}} cluster, install and run [juju-crashdump][] on a computer that has the Juju client installed, with the current controller and model pointing at your {{product}} deployment. | |
To collect comprehensive debug output from your {{product}} cluster, install and run [juju-crashdump][] on a computer that has the Juju client installed. Please ensure that the current controller and model are pointing at your {{product}} deployment. |
juju-crashdump -a debug-layer -a config | ||
``` | ||
|
||
Running the `juju-crashdump` script will generate a tarball of debug information that includes systemd unit status and logs, Juju logs, charm unit data, and Kubernetes cluster information. It is recommended that you include this tarball when filing a bug. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a systemd docs link and also consider adding one for the juju-crashdump docs
juju-crashdump -a debug-layer -a config | ||
``` | ||
|
||
Running the `juju-crashdump` script will generate a tarball of debug information that includes systemd unit status and logs, Juju logs, charm unit data, and Kubernetes cluster information. It is recommended that you include this tarball when filing a bug. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running the `juju-crashdump` script will generate a tarball of debug information that includes systemd unit status and logs, Juju logs, charm unit data, and Kubernetes cluster information. It is recommended that you include this tarball when filing a bug. | |
Running the `juju-crashdump` script will generate a tarball of debug information that includes systemd unit status and logs, Juju logs, charm unit data, and Kubernetes cluster information. Please include the generated tarball when filing a bug. |
0 started 10.94.106.136 juju-380ff2-0 [email protected] Running | ||
1 started 10.94.106.154 juju-380ff2-1 [email protected] Running | ||
``` | ||
In this example we can glean some information. The `Workload` column will show the status of a given service. The `Message` section will show you the health of a given service in the cluster. During deployment and maintenance these workload statuses will update to reflect what a given node is doing. For example the workload may say `maintenance` while message will describe this maintenance as `Ensuring snap installation`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this example we can glean some information. The `Workload` column will show the status of a given service. The `Message` section will show you the health of a given service in the cluster. During deployment and maintenance these workload statuses will update to reflect what a given node is doing. For example the workload may say `maintenance` while message will describe this maintenance as `Ensuring snap installation`. | |
In this example we can glean some information. The `Workload` column will show the status of a given service. The `Message` section will show you the health of a given service in the cluster. During deployment and maintenance these workload statuses will update to reflect what a given node is doing. For example the workload may say `maintenance` while `Message` will describe this maintenance as `Ensuring snap installation`. |
Adds a How-To page on troubleshooting a Canonical Kubernetes charm deployment