Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job finished successfully message contains only job id #162

Open
mbukatov opened this issue Mar 12, 2018 · 8 comments
Open

Job finished successfully message contains only job id #162

mbukatov opened this issue Mar 12, 2018 · 8 comments
Assignees

Comments

@mbukatov
Copy link
Contributor

Description of the problem

When I open Events page of Tendrl ui, I see events like:

Job finished successfully (job_id: 4207477c-8101-4921-b48a-f66c4d028cb8)

I don't immediately see what kind of job it is.

This could be especially confusing when I see lot of events like that, without any hint what's wrong (if anything):

screenshot_20180312_164500

Note that in the screenshot above, the message about successfully finished job repeats after few minutes.

When I tried to dig deeper and on the tendrl server machine tried:

# grep -R 4207477c-8101-4921-b48a-f66c4d028cb8 /var/log/
/var/log/tendrl/node-agent/node-agent.log:Mar 12 15:56:49 mbukatov-usm1-server tendrl-node-agent: 2018-03-12 15:56:49.766151+00:00 - node_agent - /usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py:169 - process_job - INFO - Node (76bc408b-e51d-4530-8b30-29ee1f153e60)(type: node)(tags: [u'tendrl/node_76bc408b-e51d-4530-8b30-29ee1f153e60', u'tendrl/integration/monitoring', u'tendrl/central-store', u'tendrl/server', u'tendrl/monitor', u'tendrl/node']) will not process job-4207477c-8101-4921-b48a-f66c4d028cb8 (tags: tendrl/node_6f6e2269-bcf4-4889-82c7-9ba8ed8fb152)
/var/log/messages:Mar 12 15:56:49 mbukatov-usm1-server journal: 2018-03-12 15:56:49.766151+00:00 - node_agent - /usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py:169 - process_job - INFO - Node (76bc408b-e51d-4530-8b30-29ee1f153e60)(type: node)(tags: [u'tendrl/node_76bc408b-e51d-4530-8b30-29ee1f153e60', u'tendrl/integration/monitoring', u'tendrl/central-store', u'tendrl/server', u'tendrl/monitor', u'tendrl/node']) will not process job-4207477c-8101-4921-b48a-f66c4d028cb8 (tags: tendrl/node_6f6e2269-bcf4-4889-82c7-9ba8ed8fb152)

I see only single log message related to this (with two occurrences though, one in node agent and other in messages log) and I read it as:

Node 76bc408b-e51d-4530-8b30-29ee1f153e60  will not process job 4207477c-8101-4921-b48a-f66c4d028cb8

Which doesn't help me much with debugging of the event showed above, as it contradicts the original message (job finished successfully).

Expected Result

Event description may contain more details, eg. job type, to improve information delivered to the user.

Moreover we will need a description of the job id and how to use it for debugging. In my case, I'm unable to find any useful details for the event to go further.

Version

On Storage Servers:

# rpm -qa | egrep '(gluster|tendrl)'
glusterfs-api-4.1dev-0.115.git685d440.el7.centos.x86_64
glusterfs-events-4.1dev-0.115.git685d440.el7.centos.x86_64
tendrl-gluster-integration-1.6.1-1.el7.centos.noarch
tendrl-node-agent-1.6.1-1.el7.centos.noarch
python2-gluster-4.1dev-0.115.git685d440.el7.centos.x86_64
tendrl-collectd-selinux-1.5.4-2.el7.centos.noarch
glusterfs-fuse-4.1dev-0.115.git685d440.el7.centos.x86_64
glusterfs-server-4.1dev-0.115.git685d440.el7.centos.x86_64
glusterfs-geo-replication-4.1dev-0.115.git685d440.el7.centos.x86_64
tendrl-commons-1.6.1-1.el7.centos.noarch
glusterfs-libs-4.1dev-0.115.git685d440.el7.centos.x86_64
glusterfs-client-xlators-4.1dev-0.115.git685d440.el7.centos.x86_64
glusterfs-cli-4.1dev-0.115.git685d440.el7.centos.x86_64
tendrl-selinux-1.5.4-2.el7.centos.noarch
glusterfs-4.1dev-0.115.git685d440.el7.centos.x86_64

On Tendrl server:

# rpm -qa | egrep '(gluster|tendrl)'
tendrl-grafana-plugins-1.6.1-1.el7.centos.noarch
tendrl-monitoring-integration-1.6.1-1.el7.centos.noarch
tendrl-notifier-1.6.0-1.el7.centos.noarch
tendrl-api-httpd-1.6.1-1.el7.centos.noarch
tendrl-selinux-1.5.4-2.el7.centos.noarch
tendrl-node-agent-1.6.1-1.el7.centos.noarch
tendrl-ui-1.6.1-1.el7.centos.noarch
tendrl-grafana-selinux-1.5.4-2.el7.centos.noarch
tendrl-commons-1.6.1-1.el7.centos.noarch
tendrl-api-1.6.1-1.el7.centos.noarch
@mbukatov
Copy link
Contributor Author

@fbalak I reported this as a suggestion to provide better event description to help with debugging. I haven't reported the problem itself, as it's likely caused by some glusterfs problem.

@r0h4n
Copy link
Contributor

r0h4n commented Mar 13, 2018

@nthomas-redhat Please fix this along with other log message fixes as discussed at (https://docs.google.com/document/d/138SFPUlRqdLjISMcd-Cts-vWzY7wfTGWi8GhdQHnh0Q/edit)

@mbukatov
Copy link
Contributor Author

mbukatov commented Mar 13, 2018

On Architecture Sync up meeting today, we decided that we are going to address it by:

  • enhancing messages in events and logs so that at least type of job is showed next to job id
  • providing debugging guide how to get more details for particular job id via curl and etcd api (in read only mode to prevent messing with etcd data)

In the long term, we may need to add tednrl api endpoint and enhance tendrl ui to show details for particular job id.

@julienlim
Copy link
Member

julienlim commented Mar 14, 2018

@r0h4n @mbukatov @nthomas-redhat @a2batic @gnehapk @shirshendu @mcarrano

When looking at this and thinking about event details further, it appears we don't get too much from the Events API at the moment, i.e. message, timestamp, message_id, priority.

We appear to be showing the message and timestamp at the moment.

+1 @mbukatov on needing more details on the particular job.

Here are some things that occur to me when showing the Event Details.

In the Events List, we should be showing a short event message and not a long, verbose event message. Moreover, the priority should be shown as well.

In the Event Details, we would show the event row/item again but with more details, e.g. we should show a long event message, along with the priority of it (if we don't show in the Event List). In addition, if we have a category/type for the Event, that would be good to show.

E.g.
Short msg == gluster-195d43d86fd38ba5929e44529d1fa0b985f42f03946e0bb5ada6999805556674 is healthy
Long (current) msg == Health status of cluster: gluster-195d43d86fd38ba5929e44529d1fa0b985f42f03946e0bb5ada6999805556674 changed from unhealthy to healthy

If the event contains the Job completed or failed, we should show details about what the Flow that was run.

E.g. Current msg == Job finished successfully (job_id: 14e7207a-02d4-4e97-a0c7-214bf71a91e8)
Suggested short msg == (Job ID 14e7207a-02d4-4e97-a0c7-214bf71a91e8) completed successfully
If we're able to, we should ideally make the Job Name and/or Job ID hyperlinkable to the task details to see more details about what was performed.

Ideally the event details would provide enough details so that it is actionable with guidance on how to resolve it if there's a problem or failure.

Thoughts?

@mcarrano
Copy link

I've create an Event Details page to display the details of an event as a drill-down from the events list. This is designed to display the full event message and link to any related resources. See https://redhat.invisionapp.com/share/HVGA7O575AZ#/285313287_Cluster_Details-Event_Detail

I also should note that the Event List, as designed, should display the event severity/priority before the short message.
Let me know if you have any questions.

@julienlim
Copy link
Member

@r0h4n @mbukatov @nthomas-redhat @a2batic @gnehapk @shirshendu @mcarrano

Please note we've published the Event Details design. See previous comment by @mcarrano.

@r0h4n
Copy link
Contributor

r0h4n commented Mar 16, 2018

@julienlim

@nthomas-redhat is working on this issue, waiting for updates from him

@r0h4n r0h4n added this to the Milestone 5 (2018) milestone Apr 20, 2018
@r0h4n
Copy link
Contributor

r0h4n commented Apr 20, 2018

@nthomas-redhat please close this if done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants