-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
various alerts numbers #373
Comments
@ltrilety I can't reproduce with the latest master, need more info |
If you are able to reproduce it please share screenshot also |
@ltrilety , Please provide the information requested. If not reproducible, close the issue |
@ltrilety need more info |
@GowthamShanmugam I tried it once more. You're right the scenario didn't bring the issue right away. However when I let the tendrl run (about 4 days) the issue was hit again. Checked version: |
@GowthamShanmugam nope, not this time. I didn't perform any un-manage of the cluster. |
@ltrilety @r0h4n @shtripat problem is when glusterd stopped for some time then TTL for volume object is deleting entire volume details, including an alert count for that volume also. So when glusterd is started then again alert count is initialized by zero but still some warning alert present for that particular volume. So alert count in the cluster is not matching with volume alert count. Entire alert count calculation goes wrong here. shall we put TTL only for deleted volumes? |
I understand that alert counter is a different object from volume. Even if volume gets deleted due to ttl feel alert counter you should still retain and that should work out. What you say? |
@r0h4n @shirshendu i had discuss with @shtripat about this problem, So we are incrementing alert count when any warning alert came and decrementing any info alert came, Problem is any one time increment goes wrong then entire alert calculation will go wrong, So @shtripat suggested me like instead of storing alert count in etcd, we will calculate the alert count in API and send it to UI for the particular cluster will solve this issue. suggestion please |
@r0h4n @shirshendu @nthomas-redhat why I say/suggest so is that maintaining counters could always be problematic with TTLs in place. What I suggest is that maintain the alerts for volumes and bricks as well at location as below in etcd
No need to maintain the counters at individual entities level. The object model which REST uses to return details in GET output for these entities can have additional fields for alert counters and during GET call API layer can actually make two get calls to etcd. One for getting the entity and one for counting the no of alerts for the entities by looking at @shirshendu thoughts?? |
@shtripat are you telling like: and i don't think /alerts/cluster/{cid}/bricks/{b_path}/{aid} is required But @shtripat in this case lot of redundancy of alert is will happen, (e.g) brick alert need to be stored in /alerting/alerts the same alert should be stored in a lot of places, @shtripat this not correct structure when ceph comes into picture |
The current alert directory structure is not fit with Concept B design because in concept A we displayed clusters and nodes separately But in concept B design we are displaying clusters and inside clusters, we are displaying nodes. Problem is when any alert which is related to the node then only alert count for the node is increased but if you see in cluster alert count shows 0 only. the node also inside cluster only I feel node alert also should increment cluster alert count. The structure which I suggest is
Using this structure deleting of alert for the deleted volume or node is possible, alert count logic also not required. To do this change i don't think much effort is required @shirshendu @shtripat @nthomas-redhat @r0h4n suggestion please |
Any changes to the alerts structure will need a corresponding change in API also, because we need to read the new structure to be able to present as an API. At this point in the release cycle, I would recommend against changes in the fundamental way of saving alerts. @r0h4n would like to hear your thoughts as well here. Edit: That said, changes to API are definitely possible, if so be the final consensus that we decide to change the alerts structure. |
@GowthamShanmugam as discussed lets keep the dir structure as below
So this way last 3 would be used for count purpose only and first one for maintaining all the alerts across a cluster. |
Please verify and close this issue |
The number of cluster alerts does not always correspond to the other available alerts numbers. Moreover those numbers not always correspond with actual state of the cluster.
E.g.
In both cases the cluster was completely up and running.
Reproduction steps:
Tendrl version:
tendrl-commons-1.6.1-1.el7.centos.noarch
tendrl-api-1.6.1-1.el7.centos.noarch
tendrl-ui-1.6.1-1.el7.centos.noarch
tendrl-grafana-selinux-1.5.4-2.el7.centos.noarch
tendrl-ansible-1.6.1-1.el7.centos.noarch
tendrl-notifier-1.6.0-1.el7.centos.noarch
tendrl-node-agent-1.6.1-1.el7.centos.noarch
tendrl-api-httpd-1.6.1-1.el7.centos.noarch
tendrl-selinux-1.5.4-2.el7.centos.noarch
tendrl-grafana-plugins-1.6.1-1.el7.centos.noarch
tendrl-monitoring-integration-1.6.1-1.el7.centos.noarch
The text was updated successfully, but these errors were encountered: