-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metric cardinality #43
Comments
I'm trying to imagine who/what these metrics would be used for. If it's for the user to monitory how close they are to a violation we could gauge Another case might be debugging why the plugin has decided to throttle. The throttling has kicked in and the user is trying to trace why and what was observed. Maybe this is better handled with logging. When we decide to change the factor we could log more about the observation, like |
Do we need to know the metrics from the plugin in all 59 brokers when 1 broker has some issue. It is enough to know that a subset of the brokers all agree that that 1 broker has a problem? If so, then we could construct some expression involving broker ids, the cluster size and the number of observations we do want and use that to drive which brokers publish metrics about which other others. Something like |
Yes the cardinality explosion is potentially problematic however there a few separate mitigations here.
In small clusters I think we do want the full breakdown of metrics so that users can identify split brain/connectivity issues or issues in the plug-in. Potential mitigations would be:
Compromising the experience for the majority of users just in case we potentially inconvenience exceptional users seems like the wrong trade off to me. So I suggest we close this for now, and revisit if it actually proves to be an issue for users. |
Now I see that tags like this it makes me wonder about the metric cardinality we have going on here. It's fine for smaller clusters, but it grows like$numBrokers^2$ , so a 60 broker cluster would have 3,600 metrics for each of these.
Originally posted by @tombentley in #42 (comment)
The text was updated successfully, but these errors were encountered: