Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus metrics generating without labels causing "Error on ingesting samples with different value but same timestamp" #666

Open
nicholaskuechler opened this issue Jan 16, 2025 · 0 comments · May be fixed by #668
Assignees
Labels
status: accepted This issue has been accepted by the maintainers team for implementation type: bug Issues/PRs addressing a bug.

Comments

@nicholaskuechler
Copy link

Environment

  • Python version: 3.12.8
  • Nautobot version: 2.3.16
  • nautobot-ssot version: 3.4.0

Expected Behavior

Enabling nautobot metrics does not cause kube-prometheus to fire PrometheusDuplicateTimestamps alerts.

Observed Behavior

Enabling prometheus metrics in the nautobot-helm chart works, but the metrics produced cause a prometheus error:

ts=2025-01-12T06:58:48.519Z caller=scrape.go:1754 level=warn component="scrape manager" scrape_pool=serviceMonitor/nautobot/nautobot-default/0 target=http://10.64.49.49:8080/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=4
ts=2025-01-12T06:59:00.652Z caller=scrape.go:1754 level=warn component="scrape manager" scrape_pool=serviceMonitor/nautobot/nautobot-default/0 target=http://10.64.50.125:8080/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=4
ts=2025-01-12T06:59:48.486Z caller=scrape.go:1754 level=warn component="scrape manager" scrape_pool=serviceMonitor/nautobot/nautobot-default/0 target=http://10.64.49.49:8080/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=4
ts=2025-01-12T07:00:00.145Z caller=scrape.go:1754 level=warn component="scrape manager" scrape_pool=serviceMonitor/nautobot/nautobot-default/0 target=http://10.64.50.125:8080/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=4

While troubleshooting the produced metrics, I found the duplicates were generated by ssot:

# HELP nautobot_ssot_sync_memory_usage_bytes Nautobot SSoT Sync Memory Usage
# TYPE nautobot_ssot_sync_memory_usage_bytes gauge
nautobot_ssot_sync_memory_usage_bytes{job="",phase=""} 0.0
nautobot_ssot_sync_memory_usage_bytes{job="",phase=""} 0.0
nautobot_ssot_sync_memory_usage_bytes{job="",phase=""} 0.0
nautobot_ssot_sync_memory_usage_bytes{job="",phase=""} 0.0
nautobot_ssot_sync_memory_usage_bytes{job="",phase=""} 0.0

"nautobot_ssot_sync_memory_usage_bytes", "Nautobot SSoT Sync Memory Usage", labels=["phase", "job"]

I wonder if https://github.com/nautobot/nautobot-app-ssot/blob/develop/nautobot_ssot/metrics.py#L141-L142 should just be removed? Or maybe changed to include the job label which should produce unique metrics? Something like this instead of empty labels?

        else:
            memory_gauge.add_metric(labels=[".".join(job.natural_key())], value=0)

Steps to Reproduce

  1. Using kube-prometheus stack: https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/README.md
  2. Enable metrics in nautobot helm chart: https://github.com/nautobot/helm-charts/blob/develop/charts/nautobot/values.yaml#L841-L843
  3. Metrics work, but it appears in certain scenarios may produce metrics without labels and duplicated metrics:
  1. Prometheus sees: "Error on ingesting samples with different value but same timestamp"
  2. By default, kube-prometheus stack includes an alert for PrometheusDuplicateTimestamps which is firing: https://runbooks.prometheus-operator.dev/runbooks/prometheus/prometheusduplicatetimestamps/
@jdrew82 jdrew82 added status: accepted This issue has been accepted by the maintainers team for implementation type: bug Issues/PRs addressing a bug. labels Jan 16, 2025
@jdrew82 jdrew82 self-assigned this Jan 16, 2025
@jdrew82 jdrew82 linked a pull request Jan 17, 2025 that will close this issue
2 tasks
@jdrew82 jdrew82 linked a pull request Jan 17, 2025 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: accepted This issue has been accepted by the maintainers team for implementation type: bug Issues/PRs addressing a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants