Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add basic exponential histogram #1736

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

xuan-cao-swi
Copy link
Contributor

@xuan-cao-swi xuan-cao-swi commented Sep 27, 2024

Description

This PR introduces basic functionality for exponential histograms, similar to Python and JavaScript. Most of the code is adapted/copied from Python. The only missing feature compared to other languages is merging.

Exponential histogram merging aims to make bucket sizes consistent. With the current otel-ruby metrics collection mechanism, raw data is sent unmodified to the collector (e.g., from @data_point).

I think it's best to implement the merge feature on the collector side to reduce duplicate work and computation on the client side. Downscaling is implemented on the language side to adjust bucket sizes, avoiding out-of-range indices for very large or small numbers.

Test

Most test cases are copied from Python/JavaScript to ensure the correctness of mapping and downscaling.

I didn't add all test case from Python as I think some of the test cases are redundant (e.g. 0 increment, etc.) but please correct me if I am wrong.

Copy link
Contributor

👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot

Copy link
Contributor

@kaylareopelle kaylareopelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @xuan-cao-swi!

Comment on lines +17 to +18
# Contains the implementation of the ExponentialBucketHistogram aggregation
# https://opentelemetry.io/docs/specs/otel/metrics/data-model/#exponentialhistogram
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line might be too long, but the link might look better in the YARD doc UI...

Suggested change
# Contains the implementation of the ExponentialBucketHistogram aggregation
# https://opentelemetry.io/docs/specs/otel/metrics/data-model/#exponentialhistogram
# Contains the implementation of the {https://opentelemetry.io/docs/specs/otel/metrics/data-model/#exponentialhistogram ExponentialBucketHistogram} aggregation

MIN_SCALE = -10
MAX_SIZE = 160

# The default boundaries is calculated based on default max_size and max_scale value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The default boundaries is calculated based on default max_size and max_scale value
# The default boundaries are calculated based on default max_size and max_scale values


# The default boundaries is calculated based on default max_size and max_scale value
def initialize(
aggregation_temporality: ENV.fetch('OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE', :delta), # TODO: aggregation_temporality should be renamed to collect_aggregation_temporality for clear definition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why this should be renamed. Can you tell me more? Is there a spec reference?

record_min_max: true,
zero_threshold: 0
)
@data_points = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @data_points was refactored out of the other aggregations when views were merged in. Why save them as an instance variable in this aggregation?

end

def validate_scale(scale)
return scale unless scale > MAX_SCALE || scale < MIN_SCALE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how performant it is, but another way to implement this could be:

(MIN_SCALE..MAX_SCALE).cover?(scale)


def downscale(change, positive, negative)
return if change == 0
raise 'Invalid change of scale' if change.negative?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this get caught by the error handler? I'm concerned if we raise, we might crash a user's app.

module Aggregation
module ExponentialHistogram
# Buckets is the fundamental building block of exponential histogram that store bucket/boundary value
class Buckets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this class have test coverage?

Comment on lines +24 to +25
let(:start_time) { (Time.now.to_r * 1_000_000_000).to_i }
let(:end_time) { ((Time.now + 60).to_r * 1_000_000_000).to_i }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using Process.clock_gettime(Process::CLOCK_REALTIME, :nanosecond)?

Comment on lines +69 to +71
# The corresponding Go test is TestAlternatingGrowth1 where:
# agg := NewFloat64(NewConfig(WithMaxSize(4)))
# agg is an instance of github.com/lightstep/otel-launcher-go/lightstep/sdk/metric/aggregator/histogram/structure.Histogram[float64]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I can't get this link to load. Is there a permalink option?

_(exphdps[1].zero_threshold).must_equal(0)
end

it 'rescale_with_alternating_growth_0' do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
it 'rescale_with_alternating_growth_0' do
it 'rescales with alternating growth 0' do

correct_boundary = right_boundary(scale, correct_min_index)

# truffleruby will fail because truffleruby `must_equal` check the exact precision of float point
_(correct_boundary).must_equal(boundary) if RUBY_ENGINE != 'truffleruby'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truffleruby will compare the exact float point when use must_equal.

For example, for scale =-10, boundary is 5.562684646268003e-309; correct_boundary is 1/179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137216,

Since 5.562684646268003e-309 will assume all number after 3 will be 0, so it's different compare to correct_boundary. Truffleruby will treat them as not equal but other ruby variation will accept it.

Let me know if I miss something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: To do
Development

Successfully merging this pull request may close these issues.

Implement ExponentialBucketHistogram aggregation
2 participants