-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propose using a different schema to represent Events in a span #37028
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Hey 👏 with the default mapping mode (which is the mode So for traces, the For that mapping mode there was already a lot of thinking on how data is stored. When it comes to how OTel data is stored in Elasticsearch, with the All I'm saying here is meant for the The idea is that events are modelled as log records, therefore they'll end up in the logs data stream - that's very natural for log events, may not be for span events, but I think so far, the idea is that all events will end up in the same data stream. You can of course connect back those span events to the original span via the span id. This is aligned with what's stated here in the docs. It's interesting to see that in OTLP, there is a specific type for span events which is totally different from log events. In any case - with the So, I'd say the Having said that, a few throughs on the default mapping mode (
That's technically clearly possible, and would be indeed an issue. Honestly I'd have expected to spec to say something about event name cardinality, all I find is this part here:
With that, I'd say, if the user follows the spec, there should not be cardinality explosion. On that other hand, the events API easily allows this, so I think you raise a good point here. |
Component(s)
exporter/elasticsearch
Is your feature request related to a problem? Please describe.
When storing Span
Events
in elasticsearch, the eventname
becomes the key in the default mapping mode, under which different attributes are stored, e.g. if we have events with name "my-event-1", "my-event-2", then in Elasticsearch we'll haveEvents.my-event-1.time
,Events.my-event-2.time
, etc. This does not seem to follow the data format for events for a Span from opentelemetry collector, which are modeled as an array ofSpan_Event
, in which aSpan_Event
will contain fields liketime
,name
and array ofattributes
.The issue I see with this approach is that if
name
is given arbitrary values (e.g. random UUIDs), then we could see an arbitrary increase in the number of keys, leading to mapping field explosion in Elasticsearch.Describe the solution you'd like
Store span events as an array in elasticsearch, in which each element is an object with fields with
time
,name
and array ofattribute
(with dropped attribute counts as another possible field - like theSpan_Event
class)Admittedly this format may require nested objects which may bring its own performance issues, but it resembles more the data layout from opentelemetry pdata.
Describe alternatives you've considered
No response
Additional context
The schema proposed above would follow the same format for spans, e.g., we have
Span.Name
andSpan.Attributes
, and we'd haveEvent.Name
andEvent.Attributes
, and more closely represents theEvent
as defined in opentelemetryThe text was updated successfully, but these errors were encountered: