Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Kafka exporter to respect max message size #36982

Open
yurishkuro opened this issue Dec 29, 2024 · 8 comments
Open

Enhance Kafka exporter to respect max message size #36982

yurishkuro opened this issue Dec 29, 2024 · 8 comments
Assignees
Labels
enhancement New feature or request exporter/kafka good first issue Good for newcomers help wanted Extra attention is needed

Comments

@yurishkuro
Copy link
Member

Component(s)

exporter/kafka

Is your feature request related to a problem? Please describe.

The exporter has a config option MaxMessageBytes but it itself does not respect it and attempts to send a full serialized payload to the Kafka driver which may reject it based on this setting.

Describe the solution you'd like

Most payloads can be safely split into chunks of "safe" size that will be accepted by the Kafka driver. For example, in Jaeger integration tests there is a test that writes a trace with 10k spans, which is 3Mb in size when serialized as JSON. The trace can be trivially split into multiple messages that would fit in the default 1Mb size limit.

Describe alternatives you've considered

No response

Additional context

jaegertracing/jaeger#6437 (comment)

@yurishkuro yurishkuro added enhancement New feature or request needs triage New item requiring triage help wanted Extra attention is needed good first issue Good for newcomers labels Dec 29, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@RenuBhati
Copy link

Hey @yurishkuro, I would like to contribute to this issue. Can you please assign me this?

@RenuBhati
Copy link

Hey @yurishkuro, I’m tied up with another project and it’s taking longer than expected. I won't be able to pick this up. Thanks for your understanding.

@RenuBhati RenuBhati removed their assignment Dec 31, 2024
@LZiHaN
Copy link

LZiHaN commented Jan 1, 2025

hi @yurishkuro , is this task still available? I’d like to give it a try, assign please

@JaredTan95 JaredTan95 removed the needs triage New item requiring triage label Jan 3, 2025
@chahatsagarmain
Copy link

@LZiHaN are you working on this ?

@LZiHaN
Copy link

LZiHaN commented Jan 5, 2025

@LZiHaN are you working on this ?

Yes, I'm working on it.

@LZiHaN
Copy link

LZiHaN commented Jan 8, 2025

Hi @yurishkuro ,

I’m working on implementing this feature and wanted to confirm if the approach I’m considering for message splitting and reassembling is feasible.
My plan is as follows:

  1. Splitting the Message: When a message exceeds MaxMessageBytes, I will split it into multiple chunks. Each chunk will include the following information in the message headers:
  • message_id (to identify the original message),
  • chunk_index (the position of this chunk in the message),
  • total_chunks (the total number of chunks for the message).
  1. Reassembling the Message: On the consumer side, I will use the message_id in the header to group the chunks. Each chunk will have its chunk_index and total_chunks in the header to help with ordering. Once all chunks are received, I will concatenate the Value fields from each chunk to reassemble the original message.

Is this approach viable, and would it work seamlessly with the Kafka producer/consumer setup? Or are there any potential issues with storing this information in the headers and reassembling the message on the consumer side? Looking forward to your feedback.

@yurishkuro
Copy link
Member Author

@LZiHaN this is a possible approach but it would be a breaking change since a consumer that does not understand this chunking may not be able to reassemble the message. My idea was that we instead split the spans from the payload into multiple payloads, such that each payload fits in the MaxMessageSize when serialized. It's not quite simple to implement because it's possible the payload has one huge spans, but if we can split it this way then it's a fully backwards compatible solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request exporter/kafka good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants