Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Open Tracing support (WIP) #1345

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Add Open Tracing support (WIP) #1345

wants to merge 2 commits into from

Conversation

erikvanoosten
Copy link
Collaborator

@erikvanoosten erikvanoosten commented Oct 19, 2024

Add Open Tracing support to zio-kafka. This PR is not finished yet, but it will implement the following changes:

  • consumer and producer aspects
  • insert Open Tracing compatible Kafka headers into each outgoing record
  • pick up Open Tracing compatible Kafka headers from each incoming record

This is still Work In Progress!

Also:

  • new feature: producer aspects
  • use ByteRecord alias in Producer trait

TODO:

  • all code to be verified by OTEL expert
  • add otel bagage support
  • add support in consumer
  • add tests
  • consider not mutating record headers in TracingProducerAspect.traced

@grouzen
Copy link

grouzen commented Oct 21, 2024

I want to answer here the questions asked in Discord

I've done some quick research on Kafka support in OpenTracing and OpenTelemetry, and here is what I found. There is an article in the OTEL blog specifically for Kafka: https://opentelemetry.io/blog/2022/instrument-kafka-clients/. It discusses three approaches to instrumenting Kafka consumers and producers. One of them is wrapping the producer/consumer into tracing implementations. It is what you've done in this PR. My idea is to re-use the already existing Java library opentelemetry-kafka-clients-2.6 which is part of opentelemetry-java-instrumentation (https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation/kafka). Opentracing provides its own tracing kafka client implementation. Wrapping the java implementation is preferable in my opinion because it is already tested by many users so why to re-invent the wheel here. We can ship it as aspects to make it more scala/zio idiomatic (that's how you've already done it). We will only need to check one thing - whether the child spans created with the zio-opentelemetry/zio-opentracing will correctly take into account the parent's context. I can say for sure that for this we need to use OpenTelemetry.contextJVM. I'm not sure it will work properly, so maybe we would need to fall back to pure scala solution that uses ContextStorage.fibeRef. But at least we have a reference java implementation that we can copy (this hopefully answers your question about the tracing consumer implementation)

In general, I believe it is good to have both auto and manual instrumentation. Some users might not want to use auto instrumentation, so it is good to give them this option. Moreover, in the case of opentracing, the auto instrumentation agent for Kafka is unavailable (at least I couldn't find it).

Also, as far as I understand, the wrapping approach allows to customize the list of Kafka headers that the user may choose to use as span attributes (at least for OTEL).

Additionally, you can have a look at trace4cats-zio-extras for inspiration.

@erikvanoosten
Copy link
Collaborator Author

erikvanoosten commented Oct 24, 2024

@grouzen Thanks for your analysis. I agree, this definitely needs testing; will the kafka instrumentation from the java agent pick up the right context out of the box?

However, my assumption is that it will not. The zio-kafka producer and consumer run in different threads/fibers than the user application. I do not see how the java agent could support this.

@grouzen
Copy link

grouzen commented Oct 24, 2024

@grouzen Thanks for your analysis. I agree, this definitely needs testing; will the kafka instrumentation from the java agent pick up the right context out of the box?

However, my assumption is that it will not. The zio-kafka producer and consumer run in different threads/fibers than the user application. I do not see how the java agent could support this.

Yes, it will. For this, you need to turn on the ZIO support in the Java agent (open-telemetry/opentelemetry-java-instrumentation#7980) and use the OpenTelemetry.contextJVM layer, which provides a ContextStorage instance that uses the native OTEL Context stored in ThreadLocal.

See also the zio-telemetry docs: https://zio.dev/zio-telemetry/opentelemetry#usage-with-opentelemetry-automatic-instrumentation

@erikvanoosten
Copy link
Collaborator Author

erikvanoosten commented Oct 24, 2024

Yes, it will...

Yeah, I am sure it will pick up some span. My point is, will it pick up the span of the fiber that created the producer, or the span of the fiber that produces records? I strongly suspect it will be the former, while IMHO it should be the latter.

This is because with zio-kafka, the producers and consumers never run on the thread/fiber that actually pushed/polled the records from the kafka consumer/producer.

@grouzen
Copy link

grouzen commented Oct 24, 2024

Yes, it will...

Yeah, I am sure it will pick up some span. My point is, will it pick up the span of the fiber that created the producer, or the span of the fiber that produces records? I strongly suspect it will be the former, while IMHO it should be the latter.

This is because with zio-kafka, the producers and consumers never run on the thread/fiber that actually pushed/polled the records from the kafka consumer/producer.

Ah, it makes sense. I didn't know about this nuance of zio-kafka. In this case I can say it won't work as intended and we need to figure out how to overcome this.

Also:
- producer aspects
- use ByteRecord alias

TODO:
- all code to be verified by OT expert
- add support in consumer
- add tests
- consider not mutating record headers in `TracingProducerAspect.traced`
@joost-de-vries
Copy link

joost-de-vries commented Oct 29, 2024

The java kafka library is already being otel instrumented. If zio-kafka would preserve the fiber context from the application code to the java code (and vice versa for consuming) then we could have tracing 'out of the box'. And we wouldn't need to add aspects in the application code.
That would be nice.
It should be feasible: we do have working tracing with zio -> doobie -> jdbc without aspect code. That works because of the otel instrumentation of jdbc. And the fact that the zio fiber context trace context is the same.
I think java kafka should be a similar case.
Is that an interesting approach?

@erikvanoosten
Copy link
Collaborator Author

erikvanoosten commented Oct 29, 2024

Is that an interesting approach?

It is indeed! It is possible to get all current fiberRefs and then set them on another fiber. That most probably works. However, I am not sure it is a good idea to blindly copy all fiberRefs from one fiber to the next. I would strongly prefer to only copy the fiberRefs that should be copied, so that we do no accidentally ruin some other fiber trickery that is going on.

For this we need some mechanism that allows zio-kafka (and any other zio-library) to discover which fiberRefs need to be copied. The mechanisms should ideally be available in the JVM or in the Zio base library so that zio-kafka does not need to add additional dependencies. For example: a java property with the full class name(s) of the fiberref. Then with a bit of reflection we can select the fiberRefs to copy.

Any other ideas?

@grouzen would you support this in the zio-telemetry libraries?

@grouzen
Copy link

grouzen commented Oct 29, 2024

@grouzen would you support this in the zio-telemetry libraries?

It may be better to keep this functionality in zio-kafka until we realize that other libraries suffer from the same issue. What do you think? In general, I don't mind adding it to zio-telemetry if there is a demand.

@erikvanoosten
Copy link
Collaborator Author

@grouzen would you support this in the zio-telemetry libraries?

It may be better to keep this functionality in zio-kafka until we realize that other libraries suffer from the same issue. What do you think? In general, I don't mind adding it to zio-telemetry if there is a demand.

Yeah, I think we hard-code the list in zio-kafka.

@erikvanoosten
Copy link
Collaborator Author

I started #1356 to explore the discussed alternative where we retain the telemetry context by copying the relevant fiber ref values between fibers inside zio-kafka.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants