-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[connector/failover] Support queuing data in failover connector #33007
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Makes sense to me. Can I assign this to you @sinkingpoint? |
Hey @sinkingpoint, thanks for the issue. I see your point, I can add this if you’d like. I’m also planning an update atm to the export flow (to retry by sampling data points instead of switching the entire pipeline), and can add this in tandem. |
I have a patch internally already, so I can PR that tomorrow. If it doesn't work with your expected changes @akats7 then I'm happy to abandon it in favor of your work |
Hey @sinkingpoint, any update on this? |
Hey - apologies, been a bit busy. I think it's best to go with your stuff here. My internal patch is proving weirdly buggy for reasons I can't seem to work out |
@akats7 since this sounds like quite a different solution than proposed in this issue, do you mind opening a new issue to describe this? Then I will close this one in favor of the new one. |
Hey @djaglowski, Sure thing, we can probably leave this issue open as well, since it’s a separate feature and can be tracked independently. |
Ok, great. I misunderstood it to be one or the other but if both would be useful then we'll leave this open too. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Will pick this back up |
Is it worth considering a queue |
I do actually like the idea of decoupling the queueing from the exporter as a component that can be used for use cases similar to this one. To me it seems like it might make more sense as a processor though, that sits in front of the exporter portion of a connector. I don’t think a queue connector would be ideal because for the one to many consumption model it would have to have some sort of routing logic, which components like the routing connector and round robin connector already do. I think putting a queue processor in front of the component responsible for the routing achieves the same effect. @djaglowski what are your thoughts on this. |
I agree that a queueing processor makes more sense than a connector. That said, I don't whether anyone will sponsor it. I think it seems like a reasonable idea but don't have time to commit to maintaining it. Seems worth pitching as a new component though. |
Makes sense, I’ll create an issue. |
Hey @djaglowski, wanted to get your thoughts. |
@dmitryax, you've spent a lot more time considering the design implications of how we manage queueing of data. Do you have an opinion about whether connectors should support it as well? I don't think I'm very tuned into the considerations here but my default is to not add complexity to the connectors framework unless there is a specific reason it needs to be there, so I'd prefer we implement queueing in a processor if we need it earlier in the pipeline in some cases. |
I'm not up to speed on the original problem. Do I understand correctly that the goal is to have failover behavior between several exporters? I would imagine that having a failover exporter, not a connector, would be easier for end users. Making it an exporter goes with all the queuing and batching (experimental for now) capabilities OOTB. The problem is that this failover exporter would need the |
That’s an interesting thought, to me it seems like the connector pattern is a better fit than a sort of wrapper exporter, that is responsible for determining which true exporter will be used, I don’t recall if there are other exporters that are doing something similar. Another nice feature of having it be a connector also is the flexibility to have access to the processor pipeline. In the case you use a different exporter in the case of failure and want to do some additional tagging or something along those lines. I do agree that it would be great to have the exporter capabilities OOTB, but not sure if it is worth making it less flexible. |
The original reason to implement failover as a connector instead of an exporter is that it does not have to be associated with any particular type of exporter. Even if you implemented failover as a generic feature of all exporters, similar to exporter queue, this doesn't allow for actual use cases like where your backup destination is a different type. e.g. try to export with otlp but if that fails, then dump to s3. |
Hello, I'd like to follow up on this issue as this is bothering us too. To summarize:
For point 1, #35803 is proposed but doesn't seem to gain much attention. I also noticed that in open-telemetry/opentelemetry-collector#8122, the team is trying to move the batching function to exporter itself, and eventually deprecate the batch processor. In the meantime, it seems we are shifting to fully synchronous consumption in the exporter once open-telemetry/opentelemetry-collector#11951 is merged. Considering the efforts on the exporter side, it feels a bit awkward to propose a new queue processor. Point 2 reminds me #36094 that adds the exporter settings to the loadbalancing exporter to make it more robust. I agree that it's valuable to provide the failover capability for all exporters, which also means that we can't add the exporter utility functions like what loadbalancing exporter does. According to the connector's readme:
Reading it gives me an impression that the connector should implement both the exporter and receiver interfaces, and have the capabilities of exporter and receiver. The connector indeed implements both interfaces, but the receiver interface is very loose (only I put together a dirty example with the failover connector and it works as expected, with all the exporter capabilities. type Config struct {
// failover configs are omitted
// exporter helper configs
exporterhelper.TimeoutConfig `mapstructure:",squash"` // squash ensures fields are correctly decoded in embedded struct.
exporterhelper.QueueConfig `mapstructure:"sending_queue"`
RetryConfig configretry.BackOffConfig `mapstructure:"retry_on_failure"`
// Experimental: This configuration is at the early stage of development and may change without backward compatibility
// until https://github.com/open-telemetry/opentelemetry-collector/issues/8122 is resolved
BatcherConfig exporterbatcher.Config `mapstructure:"batcher"`
}
func createLogsToLogs(
ctx context.Context,
set connector.Settings,
cfg component.Config,
logs consumer.Logs,
) (connector.Logs, error) {
l, err := newLogsToLogs(set, cfg, logs)
if err != nil {
return nil, err
}
expSet := exporter.Settings{
ID: set.ID,
TelemetrySettings: set.TelemetrySettings,
BuildInfo: set.BuildInfo,
}
oCfg := cfg.(*Config)
return exporterhelper.NewLogs(ctx, expSet, cfg,
l.ConsumeLogs,
exporterhelper.WithCapabilities(consumer.Capabilities{MutatesData: false}),
exporterhelper.WithTimeout(oCfg.TimeoutConfig),
exporterhelper.WithRetry(oCfg.RetryConfig),
exporterhelper.WithQueue(oCfg.QueueConfig),
exporterhelper.WithBatcher(oCfg.BatcherConfig),
exporterhelper.WithStart(l.Start),
exporterhelper.WithShutdown(l.Shutdown),
)
} To me the end result is not too bad. The change is minimal, it solves the original issue, and also brings the connector closer to what it claims to be, "an exporter and receiver". I'm not sure if this is considered as antipattern or "adding complexity to the connectors framework". Another downside I can think of (and specific to the failover connector) is that the connectors:
failover:
retry_interval: 10s
retry_gap: 1s
max_retries: 10000000
retry_on_failure:
enabled: true
max_elapsed_time: 0 We have two set of "retries" here but with different meanings, which could be confusing. However, I believe most of the users should be relatively familiar with the @djaglowski @dmitryax @akats7 would like to see WDYT on this approach, thanks! |
Component(s)
connector/failover
Is your feature request related to a problem? Please describe.
When using the failover connector, we've found it beneficial to disable the sending queue for the underlying exporters, so as to failover more quickly in the event of a total failure. This has the downside of causing us to drop telemetry in the event that all of our exporters are down. In this case, it would be helpful for the failover connector to maintain a sending queue that operates over all the underlying exporters, flushing to whatever exporter comes up.
Describe the solution you'd like
I'd like to bring in a
queueSender
(https://github.com/open-telemetry/opentelemetry-collector/blob/1d52fb9c3cca27f5101b25abebd1a3bfb09bf852/exporter/exporterhelper/queue_sender.go#L31-L43) or similar to the failover connector that new data gets enqueued on to before being flushed to the exportersDescribe alternatives you've considered
We could enable queuing on the underlying exporters, but this has a few disadvantages:
Additional context
No response
The text was updated successfully, but these errors were encountered: