-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry connection on 'Remote end closed connection without response' #433
Comments
Hi @septponf! Thank you for reporting this issue. Yes, retry logic currently is not perfect, and each report like this helps to make it better. I need to explore your issue more, but generally retrying query execution is not safe, even on errors like terminated connection. See, if you submitted query, but didn't get the response from the server (for any reason) - you don't know on which stage query execution failed. Maybe server didn't even start processing the request, as well as it may be possible that query was successfully executed but server failed when sending the response to the library. And it's relatively safe to re-execute read queries (e.g. We do have a list of errors that a safe to retry, and this library still doesn't fully implement it. Right now I'm doing another iteration to improve the retry logic of this library, and I will check what I can do in your case. But keep in mind what I explained above. In some cases, only user can decide what is safe to retry and what's not |
Thank you @kravets-levko for swift response. I appreciate you looking into this |
@kravets-levko Found this issue when looking into the same problem with dbt runs against serverless warehouses. I did want to add a comment - on the issue in the dbt-databricks repo @benc-db said the below
You stated in your response to @septponf that we shouldn't retry these because it is not safe - does what @benc-db stated change that? It seems that the two of you are of differing opinions on if it is truly safe to retry here - if @benc-db is correct and we can be confident that this error means we were never able to try the query then we should be able to retry. On the other hand, if you are correct and we cannot guarantee that it is safe to retry based on this error message then we can likely just close the issue on this repo, as it will by necessity need to be handled downstream (or so I would think). |
@NodeJSmith since I commented that, I have subsequently seen issues where the connection gets broken but the thrift server does schedule the command for execution ;( |
Damn, that's unfortunate. Would there be anyway to query the databricks API for the status of the query using the statement ID to attempt to retry based on that, like with the get_status call? |
If we have a statement id, does that mean it was scheduled? I think the core idea makes sense if we have the ID available in cases where we get disconnected. Don't reissue, but just check to see if the server knows about it. That might also fail, because the scenarios I'm thinking of, the server is so overloaded we stop getting responses, but it's something to try. @kravets-levko thoughts? |
Hi all, do you guys have any workaround for this? We are using |
I initially reported this issue against dbt-databricks but was asked by a collaborator to file it here.
So basically, when running dbt request against a serverless sql warehouse, we get intermittent errors as below and the dag execution is aborted.
Runtime Error
('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
I believe it should be safe to retry connection and retry execution.
databricks-sql-connector==2.9.5
The text was updated successfully, but these errors were encountered: