Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PECO-969] Make sure that DBSQLOperation.fetchChunk returns chunks of requested size #200

Merged
merged 12 commits into from
Nov 28, 2023

Conversation

kravets-levko
Copy link
Contributor

@kravets-levko kravets-levko commented Nov 1, 2023

PECO-969

DBSQLOperation.fetchChunk/fetchAll methods suport a maxRows parameter which should define a chunk size returned from fetchChunk. This parameters is passed to server which uses it to decide how much rows to return. Unfortunately, sometimes (often?) server returns chunks which size doesn't match requested value (chunks could be even bigger). This behavior confuses users of the library, which expect that chunk size will be equal to maxRows (or probably less for a last chunk). We used to explain users how things works (e.g. #155 (comment)), but eventually we had to fix this behavior, especially considering that other connectors already do this right.

The proposed solution is similar to the one implemented in Python connector. Instead of returning raw chunks, we collect records in buffer until we collect enough, and then slice and return a part of this buffer. Remaining records are kept for the next fetchChunk call.

  • Implement currect/expected behavior of maxRows option for DBSQLOperation.fetchChunk
  • Allow user to still access raw chunks without buffering (this may optimize memory consumption when exact chunk size is not required)
  • Use raw chunks in DBSQLOperation.fetchAll - since it collects all the data anyway, there's no need to use intermediate buffer. Using raw chunks will optimize memory consumption
  • Add/update tests

Base automatically changed from refactoring-operation-helpers-3 to main November 14, 2023 22:08
Signed-off-by: Levko Kravets <[email protected]>
Signed-off-by: Levko Kravets <[email protected]>
@codecov-commenter

This comment was marked as resolved.

Copy link
Contributor

@nithinkdb nithinkdb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks good, but I think that we should raise a ticket for fixing the server issue.

@kravets-levko
Copy link
Contributor Author

@nithinkdb I doubt it's possible to fix it on a backend - considering all the variation of the results formats (e.g. it's totally impossible for cloudfetch). There is a similar approach implemented for other drivers to ensure consistent batch size

@kravets-levko kravets-levko merged commit 9b03de3 into main Nov 28, 2023
5 checks passed
@kravets-levko kravets-levko deleted the fix-max-rows-behavior branch November 28, 2023 11:53
@kravets-levko kravets-levko changed the title Make sure that DBSQLOperation.fetchChunk returns chunks of requested size [PECO-969] Make sure that DBSQLOperation.fetchChunk returns chunks of requested size Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants