[PECO-969] Make sure that DBSQLOperation.fetchChunk returns chunks of requested size #200

kravets-levko · 2023-11-01T19:02:30Z

DBSQLOperation.fetchChunk/fetchAll methods suport a maxRows parameter which should define a chunk size returned from fetchChunk. This parameters is passed to server which uses it to decide how much rows to return. Unfortunately, sometimes (often?) server returns chunks which size doesn't match requested value (chunks could be even bigger). This behavior confuses users of the library, which expect that chunk size will be equal to maxRows (or probably less for a last chunk). We used to explain users how things works (e.g. #155 (comment)), but eventually we had to fix this behavior, especially considering that other connectors already do this right.

The proposed solution is similar to the one implemented in Python connector. Instead of returning raw chunks, we collect records in buffer until we collect enough, and then slice and return a part of this buffer. Remaining records are kept for the next fetchChunk call.

Implement currect/expected behavior of maxRows option for DBSQLOperation.fetchChunk
Allow user to still access raw chunks without buffering (this may optimize memory consumption when exact chunk size is not required)
Use raw chunks in DBSQLOperation.fetchAll - since it collects all the data anyway, there's no need to use intermediate buffer. Using raw chunks will optimize memory consumption
Add/update tests

…ltsHelper into provider of TRowSet Signed-off-by: Levko Kravets <[email protected]>

…vider interface Signed-off-by: Levko Kravets <[email protected]>

Signed-off-by: Levko Kravets <[email protected]>

…size Signed-off-by: Levko Kravets <[email protected]>

Signed-off-by: Levko Kravets <[email protected]>

nithinkdb

Implementation looks good, but I think that we should raise a ticket for fixing the server issue.

kravets-levko · 2023-11-28T11:51:58Z

@nithinkdb I doubt it's possible to fix it on a backend - considering all the variation of the results formats (e.g. it's totally impossible for cloudfetch). There is a similar approach implemented for other drivers to ensure consistent batch size

kravets-levko added 7 commits October 7, 2023 19:41

Refactoring: Introduce concept of results provider; convert FetchResu…

d292824

…ltsHelper into provider of TRowSet Signed-off-by: Levko Kravets <[email protected]>

Convert Json/Arrow/CloudFetch result handlers to implement result pro…

3da3e4a

…vider interface Signed-off-by: Levko Kravets <[email protected]>

Refine the code and update tests

6ada0db

Signed-off-by: Levko Kravets <[email protected]>

Make sure that DBSQLOperation.fetchChunk returns chunks of requested …

f8ca56d

…size Signed-off-by: Levko Kravets <[email protected]>

Add option to disable result buffering & slicing

ec96ec6

Signed-off-by: Levko Kravets <[email protected]>

Update existing tests

44168c4

Signed-off-by: Levko Kravets <[email protected]>

Add tests for ResultSlicer

68c2225

Signed-off-by: Levko Kravets <[email protected]>

Base automatically changed from refactoring-operation-helpers-3 to main November 14, 2023 22:08

Merge branch 'main' into fix-max-rows-behavior

acd511f

kravets-levko temporarily deployed to azure-prod November 14, 2023 22:36 — with GitHub Actions Inactive

Merge branch 'main' into fix-max-rows-behavior

9646b38

kravets-levko temporarily deployed to azure-prod November 15, 2023 13:38 — with GitHub Actions Inactive

kravets-levko added 2 commits November 15, 2023 19:56

Refine code

b6936f2

Signed-off-by: Levko Kravets <[email protected]>

Add more tests

29e86e3

Signed-off-by: Levko Kravets <[email protected]>

kravets-levko temporarily deployed to azure-prod November 15, 2023 17:56 — with GitHub Actions Inactive

kravets-levko marked this pull request as ready for review November 15, 2023 17:59

kravets-levko requested review from arikfr, superdupershant, yunbodeng-db, susodapop, nithinkdb and andrefurlan-db as code owners November 15, 2023 17:59

databricks deleted a comment from codecov-commenter Nov 15, 2023

kravets-levko mentioned this pull request Nov 22, 2023

[PECO-953] Optimize CloudFetchResultHandler memory consumption #204

Merged

2 tasks

Merge branch 'main' into fix-max-rows-behavior

397a7f7

kravets-levko temporarily deployed to azure-prod November 27, 2023 22:07 — with GitHub Actions Inactive

kravets-levko temporarily deployed to azure-prod November 27, 2023 22:38 — with GitHub Actions Inactive

This comment was marked as resolved.

Sign in to view

nithinkdb approved these changes Nov 27, 2023

View reviewed changes

kravets-levko merged commit 9b03de3 into main Nov 28, 2023
5 checks passed

kravets-levko deleted the fix-max-rows-behavior branch November 28, 2023 11:53

kravets-levko changed the title ~~Make sure that DBSQLOperation.fetchChunk returns chunks of requested size~~ [PECO-969] Make sure that DBSQLOperation.fetchChunk returns chunks of requested size Nov 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PECO-969] Make sure that DBSQLOperation.fetchChunk returns chunks of requested size #200

[PECO-969] Make sure that DBSQLOperation.fetchChunk returns chunks of requested size #200

kravets-levko commented Nov 1, 2023 •

edited by jira bot

Loading

This comment was marked as resolved.

nithinkdb left a comment

kravets-levko commented Nov 28, 2023

[PECO-969] Make sure that DBSQLOperation.fetchChunk returns chunks of requested size #200

[PECO-969] Make sure that DBSQLOperation.fetchChunk returns chunks of requested size #200

Conversation

kravets-levko commented Nov 1, 2023 • edited by jira bot Loading

This comment was marked as resolved.

nithinkdb left a comment

Choose a reason for hiding this comment

kravets-levko commented Nov 28, 2023

kravets-levko commented Nov 1, 2023 •

edited by jira bot

Loading