-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix another S3 multipart upload issue with marc exporter (PP-1693) #2053
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,7 +10,7 @@ | |
) | ||
from palace.manager.sqlalchemy.model.resource import Representation | ||
from tests.fixtures.redis import RedisFixture | ||
from tests.fixtures.s3 import S3ServiceFixture | ||
from tests.fixtures.s3 import S3ServiceFixture, S3ServiceIntegrationFixture | ||
|
||
|
||
class MarcUploadManagerFixture: | ||
|
@@ -189,7 +189,7 @@ def test_sync(self, marc_upload_manager_fixture: MarcUploadManagerFixture): | |
] | ||
assert upload.upload_id is not None | ||
assert upload.content_type is Representation.MARC_MEDIA_TYPE | ||
[part] = upload.parts | ||
[part] = upload.parts.values() | ||
assert part.content == marc_upload_manager_fixture.test_record1 * 5 | ||
|
||
# And the s3 part data and upload_id is synced to redis | ||
|
@@ -332,3 +332,64 @@ def test__abort( | |
|
||
# The redis record should have been deleted | ||
mock_delete.assert_called_once() | ||
|
||
def test_real_storage_service( | ||
self, | ||
redis_fixture: RedisFixture, | ||
s3_service_integration_fixture: S3ServiceIntegrationFixture, | ||
): | ||
""" | ||
Full end-to-end test of the MarcUploadManager using the real S3Service | ||
""" | ||
s3_service = s3_service_integration_fixture.public | ||
uploads = MarcFileUploadSession(redis_fixture.client, 99) | ||
uploader = MarcUploadManager(s3_service, uploads) | ||
batch_size = s3_service.MINIMUM_MULTIPART_UPLOAD_SIZE + 1 | ||
|
||
with uploader.begin() as locked: | ||
assert locked | ||
|
||
# Test all three cases for the complete() method. | ||
# | ||
# 1. A small record that doesn't need to be uploaded in parts, it just | ||
# gets uploaded directly when complete is called (test1). | ||
# 2. A large record that needs to be uploaded in parts, on the first sync | ||
# call its buffer is large enough to trigger the upload. When complete | ||
# is called, there is no data in the buffer, so no final part needs to be | ||
# uploaded (test2). | ||
# 3. A large record that needs to be uploaded in parts, on the first sync | ||
# call its buffer is large enough to trigger the upload. When complete | ||
# is called, there is data in the buffer, so a final part needs to be | ||
# uploaded (test3). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Minor/pedantic: It took me a minute to process this explanation because of the sentence structure. Maybe start a new sentence or add a semicolon after "... in parts" at the beginning of each case explanation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I re-worked this comment. |
||
|
||
uploader.add_record("test1", b"test_record") | ||
uploader.add_record("test2", b"a" * batch_size) | ||
uploader.add_record("test3", b"b" * batch_size) | ||
|
||
# Start the sync. This will begin the multipart upload for test2 and test3. | ||
uploader.sync() | ||
|
||
# Add some more data | ||
uploader.add_record("test1", b"test_record") | ||
uploader.add_record("test2", b"a" * batch_size) | ||
uploader.add_record("test3", b"b") | ||
|
||
# Complete the uploads | ||
completed = uploader.complete() | ||
|
||
assert completed == {"test1", "test2", "test3"} | ||
assert uploads.get() == {} | ||
assert set(s3_service_integration_fixture.list_objects("public")) == completed | ||
|
||
assert ( | ||
s3_service_integration_fixture.get_object("public", "test1") | ||
== b"test_record" * 2 | ||
) | ||
assert ( | ||
s3_service_integration_fixture.get_object("public", "test2") | ||
== b"a" * batch_size * 2 | ||
) | ||
assert ( | ||
s3_service_integration_fixture.get_object("public", "test3") | ||
== b"b" * batch_size + b"b" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be better to have this "+1" logic in a single function with the explanation in its docstring and call that from the multiple places that need it. That might make it a little more clear what's going on for future us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified the logic here so its easier to follow, and the part logic with offsets is all encapsulated in the same place.
The simplification comes at the expense of efficiency, since it will require a couple more calls to redis and another call to s3, but that inefficiency probably doesn't make much a difference here and its much more readable.