Convert MARC Export to use Celery (PP-1472) #2017

jonathangreen · 2024-08-28T18:05:12Z

Description

Convert the MARC export script to use Celery.

Motivation and Context

This ended up being rather involved, since the export can take quite a long time. Even getting a large enough chunk of data for a single s3 multipart upload can take longer then I was comfortable with.

This PR takes the approach of processing batch_size (default: 500) records in one task, then saving the output to redis and re-queuing the task to process the next batch_size of records. Once the data in redis is large enough, a multipart upload is started in S3, and the multipart data is cached in redis. This continues until the file is completely generated.

How Has This Been Tested?

Tested by generating marc files locally with some production DB data
Unit tests

Checklist

I have updated the documentation accordingly.
All new and existing tests passed.

codecov · 2024-08-28T18:12:55Z

Codecov Report

Attention: Patch coverage is 98.80342% with 7 lines in your changes missing coverage. Please review.

Project coverage is 90.67%. Comparing base (b190c49) to head (c1098b0).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/palace/manager/service/redis/models/marc.py	97.63%	2 Missing and 2 partials ⚠️
src/palace/manager/marc/exporter.py	97.93%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2017      +/-   ##
==========================================
+ Coverage   90.59%   90.67%   +0.08%     
==========================================
  Files         338      342       +4     
  Lines       40135    40502     +367     
  Branches     8681     8777      +96     
==========================================
+ Hits        36360    36726     +366     
- Misses       2509     2510       +1     
  Partials     1266     1266

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

README.md

src/palace/manager/marc/annotator.py

src/palace/manager/api/admin/controller/catalog_services.py

dbernstein

That is an absolute beast. I like your batch + re-queueing approach. I think that pattern will likely come in handy in other places.

src/palace/manager/api/admin/controller/catalog_services.py

src/palace/manager/marc/settings.py

…essed

jonathangreen commented Aug 28, 2024

View reviewed changes

README.md Show resolved Hide resolved

src/palace/manager/marc/annotator.py Show resolved Hide resolved

src/palace/manager/api/admin/controller/catalog_services.py Show resolved Hide resolved

jonathangreen requested a review from a team August 28, 2024 23:03

jonathangreen marked this pull request as ready for review August 28, 2024 23:03

jonathangreen added the feature New feature label Aug 29, 2024

dbernstein self-requested a review September 3, 2024 15:50

dbernstein approved these changes Sep 3, 2024

View reviewed changes

src/palace/manager/api/admin/controller/catalog_services.py Show resolved Hide resolved

src/palace/manager/marc/settings.py Outdated Show resolved Hide resolved

jonathangreen added 6 commits September 4, 2024 09:49

Redis json lock added

faea04d

Celery MarcFileExporter implementation

9e615a3

Code review feedback: Update settings comments

252d585

Rename the MarcFileUploads class to MarcFileUploadSession

b75320c

Rename MarcUploader to MarcUploadManager

3576991

Set a state on the upload session, so we can tell if it is being proc…

2439737

…essed

jonathangreen force-pushed the feature/marc-celery branch from 314fde3 to 2439737 Compare September 4, 2024 13:50

Fix import

c1098b0

jonathangreen merged commit 1ac4d03 into main Sep 4, 2024
20 of 21 checks passed

jonathangreen deleted the feature/marc-celery branch September 4, 2024 14:12

This was referenced Sep 5, 2024

Fix non-canonical protocol not showing settings (PP-1472) #2029

Merged

Migrate MARC exporter protocol name #2030

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert MARC Export to use Celery (PP-1472) #2017

Convert MARC Export to use Celery (PP-1472) #2017

jonathangreen commented Aug 28, 2024 •

edited

Loading

codecov bot commented Aug 28, 2024 •

edited

Loading

dbernstein left a comment

Convert MARC Export to use Celery (PP-1472) #2017

Convert MARC Export to use Celery (PP-1472) #2017

Conversation

jonathangreen commented Aug 28, 2024 • edited Loading

Description

Motivation and Context

How Has This Been Tested?

Checklist

codecov bot commented Aug 28, 2024 • edited Loading

Codecov Report

dbernstein left a comment

Choose a reason for hiding this comment

jonathangreen commented Aug 28, 2024 •

edited

Loading

codecov bot commented Aug 28, 2024 •

edited

Loading