Implement efficient object storage streaming handler #102

chocolatkey · 2024-09-07T23:03:01Z

While the toolkit can stream files from local file systems, in many cases users of the toolkit will want to stream publications from an object storage provider, such as Amazon S3 or GCP Cloud Storage. This may seam trivial to implement at first glance ("just hook up S3 to the streamer!") but doing so efficiently is difficult due to the nature of reading ZIP files (EPUB, CBZ etc.).

There is already an optional "minimized read" utility in the ZIP archive reader in the toolkit, but this only works well when paired with lower-level optimizations in the reading of the ZIP itself. The following diagram shows how many reads are needed just to generate a WebPub manifest. For local filesystems, this is perfectly fine and efficient, but when performing the reads on a file located across the web, each additional request adds additional latency. If no optimizations are performed, the latency has a big impact on the performance of whatever software a user of the go-toolkit is writing, not to mention the additional costs of the requests (many object storage providers charge by # of requests). Below is an example of the reads that occur for opening the Moby Dick EPUB file:

I plan on porting my cloud storage reading logic to the go-toolkit to address this issue.

chocolatkey self-assigned this Sep 7, 2024

chocolatkey added the enhancement label Sep 7, 2024

HadrienGardeur added Epic and removed enhancement labels Jan 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement efficient object storage streaming handler #102

Implement efficient object storage streaming handler #102

chocolatkey commented Sep 7, 2024 •

edited

Loading

Implement efficient object storage streaming handler #102

Implement efficient object storage streaming handler #102

Comments

chocolatkey commented Sep 7, 2024 • edited Loading

chocolatkey commented Sep 7, 2024 •

edited

Loading