You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While the toolkit can stream files from local file systems, in many cases users of the toolkit will want to stream publications from an object storage provider, such as Amazon S3 or GCP Cloud Storage. This may seam trivial to implement at first glance ("just hook up S3 to the streamer!") but doing so efficiently is difficult due to the nature of reading ZIP files (EPUB, CBZ etc.).
There is already an optional "minimized read" utility in the ZIP archive reader in the toolkit, but this only works well when paired with lower-level optimizations in the reading of the ZIP itself. The following diagram shows how many reads are needed just to generate a WebPub manifest. For local filesystems, this is perfectly fine and efficient, but when performing the reads on a file located across the web, each additional request adds additional latency. If no optimizations are performed, the latency has a big impact on the performance of whatever software a user of the go-toolkit is writing, not to mention the additional costs of the requests (many object storage providers charge by # of requests). Below is an example of the reads that occur for opening the Moby Dick EPUB file:
I plan on porting my cloud storage reading logic to the go-toolkit to address this issue.
The text was updated successfully, but these errors were encountered:
While the toolkit can stream files from local file systems, in many cases users of the toolkit will want to stream publications from an object storage provider, such as Amazon S3 or GCP Cloud Storage. This may seam trivial to implement at first glance ("just hook up S3 to the streamer!") but doing so efficiently is difficult due to the nature of reading ZIP files (EPUB, CBZ etc.).
There is already an optional "minimized read" utility in the ZIP archive reader in the toolkit, but this only works well when paired with lower-level optimizations in the reading of the ZIP itself. The following diagram shows how many reads are needed just to generate a WebPub manifest. For local filesystems, this is perfectly fine and efficient, but when performing the reads on a file located across the web, each additional request adds additional latency. If no optimizations are performed, the latency has a big impact on the performance of whatever software a user of the go-toolkit is writing, not to mention the additional costs of the requests (many object storage providers charge by # of requests). Below is an example of the reads that occur for opening the Moby Dick EPUB file:
I plan on porting my cloud storage reading logic to the go-toolkit to address this issue.
The text was updated successfully, but these errors were encountered: