Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split large text files by size #351

Closed
ghost opened this issue Apr 13, 2014 · 4 comments
Closed

split large text files by size #351

ghost opened this issue Apr 13, 2014 · 4 comments

Comments

@ghost
Copy link

ghost commented Apr 13, 2014

From [email protected] on July 17, 2013 11:44:27

Some producers will have existing content with only one really huge (3MB+) text content document (XHTML or DTBook format). We should have an option in our conversion scripts to split this into several smaller files in the EPUB output. Having several smaller text files improves performance dramatically in reading systems.

The html-utils module contains an XSLT that split an XHTML document based in its structure, but it would be nice to also have an option to split the text content document based on KB.

See also issue #309

Original issue: http://code.google.com/p/daisy-pipeline/issues/detail?id=351

@bertfrees
Copy link
Member

Should be prioritized at some point, since it causes issues on some reading systems with large books.

@josteinaj
Copy link
Member

There's a splitter in the nordic migrator. It doesn't split by KB though and expects certain things from the input.

We'll probably need a better splitter later this year at NLB as we plan to use a single-HTML unzipped EPUB as our master format and then split into multiple files for distribution.

@bertfrees
Copy link
Member

OK thanks for the info!

@bertfrees
Copy link
Member

This was fixed by daisy/pipeline-scripts#149

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants