split large text files by size #351

ghost · 2014-04-13T00:30:01Z

From [email protected] on July 17, 2013 11:44:27

Some producers will have existing content with only one really huge (3MB+) text content document (XHTML or DTBook format). We should have an option in our conversion scripts to split this into several smaller files in the EPUB output. Having several smaller text files improves performance dramatically in reading systems.

The html-utils module contains an XSLT that split an XHTML document based in its structure, but it would be nice to also have an option to split the text content document based on KB.

See also issue #309

Original issue: http://code.google.com/p/daisy-pipeline/issues/detail?id=351

bertfrees · 2018-01-04T10:40:16Z

Should be prioritized at some point, since it causes issues on some reading systems with large books.

josteinaj · 2018-01-04T16:21:27Z

There's a splitter in the nordic migrator. It doesn't split by KB though and expects certain things from the input.

We'll probably need a better splitter later this year at NLB as we plan to use a single-HTML unzipped EPUB as our master format and then split into multiple files for distribution.

bertfrees · 2018-01-04T17:13:26Z

OK thanks for the info!

bertfrees · 2020-02-12T13:16:26Z

This was fixed by daisy/pipeline-scripts#149

Repository owner assigned rdeltour May 5, 2014

bertfrees mentioned this issue Mar 6, 2017

Fix various Benetech issues daisy/pipeline-tasks#96

Open

6 tasks

rdeltour mentioned this issue Mar 13, 2018

Improve EPUB 3 Content Documents chunking daisy/pipeline-tasks#123

Closed

bertfrees mentioned this issue Apr 10, 2018

Improve EPUB 3 Content Documents chunking daisy/pipeline-scripts#123

Closed

bertfrees added pipeline-scripts and removed Component-Modules labels Apr 25, 2018

bertfrees added eili accepted 2 - In Progress prio:1 and removed Priority-Low labels Jun 5, 2018

bertfrees assigned bertfrees and unassigned rdeltour Jun 12, 2018

bertfrees added enhancement and removed bug labels Jun 12, 2018

bertfrees added this to the v1.12.0 milestone Feb 12, 2020

bertfrees closed this as completed Feb 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split large text files by size #351

split large text files by size #351

ghost commented Apr 13, 2014 •

edited by bertfrees

Loading

bertfrees commented Jan 4, 2018

josteinaj commented Jan 4, 2018

bertfrees commented Jan 4, 2018

bertfrees commented Feb 12, 2020

split large text files by size #351

split large text files by size #351

Comments

ghost commented Apr 13, 2014 • edited by bertfrees Loading

bertfrees commented Jan 4, 2018

josteinaj commented Jan 4, 2018

bertfrees commented Jan 4, 2018

bertfrees commented Feb 12, 2020

ghost commented Apr 13, 2014 •

edited by bertfrees

Loading