Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Media overlays not working optimally when reading word-for-word #80

Open
jansc opened this issue Jun 19, 2024 · 0 comments
Open

Media overlays not working optimally when reading word-for-word #80

jansc opened this issue Jun 19, 2024 · 0 comments

Comments

@jansc
Copy link

jansc commented Jun 19, 2024

Hi!

I'm currently working on automatic creation of word-for-word media overlays for EPUB files by simply matching the corresponding audio book, and found some problems with the current media overlay module in src/electron/renderer/media-overlays.ts.

One problem is that the sound sometimes is a bit choppy, depending on the browser. The reason for that seems to be that the code relies on the audio elements currentTime, which may be a bit off because some (all?) browsers reduce the precision to avoid fingerprinting. Therefore, the following code does not always set the correct value for contiguous, even though the SMIL files has back-to-back begin and end values:

const contiguous =
       _previousAudioUrl === _currentAudioUrl &&
       typeof _previousAudioEnd !== "undefined" &&
       _previousAudioEnd > (timeToSeekTo - 0.02) &&
       _previousAudioEnd <= timeToSeekTo &&
       _currentAudioElement.currentTime >= (timeToSeekTo - 0.1);
       // _currentAudioElement.currentTime <= (timeToSeekTo + 0.5)

I've fixed this by allowing far more variance (2 seconds), but I'm not sure if this breaks media overlays for other EPUB files. What do you think? Where do the constants you used come from?

const contiguous =
       _previousAudioUrl === _currentAudioUrl &&
       typeof _previousAudioEnd !== "undefined" &&
       Math.abs(_previousAudioEnd - timeToSeekTo) <= 2;

The other problem I was experiencing was that, sometimes, the rendring of the media overlay CSS does not keep up with the audio. I've tried to fix this in two ways:

  1. findNextTextAudioPair() always searches the whole SMIL data from the beginning - even though only the next element is needed. By using a generator-function instead, the overhead is avoided.
  2. I've also added a check to see if the next Text Audio pair is pass the audio elements currentTime. If yes, simply skip to the next element without doing any rending like changing the CSS class on the element referred to by the text id. The problem was especially visible in the Thorium reader which I assume is based on r2-navigator-js.

These two fixes seem to help a lot, so now I am able to apply word-for-word highlighting.

Unfortunately, my code is base on the d-i-t-a-reader which is a fork of your code, so I can't provide you with a pull request. But this is how the generator function looks like:

  // generator function to yield all elements in the mo tree
  *textAudioPairGenerator(
    mo: MediaOverlayNode
  ): IterableIterator<MediaOverlayNode | undefined>  {
    if (!mo.hasOwnProperty("Children") || !mo.Children.length) {
      yield mo;
    } else {
      for (const child of mo.Children) {
        yield *this.textAudioPairGenerator(child);
       }
     }
   }

You can simply initialize it when setting the rootMo:

this.mediaOverlayGenerator = this.textAudioPairGenerator(rootMo);

When trying to get the next media overlay node, I do the following:

      // find next text audio pair where pair.end > currentTime
      let nextTextAudioPair: MediaOverlayNode | undefined;
      while (true) {
        nextTextAudioPair = (this.mediaOverlayGenerator as IterableIterator<MediaOverlayNode>).next().value;
        if (!nextTextAudioPair) {
          break;
        }
        let beginEnd = this.getBeginEndFromNode(nextTextAudioPair);
        if (!beginEnd.end || beginEnd.end > currentTime) {
          break;
        }
      }

getBeginEndFromNode() is just a little helper function that gives me the begin and end values of the current MediaOverlayNode.

Here is the EPUB file I'm testing with: https://github.com/ravn-no/epub-mo-test/raw/main/alice-in-wonderland-automatic-mo.epub Please note that the automatically created media overlay is not perfect as I did not bother to edit the original sound file. Also, our system does not officially support english, so I had to create a little hack to create this file.

So I guess, my question is if you think that this code has any unwanted consequences for other EPUB files? Do you now where I can get hold of more EPUB files with media overlays for testing? On the other hand, I think that the generator function is a better way of providing the next node nonetheless. No need to loop over the same nodes over and over. Skipping mo-nodes that are past currentTime might or might not work, not sure of the consequences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant