Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to read the original data of opus file into python/numpy array? #76

Open
yongxuUSTC opened this issue May 27, 2023 · 2 comments
Open

Comments

@yongxuUSTC
Copy link

I am not familiar with the format of opus file (after I opusenc in.wav out.opus).
How many bytes are in the head of out.opus? Are the rest all actual audio encoded data (2 bytes for each)? Are there any other non-data bytes in the tail?

Could any one can post a sample code to help me to read the out.opus into python/numpy array with Uint16 format? I do not need to decode the out.opus file, I just need the original encoded data in the out.opus file.

Thank you very much

@rillian
Copy link
Contributor

rillian commented May 27, 2023

It's not as simple as header and tail bytes. The compressed opus audio packages are split into segments which are grouped with periodic headers including timestamps for seeking. The format is documented in

I looked briefly and didn't find a pure python library that mentioned access to the raw encoded data, although several will decode opus files to pcm audio in numpy arrays.

You could however use the pyogg.ogg ctypes wrapper to access the libogg C implementation directly and pull the data out that way. It's also possible some of the higher-level python libraries have accessible decapsulation functions.

But you mention Uint16 which is confusing. That doesn't make a lot of sense for the compressed opus-encoded data, which is a complex, entropy-coded data structure packed into bytes.

@yongxuUSTC
Copy link
Author

It's not as simple as header and tail bytes. The compressed opus audio packages are split into segments which are grouped with periodic headers including timestamps for seeking. The format is documented in

I looked briefly and didn't find a pure python library that mentioned access to the raw encoded data, although several will decode opus files to pcm audio in numpy arrays.

You could however use the pyogg.ogg ctypes wrapper to access the libogg C implementation directly and pull the data out that way. It's also possible some of the higher-level python libraries have accessible decapsulation functions.

But you mention Uint16 which is confusing. That doesn't make a lot of sense for the compressed opus-encoded data, which is a complex, entropy-coded data structure packed into bytes.

Thank you very much for your reply. Is it possible to get the discrete representation (index or int ?) of the codes in the opus file? Just like, nowadays, the neural network based codec (e.g., soundstream https://arxiv.org/abs/2107.03312) can produce the discrete representation through RVQ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants