Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation provides very few usage details #2705

Open
Jonny-Burkholder opened this issue Jan 4, 2025 · 3 comments
Open

Documentation provides very few usage details #2705

Jonny-Burkholder opened this issue Jan 4, 2025 · 3 comments

Comments

@Jonny-Burkholder
Copy link

Jonny-Burkholder commented Jan 4, 2025

There is a lot of great work being done here, and a lot of detailed documentation and examples. However, much of the documentation and examples are more technical and detailed (long if/else chains checking for command line input, etc), and some examples (the go bindings) are not actually useable as is - for those of us who are not primarily c++ developers, it can be very difficult from the documentation to tell how to actually use this library. The examples seem to assume that you're building your app within the whisper repository, or that every user intends to use the whisper repo as a standalone CLI. The README.md has a lot of information about CUDA, OpenVINO, quantization, and other technical details, but outside of running a basic example, there is pretty much no information on how to actually use this amazing library. It would be very, very helpful if there was simple, clear documentation (for us asr and c++ beginners) on writing the equivalent of "hello, world" - importing whisper into our own projects to use. For example, if there was a broad step by step (wouldn't have to be long, just a few bullet points) on how to replicate a simpler version of the main cli/transcription functionality in our own repositories, and a rough overview of the library's API, it would make adopting this library much more achievable.

It may be that the information is here somewhere and I'm missing it, or that the answer is that I just need to get better at c++ (which is true), and if I'm simply missing the information I'm happy to have that pointed out to me. But many libraries I've used that are much simpler have very clear usage documentation, and I don't think that's an unreasonable thing to expect. For instance pocketsphinx (what I'm currently using) immediately has a very clear, easy to follow example of using live audio in your project that a newcomer to the library can easily replicate.

Thanks to the contributors for this amazing project. I hope this comes across less as a complaint and more as my eagerness to use this tool in my own projects

@mrfragger
Copy link

for f in *.opus
  do sleep .1; ffmpeg -hide_banner -i "$f" -f wav -ar 16000 -ac 1 - | nice ~/whisper/whisper.cpp-1.7.3/build/bin/main -m ~/whisper/models/ggml-$setmodel.bin - -osrt -of "${f%.*}" -t 8 -l "$setlanguage"  $translate -ml $maxlength $splitatword $setprintcolors $setng $setfa --prompt "$setprompt"; [ ! -d srtsubs ] && mkdir srtsubs/ ; mv *.srt srtsubs/ ; done
  
  Hopefully that'll give you an idea.  I actually now transcribe one chapter at a time, split into 2 min audio segments then stitch that vtt together.  Then at very end stitch all chapter vtt together into one vtt.  Obviously this is all done automatically.

@Jonny-Burkholder
Copy link
Author

Hey thanks! So basically just using ffmpeg to pipe the audio to whisper?

For context I'm working on a voice assistant that gets audio through a websocket from a smart speaker. My strategy so far is importing whisper.h and trying to reverse-engineer the command example

@crummyh
Copy link
Contributor

crummyh commented Jan 6, 2025

I've had the same issue with docs trying to make an app with the Java bindings. More documentation about including whisper.cpp as a library for another project would be very nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants