Documentation provides very few usage details #2705

Jonny-Burkholder · 2025-01-04T21:21:21Z

There is a lot of great work being done here, and a lot of detailed documentation and examples. However, much of the documentation and examples are more technical and detailed (long if/else chains checking for command line input, etc), and some examples (the go bindings) are not actually useable as is - for those of us who are not primarily c++ developers, it can be very difficult from the documentation to tell how to actually use this library. The examples seem to assume that you're building your app within the whisper repository, or that every user intends to use the whisper repo as a standalone CLI. The README.md has a lot of information about CUDA, OpenVINO, quantization, and other technical details, but outside of running a basic example, there is pretty much no information on how to actually use this amazing library. It would be very, very helpful if there was simple, clear documentation (for us asr and c++ beginners) on writing the equivalent of "hello, world" - importing whisper into our own projects to use. For example, if there was a broad step by step (wouldn't have to be long, just a few bullet points) on how to replicate a simpler version of the main cli/transcription functionality in our own repositories, and a rough overview of the library's API, it would make adopting this library much more achievable.

It may be that the information is here somewhere and I'm missing it, or that the answer is that I just need to get better at c++ (which is true), and if I'm simply missing the information I'm happy to have that pointed out to me. But many libraries I've used that are much simpler have very clear usage documentation, and I don't think that's an unreasonable thing to expect. For instance pocketsphinx (what I'm currently using) immediately has a very clear, easy to follow example of using live audio in your project that a newcomer to the library can easily replicate.

Thanks to the contributors for this amazing project. I hope this comes across less as a complaint and more as my eagerness to use this tool in my own projects

mrfragger · 2025-01-05T05:38:55Z

for f in *.opus
  do sleep .1; ffmpeg -hide_banner -i "$f" -f wav -ar 16000 -ac 1 - | nice ~/whisper/whisper.cpp-1.7.3/build/bin/main -m ~/whisper/models/ggml-$setmodel.bin - -osrt -of "${f%.*}" -t 8 -l "$setlanguage"  $translate -ml $maxlength $splitatword $setprintcolors $setng $setfa --prompt "$setprompt"; [ ! -d srtsubs ] && mkdir srtsubs/ ; mv *.srt srtsubs/ ; done
  
  Hopefully that'll give you an idea.  I actually now transcribe one chapter at a time, split into 2 min audio segments then stitch that vtt together.  Then at very end stitch all chapter vtt together into one vtt.  Obviously this is all done automatically.

Jonny-Burkholder · 2025-01-05T15:05:48Z

Hey thanks! So basically just using ffmpeg to pipe the audio to whisper?

For context I'm working on a voice assistant that gets audio through a websocket from a smart speaker. My strategy so far is importing whisper.h and trying to reverse-engineer the command example

crummyh · 2025-01-06T14:39:33Z

I've had the same issue with docs trying to make an app with the Java bindings. More documentation about including whisper.cpp as a library for another project would be very nice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation provides very few usage details #2705

Documentation provides very few usage details #2705

Jonny-Burkholder commented Jan 4, 2025 •

edited

Loading

mrfragger commented Jan 5, 2025

Jonny-Burkholder commented Jan 5, 2025

crummyh commented Jan 6, 2025

Documentation provides very few usage details #2705

Documentation provides very few usage details #2705

Comments

Jonny-Burkholder commented Jan 4, 2025 • edited Loading

mrfragger commented Jan 5, 2025

Jonny-Burkholder commented Jan 5, 2025

crummyh commented Jan 6, 2025

Jonny-Burkholder commented Jan 4, 2025 •

edited

Loading