-
Notifications
You must be signed in to change notification settings - Fork 28
Whisper Python performance - Benchmarking #15
Comments
And with the two files from;
|
ggerganov/whisper.cpp#7 (comment) Thinking an arg might be better as with big.Little often its better just to use the big as have found its sometimes faster than all cores. Also being trying to work out how to get floating point times so we get fractions of a second
ps
Guess that is something to do with the tokeniser which is out of timings |
Am I correct, it only decodes one pass of 30 seconds? I can't seem to get the full transcribe of >30 seconds wav files. It just continues to the next wav file with the test.py file. |
Seems something to do with the small model as ok with tiny |
Looks like indeed only one pass of 30 seconds is transcribed after it loads the next wav file. |
Yeah I am talking about the above
Happens with the small model but fine with tiny but yeah we only get the 1st 30sec beamsearch
|
Yeah, posted that error over at the now closed issue at whisper.cpp the small model is not yet correct. |
And with Python
|
I ran test.py and it worked fine with whisper-small.tflite, maybe on raspberry pi tflite version is bit older one Loading audio file: ../test-files/en_sh_lights_70pct_4s.wav Inference took 9.42s for 3.58s audio file. Loading audio file: ../test-files/en_speech_jfk_11s.wav |
@j1nx pls downloasd the latest whisper-small.tflite and I will also run using minimal c++ example |
@nyadla-sys Will do in a bit. That test.py you linked to uses full tensorflow, however we use the tensorflow-lite. Could you flip the # at line 8 and 9 and try again? (PS, I run TFlite 2.11, however without any custom ops. Perhaps that is what we need) |
PS the guys at tensorflow took pity on me :) |
@nyadla-sys Ran with the latest small model
Same error I think it is because of the Custom OPS ( oneDNN custom operations ) are used as I saw in your output. |
If you do that don't forget to change line 24 to
Did you notice any difference? I need to check what they've actually changed since they rewrote a lot without comments |
Yeah, indeed. Grabbed the snippet here which has that also flipped; Anyhow, could you or @nyadla-sys check out both? As we do not train, all we need is the tflite runtime for inference. |
I just ran whisper-small.tflite on my linux ubuntu and please refer the latest README.md $ ./minimal ../models/whisper-small.tflite ../samples/jfk.wav n_vocab:50257 mel.n_len3000 mel.n_mel:80 [_extra_token_50258][_extra_token_50259]!! And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country. |
[_extra_token_50258][_extra_token_50259] are tokens output stays the same |
tokens output gets changed depending on the english/multilingual model |
Continuation from here Does this require the |
Please use whisper-small.tflite or whisper-medium.tflite |
These two whisper-small.tflite or whisper-medium.tflite models are multilingual, "whisper.tflite" is identical to "whisper-tiny.en.tflite", I intended to change the name but I refrained from doing so because many people are using in their examples. |
Strangs as I run those test with the right vocab multilang bin. will double check in the morning. |
I've updated test.py with ne parameters for "--lang", "--runtime" and the tweaks mentioned in the tensorflow issue: $ python3 test.py -h
usage: test.py [-h] [-f FOLDER] [-m MODEL] [-t THREADS] [-l LANG] [-r RUNTIME]
Running Whisper TFlite test inference.
optional arguments:
-h, --help show this help message and exit
-f FOLDER, --folder FOLDER
Folder with WAV input files
-m MODEL, --model MODEL
Path to model
-t THREADS, --threads THREADS
Threads used
-l LANG, --lang LANG Language used
-r RUNTIME, --runtime RUNTIME
Tensorflow runtime, use '1' for tf.lite or '2' for tflite_runtime
On my Rpi400 |
Change the name @nyadla-sys as we can change things but think we are all used the the original naming convention |
@fquirin dunno why but the tensorflow rewrote the test script for me, maybe its because the full tf has some special sauce optimisation with cmake for Arm that maybe we are missing with the TensorFlow Lite Python Wheel Package build? |
I have updated the model to "whisper-tiny.en.tflite" and I need to update the README. For backward compatibility, I will keep "whisper.tflite" for a while. |
Error with
It must be something like that, yes 🤔 |
Can you provide me with additional information such as the operating system, machine, and whether you are using a minimal C++ build or a Python script? |
Sure: Aarch64, Raspberry Pi 400, 4GB RAM, Debian Bullseye (11), Python script. Maybe my Pi is actually out of memory when using the small model but according to OpenAI 2GB should be fine 🤔 |
I found the binary delegate very picky on the distro it ran and it wasn't easy but did get it working on a Mali based board. You can switch between GpuAcc & CpuAcc and the 8bit quantised model they provide works, but if GpuAcc works on non Mali I don't know as I tested on a MaliG610. |
@j1nx uploaded whisper-base.tflite model along with the colab notbook |
Ok, here some memory data for the tiny model. This is running on a Raspberry Pi4 - 2GB model. Some info first;
Running the minimal C++ binary on the whisper-tiny.en.tflite while watching the memory consumption (don't know about an easy way to record) it shows 185 MB added use at max while running it;
Doing the same but runing it with the test.py showed 305 MB added use at max.
And finally, using the benchmark tool, which shows the memory consumption it self. (watching the mem usage showed 176 MB used)
Will see what we can do with the base model now. |
The base model suffers from the same gather index out of bounds issue;
|
@j1nx Great and thanks for your time on this .. |
Can you try with the latest base model that i uploaded yesterday? |
Downloaded the base model this morning, so is using that latest. |
No problem. Hopefully we can get the base and small models to work as well. Tiny is fast enough but perhaps base as well. |
Just for others who want to understand about VmRSS,RssAnnon, RssFile.. VmRSS is the total amount of resident memory used by the process, measured in megabytes (MB). |
Very nice explaination. Thanks for that. |
@fquirin create the runtime with bazel grab the latest from https://github.com/bazelbuild/bazelisk likely wget to /usr/bin, chmod a+x, create symlink called bazel /home/orangepi/tensorflow_src/tensorflow/lite/tools/pip_package/build_pip_package_with_bazel.sh native
bazel tflite_runtime
Might be intersting creating tflite as an external lib with bazel and then linking to the tflite_minimal cmake than static? As dunno what Bazel does that cmake doesn't but results above. |
I noticed you uploaded new tiny models as well. Hereby the model benchmark tool output on them. The normal one;
The EN one;
The base model still errors out.
But the funny thing, the small model does work?!
Look at that AMAZING low memory usage for the small model !!!! @nyadla-sys Well done / Great work. However...
|
@j1nx I just uploaded new whisper-base.tflite model. |
All good now;
However just as with the small model, the WER is terrible.
I suspect this has something to do with the language selection of some kind. Is it possible that you convert and upload the small and base model ".en" models as well? |
Just confirmed it;
Could also be the python script perhaps. @StuartIanNaylor @fquirin Perhaps you can have a look at it as well. |
Slightly slower was 1.07s ? |
Indeed, but the WER of the non EN fixed model is... Well, 5% so I guess something goed wrong there. will have a look at the C++ binary in the morning to see how that one is performing on all models. Will also feed some more different WAV files to all models to gather some more info. |
I presume the initial decoder tokens are hardcoded for a translation model / language or something or just missing, but the start tokens are control tokens
|
Here the output of the C++ binary with the single vocab.bin;
And the same multilingual model but this time with the multilingual vocab.bin;
And to complete the whole test package;
Next up is some other different WAV files running the Python inference utilizing the tiny.en model to get some insight on the WER. |
This also might be of interest to you @nyadla-sys The base model do translation to english where as the tiny and small models just returned the language detected.
|
|
@StuartIanNaylor it looks like the |
Yep the bazel native build on the rk3588 is almost 50% faster also on Pi if compiling on metal you should select native, I haven't tried but looks like the others are for cross compiling. I did send you a message to use bazel with a @fquirin that I have forgot where but should be in your 'mentions' |
Hi @StuartIanNaylor , do you have a tflite_runtime wheel file built with the bazel script for download maybe? My Pi seems to be to weak to build it and I don't understand how the cross-compile is supposed to work because the examples are broken 😕 (the referenced Docker files don't exist). [Edit] I did manage to compile it finally! Had to uninstall regular tensorflow and tflite packages though to use it 🤔 (some conflict it seems), but it is indeed even faster than "fat" tflite now 😀 |
Good as my native compile on Armv8,2 prob would not work on a Pi. I am not really sure what the bazel build does that is different but yeah it is notibeabilly faster than the cmake compile that maybe hasn't been updated and creates a legacy build. Shouldn't need docker though even though you got it working as you just specify https://www.tensorflow.org/lite/guide/build_cmake_pip#available_target_names |
Actually could you try my build: download (if you have the same Python version). I'd like to know if they are compatible and perform the same :-). |
Ran in a miniconda env as on Ubuntu with Python 3.10, but yeah runs OK.
|
Nice 😎 , thanks for testing 👍 |
Running on OpenVoiceOS, RaspberryPi 4 - 2GB model. Using Python 3.10 and Tensorflow-lite 2.11
With the tiny model;
The text was updated successfully, but these errors were encountered: