Skip to content
This repository has been archived by the owner on Aug 28, 2023. It is now read-only.

Whisper Python performance - Benchmarking #15

Open
j1nx opened this issue Jan 19, 2023 · 89 comments
Open

Whisper Python performance - Benchmarking #15

j1nx opened this issue Jan 19, 2023 · 89 comments

Comments

@j1nx
Copy link

j1nx commented Jan 19, 2023

Running on OpenVoiceOS, RaspberryPi 4 - 2GB model. Using Python 3.10 and Tensorflow-lite 2.11

With the tiny model;

mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper.tflite -t 4
Importing tensorflow, numpy and torch
Importing whisper
Loading tflite model models/whisper.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

Loading audio file: samples/test.wav
Samplerate: 16000, length: 30.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
 Bili always listens to his mother. He always does what she says. If his mother says,

Inference took 4.74s for 30.0s audio file.

Loading audio file: samples/test_1.wav
Samplerate: 16000, length: 30.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
 David lost his yellow pencil. He could not find it. Where is my yellow pencil? He asked his sister. His sister did not know. I don't know where your pencil is. She said David thought about it. He thought and thought. He used his yellow pencil for before lunch. He used it to write a note to his teacher. The notes said, dear teacher, thank you for helping me, David. He put the note in the envelope where was the envelope?

Inference took 8.57s for 30.0s audio file.

Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
 And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Inference took 4.28s for 11.0s audio file.
@j1nx
Copy link
Author

j1nx commented Jan 19, 2023

And with the two files from;
https://github.com/fquirin/speech-recognition-experiments/tree/main/test-files

Loading audio file: samples/en_sh_lights_70pct_4s.wav
Samplerate: 16000, length: 3.575875s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
 Set the lights in the living room to 70%.

Inference took 3.5s for 3.58s audio file.

Loading audio file: samples/en_speech_jfk_11s.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
 And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Inference took 4.26s for 11.0s audio file.

@StuartIanNaylor
Copy link

StuartIanNaylor commented Jan 19, 2023

ggerganov/whisper.cpp#7 (comment)

Thinking an arg might be better as with big.Little often its better just to use the big as have found its sometimes faster than all cores.

Also being trying to work out how to get floating point times so we get fractions of a second
We have end.tv_sec-start.tv_sec
Whats the best way of adding tv_usec and is there just time float alternative?

orangepi@orangepi5:~/openai-whisper/minimal_build$ ./minimal ../models/whisper.tflite ../samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 1 seconds

[_SOT_][_NOT_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

ps

orangepi@orangepi5:~/openai-whisper/minimal_build$ ./minimal ../models/whisper-small.tflite ../samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 12 seconds
terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at
Aborted

Guess that is something to do with the tokeniser which is out of timings

@j1nx
Copy link
Author

j1nx commented Jan 19, 2023

Am I correct, it only decodes one pass of 30 seconds? I can't seem to get the full transcribe of >30 seconds wav files. It just continues to the next wav file with the test.py file.

@StuartIanNaylor
Copy link

StuartIanNaylor commented Jan 19, 2023

orangepi@orangepi5:~/openai-whisper/minimal_build$ ./minimal ../models/whisper.tflite ../samples/test.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 2 seconds

[_SOT_][_NOT_] Bili always listens to his mother. He always does what she says. If his mother says, brush your teeth, Bili brushes his teeth. If his mother says, go to bed, Bili goes to bed. Bili is a very good boy, a good boy listens to his mother. His mother does not have to ask him again. She asks him to do something one time and she does not ask again. Bili is a good boy. He does what his mother asks the first time. She does not have to ask again.

Seems something to do with the small model as ok with tiny

@j1nx
Copy link
Author

j1nx commented Jan 19, 2023

Loading audio file: samples/A_J_Cook_Speech_from_Lansbury's_Labour_Weekly.wav
Samplerate: 16000, length: 188.231125s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
 The field of the workers have put a million miners with their wives and children something like one tends to the whole population of this country have long called a loud progester. If you were to believe all the things that capitalist press pay about us, you would think that we were the most terrible people on earth. They tell you that we are never satisfied. That we are always psychic, that we are never content for our wages, with our hours, or with the hoses we live in. And yet,

Inference took 9.12s for 1.88e+02s audio file.

Looks like indeed only one pass of 30 seconds is transcribed after it loads the next wav file.

@StuartIanNaylor
Copy link

Yeah I am talking about the above

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 12 seconds
terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at
Aborted

Happens with the small model but fine with tiny but yeah we only get the 1st 30sec beamsearch

orangepi@orangepi5:~/openai-whisper/minimal_build$ ./minimal ../models/whisper.tflite ../samples/gb0.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 2 seconds

[_SOT_][_NOT_] Good morning. This Tuesday is Election Day. After months of spirited debate in vigorous campaigning, the time has come for Americans to make important decisions about our nation's future and encourage all Americans to go to the polls and vote. Election season brings out the spirit of competition between our political parties. And that competition is an essential part of a healthy democracy. But as the campaigns come to a close, Republicans, Democrats, and independents can find common ground on at least one point. Our system of

wget --quiet --show-progress -O samples/gb0.ogg https://upload.wikimedia.org/wikipedia/commons/2/22/George_W._Bush%27s_weekly_radio_address_%28November_1%2C_2008%29.oga
ffmpeg -loglevel -0 -y -i samples/gb0.ogg -ar 16000 -ac 1 -c:a pcm_s16le samples/gb0.wav

@j1nx
Copy link
Author

j1nx commented Jan 19, 2023

Yeah, posted that error over at the now closed issue at whisper.cpp

the small model is not yet correct.

@j1nx
Copy link
Author

j1nx commented Jan 19, 2023

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper.tflite samples/gb0.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 9 seconds

[_SOT_][_NOT_] Good morning. This Tuesday is Election Day. After months of spirited debate in vigorous campaigning, the time has come for Americans to make important decisions about our nation's future and encourage all Americans to go to the polls and vote. Election season brings out the spirit of competition between our political parties. And that competition is an essential part of a healthy democracy. But as the campaigns come to a close, Republicans, Democrats, and independents can find common ground on at least one point. Our system of

And with Python

Loading audio file: samples/gb0.wav
Samplerate: 16000, length: 127.36s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
 Good morning. This Tuesday is Election Day. After months of spirited debate in vigorous campaigning, the time has come for Americans to make important decisions about our nation's future and encourage all Americans to go to the polls and vote. Election season brings out the spirit of competition between our political parties. And that competition is an essential part of a healthy democracy. But as the campaigns come to a close, Republicans, Democrats, and independents can find common ground on at least one point. Our system of

Inference took 8.75s for 1.27e+02s audio file.

@nyadla-sys
Copy link
Contributor

I ran test.py and it worked fine with whisper-small.tflite, maybe on raspberry pi tflite version is bit older one
$python test.py
Importing tensorflow and numpy
2023-01-19 12:00:17.805994: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
Importing whisper
Loading tflite model models/whisper.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

Loading audio file: ../test-files/en_sh_lights_70pct_4s.wav
Samplerate: 16000, length: 3.575875s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
!!!

Inference took 9.42s for 3.58s audio file.

Loading audio file: ../test-files/en_speech_jfk_11s.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
!! And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

@nyadla-sys
Copy link
Contributor

@j1nx pls downloasd the latest whisper-small.tflite and I will also run using minimal c++ example

@j1nx
Copy link
Author

j1nx commented Jan 19, 2023

@nyadla-sys Will do in a bit.

That test.py you linked to uses full tensorflow, however we use the tensorflow-lite.
https://github.com/fquirin/speech-recognition-experiments/blob/main/whisper-tflite/test.py#L9

Could you flip the # at line 8 and 9 and try again?

(PS, I run TFlite 2.11, however without any custom ops. Perhaps that is what we need)

@StuartIanNaylor
Copy link

PS the guys at tensorflow took pity on me :)

tensorflow/tensorflow#59273 (comment)

@j1nx
Copy link
Author

j1nx commented Jan 19, 2023

@nyadla-sys Ran with the latest small model

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-small.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
ERROR: gather index out of bounds
ERROR: Node number 35 (GATHER) failed to invoke.
ERROR: Node number 3435 (WHILE) failed to invoke.
Error at ../minimal.cc:211

Same error

I think it is because of the Custom OPS ( oneDNN custom operations ) are used as I saw in your output.

@fquirin
Copy link

fquirin commented Jan 19, 2023

Could you flip the # at line 8 and 9 and try again?

If you do that don't forget to change line 24 to interpreter = tf.Interpreter(model_path, num_threads=int(args.threads)) as well

PS the guys at tensorflow took pity on me :)

Did you notice any difference? I need to check what they've actually changed since they rewrote a lot without comments

@j1nx
Copy link
Author

j1nx commented Jan 19, 2023

Yeah, indeed. Grabbed the snippet here which has that also flipped;
ggerganov/whisper.cpp#7 (comment)

Anyhow, could you or @nyadla-sys check out both? As we do not train, all we need is the tflite runtime for inference.

@nyadla-sys
Copy link
Contributor

I just ran whisper-small.tflite on my linux ubuntu and please refer the latest README.md

$ ./minimal ../models/whisper-small.tflite ../samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 26 seconds

[_extra_token_50258][_extra_token_50259]!! And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

@StuartIanNaylor
Copy link

[_extra_token_50258][_extra_token_50259] are tokens output stays the same

@nyadla-sys
Copy link
Contributor

are tokens output stays the same

tokens output gets changed depending on the english/multilingual model

@fquirin
Copy link

fquirin commented Jan 19, 2023

To use a multilingual model in Python, you can simply change the line "wtokenizer = whisper.tokenizer.get_tokenizer(False, language="en")" to "wtokenizer = whisper.tokenizer.get_tokenizer(True, language="en")"

Continuation from here

Does this require the whisper-small.tflite model? Because I've tried that with whisper.tflite (en only?) but the output is completely scrambled and still English when I set for example "de".

@nyadla-sys
Copy link
Contributor

nyadla-sys commented Jan 19, 2023

To use a multilingual model in Python, you can simply change the line "wtokenizer = whisper.tokenizer.get_tokenizer(False, language="en")" to "wtokenizer = whisper.tokenizer.get_tokenizer(True, language="en")"

Continuation from here

Does this require the whisper-small.tflite model? Because I've tried that with whisper.tflite (en only?) but the output is completely scrambled and still English when I set for example "de".

Please use whisper-small.tflite or whisper-medium.tflite

@nyadla-sys
Copy link
Contributor

nyadla-sys commented Jan 19, 2023

These two whisper-small.tflite or whisper-medium.tflite models are multilingual, "whisper.tflite" is identical to "whisper-tiny.en.tflite", I intended to change the name but I refrained from doing so because many people are using in their examples.

@j1nx
Copy link
Author

j1nx commented Jan 19, 2023

Strangs as I run those test with the right vocab multilang bin.

will double check in the morning.

@fquirin
Copy link

fquirin commented Jan 19, 2023

I've updated test.py with ne parameters for "--lang", "--runtime" and the tweaks mentioned in the tensorflow issue:

$ python3 test.py -h
usage: test.py [-h] [-f FOLDER] [-m MODEL] [-t THREADS] [-l LANG] [-r RUNTIME]

Running Whisper TFlite test inference.

optional arguments:
  -h, --help            show this help message and exit
  -f FOLDER, --folder FOLDER
                        Folder with WAV input files
  -m MODEL, --model MODEL
                        Path to model
  -t THREADS, --threads THREADS
                        Threads used
  -l LANG, --lang LANG  Language used
  -r RUNTIME, --runtime RUNTIME
                        Tensorflow runtime, use '1' for tf.lite or '2' for tflite_runtime

On my Rpi400 tflite_runtime is still about 1.5s slower than tf.lite.
I could not test whisper-small.tflite because it keeps crashing (will post error in a minute).

@StuartIanNaylor
Copy link

Change the name @nyadla-sys as we can change things but think we are all used the the original naming convention

@StuartIanNaylor
Copy link

StuartIanNaylor commented Jan 19, 2023

@fquirin dunno why but the tensorflow rewrote the test script for me, maybe its because the full tf has some special sauce optimisation with cmake for Arm that maybe we are missing with the TensorFlow Lite Python Wheel Package build?

tensorflow/tensorflow#59273 (comment)

@nyadla-sys
Copy link
Contributor

nyadla-sys commented Jan 19, 2023

Change the name @nyadla-sys as we can change things but think we are all used the the original naming convention

I have updated the model to "whisper-tiny.en.tflite" and I need to update the README. For backward compatibility, I will keep "whisper.tflite" for a while.

@fquirin
Copy link

fquirin commented Jan 19, 2023

Error with whisper-small.tflite (I think we had this somewhere a few hours ago already?):

Traceback (most recent call last):
  File "/home/pi/whisper-tflite/openai-whisper/test.py", line 93, in <module>
    transcribe(args.folder + file)
  File "/home/pi/whisper-tflite/openai-whisper/test.py", line 68, in transcribe
    interpreter.invoke()
  File "/home/pi/whisper-tflite/venv/lib/python3.9/site-packages/tensorflow/lite/python/interpreter.py", line 917, in invoke
    self._interpreter.Invoke()
RuntimeError: gather index out of boundsNode number 35 (GATHER) failed to invoke.Node number 3435 (WHILE) failed to invoke.

dunno why but the tensorflow rewrote the test script for me, maybe its because the full tf has some special sauce optimisation with cmake for Arm that maybe we are missing with the TensorFlow Lite Python Wheel Package build?

It must be something like that, yes 🤔

@nyadla-sys
Copy link
Contributor

Error with whisper-small.tflite (I think we had this somewhere a few hours ago already?):

Traceback (most recent call last):
  File "/home/pi/whisper-tflite/openai-whisper/test.py", line 93, in <module>
    transcribe(args.folder + file)
  File "/home/pi/whisper-tflite/openai-whisper/test.py", line 68, in transcribe
    interpreter.invoke()
  File "/home/pi/whisper-tflite/venv/lib/python3.9/site-packages/tensorflow/lite/python/interpreter.py", line 917, in invoke
    self._interpreter.Invoke()
RuntimeError: gather index out of boundsNode number 35 (GATHER) failed to invoke.Node number 3435 (WHILE) failed to invoke.

dunno why but the tensorflow rewrote the test script for me, maybe its because the full tf has some special sauce optimisation with cmake for Arm that maybe we are missing with the TensorFlow Lite Python Wheel Package build?

It must be something like that, yes thinking

Can you provide me with additional information such as the operating system, machine, and whether you are using a minimal C++ build or a Python script?

@fquirin
Copy link

fquirin commented Jan 19, 2023

Can you provide me with additional information such as the operating system, machine, and whether you are using a minimal C++ build or a Python script?

Sure: Aarch64, Raspberry Pi 400, 4GB RAM, Debian Bullseye (11), Python script.

Maybe my Pi is actually out of memory when using the small model but according to OpenAI 2GB should be fine 🤔

@StuartIanNaylor
Copy link

StuartIanNaylor commented Jan 25, 2023

I found the binary delegate very picky on the distro it ran and it wasn't easy but did get it working on a Mali based board.
The GPU doesn't have to be Float as my test in https://github.com/StuartIanNaylor/rock5b-wav2letter-bench
Which is just the https://developer.arm.com/documentation/102603/2211 tutorial as it seemed to have quite a lot of typo's.

You can switch between GpuAcc & CpuAcc and the 8bit quantised model they provide works, but if GpuAcc works on non Mali I don't know as I tested on a MaliG610.
With the Wav2Letter they omit the GPU setup on the Pi which makes me think its Mali only even if its base is OpenCL as for the Odroid with a Mali-G52 its included.

@nyadla-sys
Copy link
Contributor

nyadla-sys commented Jan 26, 2023

@j1nx uploaded whisper-base.tflite model along with the colab notbook

@j1nx
Copy link
Author

j1nx commented Jan 26, 2023

Ok, here some memory data for the tiny model. This is running on a Raspberry Pi4 - 2GB model.

Some info first;

mycroft@OpenVoiceOS-e3830c:~/whisper $ uname -a
Linux OpenVoiceOS-e3830c 5.15.76-v8 #1 SMP Fri Dec 30 14:58:43 CET 2022 aarch64 GNU/Linux

mycroft@OpenVoiceOS-e3830c:~/whisper $ lscpu
Architecture:            aarch64
  CPU op-mode(s):        32-bit, 64-bit
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               ARM
  Model name:            Cortex-A72
    Model:               3
    Thread(s) per core:  1
    Core(s) per cluster: 4
    Socket(s):           -
    Cluster(s):          1
    Stepping:            r0p3
    CPU max MHz:         1800.0000
    CPU min MHz:         600.0000
    BogoMIPS:            108.00
    Flags:               fp asimd evtstrm crc32 cpuid
Caches (sum of all):
  L1d:                   128 KiB (4 instances)
  L1i:                   192 KiB (4 instances)
  L2:                    1 MiB (1 instance)
Vulnerabilities:
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; __user pointer sanitization
  Spectre v2:            Vulnerable
  Srbds:                 Not affected
  Tsx async abort:       Not affected

mycroft@OpenVoiceOS-e3830c:~/whisper $ sync; dd if=/dev/zero of=tempfile bs=1M count=1024; sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 20.906206 seconds, 49.0MB/s
mycroft@OpenVoiceOS-e3830c:~/whisper $ sudo sh -c "sync && echo 3 > /proc/sys/vm/drop_caches"
mycroft@OpenVoiceOS-e3830c:~/whisper $ dd if=tempfile of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 4.949115 seconds, 206.9MB/s

mycroft@OpenVoiceOS-e3830c:~/whisper $ free -h
              total        used        free      shared  buff/cache   available
Mem:           1.8G      182.8M      561.6M        7.1M        1.0G        1.5G
Swap:        359.5M           0      359.5M

mycroft@OpenVoiceOS-e3830c:~/whisper $ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda      8:0    1  29.9G  0 disk
├─sda1   8:1    1    64M  0 part /boot
├─sda2   8:2    1   1.1G  0 part /media/rfs/ro
└─sda3   8:3    1  28.7G  0 part /media/rfs/rw
zram0  254:0    0 359.5M  0 disk [SWAP]

Running the minimal C++ binary on the whisper-tiny.en.tflite while watching the memory consumption (don't know about an easy way to record) it shows 185 MB added use at max while running it;

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal  models/whisper-tiny.en.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 4 seconds

[_SOT_][_NOT_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Doing the same but runing it with the test.py showed 305 MB added use at max.

mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-tiny.en.tflite -t 4 -l en -r 2
Importing tflite_runtime
Importing numpy
Importing whisper
Loading tflite model models/whisper-tiny.en.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
 And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Inference took 4.45s for 11.00s audio file.

And finally, using the benchmark tool, which shows the memory consumption it self. (watching the mem usage showed 176 MB used)

mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-tiny.en.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true
STARTING!
Log parameter values verbosely: [0]
Min num runs: [5]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/whisper-tiny.en.tflite]
#threads used for CPU inference: [4]
Loaded model models/whisper-tiny.en.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 40.9627
Initialized session in 15.03ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=1 curr=3044595

Running benchmark for at least 5 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=5 first=2903102 curr=2876513 min=2870115 max=2903102 avg=2.8807e+06 std=12080

Inference timings in us: Init: 15030, First inference: 3044595, Warmup (avg): 3.0446e+06, Inference (avg): 2.8807e+06
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=9 overall=251.41
Overall peak memory footprint (MB) via periodic monitoring: 257.414
Memory status at the end of exeution:
- VmRSS              : 222 MB
+ RssAnnon           : 176 MB
+ RssFile + RssShmem : 46 MB

Will see what we can do with the base model now.

@j1nx
Copy link
Author

j1nx commented Jan 26, 2023

The base model suffers from the same gather index out of bounds issue;

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal  models/whisper-base.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
ERROR: gather index out of bounds
ERROR: Node number 35 (GATHER) failed to invoke.
ERROR: Node number 1264 (WHILE) failed to invoke.
Error at ../minimal.cc:210

@nyadla-sys
Copy link
Contributor

Ok, here some memory data for the tiny model. This is running on a Raspberry Pi4 - 2GB model.

Some info first;

mycroft@OpenVoiceOS-e3830c:~/whisper $ uname -a
Linux OpenVoiceOS-e3830c 5.15.76-v8 #1 SMP Fri Dec 30 14:58:43 CET 2022 aarch64 GNU/Linux

mycroft@OpenVoiceOS-e3830c:~/whisper $ lscpu
Architecture:            aarch64
  CPU op-mode(s):        32-bit, 64-bit
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               ARM
  Model name:            Cortex-A72
    Model:               3
    Thread(s) per core:  1
    Core(s) per cluster: 4
    Socket(s):           -
    Cluster(s):          1
    Stepping:            r0p3
    CPU max MHz:         1800.0000
    CPU min MHz:         600.0000
    BogoMIPS:            108.00
    Flags:               fp asimd evtstrm crc32 cpuid
Caches (sum of all):
  L1d:                   128 KiB (4 instances)
  L1i:                   192 KiB (4 instances)
  L2:                    1 MiB (1 instance)
Vulnerabilities:
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; __user pointer sanitization
  Spectre v2:            Vulnerable
  Srbds:                 Not affected
  Tsx async abort:       Not affected

mycroft@OpenVoiceOS-e3830c:~/whisper $ sync; dd if=/dev/zero of=tempfile bs=1M count=1024; sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 20.906206 seconds, 49.0MB/s
mycroft@OpenVoiceOS-e3830c:~/whisper $ sudo sh -c "sync && echo 3 > /proc/sys/vm/drop_caches"
mycroft@OpenVoiceOS-e3830c:~/whisper $ dd if=tempfile of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 4.949115 seconds, 206.9MB/s

mycroft@OpenVoiceOS-e3830c:~/whisper $ free -h
              total        used        free      shared  buff/cache   available
Mem:           1.8G      182.8M      561.6M        7.1M        1.0G        1.5G
Swap:        359.5M           0      359.5M

mycroft@OpenVoiceOS-e3830c:~/whisper $ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda      8:0    1  29.9G  0 disk
├─sda1   8:1    1    64M  0 part /boot
├─sda2   8:2    1   1.1G  0 part /media/rfs/ro
└─sda3   8:3    1  28.7G  0 part /media/rfs/rw
zram0  254:0    0 359.5M  0 disk [SWAP]

Running the minimal C++ binary on the whisper-tiny.en.tflite while watching the memory consumption (don't know about an easy way to record) it shows 185 MB added use at max while running it;

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal  models/whisper-tiny.en.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 4 seconds

[_SOT_][_NOT_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Doing the same but runing it with the test.py showed 305 MB added use at max.

mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-tiny.en.tflite -t 4 -l en -r 2
Importing tflite_runtime
Importing numpy
Importing whisper
Loading tflite model models/whisper-tiny.en.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
 And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Inference took 4.45s for 11.00s audio file.

And finally, using the benchmark tool, which shows the memory consumption it self. (watching the mem usage showed 176 MB used)

mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-tiny.en.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true
STARTING!
Log parameter values verbosely: [0]
Min num runs: [5]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/whisper-tiny.en.tflite]
#threads used for CPU inference: [4]
Loaded model models/whisper-tiny.en.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 40.9627
Initialized session in 15.03ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=1 curr=3044595

Running benchmark for at least 5 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=5 first=2903102 curr=2876513 min=2870115 max=2903102 avg=2.8807e+06 std=12080

Inference timings in us: Init: 15030, First inference: 3044595, Warmup (avg): 3.0446e+06, Inference (avg): 2.8807e+06
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=9 overall=251.41
Overall peak memory footprint (MB) via periodic monitoring: 257.414
Memory status at the end of exeution:
- VmRSS              : 222 MB
+ RssAnnon           : 176 MB
+ RssFile + RssShmem : 46 MB

Will see what we can do with the base model now.

@j1nx Great and thanks for your time on this ..

@nyadla-sys
Copy link
Contributor

The base model suffers from the same gather index out of bounds issue;

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal  models/whisper-base.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
ERROR: gather index out of bounds
ERROR: Node number 35 (GATHER) failed to invoke.
ERROR: Node number 1264 (WHILE) failed to invoke.
Error at ../minimal.cc:210

Can you try with the latest base model that i uploaded yesterday?

@j1nx
Copy link
Author

j1nx commented Jan 26, 2023

The base model suffers from the same gather index out of bounds issue;

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal  models/whisper-base.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
ERROR: gather index out of bounds
ERROR: Node number 35 (GATHER) failed to invoke.
ERROR: Node number 1264 (WHILE) failed to invoke.
Error at ../minimal.cc:210

Can you try with the latest base model that i uploaded yesterday?

Downloaded the base model this morning, so is using that latest.

@j1nx
Copy link
Author

j1nx commented Jan 26, 2023

Ok, here some memory data for the tiny model. This is running on a Raspberry Pi4 - 2GB model.
Some info first;

mycroft@OpenVoiceOS-e3830c:~/whisper $ uname -a
Linux OpenVoiceOS-e3830c 5.15.76-v8 #1 SMP Fri Dec 30 14:58:43 CET 2022 aarch64 GNU/Linux

mycroft@OpenVoiceOS-e3830c:~/whisper $ lscpu
Architecture:            aarch64
  CPU op-mode(s):        32-bit, 64-bit
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               ARM
  Model name:            Cortex-A72
    Model:               3
    Thread(s) per core:  1
    Core(s) per cluster: 4
    Socket(s):           -
    Cluster(s):          1
    Stepping:            r0p3
    CPU max MHz:         1800.0000
    CPU min MHz:         600.0000
    BogoMIPS:            108.00
    Flags:               fp asimd evtstrm crc32 cpuid
Caches (sum of all):
  L1d:                   128 KiB (4 instances)
  L1i:                   192 KiB (4 instances)
  L2:                    1 MiB (1 instance)
Vulnerabilities:
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; __user pointer sanitization
  Spectre v2:            Vulnerable
  Srbds:                 Not affected
  Tsx async abort:       Not affected

mycroft@OpenVoiceOS-e3830c:~/whisper $ sync; dd if=/dev/zero of=tempfile bs=1M count=1024; sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 20.906206 seconds, 49.0MB/s
mycroft@OpenVoiceOS-e3830c:~/whisper $ sudo sh -c "sync && echo 3 > /proc/sys/vm/drop_caches"
mycroft@OpenVoiceOS-e3830c:~/whisper $ dd if=tempfile of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 4.949115 seconds, 206.9MB/s

mycroft@OpenVoiceOS-e3830c:~/whisper $ free -h
              total        used        free      shared  buff/cache   available
Mem:           1.8G      182.8M      561.6M        7.1M        1.0G        1.5G
Swap:        359.5M           0      359.5M

mycroft@OpenVoiceOS-e3830c:~/whisper $ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda      8:0    1  29.9G  0 disk
├─sda1   8:1    1    64M  0 part /boot
├─sda2   8:2    1   1.1G  0 part /media/rfs/ro
└─sda3   8:3    1  28.7G  0 part /media/rfs/rw
zram0  254:0    0 359.5M  0 disk [SWAP]

Running the minimal C++ binary on the whisper-tiny.en.tflite while watching the memory consumption (don't know about an easy way to record) it shows 185 MB added use at max while running it;

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal  models/whisper-tiny.en.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 4 seconds

[_SOT_][_NOT_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Doing the same but runing it with the test.py showed 305 MB added use at max.

mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-tiny.en.tflite -t 4 -l en -r 2
Importing tflite_runtime
Importing numpy
Importing whisper
Loading tflite model models/whisper-tiny.en.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
 And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Inference took 4.45s for 11.00s audio file.

And finally, using the benchmark tool, which shows the memory consumption it self. (watching the mem usage showed 176 MB used)

mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-tiny.en.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true
STARTING!
Log parameter values verbosely: [0]
Min num runs: [5]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/whisper-tiny.en.tflite]
#threads used for CPU inference: [4]
Loaded model models/whisper-tiny.en.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 40.9627
Initialized session in 15.03ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=1 curr=3044595

Running benchmark for at least 5 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=5 first=2903102 curr=2876513 min=2870115 max=2903102 avg=2.8807e+06 std=12080

Inference timings in us: Init: 15030, First inference: 3044595, Warmup (avg): 3.0446e+06, Inference (avg): 2.8807e+06
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=9 overall=251.41
Overall peak memory footprint (MB) via periodic monitoring: 257.414
Memory status at the end of exeution:
- VmRSS              : 222 MB
+ RssAnnon           : 176 MB
+ RssFile + RssShmem : 46 MB

Will see what we can do with the base model now.

@j1nx Great and thanks for your time on this ..

No problem. Hopefully we can get the base and small models to work as well. Tiny is fast enough but perhaps base as well.

@nyadla-sys
Copy link
Contributor

  • VmRSS : 222 MB
  • RssAnnon : 176 MB
  • RssFile + RssShmem : 46 MB

Just for others who want to understand about VmRSS,RssAnnon, RssFile..
These values represent the amount of memory used by a process, specifically the Resident Set Size (RSS) of the process.

VmRSS is the total amount of resident memory used by the process, measured in megabytes (MB).
RssAnnon is the amount of anonymous memory used by the process, measured in megabytes (MB). Anonymous memory is memory that is not associated with a file on disk.
RssFile and RssShmem are the amount of file-backed and shared memory used by the process, respectively, measured in megabytes (MB). File-backed memory is memory that is associated with a file on disk, while shared memory is memory that is shared between multiple processes.
In this case, the process is using a total of 222 MB of memory, with 176 MB of that being anonymous memory and 46 MB being file-backed and shared memory.

@j1nx
Copy link
Author

j1nx commented Jan 26, 2023

Very nice explaination. Thanks for that.

@StuartIanNaylor
Copy link

StuartIanNaylor commented Jan 26, 2023

@fquirin create the runtime with bazel

grab the latest from https://github.com/bazelbuild/bazelisk likely wget to /usr/bin, chmod a+x, create symlink called bazel
export USE_BAZEL_VERSION=5.3.0
Find the version in the tensorflow_src .bazelversion file

/home/orangepi/tensorflow_src/tensorflow/lite/tools/pip_package/build_pip_package_with_bazel.sh native
python test.py --model=../openai-whisper/models/whisper.tflite --folder=../openai-whisper/samples/ --threads=4
cmake tflite_runtime

test.wav
 Bili always listens to his mother. He always does what she says. If his mother says,

Inference took 1.54s
test_1.wav
 David lost his yellow pencil. He could not find it. Where is my yellow pencil? He asked his sister. His sister did not know. I don't know where your pencil is. She said David thought about it. He thought and thought. He used his yellow pencil before lunch. He used it to write a note to his teacher. The notes said, dear teacher, thank you for helping me, David. He put the note in the envelope where was the envelope?

Inference took 2.54s
jfk.wav
 And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Inference took 1.51s

bazel tflite_runtime

test.wav
 Bili always listens to his mother. He always does what she says. If his mother says,

Inference took 1.08s
test_1.wav
 David lost his yellow pencil. He could not find it. Where is my yellow pencil? He asked his sister. His sister did not know. I don't know where your pencil is. She said David thought about it. He thought and thought. He used his yellow pencil for before lunch. He used it to write a note to his teacher. The notes said, dear teacher, thank you for helping me, David. He put the note in the envelope where was the envelope?

Inference took 2.07s
jfk.wav
 And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Inference took 1.07s

Might be intersting creating tflite as an external lib with bazel and then linking to the tflite_minimal cmake than static? As dunno what Bazel does that cmake doesn't but results above.

@j1nx
Copy link
Author

j1nx commented Jan 27, 2023

I noticed you uploaded new tiny models as well. Hereby the model benchmark tool output on them.

The normal one;

mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-tiny.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true
STARTING!
Log parameter values verbosely: [0]
Min num runs: [5]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/whisper-tiny.tflite]
#threads used for CPU inference: [4]
Loaded model models/whisper-tiny.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 69.3703
Initialized session in 162.167ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=1 curr=4303281

Running benchmark for at least 5 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=5 first=3882868 curr=3828750 min=3828750 max=3882868 avg=3.85199e+06 std=17539

Inference timings in us: Init: 162167, First inference: 4303281, Warmup (avg): 4.30328e+06, Inference (avg): 3.85199e+06
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=10.7656 overall=306.816
Overall peak memory footprint (MB) via periodic monitoring: 312.531
Memory status at the end of exeution:
- VmRSS              : 244 MB
+ RssAnnon           : 171 MB
+ RssFile + RssShmem : 73 MB

The EN one;

mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-tiny.en.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true
STARTING!
Log parameter values verbosely: [0]
Min num runs: [5]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/whisper-tiny.en.tflite]
#threads used for CPU inference: [4]
Loaded model models/whisper-tiny.en.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 41.6556
Initialized session in 119.189ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=1 curr=3247494

Running benchmark for at least 5 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=5 first=2934383 curr=2893835 min=2892774 max=2934383 avg=2.90437e+06 std=15345

Inference timings in us: Init: 119189, First inference: 3247494, Warmup (avg): 3.24749e+06, Inference (avg): 2.90437e+06
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=10.7422 overall=264.406
Overall peak memory footprint (MB) via periodic monitoring: 268.031
Memory status at the end of exeution:
- VmRSS              : 227 MB
+ RssAnnon           : 181 MB
+ RssFile + RssShmem : 46 MB

The base model still errors out.

mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-base.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true
STARTING!
Log parameter values verbosely: [0]
Min num runs: [5]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/whisper-base.tflite]
#threads used for CPU inference: [4]
Loaded model models/whisper-base.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 77.0875
Initialized session in 182.557ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
ERROR: gather index out of bounds
ERROR: Node number 35 (GATHER) failed to invoke.
ERROR: Node number 1264 (WHILE) failed to invoke.
count=1 curr=6372530

Benchmarking failed.

But the funny thing, the small model does work?!

mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-small.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true
STARTING!
Log parameter values verbosely: [0]
Min num runs: [5]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/whisper-small.tflite]
#threads used for CPU inference: [4]
Loaded model models/whisper-small.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 387.698
Initialized session in 152.364ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=1 curr=21885361

Running benchmark for at least 5 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=5 first=20743640 curr=20800234 min=20726743 max=20882339 avg=2.07767e+07 std=59027

Inference timings in us: Init: 152364, First inference: 21885361, Warmup (avg): 2.18854e+07, Inference (avg): 2.07767e+07
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=33.9414 overall=1207.91
Overall peak memory footprint (MB) via periodic monitoring: 1211.55
Memory status at the end of exeution:
- VmRSS              : 752 MB
+ RssAnnon           : 376 MB
+ RssFile + RssShmem : 376 MB

Look at that AMAZING low memory usage for the small model !!!!

@nyadla-sys Well done / Great work.

However...

mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-small.tflite -t 4 -l en -r 2
Importing tflite_runtime
Importing numpy
Importing whisper
Loading tflite model models/whisper-small.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
th Wiv array Year, high Lend thisyear or D Mic, highendic or D M thisyear.

Inference took 36.53s for 11.00s audio file.

@nyadla-sys
Copy link
Contributor

@j1nx I just uploaded new whisper-base.tflite model.
could you please try this model on rpi4 and let me know if you still see crash issue

@j1nx
Copy link
Author

j1nx commented Jan 28, 2023

@j1nx I just uploaded new whisper-base.tflite model. could you please try this model on rpi4 and let me know if you still see crash issue

All good now;

mycroft@OpenVoiceOS-e3830c:~/whisper $ ./benchmark_model --graph=models/whisper-base.tflite --num_threads=4 --warmup_runs=1 --num_runs=5 --report_peak_memory_footprint=true                                                                 STARTING!
Log parameter values verbosely: [0]
Min num runs: [5]
Num threads: [4]
Min warmup runs: [1]
Report the peak memory footprint: [1]
Graph: [models/whisper-base.tflite]
#threads used for CPU inference: [4]
Loaded model models/whisper-base.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 125.575
Initialized session in 54.729ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=1 curr=10344233

Running benchmark for at least 5 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=5 first=9610296 curr=9466280 min=9466280 max=9654249 avg=9.57554e+06 std=64674

Inference timings in us: Init: 54729, First inference: 10344233, Warmup (avg): 1.03442e+07, Inference (avg): 9.57554e+06
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=15.875 overall=497.102
Overall peak memory footprint (MB) via periodic monitoring: 502.785
Memory status at the end of exeution:
- VmRSS              : 333 MB
+ RssAnnon           : 207 MB
+ RssFile + RssShmem : 126 MB

However just as with the small model, the WER is terrible.

mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-base.tflite -t 4 -l en -r 2
Importing tflite_runtime
Importing numpy
Importing whisper
Loading tflite model models/whisper-base.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
th Wiv array Year, high Lend thisyear or D Mic, highendic or D M thisyear.

Inference took 11.85s for 11.00s audio file.

I suspect this has something to do with the language selection of some kind.

Is it possible that you convert and upload the small and base model ".en" models as well?

@j1nx
Copy link
Author

j1nx commented Jan 28, 2023

Just confirmed it;

mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-tiny.tflite -t 4 -l en -r 2
Importing tflite_runtime
Importing numpy
Importing whisper
Loading tflite model models/whisper-tiny.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
th Wiv array Year high Lend thisyear or D Mic highendic or D M thisyear

Inference took 5.92s for 11.00s audio file.
mycroft@OpenVoiceOS-e3830c:~/whisper $ python3 test.py -f samples/ -m models/whisper-tiny.en.tflite -t 4 -l en -r 2
Importing tflite_runtime
Importing numpy
Importing whisper
Loading tflite model models/whisper-tiny.en.tflite ...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

Loading audio file: samples/jfk.wav
Samplerate: 16000, length: 11.0s
Calculating mel spectrogram...
Invoking interpreter ...
Preparing output data ...
Converting tokens ...
 And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Inference took 5.13s for 11.00s audio file.

Could also be the python script perhaps. @StuartIanNaylor @fquirin Perhaps you can have a look at it as well.

@j1nx
Copy link
Author

j1nx commented Jan 28, 2023

This is interesting, plotted peak memoru usage against inference time in second
inference

both tiny and tiny.en as the smallest. followed by base and small.

@StuartIanNaylor
Copy link

StuartIanNaylor commented Jan 28, 2023

 python test.py --model=../openai-whisper/models/whisper-tiny.tflite --folder=../openai-whisper/samples/ --threads=4

jfk.wav
th Wiv array Year high Lend thisyear or D Mic, highendic or D M thisyear.

Inference took 1.25s
python test.py --model=../openai-whisper/models/whisper-tiny.en.tflite --folder=../openai-whisper/samples/ --threads=4

jfk.wav
 And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Inference took 1.13s

Slightly slower was 1.07s ?

@j1nx
Copy link
Author

j1nx commented Jan 28, 2023

Indeed, but the WER of the non EN fixed model is... Well, 5% so I guess something goed wrong there.

will have a look at the C++ binary in the morning to see how that one is performing on all models.

Will also feed some more different WAV files to all models to gather some more info.

@StuartIanNaylor
Copy link

I presume the initial decoder tokens are hardcoded for a translation model / language or something or just missing, but the start tokens are control tokens

decoder_input_ids = torch.tensor([50258, 50266, 50358, 50363]) #<|startoftranscript|><|ja|><|translate|><|notimestamps|>

decoder_input_ids = torch.tensor([50258, 50259, 50359, 50363]) #<|startoftranscript|><|en|><|transcribe|><|notimestamps|>

@j1nx
Copy link
Author

j1nx commented Jan 29, 2023

Here the output of the C++ binary with the single vocab.bin;

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 6 seconds

[_extra_token_50258][_extra_token_50259][_extra_token_50359][_BEG_]th Wiv array Year high Lend thisyear or D Mic, highendic or D M thisyear.[_SOT_]

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.en.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 5 seconds

[_SOT_][_NOT_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

And the same multilingual model but this time with the multilingual vocab.bin;

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 6 seconds

[_extra_token_50258][_extra_token_50259][_extra_token_50359][_BEG_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.[_SOT_]

And to complete the whole test package;

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 6 seconds

[_extra_token_50258][_extra_token_50259][_extra_token_50359][_BEG_] And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.[_SOT_]

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-base.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 12 seconds

[_extra_token_50258][_extra_token_50259][_extra_token_50359][_BEG_] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.[_SOT_]

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-small.tflite samples/jfk.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 37 seconds

[_extra_token_50258][_extra_token_50259][_extra_token_50359][_BEG_] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.[_SOT_]

Next up is some other different WAV files running the Python inference utilizing the tiny.en model to get some insight on the WER.

@j1nx
Copy link
Author

j1nx commented Jan 29, 2023

This also might be of interest to you @nyadla-sys The base model do translation to english where as the tiny and small models just returned the language detected.

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.tflite de_speech_thorsten_sample03_8s.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 7 seconds

[_extra_token_50258][_extra_token_50261][_extra_token_50359][_BEG_] Für mich sind alle Menschen gleich unabhängig von Geschlecht, sexuelle Orientierung, Religion, Hautfarbe oder Geo-Kordinaten der Geburt.[_SOT_]

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-base.tflite de_speech_thorsten_sample03_8s.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 12 seconds

[_extra_token_50258][_extra_token_50261][_extra_token_50358][_BEG_] For me, all people are equally independent of gender, sex, orientation, religion, hate, or gender coordinates of birth.[_SOT_]

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-small.tflite de_speech_thorsten_sample03_8s.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 43 seconds

[_extra_token_50258][_extra_token_50261][_extra_token_50359][_BEG_] Für mich sind alle Menschen gleich, unabhängig von Geschlecht, sexueller Orientierung, Religion, Hautfarbe oder Geo-Koordinaten der Geburt.[_SOT_]

@nyadla-sys
Copy link
Contributor

This also might be of interest to you @nyadla-sys The base model do translation to english where as the tiny and small models just returned the language detected.

thanks for your information and I will look into it

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.tflite de_speech_thorsten_sample03_8s.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 7 seconds

[_extra_token_50258][_extra_token_50261][_extra_token_50359][_BEG_] Für mich sind alle Menschen gleich unabhängig von Geschlecht, sexuelle Orientierung, Religion, Hautfarbe oder Geo-Kordinaten der Geburt.[_SOT_]

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-base.tflite de_speech_thorsten_sample03_8s.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 12 seconds

[_extra_token_50258][_extra_token_50261][_extra_token_50358][_BEG_] For me, all people are equally independent of gender, sex, orientation, religion, hate, or gender coordinates of birth.[_SOT_]

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-small.tflite de_speech_thorsten_sample03_8s.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 43 seconds

[_extra_token_50258][_extra_token_50261][_extra_token_50359][_BEG_] Für mich sind alle Menschen gleich, unabhängig von Geschlecht, sexueller Orientierung, Religion, Hautfarbe oder Geo-Koordinaten der Geburt.[_SOT_]

@fquirin
Copy link

fquirin commented Jan 30, 2023

@StuartIanNaylor it looks like the tflite_runtime issue was resolved? Did you understand what the problem was? Should we use this build script as default? 🙂

@StuartIanNaylor
Copy link

StuartIanNaylor commented Jan 30, 2023

@StuartIanNaylor it looks like the tflite_runtime issue was resolved? Did you understand what the problem was? Should we use this build script as default? 🙂

Yep the bazel native build on the rk3588 is almost 50% faster also on Pi if compiling on metal you should select native, I haven't tried but looks like the others are for cross compiling.
Tensorflow does this thing where it names new features experimental but many of them can be years old and the current defacto way. So I don't know if its just optimisation as been scratching my head with minimal_buiild as that could be the same, but boy the api is complex and not always clear what is the latest and greatest, but I think some of the so called 'experimental' methods also end up in the build with bazel. I am thinking the cmake script is an older build method and they have only been updating the bazel one?
Maybe we should be creating tensorflow as a dynamic lib in minimal_build via bazel and linking to it than a static embedded compile that we don't know what missing sauce is in the bazel compile. So use the python bazel build pip package as I can hack C/C++ but nowhere near building up, wel maybe I could but it would be painful and very hacky.
Also I think tensorflow has got better with threads as seeing an improvement with x4 where I remember specifically with Sanebows Pi-DTLN x2/x4 threads made no diff at all. I only have x4 big cores.

I did send you a message to use bazel with a @fquirin that I have forgot where but should be in your 'mentions'

@fquirin
Copy link

fquirin commented Feb 13, 2023

Hi @StuartIanNaylor , do you have a tflite_runtime wheel file built with the bazel script for download maybe? My Pi seems to be to weak to build it and I don't understand how the cross-compile is supposed to work because the examples are broken 😕 (the referenced Docker files don't exist).
I'd require the Python 3.9 aarch64 build 🙃.

[Edit] I did manage to compile it finally! Had to uninstall regular tensorflow and tflite packages though to use it 🤔 (some conflict it seems), but it is indeed even faster than "fat" tflite now 😀

@StuartIanNaylor
Copy link

Good as my native compile on Armv8,2 prob would not work on a Pi.

I am not really sure what the bazel build does that is different but yeah it is notibeabilly faster than the cmake compile that maybe hasn't been updated and creates a legacy build.
I am thinking its the same for the minimal_build and maybe linking to bazel built lib than a static lib might also be faster due to whatever bazel does in the build as opposed to running cmake.

Shouldn't need docker though even though you got it working as you just specify https://www.tensorflow.org/lite/guide/build_cmake_pip#available_target_names
armhf or aarch64 and it will go off and cross-compile, whilst otherwise for the host even if one of those native

@fquirin
Copy link

fquirin commented Feb 16, 2023

Good as my native compile on Armv8,2 prob would not work on a Pi

Actually could you try my build: download (if you have the same Python version). I'd like to know if they are compatible and perform the same :-).

@StuartIanNaylor
Copy link

Ran in a miniconda env as on Ubuntu with Python 3.10, but yeah runs OK.
conda create -n tlite-test python=3.9
Totally forgot what we where at, but yeah it runs.

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
gb1.wav
 My fellow Americans, this day has brought terrible news and great sadness to our country. At 9 o clock this morning, mission control in Houston lost contact with our space shuttle Columbia. A short time later, debris was seen falling from the skies above Texas. The Columbia's lost. There are no survivors.

Inference took 2.1s
mm0.wav
 This is the micro machine man presenting the most miniature motorcade of micro machine. Each one has dramatic details for a picture of precision paint jobs. Plus incredible micro machine pocket place that's physical police station fire station restaurant service station and more. Perfect pocket portable to take any place. And there are many miniature places to play with. Each one comes with its own special edition micro machine vehicle and fantastic features that miraculously move. Raise the bolt. Lift at the airport. Marine a man. The gun turret at the army base. Clean your car at the car. Raise the toll bridge. And these place that's fitted together to form a micro machine world. Micro machine pocket place that's so tremendously tiny so perfectly precise. So doesn't we detailed you on a pocket them all micro machines and micro machine pocket place that sold separately from Glube. The smaller they are, the better they are.

Inference took 3.72s
hp0.wav
 Henry F. Phillips from Wikipedia, the free encyclopedia at en.wicopedia.org.

Inference took 1.5s
gb0.wav
 Good morning. This Tuesday is Election Day. After months of spirited debate in vigorous campaigning, the time has come for Americans to make important decisions about our nation's future and encourage all Americans to go to the polls and vote. Election season brings out the spirit of competition between our political parties. And that competition is an essential part of a healthy democracy. But as the campaigns come to a close, Republicans, Democrats, and independents can find common ground on at least one point. Our system of

Inference took 2.52s

@fquirin
Copy link

fquirin commented Feb 17, 2023

Nice 😎 , thanks for testing 👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants