It is hoped that in future versions, the reasoning framework of CTranslate2 can be supported. :) #2

YLQY · 2023-10-20T02:08:17Z

Hello, I am very happy to finally wait for the demo of UnitySentis. This surprised me. I am a speech recognition algorithm engineer. We often encounter such problems, the inference speed of the model, that is, the problem of real-time performance. Nowadays, very popular large models, such as ChatGPT and whsiper-large-v2. I want to use these models in UnitySentis. We have also tested the onnx model inference speed of Whisper in the Linux environment. Unfortunately, this inference speed is still unbearable for people. So we have two methods. The first is to reduce the size of the model, such as using whisper-small or whisper-base models, but there will still be some loss in effect. We are very happy that we have found an inference library named CTranslate2 (https://github.com/OpenNMT/CTranslate2). We only need to export the model to the format required by ctranslate2, and then with the acceleration of CTranslate2, whisper-large- The real-time rate of v2 has been significantly improved, the RTF value can be reduced to 0.03 (A30 machine), and the running speed under the CPU has also been significantly improved. Moreover, CTranslate2 also supports most of the current mainstream Transformer models, so it would be great if UnitySentis could refer to or integrate CTranslate2 in future versions. :)
I wish UnitySentis will develop better and better :)

Here are some links you may want to use
https://github.com/OpenNMT/CTranslate2
https://opennmt.net/CTranslate2
https://github.com/guillaumekln/faster-whisper/tree/master

LiutaurasVysniauskasUnity · 2023-12-05T14:20:15Z

Thank you for the suggestion! This is known internally as Task 257

elephantpanda · 2024-01-28T08:54:06Z

Hi thanks we have Whisper-Tiny model example now on Hugging Face. Please try it out and give us your feedback. Thanks for the links we'll check them out.

YLQY · 2024-02-23T07:08:12Z

@pauldog Hi, thanks for the Whisper-Tiny model example on Hugging Face.
I tried the following link here
https://huggingface.co/unity/sentis-whisper-tiny
Then I want to ask some questions.

In the file https://huggingface.co/unity/sentis-whisper-tiny/blob/main/RunWhisper.cs, line 215, an error is reported in my Visual Studio. I want to know the encoding of this file.

Then I tried the following:

There will be problems with the decoded results in the sample file,When decoding the given sample audio, [Music] will appear
This decoding result is not the result of the sample audio
https://huggingface.co/unity/sentis-whisper-tiny/blob/main/answering-machine16kHz.wav

Then I tried to use Chinese transcription and modified the Chinese special token id, and some undecodable results appeared.

I feel like these exceptions are related to the encoding of RunWhisper.cs, line 215

I want to know how to obtain the sentis files (AudioDecoder_Tiny.sentis, AudioEncoder_Tiny.sentis) in the sample:)

I tried fine-tuning whisper-tiny model, as well as small, large-v2, etc. But I then used the optimization-cli in hugging face to export it as an onnx model, and then used onnx to unity sentis. I tried the tiny model and replaced the AudioDecoder_Tiny.sentis and AudioEncoder_Tiny.sentis in the sample.
This is the result after I use
optimum-cli export onnx --model ./whisper-tiny --task automatic-speech-recognition whisper-tiny-onnx/
This is my original whisper-tiny model
https://huggingface.co/openai/whisper-tiny

But when I run it I get the following results

So I can’t reproduce my fine-tuned model through examples.

elephantpanda · 2024-02-23T23:34:04Z

Hi, when I download the file and put it in Wordpad I get:

bool IsWhiteSpace(char c) { return !(('!' <= c && c <= '~') || ('¡' <= c && c <= '¬') || ('®' <= c && c <= 'ÿ')); }
If you copy the text from the webpage it doesn't work. We could look into fixing that. I think this should work for all countries. Otherwise you can compare to the standard extended ASCII set of codes from 0...255: https://www.ascii-code.com/

For your second question. You could change the names of the inputs on line 157. To the names of the inputs in your ONNX file. It may just have been exported with different names.

elephantpanda · 2024-02-23T23:41:01Z

Hi, when I download the CS file and open it in WordPad I get:
bool IsWhiteSpace(char c) { return !(('!' <= c && c <= '~') || ('¡' <= c && c <= '¬') || ('®' <= c && c <= 'ÿ')); }
It doesn't work if you copy it off the webpage. It should work for all languages. If not you can compare it to this table all codes are from 0..255. (0x0000...0x00FF) This is probably not the most elegant bit of code and I'm open to suggestions to simplify it.

For your second question you can change the names of the inputs to the inputs that you exported your ONNX file with on line 157-162. So you would change these to "encoder_hidden_states" and "input_ids".

Also, note that some languages may work better than others. It depends on the training data that was used.

YLQY · 2024-02-28T07:42:17Z

@pauldog
Hi,Thank you very much for your help:)
I have solved both problems here according to the method you mentioned.
Regarding the problem of garbled characters, I did the following experiments:
For the results output by whisper, the source code contains one token and one token decoding.

I found that in Chinese, there may be two or three tokens forming a Chinese character. Therefore, if the historical whisper results are integrated and decoded, there will be no problem of garbled characters. The following is the code I modified here and the screenshot of the operation.

YLQY · 2024-03-14T15:13:28Z

@pauldog
Hello, can you share how you obtained the LogMelSepctro.sentis model? How did you obtain its original onnx model? I am trying to use C++ to complete the feature extraction process of Whisper, but it cannot run after being packaged into Android. I am still trying to figure out whether it is a problem with the dll file. This may be a bit difficult for me :(

The following are some of my sharings. After experimental verification, they can run on Windows, but cannot run on the Android platform.
https://github.com/YLQY/FBankInterface

I can only generate encoder and decoder models on Optimum.

YLQY · 2024-03-15T07:54:21Z

@pauldog
Hello, I have successfully run the offline whisper-tiny model on the android side. Many thanks to UnitySentis and the examples given.

I will share this demo later

The following is the direction I want to go in the future. According to the current experience results, recognition effects need to be improved. In addition to fine-tuning the whisper model, there are also quantitative methods. It can greatly reduce the size of the model. The following is the size of my tiny model before and after quantization.

(It can be seen that the volume of the model has been significantly reduced.)

I use huggingface/optimum here to quantify the model (here)
Here are the commands I use:

optimum-cli export onnx --model ./whisper-tiny/ --task automatic-speech-recognition whisper-tiny-onnx
optimum-cli onnxruntime quantize --arm64 --onnx_model whisper-tiny-onnx -o qu_whisper-tiny-onnx

But when I imported the quantized model, UnitySentis reported an error. The following is the error message. It seems that some operators are not supported.

I would like to ask, does UnitySentis not support these currently? Or is there something wrong with my converted and quantified model?

If this problem can be solved, then according to our experience, a whisper-small model with at least 244 M parameters can achieve a good offline effect experience.

Here are some links that may be helpful:
https://huggingface.co/docs/optimum/index
https://github.com/huggingface/optimum

elephantpanda · 2024-03-15T08:47:41Z

Hi @YLQY. You converted them correctly. I can tell you that the Sentis team is working on quantization support. The first step in this support will be the next version 1.4.0 coming out in the next couple of weeks.

YLQY · 2024-03-16T13:13:33Z

@pauldog
Hello, this is my demo, but I found another problem, that is, if I test about 4 audios in a row, the app will crash inexplicably. Why is this? Do you have a certain idea? Is the tensor not released?

YLQY · 2024-04-11T13:17:47Z

Hello @pauldog, I am very happy to see that UnitySentis has version 1.4. I can’t wait to try it out and also found the code update in huggingface. The onnx quantization model I converted through the above method still cannot be loaded. The following is the error message, I hope it will be helpful to you.

PaulBirdUnity · 2024-04-12T21:17:18Z

Hi, you are right some aspects of quantization are still not supported such as QuantizeLinear. Appologies. If an you have an un-quantized model, you can quantize the weights. There is an example as one of the samples in the package manager.
Please also feel free to report this issue here: https://discussions.unity.com/c/ai-beta/sentis/10 as this is the main place where we log our issues.
Also you can vote and write a message here: https://unity.com/roadmap/unity-ai/sentis On the box where it says "quantization for performance" you can click on that to request it and send a comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It is hoped that in future versions, the reasoning framework of CTranslate2 can be supported. :) #2

It is hoped that in future versions, the reasoning framework of CTranslate2 can be supported. :) #2

YLQY commented Oct 20, 2023 •

edited

Loading

LiutaurasVysniauskasUnity commented Dec 5, 2023 •

edited

Loading

elephantpanda commented Jan 28, 2024

YLQY commented Feb 23, 2024 •

edited

Loading

elephantpanda commented Feb 23, 2024

elephantpanda commented Feb 23, 2024 •

edited

Loading

YLQY commented Feb 28, 2024 •

edited

Loading

YLQY commented Mar 14, 2024 •

edited

Loading

YLQY commented Mar 15, 2024 •

edited

Loading

elephantpanda commented Mar 15, 2024

YLQY commented Mar 16, 2024 •

edited

Loading

YLQY commented Apr 11, 2024 •

edited

Loading

PaulBirdUnity commented Apr 12, 2024

It is hoped that in future versions, the reasoning framework of CTranslate2 can be supported. :) #2

It is hoped that in future versions, the reasoning framework of CTranslate2 can be supported. :) #2

Comments

YLQY commented Oct 20, 2023 • edited Loading

LiutaurasVysniauskasUnity commented Dec 5, 2023 • edited Loading

elephantpanda commented Jan 28, 2024

YLQY commented Feb 23, 2024 • edited Loading

elephantpanda commented Feb 23, 2024

elephantpanda commented Feb 23, 2024 • edited Loading

YLQY commented Feb 28, 2024 • edited Loading

YLQY commented Mar 14, 2024 • edited Loading

YLQY commented Mar 15, 2024 • edited Loading

elephantpanda commented Mar 15, 2024

YLQY commented Mar 16, 2024 • edited Loading

YLQY commented Apr 11, 2024 • edited Loading

PaulBirdUnity commented Apr 12, 2024

YLQY commented Oct 20, 2023 •

edited

Loading

LiutaurasVysniauskasUnity commented Dec 5, 2023 •

edited

Loading

YLQY commented Feb 23, 2024 •

edited

Loading

elephantpanda commented Feb 23, 2024 •

edited

Loading

YLQY commented Feb 28, 2024 •

edited

Loading

YLQY commented Mar 14, 2024 •

edited

Loading

YLQY commented Mar 15, 2024 •

edited

Loading

YLQY commented Mar 16, 2024 •

edited

Loading

YLQY commented Apr 11, 2024 •

edited

Loading