We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I noticed token-level timestamps are supported in #1485 But I'm not clear how to use it. I tried dtw and met some errors:
➜ whisper.cpp git:(master) ./main -dtw base.en -m models/ggml-base.en.bin -f samples/jfk.wav whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin' whisper_init_with_params_no_state: use gpu = 1 whisper_init_with_params_no_state: flash attn = 0 whisper_init_with_params_no_state: gpu_device = 0 whisper_init_with_params_no_state: dtw = 1 whisper_model_load: loading model whisper_model_load: n_vocab = 51864 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 512 whisper_model_load: n_audio_head = 8 whisper_model_load: n_audio_layer = 6 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 512 whisper_model_load: n_text_head = 8 whisper_model_load: n_text_layer = 6 whisper_model_load: n_mels = 80 whisper_model_load: ftype = 1 whisper_model_load: qntvr = 0 whisper_model_load: type = 2 (base) whisper_model_load: adding 1607 extra tokens whisper_model_load: n_langs = 99 whisper_model_load: Metal total size = 147.37 MB whisper_model_load: model size = 147.37 MB whisper_backend_init_gpu: using Metal backend ggml_metal_init: allocating ggml_metal_init: found device: Apple M1 Pro ggml_metal_init: picking default device: Apple M1 Pro ggml_metal_init: using embedded metal library ggml_metal_init: GPU name: Apple M1 Pro ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007) ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_init: simdgroup reduction support = true ggml_metal_init: simdgroup matrix mul. support = true ggml_metal_init: hasUnifiedMemory = true ggml_metal_init: recommendedMaxWorkingSetSize = 11453.25 MB whisper_backend_init: using BLAS backend whisper_mel_init: n_len = 6000, n_len_org = 6000, n_mel = 80 whisper_init_state: kv self size = 18.87 MB whisper_init_state: kv cross size = 18.87 MB whisper_init_state: kv pad size = 3.15 MB whisper_init_state: alignment heads masks size = 160 B whisper_init_state: compute buffer (conv) = 16.26 MB whisper_init_state: compute buffer (encode) = 135.01 MB whisper_init_state: compute buffer (cross) = 4.65 MB whisper_init_state: compute buffer (decode) = 175.55 MB system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ... whisper_mel_init: n_len = 4100, n_len_org = 1099, n_mel = 80 [00:00:00.000 --> 00:00:11.000] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country. WHISPER_ASSERT: src/whisper.cpp:7224: nth == 1 WHISPER_ASSERT: src/whisper.cpp:7224: nth == 1 [1] 93332 abort ./main -dtw base.en -m models/ggml-base.en.bin -f samples/jfk.wav
Could someone help to test with DTW-based token level timestamps? Thanks!
The text was updated successfully, but these errors were encountered:
See #2271 (comment)
Sorry, something went wrong.
Thanks for your reply! It really helped me!
No branches or pull requests
I noticed token-level timestamps are supported in #1485
But I'm not clear how to use it. I tried dtw and met some errors:
Could someone help to test with DTW-based token level timestamps? Thanks!
The text was updated successfully, but these errors were encountered: