-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect timetstamps #2271
Comments
Fixes ggerganov#2271 - Adds consecutive timestamps after end of last segment as the new starting ts - Add these timestamp to output when "print-special" enabled - Fixes fflush usage in live reporting I was not able to test this with the special "token_timestamps" option.
@thewh1teagle How did you generate the |
@SimpleVictor |
Fixes ggerganov#2271 - Adds consecutive timestamps after end of last segment as the new starting ts - Add these timestamp to output when "print-special" enabled - Fixes fflush usage in live reporting I was not able to test this with the special "token_timestamps" option.
Fixes ggerganov#2271 - Adds consecutive timestamps after end of last segment as the new starting ts - Add these timestamp to output when "print-special" enabled - Fixes fflush usage in live reporting I was not able to test this with the special "token_timestamps" option.
I found another weird wrong timestamps when word timestamps enabled. sam_altman.mp4Open the details and search for transcript.json[
{
"start": 0,
"stop": 19,
"text": ""
},
{
"start": 19,
"stop": 34,
"text": " What"
},
{
"start": 34,
"stop": 48,
"text": " do"
},
{
"start": 48,
"stop": 72,
"text": " you"
},
{
"start": 72,
"stop": 112,
"text": " think"
},
{
"start": 112,
"stop": 151,
"text": " about"
},
{
"start": 151,
"stop": 191,
"text": " like"
},
{
"start": 191,
"stop": 216,
"text": " when"
},
{
"start": 216,
"stop": 248,
"text": " Elon"
},
{
"start": 248,
"stop": 272,
"text": " was"
},
{
"start": 272,
"stop": 336,
"text": " causing"
},
{
"start": 336,
"stop": 384,
"text": " calling"
},
{
"start": 384,
"stop": 408,
"text": " for"
},
{
"start": 408,
"stop": 416,
"text": " a"
},
{
"start": 416,
"stop": 456,
"text": " pause"
},
{
"start": 456,
"stop": 474,
"text": " on"
},
{
"start": 474,
"stop": 494,
"text": " AI"
},
{
"start": 764,
"stop": 514,
"text": " He"
},
{
"start": 514,
"stop": 530,
"text": " was"
},
{
"start": 530,
"stop": 587,
"text": " like"
},
{
"start": 587,
"stop": 670,
"text": " starting"
},
{
"start": 670,
"stop": 711,
"text": " then"
},
{
"start": 711,
"stop": 721,
"text": " a"
},
{
"start": 721,
"stop": 762,
"text": " GI"
},
{
"start": 762,
"stop": 815,
"text": " company"
},
{
"start": 815,
"stop": 867,
"text": " while"
},
{
"start": 867,
"stop": 888,
"text": " he"
},
{
"start": 888,
"stop": 919,
"text": " was"
},
{
"start": 919,
"stop": 971,
"text": " doing"
},
{
"start": 971,
"stop": 1018,
"text": " that"
},
{
"start": 1104,
"stop": 1257,
"text": " Yeah,"
},
{
"start": 1257,
"stop": 1272,
"text": " so"
},
{
"start": 1272,
"stop": 1310,
"text": " didn't"
},
{
"start": 1310,
"stop": 1323,
"text": " he"
},
{
"start": 1323,
"stop": 1357,
"text": " start"
},
{
"start": 1357,
"stop": 1367,
"text": " it"
},
{
"start": 1367,
"stop": 1414,
"text": " like"
},
{
"start": 1414,
"stop": 1431,
"text": " after"
},
{
"start": 1431,
"stop": 1446,
"text": " he"
},
{
"start": 1446,
"stop": 1464,
"text": " was"
},
{
"start": 1464,
"stop": 1512,
"text": " calling"
},
{
"start": 1512,
"stop": 1532,
"text": " for"
},
{
"start": 1532,
"stop": 1553,
"text": " the"
},
{
"start": 1553,
"stop": 1605,
"text": " pause."
},
{
"start": 1605,
"stop": 1628,
"text": " I"
},
{
"start": 1694,
"stop": 1658,
"text": " Think"
},
{
"start": 1658,
"stop": 1694,
"text": " before"
},
{
"start": 1694,
"stop": 1712,
"text": " but"
},
{
"start": 1712,
"stop": 1718,
"text": " I"
},
{
"start": 1718,
"stop": 1748,
"text": " don't"
},
{
"start": 1748,
"stop": 1772,
"text": " know"
},
{
"start": 1772,
"stop": 1784,
"text": " in"
},
{
"start": 1784,
"stop": 1803,
"text": " any"
},
{
"start": 1803,
"stop": 1832,
"text": " cases"
},
{
"start": 1832,
"stop": 1850,
"text": " one"
},
{
"start": 1850,
"stop": 1866,
"text": " of"
},
{
"start": 1866,
"stop": 1892,
"text": " those"
},
{
"start": 1892,
"stop": 1910,
"text": " you"
},
{
"start": 1910,
"stop": 1940,
"text": " can't"
},
{
"start": 1940,
"stop": 1964,
"text": " beat"
},
{
"start": 1964,
"stop": 1981,
"text": " him"
},
{
"start": 1981,
"stop": 2006,
"text": " join"
},
{
"start": 2006,
"stop": 2030,
"text": " them"
},
{
"start": 2030,
"stop": 2084,
"text": " things."
},
{
"start": 2084,
"stop": 2108,
"text": " Um,"
},
{
"start": 2108,
"stop": 2126,
"text": " I"
},
{
"start": 2410,
"stop": 2185,
"text": " Think"
},
{
"start": 2185,
"stop": 2220,
"text": " the"
},
{
"start": 2220,
"stop": 2315,
"text": " instinct"
},
{
"start": 2315,
"stop": 2338,
"text": " of"
},
{
"start": 2338,
"stop": 2430,
"text": " saying"
},
{
"start": 2430,
"stop": 2461,
"text": " like"
},
{
"start": 2461,
"stop": 2518,
"text": " we've"
},
{
"start": 2518,
"stop": 2585,
"text": " really"
},
{
"start": 2585,
"stop": 2620,
"text": " got"
},
{
"start": 2620,
"stop": 2643,
"text": " to"
},
{
"start": 2643,
"stop": 2714,
"text": " figure"
},
{
"start": 2714,
"stop": 2756,
"text": " out"
},
{
"start": 2756,
"stop": 2784,
"text": " how"
},
{
"start": 2784,
"stop": 2820,
"text": " to"
},
{
"start": 2840,
"stop": 2872,
"text": " Make"
},
{
"start": 2872,
"stop": 2920,
"text": " this"
},
{
"start": 2920,
"stop": 2970,
"text": " safe"
},
{
"start": 2970,
"stop": 3008,
"text": " and"
},
{
"start": 3008,
"stop": 3058,
"text": " good"
},
{
"start": 3058,
"stop": 3096,
"text": " and"
},
{
"start": 3096,
"stop": 3164,
"text": " like"
},
{
"start": 3164,
"stop": 3222,
"text": " widely"
},
{
"start": 3222,
"stop": 3272,
"text": " good"
},
{
"start": 3272,
"stop": 3306,
"text": " is"
},
{
"start": 3306,
"stop": 3454,
"text": " really"
},
{
"start": 3454,
"stop": 3486,
"text": " important"
},
{
"start": 3486,
"stop": 3524,
"text": " but"
},
{
"start": 3524,
"stop": 3535,
"text": " I"
},
{
"start": 3535,
"stop": 3606,
"text": " think"
},
{
"start": 3816,
"stop": 3845,
"text": " Calling"
},
{
"start": 3845,
"stop": 3977,
"text": " for"
},
{
"start": 3977,
"stop": 4016,
"text": " a"
},
{
"start": 4108,
"stop": 4078,
"text": " Pause"
},
{
"start": 4078,
"stop": 4091,
"text": " is"
},
{
"start": 4091,
"stop": 4133,
"text": " like"
},
{
"start": 4133,
"stop": 4188,
"text": " naive"
},
{
"start": 4188,
"stop": 4209,
"text": " it"
},
{
"start": 4209,
"stop": 4230,
"text": " at"
},
{
"start": 4230,
"stop": 4273,
"text": " best"
},
{
"start": 4273,
"stop": 4337,
"text": " for"
},
{
"start": 4337,
"stop": 4337,
"text": " the"
},
{
"start": 4337,
"stop": 4402,
"text": " latest"
},
{
"start": 4402,
"stop": 4446,
"text": " tech"
},
{
"start": 4446,
"stop": 4543,
"text": " insights"
},
{
"start": 4543,
"stop": 4586,
"text": " visit"
},
{
"start": 4586,
"stop": 4693,
"text": " em"
},
{
"start": 4693,
"stop": 4704,
"text": " 360"
},
{
"start": 4704,
"stop": 4750,
"text": " tech"
},
{
"start": 4750,
"stop": 4800,
"text": " calm"
},
{
"start": 4800,
"stop": 4800,
"text": ""
},
{
"start": 4800,
"stop": 4843,
"text": " visit"
},
{
"start": 4843,
"stop": 5054,
"text": " EM360tech.com."
},
{
"start": 5054,
"stop": 5054,
"text": ""
},
{
"start": 5054,
"stop": 6054,
"text": " [BLANK_AUDIO]"
}
] Is there a way we can 'tell' whisper the segments instead of letting him segment it? The diarization is actually pretty simple and once I'll find an approach to use it along with whisper.cpp I can add it to whisper.cpp / implement in Rust. https://github.com/thewh1teagle/ort-diarize/blob/main/main.py |
What if I want to use it in jni? That is, what if I want to use it in Android? |
Can you take a look about this issue? it happens with all the models. You can easily understand the timestamps issue just by run this and listen for the few seconds audio: cd $(mktemp -d)
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
cmake -B build .
cmake --build build
wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.bin
wget https://github.com/thewh1teagle/vibe/releases/download/v2.6.3/normal.wav
./build/bin/main -m ggml-tiny.bin -f normal.wav
ffplay normal.wav Seems like the issue is here. t0 updated to be latest t1. but the next start timestamp shouldn't always be the end timestamp of last segment: Lines 6210 to 6217 in 31aea56
Maybe it's limitation in the model |
When transcribing the following file, the timestamps are incorrect.
As you can see the start timestamp of the second segment is the same as the end timestamp of the previous one, although there's a gap of few seconds between.
never.give.you.up.mp4
transcript.srt
transcript.json
word_timestamps.json
The text was updated successfully, but these errors were encountered: