I am looking to get a time-aligned word-level tran output, but instead of just t

Beam Search Decoding How to Get Beam of Tokens as Output about whisper-timestamped HOT 3 CLOSED

abarcovschi commented on June 10, 2024

Beam Search Decoding How to Get Beam of Tokens as Output

from whisper-timestamped.

Comments (3)

Jeronymous commented on June 10, 2024

Is what I am looking for possible?

Having a set of hypothesis for the words makes sense.
But having timestamps for all the alternatives is not really possible.
Also the confidence score should be different for all alternatives.

And this is more a request for openai-whisper, not whisper-timestamped.

from whisper-timestamped.

abarcovschi commented on June 10, 2024

I would be happy with having multiple hypotheses for the words, and using the timestamp of the top candidate. Also I don't need the confidence scores for my voting algorithm that I am implementing.

Do you think you could at least provide as output N hypotheses in full, instead of word-by-word N hypotheses? Or is this also an openai-whisper request? I just don't know how I would then proceed to extract timestamps for each word in each full hypothesis.

from whisper-timestamped.

abarcovschi commented on June 10, 2024

I managed to modify the openai-whisper transcribe.py and decoding.py files to return multiple hypotheses to whisper-timestamped transcribe.py. From there, I was able to extract word-level time alignments for each hypotheses by modifying some of the code. I can provide more details if anyone needs to extract word-level time alignments for multiple hypotheses.

from whisper-timestamped.

Beam Search Decoding How to Get Beam of Tokens as Output about whisper-timestamped HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent