theblackcat102 / edgedict Goto Github PK

View Code? Open in Web Editor NEW

288.0 27.0 44.0 5.67 MB

Working online speech recognition based on RNN Transducer. ( Trained model release available in release )

License: Other

Python 98.48% Shell 0.40% Cython 1.12%

speech-recognition rnn-transducer speech-to-text asr online-speech-recognition openvino speech

edgedict's Introduction

Programming is the world of imagination

~~65% research, 35% software dev~~

Research Scientist at Appier

Improving LLM performance in as Agent in real world
LLM internals

Currently working on these side project topics

Data mining
- url domain-region mapping
- reranking search engine using neural nets
local assistant ( LLM model ) for handling my daily info aggregation and search

My huggingface profile

edgedict's People

Contributors

Stargazers

Watchers

edgedict's Issues

zh training flagfile

Hi,
I got the zh_70_medium pretrined model from v1.
Could I ask about the training flagfile and waht training data is uesd ?
Thank you very much!

absl flagfile adding into args.py

Hi, i wants to add the flagfile into python file so that i dont have to pass commandline arguments to python file any idea how to do it ?

time performance details?

Hi, would you please give some details about how you tested the Avg time for RNN-T? like what does the Avg mean? etc
Thanks

Avg Encoding Time | Avg Decoding Time | Avg Joint Time

Could you please update the README.md to include a tutorial on using microphone for online data?

As far as I understood, the project works with youtube videos, it would be also good to have a small recipe for using a microphone for real-time transcription. At least microphone data is the first thing that comes up to my mind when I think of 'streaming/online' ASR. Thank you!

I belive there is something wrong with the model

I have my own RNN-T project and I heavily borrowed from your model. I also can only reach around 60% WER, and when inspecting the outputs the model seems to be outputting way to many blank tokens. I can't even get it to fit to a few examples, because while the loss goes to nearly 0, the transcriptions outputted are mostly blanks, which causes the transcription to get cut off and results in a garbage WER. I am currently looking into what is wrong, perhaps something with the greedy_decode function. Please let me know if you have found anything.

Online decoding for microphone input

Hi,

Great work! I was wondering if there is a way to stream microphone input and have the live decoding over it?
Adding such functionality will be very helpful.

Best

CPU possible

Do you guys know if it is possible to run the code on CPU, thanks guys!

Missing parameter definition in the “stream.py" fo edgedict-0.1

I try to use this command.
”python stream.py --flagfile ./flagfiles/E6D2_LARGE_Batch.txt
--name rnnt-m-bpe
--model_name english_43_medium.pt
--path 3729-6852-0035.flac“
Will prompt that these parameters are not recognized.
Can you provide the correct stream.py？

Training Error

I am running the train.py. Following is the error. I digged in and fount that the input to the norm-layer in model.py is not of the correct dimension. There is places where input dimension are swapped. I tried fixing it but then other parameters got wrong.
Can you suggest a fix? I have installed everything as per the README.md. I guess the code in the models.py is needed to be fixed. Please let me know.

Traceback (most recent call last): File "/content/edgedict/train.py", line 385, in <module> app.run(main) File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run _run_main(main, args) File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "/content/edgedict/train.py", line 381, in main trainer.train(start_step=step) File "/content/edgedict/train.py", line 177, in train val_loss, wer, pred_seqs, true_seqs = self.evaluate() File "/content/edgedict/train.py", line 282, in evaluate loss, wer, pred_seq, true_seq = self.evaluate_step(batch) File "/content/edgedict/train.py", line 309, in evaluate_step loss = self.model(xs, ys, xlen, ylen) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/content/edgedict/rnnt/models.py", line 232, in forward h_enc, _ = self.encoder(xs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/content/edgedict/rnnt/models.py", line 132, in forward xs = self.norm(xs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/normalization.py", line 153, in forward input, self.normalized_shape, self.weight, self.bias, self.eps) File "/usr/local/lib/python3.7/dist-packages/apex/amp/wrap.py", line 28, in wrapper return orig_fn(*new_args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 1696, in layer_norm torch.backends.cudnn.enabled) RuntimeError: Given normalized_shape=[128], expected input with shape [*, 128], but got input of size[2, 128, 126]

the export_onnx.py error with input parameter

Maybe you upload the wrong script, do you ?
I am failed to work out the pt2onnx using export_onnx.py

in the script, two input parameter defined
flags.DEFINE_string('model_name', "last.pt", help='checkpoint name') flags.DEFINE_integer('step_n_frame', 2, help='input frame(stacked)')

However, three parameter defined
python export_onnx.py \ --flagfile ./logs/E6D2-smallbatch/flagfile.txt \ --step 15000 \ --step_n_frame 10

which one should be the right script. please update

Regarding Mixed precision RNNTLoss

Hi there,

Thanks for sharing your work.

On the bottom of your README, you said you have already "Modify wraprnnt-pytorch to compatible with apex mixed precision".

But I noticed that the warprnnt is still compiled from Hawkaaron's original repo. So is the source compatible with mixed precision to begin with? If you made the changes, where are the changes?

Thanks,
Kevin

RNNT loss is constantly zero

I am trying to train the model but the loss is constant = '0'. I found that the loss output from the PyTorch binding of WarpRNNT is a zero tensor. I am getting an error of 'src/binding.cpp:151: unsupported data type'. Is this is the reason why WarpRNNT is not able to calculate loss (I charged the datatype from float16 to 32 as sum() operation was not possible with float16). Any pointer is appreciated.

stream.py with live recording not working

I have tried to use stream.py with live recording but its not working although after passing the audio path it converts the stream to text but not live.

FileNotFoundError: No such file or directory: ' im_model.pt '.

Hello sir,
”python stream.py --flagfile ./flagfiles/E6D2_LARGE_Batch.txt
--name rnnt-m-bpe
--model_name english_43_medium.pt
--path 3729-6852-0035.flac“
i use this command,but here is a Error : FileNotFoundError: No such file or directory: ' im_model.pt '.
Can you give me some advice, sir?Looking forward to your reply.

theblackcat102 / edgedict Goto Github PK

edgedict's Introduction

Programming is the world of imagination

edgedict's People

Contributors

Stargazers

Watchers

Forkers

edgedict's Issues

Recommend Projects

Recommend Topics

Recommend Org