wenet-e2e / wenet Goto Github PK
View Code? Open in Web Editor NEWProduction First and Production Ready End-to-End Speech Recognition Toolkit
Home Page: https://wenet-e2e.github.io/wenet/
License: Apache License 2.0
Production First and Production Ready End-to-End Speech Recognition Toolkit
Home Page: https://wenet-e2e.github.io/wenet/
License: Apache License 2.0
Have you test the difference for this two learning rate schedule (NoamLR and WarmupLR)? When I use NoamLR to train CTC/AED joint model, it seems that it's quite hard to train it.
I create conda env accoding to following:
conda create -n wenet python=3.8
conda activate wenet
pip install -r requirements.txt
conda install pytorch==1.6.0 cudatoolkit=10.1 torchaudio -c pytorch
cmake is OK, but error occure when make, the error is:
/data/home/yezj/github/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so: undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::replace(unsigned long, unsigned long, char const*, unsigned long)@GLIBCXX_3.4.21'
/data/home/yezj/github/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so: undefined reference to `std::_Sp_locker::_Sp_locker(void const*)@GLIBCXX_3.4.21'
/data/home/yezj/github/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so: undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare(unsigned long, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const@GLIBCXX_3.4.21'
/data/home/yezj/github/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so: undefined reference to `std::basic_ofstream<char, std::char_traits<char> >::basic_ofstream(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::_Ios_Openmode)@GLIBCXX_3.4.21'
.....
My os is centos 7.5 and gcc 7.3.1, cmake 3.19.4.
Thanks you
Thank u for making your code public. Is it all ready for runing now? There are many core dumed when trainning by DDP.If training by single gpu, torchscript gets some errors,too.Just like Unknown type name 'torch.device'. If I ignore torch.jit.script, also got errors.My pytorch version is 1.7.0.
When I used my own data to train the Conformer network, the following error occurred at the beginning of the training. How should I solve it? The batchsize has been set very small, 4, and my GPU is a 32G graphics card(NVIDIA V100).
...
2021-03-15 15:34:05,287 INFO Checkpoint: save to checkpoint exp/conformer/init.pt
2021-03-15 15:34:06,656 INFO Epoch 0 TRAIN info lr 4e-08
2021-03-15 15:34:06,657 INFO using accumulate grad, new batch size is 1 timeslarger than before
2021-03-15 15:34:12,573 DEBUG TRAIN Batch 0/19074 loss 525.971741 loss_att 291.681671 loss_ctc 1072.648438 lr 0.00000004 rank 0
2021-03-15 15:34:36,776 DEBUG TRAIN Batch 100/19074 loss 610.763184 loss_att 431.314453 loss_ctc 1029.476929 lr 0.00000404 rank 0
2021-03-15 15:35:00,639 DEBUG TRAIN Batch 200/19074 loss 29.456654 loss_att 28.791754 loss_ctc 31.008080 lr 0.00000804 rank 0
*** Error in `python': corrupted size vs. prev_size: 0x0000560a10045d70 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777f5)[0x7f9ca30ea7f5]
/lib/x86_64-linux-gnu/libc.so.6(+0x80e0b)[0x7f9ca30f3e0b]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f9ca30f758c]
/home/sine/anaconda3/envs/wenet/lib/python3.8/site-packages/torchaudio/_torchaudio.so(_ZN5torch5audio18build_flow_effectsERKSsN2at6TensorEbP16sox_signalinfo_tP18sox_encodinginfo_tPKcSt6vectorINS0_9SoxEffectESaISC_EEi+0xfec)[0x7f9c3f46317c]
/home/sine/anaconda3/envs/wenet/lib/python3.8/site-packages/torchaudio/_torchaudio.so(+0x86dc3)[0x7f9c3f481dc3]
/home/sine/anaconda3/envs/wenet/lib/python3.8/site-packages/torchaudio/_torchaudio.so(+0x7bbfa)[0x7f9c3f476bfa]
python(PyCFunction_Call+0x58)[0x560a027b62d8]
python(_PyObject_MakeTpCall+0x23c)[0x560a027a5edc]
python(_PyEval_EvalFrameDefault+0x45a9)[0x560a02831879]
python(_PyEval_EvalCodeWithName+0x300)[0x560a027fb760]
python(_PyFunction_Vectorcall+0x1e3)[0x560a027fc593]
python(+0x10399c)[0x560a0276599c]
python(_PyFunction_Vectorcall+0x10b)[0x560a027fc4bb]
python(+0x10425f)[0x560a0276625f]
python(_PyEval_EvalCodeWithName+0x8b1)[0x560a027fbd11]
python(_PyFunction_Vectorcall+0x1e3)[0x560a027fc593]
python(+0x10425f)[0x560a0276625f]
python(PyEval_EvalCodeWithName+0x8b1)[0x560a027fbd11]
...
...
...
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Traceback (most recent call last):
File "wenet/bin/train.py", line 211, in
executor.train(model, optimizer, scheduler, train_data_loader, device,
File "/home/sine/wenet/wenet-main/examples/accent_reg/s0/wenet/utils/executor.py", line 63, in train
optimizer.zero_grad()
File "/home/sine/anaconda3/envs/wenet/lib/python3.8/site-packages/torch/optim/optimizer.py", line 171, in zero_grad
p.grad.detach()
File "/home/sine/anaconda3/envs/wenet/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 30256) is killed by signal: Aborted.
x = x * self.xscale in this place self.xscale = 16 ? is there any paper explain this ?thanks
Thank you for this great work.
I trained aishell follow aishell/s0, and get final.zip
I want try the x86 runtime, but get error:
cmd is :
./build/decoder_main --chunk_size -1 --wav_path /root/A2_0.wav --model_path ./final.zip --dict_path ./words.txt
the error is:
terminate called after throwing an instance of 'std::runtime_error'
what(): The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/torch/nn/functional.py", line 38, in forward_encoder_chunk
ret = ret2
else:
output = torch.matmul(input, torch.t(weight))
~~~~~~~~~~~~ <--- HERE
if torch.isnot(bias, None):
bias1 = unchecked_cast(Tensor, bias)
Traceback of TorchScript, original code (most recent call last):
File "/data/Softwares/miniconda3/envs/wenet/lib/python3.8/site-packages/torch/nn/functional.py", line 1676, in forward_encoder_chunk
ret = torch.addmm(bias, input, weight.t())
else:
output = input.matmul(weight.t())
~~~~~~~~~~~~ <--- HERE
if bias is not None:
output += bias
RuntimeError: size mismatch, m1: [244 x 4864], m2: [5120 x 256] at ../aten/src/TH/generic/THTensorMath.cpp:41
Did I make some mistakes? Or I need change configuration?
Thank you
I downloaded final.zip and words.txt, put them into Assets folder, and compiled the project into APK to run on Android system. When I open the app and click the button, it will report an error and flash back.
The error message is as followed :
26715-27453/com.mobvoi.wenet E/libc++abi: terminating with uncaught exception of type c10::Error: Expected at most 5 argument(s) for operator 'forward_encoder_chunk', but received 7 argument(s). Declaration: forward_encoder_chunk(torch.wenet.transformer.asr_model.___torch_mangle_21.ASRModel self, Tensor xs, Tensor? subsampling_cache=None, Tensor[]? elayers_output_cache=None, Tensor[]? conformer_cnn_cache=None) -> ((Tensor, Tensor, Tensor[], Tensor[]))
Exception raised from checkAndNormalizeInputs at ../aten/src/ATen/core/function_schema_inl.h:245 (most recent call first):
(no backtrace available)
26715-27453/com.mobvoi.wenet A/libc: Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 27453 (om.mobvoi.wenet), pid 26715 (om.mobvoi.wenet)
There was no wenet/bin/recognize.py file in main branch now. If i run any script to decode it will report as bellow:
python: can't open file 'wenet/bin/recognize.py': [Errno 2] No such file or directory
hi ,i complied the runtime/x86 code on windows platform,using vs2017 ,but when run the decode_main demo , it gets empty recognize result , why ?
Thank you for your nice work!
I found two differences between wenet and espnet:
Describe the bug
I got the error about input file size(600 sec) with offline demo on server runtime.
But I have no error with 60 sec.
With streaming demo, I used the same wav file(600sec) and the server hung up.
To Reproduce
Steps to reproduce the behavior:
Go to...
cd wenet/runtime/server/x86
Run this command...
export GLOG_logtostderr=1
export GLOG_v=2
#wav_scp=raw_wav/test.scp
wav_path=
model_dir=
./build/decoder_main \
--chunk_size -1 \
--wav_path $wav_path \
--model_path $model_dir/final.zip \
--dict_path $model_dir/words.txt 2>&1 | tee log.txt
Get this error.(some logs added by me.)
$ bash offline_recog.sh
I0322 07:28:19.092447 5399 torch_asr_model.cc:36] torch model info subsampling_rate 4 right context 6 sos 11175 eos 11175
I0322 07:28:19.111845 5399 feature_pipeline.h:43] feature pipeline config num_bins 80 frame_length 400frame_shift160
I0322 07:28:19.112640 5399 decoder_main.cc:74] wav raw_wav/wani.wav
I0322 07:28:19.113868 5399 wav.h:73] wav header info: data size 36
I0322 07:28:19.114097 5399 decoder_main.cc:79] read 18 samples, 1 channels, 16 bits, So we got the length of data is 18
I0322 07:28:19.114109 5399 fbank.h:133] Get the 18 samples
I0322 07:28:19.114113 5399 feature_pipeline.cc:39] add 0 frames
I0322 07:28:19.114121 5399 decoder_main.cc:83] num frames 0
I0322 07:28:19.114135 5399 torch_asr_decoder.cc:60] AdvanceDecoding
I0322 07:28:19.114140 5399 torch_asr_decoder.cc:78] Required 2147483647 get 0
terminate called after throwing an instance of 'c10::Error'
what(): There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CPU, QuantizedCPU, Autograd, Profiler, Tracer, Autocast]
Exception raised from reportError at ../aten/src/ATen/core/dispatch/Dispatcher.cpp:306 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x68 (0x7f9ffa3eaeb8 in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libc10.so)
frame #1: c10::Dispatcher::reportError(c10::DispatchTable const&, c10::DispatchKey) + 0x18f (0x7f9ffb12780f in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #2: at::_cat(c10::ArrayRef<at::Tensor>, long) + 0x203 (0x7f9ffb8bf373 in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #3: at::native::cat(c10::ArrayRef<at::Tensor>, long) + 0xbd (0x7f9ffb53f4ad in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x135fec6 (0x7f9ffb977ec6 in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #5: <unknown function> + 0xac4c3c (0x7f9ffb0dcc3c in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #6: at::cat(c10::ArrayRef<at::Tensor>, long) + 0x117 (0x7f9ffb8bf067 in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x2ef7d5d (0x7f9ffd50fd5d in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #8: <unknown function> + 0xac4c3c (0x7f9ffb0dcc3c in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #9: at::cat(c10::ArrayRef<at::Tensor>, long) + 0x117 (0x7f9ffb8bf067 in /disk107/code/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so)
frame #10: <unknown function> + 0x40f79 (0x5650037f1f79 in ./build/decoder_main)
frame #11: <unknown function> + 0x3f57d (0x5650037f057d in ./build/decoder_main)
frame #12: <unknown function> + 0xe979 (0x5650037bf979 in ./build/decoder_main)
frame #13: __libc_start_main + 0xe6 (0x7f9ff926dbf6 in /lib/x86_64-linux-gnu/libc.so.6)
frame #14: <unknown function> + 0xde59 (0x5650037bee59 in ./build/decoder_main)
when I pass some wav files to decoder_main
I may encounter exceptions as follows:
I0317 14:59:13.375226 1310 torch_asr_model.cc:36] torch model info subsampling_rate 4 right context 6 sos 4232 eos 4232
I0317 14:59:13.379293 1310 decoder_main.cc:80] num frames 0
I0317 14:59:13.379323 1310 torch_asr_decoder.cc:77] Required 2147483647 get 0
terminate called after throwing an instance of 'c10::Error'
what(): There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CPU, QuantizedCPU, Autograd, Profiler, Tracer, Autocast]
I guess it's a error caused by frontend/wav.h, in wav.h
# wav.h
...
# line 31
struct WavHeader {
char riff[4]; // "riff"
unsigned int size;
char wav[4]; // "WAVE"
char fmt[4]; // "fmt "
unsigned int fmt_size;
uint16_t format;
uint16_t channels;
unsigned int sample_rate;
unsigned int bytes_per_second;
uint16_t block_size;
uint16_t bit;
char data[4]; // "data"
unsigned int data_size;
};
...
# line 56
fread(&header, 1, sizeof(header), fp);
...
# line 72
int num_data = header.data_size / (bits_per_sample_ / 8);
data_ = new float[num_data];
num_sample_ = num_data / num_channel_;
There is a struct containing RIFF-FORMAT-DATA chunk, while sometimes, a fine wav file may contain some other chunks in wav header, like fact chunk and list chunk , when we process audio files with ffmpeg or pydub which is based on ffmpeg, there's a high possibility a LIST CHUNK encoded into generated wav file, you can talk this link as a reference.
A better way I guess is to read 4 bytes detecting which chunk the following part is, and then process it, after the data chunk is detected, then we can continue with next steps.
I'm not familiar with C/C++ coding, and not sure if my analysis is correct, but if it is, hope you can fix it or add a notification in README, it will be of great help, thanks :-)
No say here how to do:
https://github.com/mobvoi/wenet/blob/main/examples/aishell/s0/README.md
HEAD is now at 96a2f23 Merge pull request #419 from shinh/release-0-4-0
[ 33%] No patch step for 'glog-populate'
[ 44%] Performing update step for 'glog-populate'
[ 55%] No configure step for 'glog-populate'
[ 66%] No build step for 'glog-populate'
[ 77%] No install step for 'glog-populate'
[ 88%] No test step for 'glog-populate'
[100%] Completed 'glog-populate'
[100%] Built target glog-populate
CMake Error at /usr/lib/x86_64-linux-gnu/cmake/gflags/gflags-targets.cmake:37 (message):
Some (but not all) targets in this export set were already defined.
I tried to use multiprocessing decoding (run.pl like espnet), but it's extremely slow. What's the possible reason?
Use TorchAudio for feature extraction
Can I train the model using single gpu ?
Maybe you could change CMakeLists.txt to copy from the "core" directory instead for Windows users?
I can't understand forward chunk by chunk function in encoder.py
So, I have some problem about process of calculating right context parameter and
how to use it in forward chunk by chunk problem
Can anyone solve my problem ?
Is there any possibility to train model based on alphabet, not token?
Many thanks!
Is your feature request related to a problem? Please describe.
Low ressource languages and deep domain use cases need more efficient models
Describe the solution you'd like
Huggingface is working in their transformers library integrating ASR models like wave2vec 2 and speech transformer
Describe alternatives you've considered
The fairseq implementation of wave2vec has more dependencies and is more complex to use and less readable.
Additional context
Integrating huggingface models makes have you pretrained models from modelhub, multiple models support with less code.
cmake version 3.19.4
gcc version 9.3.0 (GCC)
cmake successfully,but make error when runtime server building
wenet-main/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so: undefined reference to lgammaf@GLIBC_2.23' wenet-main/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so: undefined reference to
lgamma@GLIBC_2.23'
please look at it, thanks!
早上好你叫什么名字去机场要怎么走
识别成:
宝上方你这什欢名次却具残要车
准确率大概25%,是否因为lm没加的缘故
root@e62b3865c7cc:~/data/project/wenet/examples/aishell/s0# ./run.sh
./run.sh: init method is file:///root/data/project/wenet/examples/aishell/s0/exp/sp_spec_aug/ddp_init
wenet/bin/train.py:76: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
configs = yaml.load(fin)
Traceback (most recent call last):
File "wenet/bin/train.py", line 82, in
**configs['spec_aug_conf'],
KeyError: 'spec_aug_conf'
wenet/bin/train.py:76: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
configs = yaml.load(fin)
Traceback (most recent call last):
File "wenet/bin/train.py", line 82, in
**configs['spec_aug_conf'],
KeyError: 'spec_aug_conf'
do model average and final checkpoint is exp/sp_spec_aug/avg_10.pt
Namespace(dst_model='exp/sp_spec_aug/avg_10.pt', max_epoch=65536, min_epoch=0, num=10, src_path='exp/sp_spec_aug', val_best=True)
Traceback (most recent call last):
File "wenet/bin/average_model.py", line 47, in
sort_idx = np.argsort(val_scores[:, -1])
IndexError: too many indices for array
Traceback (most recent call last):
File "wenet/bin/recognize.py", line 81, in
with open(args.config, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/train.yaml'
Traceback (most recent call last):
File "wenet/bin/recognize.py", line 81, in
with open(args.config, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/train.yaml'
Traceback (most recent call last):
File "wenet/bin/recognize.py", line 81, in
with open(args.config, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/train.yaml'
Traceback (most recent call last):
File "wenet/bin/recognize.py", line 81, in
with open(args.config, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/train.yaml'
./run.sh: line 165: python2: command not found
./run.sh: line 165: python2: command not found
./run.sh: line 165: python2: command not found
./run.sh: line 165: python2: command not found
Traceback (most recent call last):
File "wenet/bin/export_jit.py", line 29, in
with open(args.config, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/train.yaml'
The following sed
command in the librispeech s0 recipe switches all instances of dynamic
to static
, which if a train_unified_conformer.yml
recipe is used causes the use_dynamic_chunk: true
to be replaced with use_static_chunk: true
at eval time.
TypeError: init() got an unexpected keyword argument 'use_static_chunk'
this is easily fixed by modifying the above run.sh
line. But I wonder if there is something else going on there that I missed that could still affect the accuracy of the model at training/decode time. It seems to work OK, but I don't find this in the aishell run.sh
for either recipe.
The encoder outputs are fed into decoder entirely, so the encoder-decoder attention attends to the whole sequence. Right? Why not use monotonic attention?
我clone了wenet的最新版本,然后想使用双卡来跑aishell1的demo,但是运行后报如下错误:
单卡是没有问题的,似乎是缺少一个flock的函数?
Traceback (most recent call last):
Traceback (most recent call last):
File "wenet/bin/train.py", line 185, in
File "wenet/bin/train.py", line 185, in
model = torch.nn.parallel.DistributedDataParallel(
File "/dev/conda_py38/envs/wenet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 331, in init
model = torch.nn.parallel.DistributedDataParallel(
File "/dev/conda_py38/envs/wenet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 331, in init
self._distributed_broadcast_coalesced(
File "/dev/conda_py38/envs/wenet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 549, in _distributed_broadcast_coalesced
self._distributed_broadcast_coalesced(
File "/dev/conda_py38/envs/wenet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 549, in _distributed_broadcast_coalesced
dist._broadcast_coalesced(self.process_group, tensors, buffer_size)
RuntimeError: flock: Function not implemented
dist._broadcast_coalesced(self.process_group, tensors, buffer_size)
RuntimeError: flock: Function not implemented
terminate called after throwing an instance of 'std::system_error'
what(): flock: Function not implemented
terminate called after throwing an instance of 'std::system_error'
what(): flock: Function not implemented
使用mask[:, :, :-2:2][:, :, :-2:2]得到的降采样mask和pytorch的卷积公式得到的长度不一致。具体例子如下:
lens = torch.LongTensor([[24], [40], [60], [100]]).cpu()
print(compute_conv_length(compute_conv_length(lens, kernel_size=3, stride=2), kernel_size=3, stride=2))
得到结果:tensor([[ 5],
[ 9],
[14],
[24]])
mask = make_mask_by_length(a, lens).unsqueeze(-2)
new_mask = mask[:, :, :-2:2][:, :, :-2:2]
将布尔值转为int型求和输出
print(torch.sum(new_mask.int(), -1))
tensor([[ 6],
[10],
[15],
[24]])
I followed the instruct line by line, and when bash run.sh --stage 3 --stop-stage 3, it showed no such file or directory, what can I do? Thanks.
Describe the bug
tools/format_data.sh --nj 32 --feat-type wav --feat raw_wav/dev/wav.scp raw_wav/dev data/dict/lang_char.txt
split: illegal option -- -
usage: split [-a sufflen] [-b byte_count] [-l line_count] [-p pattern]
[file [prefix]]
ls: raw_wav/dev/log/wav_.slice: No such file or directory
cat: raw_wav/dev/log/wav_.shape: No such file or directory
tools/format_data.sh --nj 32 --feat-type wav --feat raw_wav/test/wav.scp raw_wav/test data/dict/lang_char.txt
split: illegal option -- -
usage: split [-a sufflen] [-b byte_count] [-l line_count] [-p pattern]
[file [prefix]]
ls: raw_wav/test/log/wav_.slice: No such file or directory
cat: raw_wav/test/log/wav_.shape: No such file or directory
tools/format_data.sh --nj 32 --feat-type wav --feat raw_wav/train/wav.scp raw_wav/train data/dict/lang_char.txt
split: illegal option -- -
usage: split [-a sufflen] [-b byte_count] [-l line_count] [-p pattern]
[file [prefix]]
ls: raw_wav/train/log/wav_.slice: No such file or directory
cat: raw_wav/train/log/wav_.shape: No such file or directory
Desktop (please complete the following information):
pre-trained model : http://mobvoi-speech-public.ufile.ucloud.cn/public/wenet/aishell/20210204_unified_transformer_exp.tar.gz
test data: ai-shell1
K40c is not support torch1.6.0
I have install torch1.3.0, but the following error occurred
site-packages/torch/utils/data/_utils/signal_handling.py", line 737, in _try_get_data raise RuntimeError
RuntimeError: DataLoader worker (pid 133734) is killed by signal: Aborted
感谢分享这么棒的工作
想问一下 你们会分享multi-cn的conformer的pretrain model吗
tools/format_data.sh文件89行
"${trans_type}" == "ch_char_en_bpe" 是否应该写作 "${trans_type}" == "cn_char_en_bpe"
Hi, here is the problem, when I tried to compile the WENET latest android codes you offerred in my machine, and add the final.zip, words.txt file into the directory asked. However, it just crashed ( both the emulator in android studio and my perssonal cellphone) , always after I gave the permission for the privacy of recording. I did some online search try to find out [1] still, sadly all not work. Here are the problems which send back to me.
And my environment is: windows 10, sdk 6.0 - 9.0, Cmake 3.18.1.
E/AndroidRuntime: FATAL EXCEPTION: main
Process: com.mobvoi.wenet, PID: 11753
java.lang.UnsatisfiedLinkError: dalvik.system.PathClassLoader[DexPathList[[zip file "/data/app/com.mobvoi.wenet-VaKqTfUA4p7TCtqlZS2Gtg==/base.apk"],nativeLibraryDirectories=[/data/app/com.mobvoi.wenet-VaKqTfUA4p7TCtqlZS2Gtg==/lib/arm, /data/app/com.mobvoi.wenet-VaKqTfUA4p7TCtqlZS2Gtg==/base.apk!/lib/armeabi-v7a, /system/lib]]] couldn't find "libwenet.so"
at java.lang.Runtime.loadLibrary0(Runtime.java:1012)
at java.lang.System.loadLibrary(System.java:1669)
at com.mobvoi.wenet.Recognize.<clinit>(Recognize.java:6)
at com.mobvoi.wenet.Recognize.init(Native Method)
at com.mobvoi.wenet.MainActivity.onCreate(MainActivity.java:88)
at android.app.Activity.performCreate(Activity.java:7136)
at android.app.Activity.performCreate(Activity.java:7127)
at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1271)
at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2893)
at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:3048)
at android.app.servertransaction.LaunchActivityItem.execute(LaunchActivityItem.java:78)
at android.app.servertransaction.TransactionExecutor.executeCallbacks(TransactionExecutor.java:108)
at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:68)
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1808)
at android.os.Handler.dispatchMessage(Handler.java:106)
at android.os.Looper.loop(Looper.java:193)
at android.app.ActivityThread.main(ActivityThread.java:6669)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:858)
OSError: dlopen(/opt/anaconda3/envs/wenet/lib/python3.8/site-packages/torch/lib/libtorch_global_deps.dylib, 10): Library not loaded: @rpath/libmkl_intel_lp64.dylib
Referenced from: /opt/anaconda3/envs/wenet/lib/python3.8/site-packages/torch/lib/libtorch_global_deps.dylib
Reason: image not found
Hi, Wenet is so amazing!
But I doubt if this model supports English cause we have to tokenize the sentence with " ".
Could I use "<SPACE!>" to represent " " in sentence like "H e l l o <SPACE!> w o r l d" or sth like that ?
Looking forward to your reply.
I notice chunk 1 degradation 30% (from 5.51 to 7.83) in section "Unified Dynamic chunk" ctc greedy search.
Please let me know if my understanding is incorrect:
Chunk 1 delay conversion is 12layers*1chunk*40ms(conv2d) = 480 ms.
Do you support complementary language models, do you plan to? I didn’t notice any examples or related code in the repo.
it would be great if wenet have a parallel inference program to speed up the decoding procedure
Traceback (most recent call last):
File "wenet/bin/train.py", line 209, in
executor.train(model, optimizer, scheduler, train_data_loader, device,
File "/media/dayu/D/nlp/wenet/examples/aishell/s0/wenet/utils/executor.py", line 35, in train
loss, loss_att, loss_ctc = model(feats, feats_lengths, target,
File "/home/dayu/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/dayu/D/nlp/wenet/examples/aishell/s0/wenet/transformer/asr_model.py", line 89, in forward
encoder_out, encoder_mask = self.encoder(speech, speech_lengths)
File "/home/dayu/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/dayu/D/nlp/wenet/examples/aishell/s0/wenet/transformer/encoder.py", line 133, in forward
masks = ~make_pad_mask(xs_lens).unsqueeze(1) # (B, 1, L)
File "/media/dayu/D/nlp/wenet/examples/aishell/s0/wenet/utils/mask.py", line 140, in make_pad_mask
max_len = int(lengths.max().item())
RuntimeError: CUDA error: no kernel image is available for execution on the device
my evn is ubuntu20.04,cuda11.1 torch17.1
I followed the Tutorial of aishell, errors occured in bash run.sh --stage 4 --stop-stage 5
step
run.sh: init method is file:///home/dapeng/PycharmProjects/wenet/examples/aishell/s0/exp/sp_spec_aug/ddp_init
File "wenet/bin/train.py", line 81
collate_func = CollateFunc(**configs['collate_conf'],
^
SyntaxError: invalid syntax
do model average and final checkpoint is exp/sp_spec_aug/avg_10.pt
Traceback (most recent call last):
File "wenet/bin/average_model.py", line 7, in <module>
import yaml
ImportError: No module named yaml
File "wenet/bin/recognize.py", line 87
test_collate_func = CollateFunc(**test_collate_conf, cmvn=args.cmvn)
^
SyntaxError: invalid syntax
File "wenet/bin/recognize.py", line 87
test_collate_func = CollateFunc(**test_collate_conf, cmvn=args.cmvn)
^
SyntaxError: invalid syntax
File " File "wenet/bin/recognize.pywenet/bin/recognize.py", line ", line 8787
test_collate_func = CollateFunc(**test_collate_conf, cmvn=args.cmvn)
test_collate_func = CollateFunc(**test_collate_conf, cmvn=args.cmvn)
^
^
SyntaxErrorSyntaxError: : invalid syntaxinvalid syntax
Traceback (most recent call last):
File "tools/compute-wer.py", line 365, in <module>
with codecs.open(hyp_file, 'r', 'utf-8') as fh:
File "/usr/lib/python2.7/codecs.py", line 898, in open
file = __builtin__.open(filename, mode, buffering)
IOError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/test_ctc_prefix_beam_search/text'
Traceback (most recent call last):
File "tools/compute-wer.py", line 365, in <module>
with codecs.open(hyp_file, 'r', 'utf-8') as fh:
File "/usr/lib/python2.7/codecs.py", line 898, in open
file = __builtin__.open(filename, mode, buffering)
IOError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/test_attention_rescoring/text'
Traceback (most recent call last):
File "tools/compute-wer.py", line 365, in <module>
Traceback (most recent call last):
with codecs.open(hyp_file, 'r', 'utf-8') as fh:
File "tools/compute-wer.py", line 365, in <module>
File "/usr/lib/python2.7/codecs.py", line 898, in open
with codecs.open(hyp_file, 'r', 'utf-8') as fh:
File "/usr/lib/python2.7/codecs.py", line 898, in open
file = __builtin__.open(filename, mode, buffering)
IOError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/test_ctc_greedy_search/text'
file = __builtin__.open(filename, mode, buffering)
IOError: [Errno 2] No such file or directory: 'exp/sp_spec_aug/test_attention/text'
When I run stage 4 of multi_cn run.sh
It gives an error: raw_wav/train/global_cmvn': No such file or directory
In the run.sh line 225, I think the path should be
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.