Giter Club home page Giter Club logo

duj12 / asr-2pass Goto Github PK

View Code? Open in Web Editor NEW
43.0 2.0 6.0 84.37 MB

ASR 2Pass onnxruntime and websocket server, based on FunASR(https://github.com/alibaba-damo-academy/FunASR).

CMake 0.06% C++ 6.29% Python 0.61% HTML 50.28% JavaScript 0.36% Java 0.05% Makefile 0.01% C 6.17% CSS 0.01% Roff 35.90% Shell 0.20% Perl 0.07%
inverse-text-normalization onnxruntime punctuation-restoration speech-recognition streaming-speech-to-text voice-activity-detection websocket

asr-2pass's Introduction

数据切分,转写,筛选流程

src_dir=/path/to/your/src/audio/  # 原始需要清洗的长音频/长视频所在路径
tgt_dir=/path/to/your/tgt/audio   # 最终清洗后的短音频和文本等kaldi格式数据保存路径
bash ./run_seg_asr_filter.sh  $src_dir  $tgt_dir 

便捷转写教程

  1. 启动服务,第一次启动时会自动编译
bash ./run_prepare_server.sh
  1. 在另一个窗口,启动转写
audio_dir=/path/to/your/audios  # 这里需要提供转写音频所在的文件夹,绝对路径。
bash ./run_transcribe_audio.sh $audio_dir

服务部署和使用

0, 首先克隆asr-2pass项目

1, 编译并启动服务

cd asr-2pass/websocket

# the following script will make websocket with onnxruntime when runing at first time. And the libs and models needed will be downloaded.
# the port is default: 10095, you can change it by yourself.
bash ./run_server_2pass.sh  &

2, 启动h5服务

cd ../html5
# you should prepare a python env by yourself.
python h5Server.py  &
# the ip and port should be kept and used in the following step. the port is default: 1337

3, 浏览器中使用ASR服务

在浏览器中粘贴 "https://xxx.xxx.xx.xx:xxxx/static/asr-2pass-demo.html"

把其中的ip和port号替换为上个步骤中得到的地址。如https://192.168.89.53:1337/static/asr-2pass-demo.html

ASR服务地址,填写第一步启动服务时的服务器地址和端口。如 wss://192.168.89.53:10095

启动后页面如下图所示:

4, 其他客户端 见clients, 目前支持cpp, h5, java, python客户端

5, 服务端参数配置

--download-model-dir 模型下载地址,在以下模型路径无法获取的时候,从modelscope下载
--model-dir  非流式识别ASR模型路径
--online-model-dir  流式识别ASR模型路径
--quantize  True为量化ASR模型,False为非量化ASR模型,默认是True
--vad-dir  VAD模型路径
--vad-quant   True为量化VAD模型,False为非量化VAD模型,默认是True
--punc-dir  标点模型路径
--punc-quant   True为量化PUNC模型,False为非量化PUNC模型,默认是True
--itn-model-dir 文本反正则模型的路径
--port  服务端监听的端口号,默认为 10095
--decoder-thread-num  服务端启动的推理线程数,默认为 8,可配置为核数,或者核数的2倍。
--io-thread-num  服务端启动的IO线程数,默认为 1,可以配置为核数的1/4。
--certfile  ssl的证书文件,默认为:../../../ssl_key/server.crt,如需关闭,设置为""
--keyfile   ssl的密钥文件,默认为:../../../ssl_key/server.key,如需关闭,设置为""

websocket通信协议

实时语音识别

系统架构图

从客户端往服务端发送数据

消息格式

配置参数与meta信息用json,音频数据采用bytes

首次通信

message为(需要用json序列化):

{"mode": "2pass", "wav_name": "wav_name", "is_speaking": True, "wav_format":"pcm", "chunk_size":[5,10,5], "audio_fs": 16000}

参数介绍:

`mode`:`offline`,表示推理模式为一句话识别;`online`,表示推理模式为实时语音识别;`2pass`:表示为实时语音识别,并且说话句尾采用离线模型进行纠错。
`wav_name`:表示需要推理音频文件名
`wav_format`:表示音视频文件后缀名,可选pcm、mp3、mp4等(备注,1.0版本只支持pcm音频流)
`is_speaking`:表示断句尾点,例如,vad切割点,或者一条wav结束
`chunk_size`:表示流式模型latency配置,`[5,10,5]`,表示当前音频为600ms,并且回看300ms,后看300ms。chunk中心越大性能越好,但是时延也越高。
`audio_fs`:当输入音频为pcm数据是时,需要加上音频采样率参数

发送音频数据

直接将音频数据,移除头部信息后的bytes数据发送,支持音频采样率为8000(message中需要指定audio_fs为8000),16000

发送结束标志

音频数据发送结束后,需要发送结束标志(需要用json序列化):

{"is_speaking": False}

从服务端往客户端发数据

发送识别结果

message为(采用json序列化)

{"mode": "2pass-online", "wav_name": "wav_name", "text": "asr ouputs", "is_final": True}

参数介绍:

`mode`:表示推理模式,分为`2pass-online`,表示实时识别结果;`2pass-offline`,表示2遍修正识别结果
`wav_name`:表示需要推理音频文件名
`text`:表示语音识别输出文本
`is_final`:表示识别结束
`timestamp`:如果AM为时间戳模型,会返回此字段,表示时间戳,格式为 "[[100,200], [200,500]]"(ms)

输入音频chunk和输出文本的示意图如下:

离线文件转写

从客户端往服务端发送数据

消息格式

配置参数与meta信息用json,音频数据采用bytes

首次通信

message为(需要用json序列化):

{"mode": "offline", "wav_name": "wav_name","wav_format":"pcm","is_speaking": True,"wav_format":"pcm","hotwords":"阿里巴巴 达摩院 阿里云"}

参数介绍:

`mode`:`offline`,表示推理模式为离线文件转写
`wav_name`:表示需要推理音频文件名
`wav_format`:表示音视频文件后缀名,可选pcm、mp3、mp4等
`is_speaking`:False 表示断句尾点,例如,vad切割点,或者一条wav结束
`audio_fs`:当输入音频为pcm数据是,需要加上音频采样率参数
`hotwords`:如果AM为热词模型,需要向服务端发送热词数据,格式为字符串,热词之间用" "分隔,例如 "语音识别 热词 时间戳"

发送音频数据

pcm直接将音频数据,其他格式音频数据,连同头部信息与音视频bytes数据发送,支持多种采样率与音视频格式

发送音频结束标志

音频数据发送结束后,需要发送结束标志(需要用json序列化):

{"is_speaking": False}

从服务端往客户端发数据

发送识别结果

message为(采用json序列化)

{"mode": "offline", "wav_name": "wav_name", "text": "asr ouputs", "is_final": True,"timestamp":"[[100,200], [200,500]]"}

参数介绍:

`mode`:`offline`,表示推理模式为离线文件转写
`wav_name`:表示需要推理音频文件名
`text`:表示语音识别输出文本
`is_final`:表示识别结束
`timestamp`:如果AM为时间戳模型,会返回此字段,表示时间戳,格式为 "[[100,200], [200,500]]"(ms)

asr-2pass's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

asr-2pass's Issues

准备环境时报错

谢谢大佬的工作!在执行第一个命令时报错:
-- Configuring incomplete, errors occurred!
make: Makefile: No such file or directory
make: *** No rule to make target 'Makefile'. Stop.
./run_server_offline.sh: line 51: build/bin/funasr-wss-server: No such file or directory

客户端访问导致服务端直接退出

客户端执行命令:
python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "D:\audios\chat.wav"

服务端启动日志:
~/ASR-2Pass/websocket$ bash ./run_server_2pass.sh
I20240426 18:54:51.956411 17392 funasr-wss-server-2pass.cpp:21] model-dir : damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx
I20240426 18:54:51.956475 17392 funasr-wss-server-2pass.cpp:21] online-model-dir : damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online-onnx
I20240426 18:54:51.956492 17392 funasr-wss-server-2pass.cpp:21] quantize : true
I20240426 18:54:51.956506 17392 funasr-wss-server-2pass.cpp:21] vad-dir : damo/speech_fsmn_vad_zh-cn-16k-common-onnx
I20240426 18:54:51.956511 17392 funasr-wss-server-2pass.cpp:21] vad-quant : true
I20240426 18:54:51.956521 17392 funasr-wss-server-2pass.cpp:21] punc-dir : damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx
I20240426 18:54:51.956533 17392 funasr-wss-server-2pass.cpp:21] punc-quant : true
I20240426 18:54:51.956538 17392 funasr-wss-server-2pass.cpp:21] itn-model-dir : damo/fst_itn_zh
I20240426 18:54:51.956542 17392 funasr-wss-server-2pass.cpp:21] offline-model-revision : v1.2.1
I20240426 18:54:51.956544 17392 funasr-wss-server-2pass.cpp:21] online-model-revision : v1.0.6
I20240426 18:54:51.956547 17392 funasr-wss-server-2pass.cpp:21] vad-revision : v1.2.0
I20240426 18:54:51.956558 17392 funasr-wss-server-2pass.cpp:21] punc-revision : v1.0.2
I20240426 18:54:51.956576 17392 funasr-wss-server-2pass.cpp:181] Download model: damo/speech_fsmn_vad_zh-cn-16k-common-onnx from modelscope:
I20240426 18:54:51.956583 17392 funasr-wss-server-2pass.cpp:207] Set vad-dir : models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx
I20240426 18:54:51.956620 17392 funasr-wss-server-2pass.cpp:246] Download model: damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx from modelscope :
I20240426 18:54:51.956635 17392 funasr-wss-server-2pass.cpp:271] Set model-dir : models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx
I20240426 18:54:51.956648 17392 funasr-wss-server-2pass.cpp:290] Download model: damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online-onnx from modelscope :
I20240426 18:54:51.956656 17392 funasr-wss-server-2pass.cpp:315] Set online-model-dir : models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online-onnx
I20240426 18:54:51.956660 17392 funasr-wss-server-2pass.cpp:334] Download model: damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx from modelscope :
I20240426 18:54:51.956673 17392 funasr-wss-server-2pass.cpp:359] Set punc-dir : models/damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx
I20240426 18:54:51.956681 17392 funasr-wss-server-2pass.cpp:382] Set itn-model-dir : models/damo/fst_itn_zh
I20240426 18:54:51.957080 17392 funasr-wss-server-2pass.cpp:424] SSL is opened!
certfile path is ../ssl_key/server.crt
I20240426 18:54:51.957162 17392 websocket-server-2pass.cpp:25] on_tls_init called with hdl: 0x557ceea483a0
I20240426 18:54:51.957180 17392 websocket-server-2pass.cpp:26] using TLS mode: Mozilla Intermediate
I20240426 18:54:51.957898 17392 tpass-stream.cpp:23] VAD model file is not exist, skip load vad model.
I20240426 18:54:52.426906 17392 paraformer.cpp:93] Successfully load model from models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online-onnx/model_quant.onnx
I20240426 18:54:52.540336 17392 paraformer.cpp:101] Successfully load model from models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online-onnx/decoder_quant.onnx
I20240426 18:54:53.185487 17392 paraformer.cpp:156] Successfully load model from models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx/model_quant.onnx
I20240426 18:54:53.347113 17392 ct-transformer-online.cpp:21] Successfully load model from models/damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx/model_quant.onnx
I20240426 18:54:53.354259 17392 websocket-server-2pass.cpp:479] initAsr run check_and_clean_connection
I20240426 18:54:53.354368 17392 websocket-server-2pass.cpp:482] initAsr run check_and_clean_connection finished
asr model init finished. listen on port:10095
I20240426 18:55:17.533152 17422 websocket-server-2pass.cpp:25] on_tls_init called with hdl: 0x7fb074010c30
I20240426 18:55:17.533193 17422 websocket-server-2pass.cpp:26] using TLS mode: Mozilla Intermediate
I20240426 18:55:17.550241 17425 websocket-server-2pass.cpp:365] hotwords:
E20240426 18:55:17.550359 17425 tpass-online-stream.cpp:10] vad_handle is null

报 VAD model file is not exist, skip load vad model,和 vad_handle is null,我看vad的目录设置为models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx,并且下面有如下文件(来自https://www.modelscope.cn/models/iic/speech_fsmn_vad_zh-cn-16k-common-onnx/files):

image

同时上面的截图也没有precomp.h定义的文件:
#define VAD_CMVN_NAME "vad.mvn"
#define VAD_CONFIG_NAME "vad.yaml"

尝试把目录中的am.mvn和config.ymal改为vad.mvn,vad.yaml启动报错,定义的这两个文件是那里的文件?
I20240426 18:51:17.395659 17350 funasr-wss-server-2pass.cpp:382] Set itn-model-dir : models/damo/fst_itn_zh
I20240426 18:51:17.396056 17350 funasr-wss-server-2pass.cpp:424] SSL is opened!
certfile path is ../ssl_key/server.crt
I20240426 18:51:17.396142 17350 websocket-server-2pass.cpp:25] on_tls_init called with hdl: 0x55e0fe9d73a0
I20240426 18:51:17.396157 17350 websocket-server-2pass.cpp:26] using TLS mode: Mozilla Intermediate
I20240426 18:51:17.408376 17350 fsmn-vad.cpp:58] Successfully load model from models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx/model_quant.onnx
E20240426 18:51:17.419442 17350 fsmn-vad.cpp:49] Error when load argument from vad config YAML.

docker部署

大佬,我在尝试在服务器用docker部署,但是我是一个新手,我也在dockerhub上看到了您的镜像,但是我却不会使用和部署,我在powershell上执行了这样的代码"docker run -dit -p 10095:10095 -p 1337:1337 --net=host 48f3d531e19ae36104fc4d7095db7bf01b0086eeb83136efed715cb2a306d999 /bin/bash"我这样执行是因为我正常docker run 的时候,他总是自己结束,还有在我试图在浏览器上访问“https://localhost/static/asr-2pass-demo.html”也总是失败,它说拒绝了我的连接请求,我也是一头雾水,我弄了好长时间也是没有解决,我就特意来问问大佬,我该怎么办?以及我在dockerhub上pull的image该怎么使用

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.