Comments (8)
感謝nl8590687對於製作ASR服務的意見與提點,對於TF-lite,模型只要可以轉換,在android有支援的版本下有硬體加速。我認為coreML也是如此(我還沒有了解此部分)。
我已經實踐了在Android上使用ASRT訓練的模型(轉為TF-lite)進行推理。
https://github.com/Evanston0624/ASRT_model_Android/tree/main
README 稍晚會創建,主要代碼在:
https://github.com/Evanston0624/ASRT_model_Android/tree/main/app/src/main/java/com/example/myapplication
上述的庫主要實踐了:
- 數據前處理(載入音訊的格式>頻譜>padding)
- 載入模型
- 調用模型取得輸出
- 編寫了一個ctc_decode(此部分跟ASRT調用的Keras的ctc_decoder不同,但我在小樣本下測試結果沒問題)
**在前三個階段在小樣本時的輸出數值跟python上相同。
**對於buffer的資料流傳遞可能存在問題,防呆可能也不完善。
尚未實踐
- 將phoneme轉為詞彙
- 運行效能測試
將ASRT訓練的模型轉為TF-lite的代碼後續會補上
from asrt_speechrecognition.
補充:
相同的模型使用predict_speech_file.py是可以正確預測輸出的
import os
from speech_model import ModelSpeech
from model_zoo.speech_model.keras_backend import SpeechModel251BN
from speech_features import Spectrogram
from language_model3 import ModelLanguage
os.environ["CUDA_VISIBLE_DEVICES"] = ""
AUDIO_LENGTH = 1600
AUDIO_FEATURE_LENGTH = 200
CHANNELS = 1
# 默认输出的拼音的表示大小是1428,即1427个拼音+1个空白块
OUTPUT_SIZE = 1431
sm251bn = SpeechModel251BN(
input_shape=(AUDIO_LENGTH, AUDIO_FEATURE_LENGTH, CHANNELS),
output_size=OUTPUT_SIZE
)
feat = Spectrogram()
ms = ModelSpeech(sm251bn, feat, max_label_length=64)
now_path = os.path.abspath(os.getcwd())
ms.load_model(now_path+'/save_models/SpeechModel251bn_cv/' + 'SpeechModel251bn_epoch40.model.base.h5')
res = ms.recognize_speech_from_file('test1.wav')
print('*[提示] 声学模型语音识别结果:\n', res)
from asrt_speechrecognition.
更新:
我透過原始的from utils.ops import read_wav_data來讀取音檔就可以了
def load_audio(audio_path):
from utils.ops import read_wav_data
wav_signal, sample_rate, _, _ = read_wav_data(audio_path)
return wav_signal, sample_rate
轉頻譜的部分目前改回原本Spectrogram類下的run
# load audio
from speech_features import Spectrogram
data_pre = Spectrogram()
# 使用函數直接從音訊檔案中加載音訊數據並轉換為所需的格式
audio_path = 'test1.wav' # 替換為你的音訊檔案路徑
wav_signal, sample_rate = load_audio(audio_path)
# audio pre
# audio_features = data_pre.onnx_run(wavsignal=wav_signal, fs=sample_rate)
audio_features = data_pre.run(wavsignal=wav_signal, fs=sample_rate)
audio_features = adaptive_padding(input_data=audio_features, target_length=1600)
from asrt_speechrecognition.
不建议直接在手机端运行,否则计算性能和依赖环境的安装配置都较为复杂,最佳方案是模型部署于服务器,手机通过API接口调用。具体讲解可以看AI柠檬博客相关文章。
from asrt_speechrecognition.
如果实在要在手机端部署也可以,那就需要你自行用对应平台支持的框架重写一遍推理能力了
from asrt_speechrecognition.
您好,我們的服務器在多用戶調用時的響應速度與不如預期,後續我有自己編寫一套透過socket的TCP+UDP實踐註冊與傳遞語音包的多進程程序,但在多用戶組的情況下響應也是不如預期。(上述問題可能是存在我們的硬體配置或網路等)
因此我想透過ONNX與TF-Lite來實現移動設備推理,我剛剛實際測試已經可以生成結果了,稍晚會把代碼發上來(python的測試代碼)。後續應該會使用java開發app程序,那這部分的工作應該如下:
- 讀取音檔
- 轉頻譜
- tf-lite模型推理
- onnx模型推理
- ctc推理
我認為如果可以確定python的數據格式與java上的差異,應該可以正確運行
from asrt_speechrecognition.
单进程因为只有一个计算图资源,多用户并发调用响应速度慢是很正常的,你需要做的是多实例集群部署,负载均衡,而不是单纯的改通信协议。AI模型部署本身就是很耗费资源的。
from asrt_speechrecognition.
Related Issues (20)
- 数据集可以只采用thchs30进行训练和预测吗?
- 修改成支持英文识别的问题 HOT 1
- h5文件转tflite出错 HOT 1
- pip package conflict caused by protobuf==3.19.6 and grpcio-tools HOT 3
- Error with CUDA_ERROR_ILLEGAL_ADDRESS HOT 7
- 训练模型时出错 HOT 2
- 怎么能识别中英文混合的语音?
- No such file or directory(训练每次出现的缺失wav文件还不一样) HOT 2
- 可以提供麦克风的示例不 HOT 1
- ValueError: Expect x to be a non-empty array or dataset. HOT 2
- ARM64 的支持 HOT 1
- 有训练好的模型权重文件下载吗
- download_default_datalist 时出现 502 Bad Gateway HOT 1
- 请问,电脑安装不了cuda和cdnn的话,可以用服务器来代替吗?然后移除那部分的代码可以吗? HOT 1
- could not broadcast input array from shape (1043793,200,1) into shape (1600,200,1) HOT 1
- 模型问题
- 命令行应该去掉前面第一个/符号
- 参考引用本项目 HOT 1
- 文件找不到 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from asrt_speechrecognition.