ibm / train-custom-speech-model Goto Github PK

View Code? Open in Web Editor NEW

58.0 21.0 45.0 1.25 MB

Create a custom Watson Speech to Text model using specialized domain data

Home Page: https://developer.ibm.com/patterns/customize-and-continuously-train-your-own-watson-speech-service/

License: Apache License 2.0

TypeScript 27.39% HTML 1.31% CSS 4.99% JavaScript 50.75% Python 15.37% sed 0.19%

ibm-cloud watson-services nodejs reactjs ibmcode watson-speech-to-text

train-custom-speech-model's Introduction

Create a custom Watson Speech to Text model using specialized domain data

In this code pattern, we will create a custom speech to text model. The Watson Speech to Text service is among the best in the industry. However, like other Cloud speech services, it was trained with general conversational speech for general use; therefore it may not perform well in specialized domains such as medicine, law, sports, etc. To improve the accuracy of the speech-to-text service, you can leverage transfer learning by training the existing AI model with new data from your domain.

In this example, we will use a medical speech data set to illustrate the process. The data is provided by ezDI and includes 16 hours of medical dictation in both audio and text files.

When the reader has completed this code pattern, they will understand how to:

Prepare audio data and transcription text for training a speech-to-text model.
Work with the Watson Speech to Text service through API calls.
Train a custom speech-to-text model with a data set.
Enhance the model with continuous user feedback.

Flow

The user downloads the custom medical dictation data set from ezDI and prepares the audio and text data for training.
The user interacts with the Watson Speech to Text service via the provided application UI or by executing command line Python scripts.
The user requests the custom data be used to create and train a language and acoustic Watson Speech to Text model.
The user interactively tests the new custom model by submitting audio files and verifying the text transcription returned from the model.
If the text transcription is not correct, the user can make corrections and resubmit the updated data for additional training.
Several users can work on the same custom model at the same time.

Included components

IBM Watson Speech to Text: easily convert audio and voice into written text for quick understanding of content.

Featured technologies

Node.js: An open-source JavaScript run-time environment for executing server-side JavaScript code.
React: A JavaScript library for building User Interfaces.
Watson Speech recognition: Advanced models for processing audio signals and language context can accurately transcribe spoken voice into text.
Watson Speech customization: Ability to further train the model to improve the accuracy for your special domain.
AI in medical services: Save time for medical care providers by automating tasks such as entering data into Electronic Medical Record.

Watch the Video

Steps

Clone the repo
Create IBM Cloud services
Configure credentials
Download and prepare the data
Train the models
Transcribe your dictation
Correct the transcription

1. Clone the repo

git clone https://github.com/IBM/Train-Custom-Speech-Model

2. Create IBM Cloud services

Create the following services:

Watson Speech To Text

Note: In order to perform customization, you will need to select the Standard paid plan.

3. Configure credentials

From your Watson Speech to Text service instance, select the Service Credentials tab.

If no credentials exist, select the New Credential button to create a new set of credentials.

Save off the apikey and url values as they will be needed in future steps.

4. Download and prepare the data

Download the ezDI Medical Dictation Dataset which is a zip file containing both the audio and text files.

Extract the zip file, moving the Documents and Audio directories into the data directory located at the root of this project.

The structure should look like:

Train-Custom-Speech-Model
  |__ data
      |__ Audio
      |     |__ 1.wav
      |     |__ ...
      |__ Documents
            |__ 1.rtf
            |__ ...

The transcription files stored in the Documents directory will be in rtf format, and need to be converted to plain text. You can use the convert_rtf.py Python script to convert them all to txt files. Run the following code block from the data directory to create a virtual environment, install dependencies, and run the conversion script. Note, you must have Python 3.

python3 -m venv .venv
source .venv/bin/activate
pip install striprtf
python convert_rtf.py

The data needs careful preparation since our deep learning model will only be as good as the data used in the training. Preparation may include steps such as removing erroneous words in the text, bad audio recordings, etc. These steps are typically very time-consuming when dealing with large datasets.

Although the dataset from ezDI is already curated, a quick scan of the text transcription files will reveal some filler text that would not help the training. These unwanted text strings have been collected in the file data/fixup.sed and can be removed from the text files by using the sed utility.

Also, for the purpose of training, we will need to combine all text files into a single package, called a corpus file.

To remove the unwanted text strings and to combine all of the text files into a single corpus file, perform the following command:

sed -f fixup.sed Documents/*.txt > corpus-1.txt

For the audio files, we can archive them as zip or tar files. Since the Watson Speech to Text API has a limit of 100MB per archive file, we will need to split up the audio files into 3 zip files. We will also set aside the first 5 audio files for testing.

zip audio-set1.zip -xi Audio/[6-9].wav Audio/[1-7][0-9].wav
zip audio-set2.zip -xi Audio/[8-9][0-9].wav Audio/1[0-6][0-9].wav
zip audio-set3.zip -xi Audio/1[7-9][0-9].wav Audio/2[0-4][0-9].wav

5. Train the models

To train the language and acoustic models, you can either run the application or use the command line interface. Or you can mix as desired, since both are working with the same data files and services.

a. Run the application

The application is a nodejs web service running locally with a GUI implemented in React.

Install Node.js runtime or NPM.

To allow the web service to connect to your Watson Speech to Text service, create in the root directory a file named services.json by copying the sample file services.sample.json. Update the apikey and 'url' fields in the newly created file with your own values that were retrieved in Step 3.

{
  "services": {
    "code-pattern-custom-language-model": [
      {
        "credentials": {
          "apikey": "<your api key>",
          "url": "<your api url>"
        },
        "label": "speech_to_text",
        "name": "code-pattern-custom-language-model"
      }
    ]
  }
}

The application will require a local login. The local user accounts are defined in the file model/user.json. The pre-defined user/passwords are user1/user1 and user2/user2. The langModel and acousticModel fields are the names of your custom language and acoustic models which will be created upon logging in if they do not already exist. You can change the baseModel field if the base model you are working with is different from our default. Here is an example of user3 using Korean as base language for transcribing. See Supported language models.

{
	"user3": {
		"password": "user3",
		"langModel": "custom-korean-language",
		"acousticModel": "custom-korean-acoustic",
		"baseModel": "ko-KR_NarrowbandModel"
	}
}

Install and start the application by running the following commands in the root directory:

npm install
npm run dev

The local nodejs web server will automatically open your browser to http://localhost:3000.

Before training the model, you must add the corpus and audio files. The files can be uploaded using the panels displayed in the Corpora and Audio tabs of the application UI.

Then select the Train tab to show the training options. Train both the Language Model and Acoustic Model.

Note: Training the acoustic model can potentially take hours to complete.

b. Use the Command Line interface

If you prefer to use the command line, set the following environment variables. Update the <your-iam-api-key> and <url> values with the values retrieved in Step 3.

export USERNAME=apikey
export PASSWORD=<your-iam-api-key>
export STT_ENDPOINT=<your-url>

To keep all of the generated data files in the proper directory, set the current directory to data before executing any of the following commands:

cd data

Note: For a more detailed description of the available commands, see the README located in the cmd directory.

Install dependencies

The Python scripts use the package requests. If you don't have it already, install it with:

pip install requests

Train the language model

To create your custom language model and your corpus of medical dictation, run:

python ../cmd/create_language_model.py "custom-model-1"

Note that we are naming our acoustic model "custom-model-1", just to be consistent with the default name that will be used by the application if logged in as user1.

This script will return the ID of your custom model. Use it to set the following environment variable:

export LANGUAGE_ID=<id_for_your_model>

Note: You can also obtain the ID by using the following command:
python ../cmd/list_language_model.py

The custom model will stay in the "pending" state until a corpus of text is added. Add the medical transcription file we created in an earlier step.

python ../cmd/add_corpus.py corpus-1.txt
python ../cmd/list_corpus.py

This step will also save a new list of Out-Of-Vocabulary words in a file (the file will be created in current directory and will end in OOVs.corpus). Out-Of-Vocabulary words are words that are not a part of the basic Watson Speech-to-Text service, but will be added and used to train the language model. It may be useful to check the words in the file to see if there are any unexpected words that you don't want to train the model with.

The status of the custom language model should now be set to "ready". Now we can train the language model using the medical transcription.

python ../cmd/train_language_model.py

Training is asynchronous and may take some time depending on the system workload. You can check for completion with cmd/list_language_model.py. When training is complete, the status will change from "training" to "available".

Train the acoustic model

Create the custom acoustic model based on the custom language model.

Note: Since the audio files are sampled at the 8Khz rate, we will need to create a narrow band model, which is coded in the create_acoustic_model.py python script.

python ../cmd/create_acoustic_model.py "acoustic-model-1"

Note that we are naming our acoustic model "acoustic-model-1", just to be consistent with the default name that will be used by the application if logged in as user1.

This script will return the ID of your custom acoustic model. Use it to set the following environment variable:

export ACOUSTIC_ID=<id_for_your_model>

The custom acoustic model will be in the "pending" state until some audio data is added. Add the 3 zip files containing the audio clips with the following commands:

python ../cmd/add_audio.py audio-set1.zip
python ../cmd/add_audio.py audio-set2.zip
python ../cmd/add_audio.py audio-set3.zip
python ../cmd/list_audio.py

Note: it may take some time to process each audio file. If processing is not completed yet, the command will return a 409 error message; in this case, simply retry later.

When the status of the custom acoustic model is set to "ready", you can start the training by running:

python ../cmd/train_acoustic_model.py

Training the acoustic model is asynchronous and can potentially take hours to complete. To determine when training is completed, you can query the model and check if the status has changed from "training" to "available".

python ../cmd/list_acoustic_model.py

6. Transcribe your dictation

To try out the model, either create your own recorded medical dictation in wav format (use 8KHz sampling rate), or use one of the first 5 test wav files located in /data/Audio (remember, we left those out of the data set used to train the model).

If running the application, click on the Transcribe tab and then browse to your wav file. You can select any combination of base or custom model for language and acoustic. Using custom model for both should give the best result.

If using the command line, enter the following:

python ../cmd/transcribe.py <my_dictation.wav>

Similarly to the application, you can set or unset the environment variables LANGUAGE_ID and ACOUSTIC_ID to select any combination of base or custom model for language and acoustic. If the corresponding variable is unset, the base model will be used. The transcription will be displayed on the terminal as well as written to a file with the same name as the audio file but with the file extension .transcript.

7. Correct the transcription

If you detect errors in the transcribed text, you can re-train the models by submitting corrected transcriptions.

If using the application, from the Transcribe panel, correct the transribed text.

If the audio file being transcribed is not already included in the acoustic model, check the Add audio file to acoustic model checkbox.

Enter a corpus name, and hit Submit.

The language and acoustic models will be re-trained with the new files.

If using the command line, you can directly edit the transcription output file generated in the previous step. You can then add the corrected text as a new corpus, and add the audio file as a new audio source.

Note: If correcting multiple transcriptions, it will be more efficient to aggregate the corrected text files and audio clips before re-training the models. (See Step #4 for examples on how to aggregate the files, and Step #5 for how to re-train the models using the command line)

Sample output

The main GUI screen:

Status for training of the models:

List of "Out of Vocabulary" words determined during training:

Note: These are the words that are not a part of the base Watson Speech to Text service, but will be added to the language model.

Troubleshooting

Error: Please set your username in the environment variable USERNAME. If you use IAM service credentials, set USERNAME set to the string "apikey" and set PASSWORD to the value of your IAM API key.

If you choose to use the command line, make sure you set up your environment variables.
409 error message.

This indicates the service is busy. Try the command again later.
Error uploading the audio files:

Since the audio files are large (70-90MB), you may encounter error when uploading them because of unstable network connection. In this case, you can break up the files into smaller files and upload them. The training for the acoustic model will work the same way.

For example, the command to zip the first audio file as described above:
```
zip audio-set1.zip -xi Audio/[6-9].wav Audio/[1-7][0-9].wav
```
To break up into two smaller files, adjust the regular expression as appropriate:
```
zip audio-set1a.zip -xi Audio/[6-9].wav Audio/[1-3][0-9].wav
zip audio-set1b.zip -xi Audio/[4-7][0-9].wav
```

Deploy on IBM Cloud

Instructions for deploying the web application on Cloud Foundry can be found here.

Learn more

Artificial Intelligence Code Patterns: Enjoyed this Code Pattern? Check out our other AI Code Patterns
AI and Data Code Pattern Playlist: Bookmark our playlist with all of our Code Pattern videos
With Watson: Want to take your Watson app to the next level? Looking to utilize Watson Brand assets? Join the With Watson program to leverage exclusive brand, marketing, and tech resources to amplify and accelerate your Watson embedded commercial solution.

License

This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.

Apache Software License (ASL) FAQ

train-custom-speech-model's People

Contributors

Stargazers

Watchers

train-custom-speech-model's Issues

is possible for spanish?

Error with audio training, Unhandled Rejection (TypeError): data.data is undefined

Audio training threw this error 76 times:

Unhandled Rejection (TypeError): data.data is undefined
_callee6$/</<
C:/Users/x/Desktop/Train-Custom-Speech-Model-master/client/src/pages/Train.js:180

177 | .then((response) => {
178 | response.json().then((data) => {
179 | this.setState({ acousticModelData: data.data });

180 | let isNotActive = this.checkModelStatusDone(data.data.status);
| ^ 181 | // If polling and if the model is no longer in an active state, stop
182 | // polling.
183 | if (isNotActive && poll) {

Compiled version:

_callee6$/</<
http://localhost:3000/static/js/main.chunk.js:3886:37

3883 | acousticModelData: data.data
3884 | });
3885 |

3886 | var isNotActive = _this.checkModelStatusDone(data.data.status); // If polling and if the model is no longer in an active state, stop
| ^ 3887 | // polling.
3888 |
3889 |

Not able to install packages

Why i am causing this error!
Can anyone tell me which version of node it is comitable

Thanks

Update service url in step 5a

Step 5a should also mention that the url might need to change depending on where the STT service is running.

Corpora parsing should have each sentence on its own line

The Watson STT documentation specifies:

"Include each sentence of the corpus on its own line, and terminate each line with a carriage return. Including multiple sentences on the same line can degrade accuracy."

The data parser for all the txt files should ensure that the resulting output has each sentence on a new line.

https://cloud.ibm.com/docs/services/speech-to-text?topic=speech-to-text-corporaWords#prepareCorpus

Language Model Error Error initializing the training: failed to train the model

I successfully trained the sample corpus and audio.
I then deleted those files.
I uploaded my own set of text files and mp3 files successfully.
However, when I try to train either the corpus or audio model (which both say ready), it says:
Language Model Error
Error initializing the training: failed to train the model

Does the system first need to be cleared in some way since I deleted the old sample text and audio files. How do we run this on a new set of data? Thank you.

create language model fails

cmd/create_language_model.py fails with following message:

Please set your username in the environment variable USERNAME

This is due to the apikey issue which needs to be consistently used. services.json doesn't use USERNAME and PASSWORD, so all of the python scripts need to be modified to accept apikey.

Error while uploading text corpus and audio zip files

I have installed all the dependencies and prepared the data as asked in the instructions. Where am I going wrong here ?

Custom Speech Recognition Model

How can I build a speech recognition model in Python using the ezDi dataset without IBM STT?
My aim is to build a speech recognition model that can be able to recognize medical terms with more efficiency and at the same time using the normal day to day conversation language model
How can I build it from scratch in Python?

Python code only works if service location is Dallas

Couple suggestions here:

Explicitly mention in the README's instructions that the service location is required to be Dallas
Provide a service location to uri mapping (would likely unknowingly become outdated)

From the README, step number 5:

To train the language and acoustic models, you can either run the application or use the command line interface. Or you can mix as desired, since both are working with the same data files and services.

The above is simply not accurate if the service location was set to something other than Dallas. I suggest some thought be put into that statement to more accurately reflect the fact that the location has been hardcoded in the Python source.

Add error alerts to GUI when API call fails.

Only the transcribe page has the dismissible alert which shows the user if an error has occurred. For a better user experience, we need to have these everywhere.

The main ones to cover are:

Transcribe Submission
Upload Corpus
List Corpus
Upload Audio
List Audio
List Words
Train Language Model
Train Acoustic Model

Login Error: Could not authenticate: Failed to fetch [http://localhost:3000/login]

npm run dev opens the local web server but cannot authenticate login with user1 as username and password.

I can't get this installed

I can't install it, I get a lot of errors:

Please tell me which version of node.js this works with and if it works at all?

Login fails when not from localhost:3000

When connecting at localhost:3000, everything is fine. But when using machine's IP address:3000, even from the same machine (and also using 127.0.0.1:3000), login doesn't work.
Seems to be a problem with CORS. Message displayed by safari:

Could not authenticate: Origin http://192.168.1.14:3000 is not allowed by Access-Control-Allow-Origin.

Message in console:

[0] [2019-11-07T07:32:21.653Z]  WARN: express/21655 on fpsmacbook.home: ::1 <-- GET /api/user HTTP/1.1 401 49 http://192.168.1.14:3000/ Safari 13.0 Mac OS X 10.15.1 2.639276 ms (req_id=52f4d1ad-0fe9-4299-a05c-4ad5d5860ab1, remote-address=::1, ip=::1, method=GET, url=/api/user, referer=http://192.168.1.14:3000/, body={}, short-body={}, http-version=1.1, response-time=2.639276, status-code=401, incoming=<--)

(Using mac OSX latest release)

Don't know whether this is related, but I have several errors displayed in the console when starting:

> [email protected] dev /Users/fps/_fps/GitRepositories/Train-Custom-Speech-Model
> npm run build && concurrently --kill-others-on-fail "npm run start" "npm run client"


> [email protected] build /Users/fps/_fps/GitRepositories/Train-Custom-Speech-Model
> npm run build-client && tsc


> [email protected] build-client /Users/fps/_fps/GitRepositories/Train-Custom-Speech-Model
> cd client && npm install


> [email protected] install /Users/fps/_fps/GitRepositories/Train-Custom-Speech-Model/client/node_modules/fsevents
> node install

node-pre-gyp WARN Tried to download(404): https://fsevents-binaries.s3-us-west-2.amazonaws.com/v1.2.4/fse-v1.2.4-node-v72-darwin-x64.tar.gz 
node-pre-gyp WARN Pre-built binaries not found for [email protected] and [email protected] (node-v72 ABI, unknown) (falling back to source compile with node-gyp) 
No receipt for 'com.apple.pkg.CLTools_Executables' found at '/'.

No receipt for 'com.apple.pkg.DeveloperToolsCLILeo' found at '/'.

No receipt for 'com.apple.pkg.DeveloperToolsCLI' found at '/'.

gyp: No Xcode or CLT version detected!
gyp ERR! configure error 
gyp ERR! stack Error: `gyp` failed with exit code: 1
gyp ERR! stack     at ChildProcess.onCpExit (/usr/local/lib/node_modules/npm/node_modules/node-gyp/lib/configure.js:351:16)
gyp ERR! stack     at ChildProcess.emit (events.js:210:5)
gyp ERR! stack     at Process.ChildProcess._handle.onexit (internal/child_process.js:272:12)
gyp ERR! System Darwin 19.0.0
gyp ERR! command "/usr/local/bin/node" "/usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "configure" "--fallback-to-build" "--module=/Users/fps/_fps/GitRepositories/Train-Custom-Speech-Model/client/node_modules/fsevents/lib/binding/Release/node-v72-darwin-x64/fse.node" "--module_name=fse" "--module_path=/Users/fps/_fps/GitRepositories/Train-Custom-Speech-Model/client/node_modules/fsevents/lib/binding/Release/node-v72-darwin-x64" "--napi_version=5" "--node_abi_napi=napi"
gyp ERR! cwd /Users/fps/_fps/GitRepositories/Train-Custom-Speech-Model/client/node_modules/fsevents
gyp ERR! node -v v12.13.0
gyp ERR! node-gyp -v v5.0.5
gyp ERR! not ok 
node-pre-gyp ERR! build error 
node-pre-gyp ERR! stack Error: Failed to execute '/usr/local/bin/node /usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js configure --fallback-to-build --module=/Users/fps/_fps/GitRepositories/Train-Custom-Speech-Model/client/node_modules/fsevents/lib/binding/Release/node-v72-darwin-x64/fse.node --module_name=fse --module_path=/Users/fps/_fps/GitRepositories/Train-Custom-Speech-Model/client/node_modules/fsevents/lib/binding/Release/node-v72-darwin-x64 --napi_version=5 --node_abi_napi=napi' (1)
node-pre-gyp ERR! stack     at ChildProcess.<anonymous> (/Users/fps/_fps/GitRepositories/Train-Custom-Speech-Model/client/node_modules/fsevents/node_modules/node-pre-gyp/lib/util/compile.js:83:29)
node-pre-gyp ERR! stack     at ChildProcess.emit (events.js:210:5)
node-pre-gyp ERR! stack     at maybeClose (internal/child_process.js:1021:16)
node-pre-gyp ERR! stack     at Process.ChildProcess._handle.onexit (internal/child_process.js:283:5)
node-pre-gyp ERR! System Darwin 19.0.0
node-pre-gyp ERR! command "/usr/local/bin/node" "/Users/fps/_fps/GitRepositories/Train-Custom-Speech-Model/client/node_modules/fsevents/node_modules/node-pre-gyp/bin/node-pre-gyp" "install" "--fallback-to-build"
node-pre-gyp ERR! cwd /Users/fps/_fps/GitRepositories/Train-Custom-Speech-Model/client/node_modules/fsevents
node-pre-gyp ERR! node -v v12.13.0
node-pre-gyp ERR! node-pre-gyp -v v0.10.0
node-pre-gyp ERR! not ok 
Failed to execute '/usr/local/bin/node /usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js configure --fallback-to-build --module=/Users/fps/_fps/GitRepositories/Train-Custom-Speech-Model/client/node_modules/fsevents/lib/binding/Release/node-v72-darwin-x64/fse.node --module_name=fse --module_path=/Users/fps/_fps/GitRepositories/Train-Custom-Speech-Model/client/node_modules/fsevents/lib/binding/Release/node-v72-darwin-x64 --napi_version=5 --node_abi_napi=napi' (1)
npm WARN optional SKIPPING OPTIONAL DEPENDENCY: [email protected] (node_modules/fsevents):
npm WARN optional SKIPPING OPTIONAL DEPENDENCY: [email protected] install: `node install`
npm WARN optional SKIPPING OPTIONAL DEPENDENCY: Exit status 1

audited 37839 packages in 12.813s
found 69 vulnerabilities (63 low, 6 high)
  run `npm audit fix` to fix them, or `npm audit` for details
[0] 
[0] > [email protected] start /Users/fps/_fps/GitRepositories/Train-Custom-Speech-Model
[0] > npm run serve
[0] 
[1] 
[1] > [email protected] client /Users/fps/_fps/GitRepositories/Train-Custom-Speech-Model
[1] > cd client && npm run start
[1] 
[1] 
[1] > [email protected] start /Users/fps/_fps/GitRepositories/Train-Custom-Speech-Model/client
[1] > react-scripts start
[1] 
[0] 
[0] > [email protected] serve /Users/fps/_fps/GitRepositories/Train-Custom-Speech-Model
[0] > node dist/server.js | bunyan
[0] 
[0]   App is running at http://localhost:5000       in development mode
[0]   Press CTRL-C to stop
[1] Starting the development server...
[1] 
[1] Compiled successfully!
[1] 
[1] You can now view speech-app in the browser.
[1] 
[1]   Local:            http://localhost:3000/
[1]   On Your Network:  http://192.168.1.14:3000/
[1] 
[1] Note that the development build is not optimized.
[1] To create a production build, use npm run build.

EZdi dataset

Trying to create a custom STT for medical dictation. I was trying to get the dataset from EZdi but is no longer available.
https://www.ezdi.com/open-datasets/

Can anyone provide a copy of it or recommend another source?

Can't transcribe long audio(.mp3)

The app deployed on IBM cloud can't transcribe long audio (longer than approx. 1min Japanese mp3).

For example, this(https://www.dropbox.com/s/icactglkmwbpg08/rd117.mp3?dl=0) one. It is about 7 min. The audio is played, but nothing appears in the text box.

Any ideas?

readme issues

Step 1. Create Watson Service

the description doesn't match the contents of the latest services.sample.json file. USERNAME and PASSWORD are not found in the file.

Step 2. Set up your code

the newly created file "services.json" needs to be in root, not 'client'.
need to do 'npm istall at root level also.
I did not need to launch browser, it came up automatically

How to use the custom trained model in the IBM cloud as my default STT model?

I followed the steps provided in read me and successfully trained the model. Now, I want to use this newly trained model as my default model in my IBM cloud instance and use it as a Speech to Text service. How can I do that? This information is not available in the Readme.

Is the local web app directly training the base model in the cloud?

The reason I want to do this is: the base model is not being able to identify some of my domain specific words.

Custom words and grammars

Do you have any plans to provide training the custom words and grammars for this github?

Firstly, I didn't know that just uploading the corpora would generate the custom words automatically.

But when I read the api reference, we can train custom words by adding "sounds_like" and "display_as" fields and grammars so it will be very helpful when those training parts are supported on this github.

This github helped me a lot to understand stt faster and better.

Much appreciated :)

npm run dev fails because of error TS2339

I follow the tutorial to run the application but il fails when I run npm run dev. I am a total newbie with NodeJs, so I'm a bit afraid to fix errors. I tried to run the application both on Linux and MacOS with node version 8 (like the configuration in the CI) and the latest LTS node version (14).

This is the error output

$ npm run dev

[email protected] dev /Users/me/Documents/deepdrone/Train-Custom-Speech-Model
npm run build && concurrently --kill-others-on-fail "npm run start" "npm run client"

[email protected] build /Users/me/Documents/deepdrone/Train-Custom-Speech-Model
npm run build-client && tsc

[email protected] build-client /Users/me/Documents/deepdrone/Train-Custom-Speech-Model
cd client && npm install

audited 2098 packages in 10.646s

43 packages are looking for funding
run npm fund for details

found 92 vulnerabilities (82 low, 2 moderate, 8 high)
run npm audit fix to fix them, or npm audit for details
server/controllers/user.ts:42:36 - error TS2339: Property 'returnTo' does not exist on type 'Session & Partial'.

42 const returnTo = req.session.returnTo || '/';
~~~~~~~~

server/controllers/user.ts:43:26 - error TS2339: Property 'returnTo' does not exist on type 'Session & Partial'.

43 delete req.session.returnTo;
~~~~~~~~

Found 2 errors.

npm ERR! code ELIFECYCLE
npm ERR! errno 2
npm ERR! [email protected] build: npm run build-client && tsc
npm ERR! Exit status 2
npm ERR!
npm ERR! Failed at the [email protected] build script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR! /Users/me/.npm/_logs/2020-11-27T12_39_35_026Z-debug.log
npm ERR! code ELIFECYCLE
npm ERR! errno 2
npm ERR! [email protected] dev: npm run build && concurrently --kill-others-on-fail "npm run start" "npm run client"
npm ERR! Exit status 2
npm ERR!
npm ERR! Failed at the [email protected] dev script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR! /Users/me/.npm/_logs/2020-11-27T12_39_35_054Z-debug.log

The error is about TypeScript. I wonder how the build in the CI can succeed when it fails on my computer.

$ nvm --version
0.37.1
$ npm --version
6.14.19
$ node --version
v8.17.0

Does someone has experienced the same issue ? And is there a fix to run the application and continue with the tutorial ?

error: 'Not authorized to view this resource'

I tried to deploy the app locally and at IBM Cloud - I'm getting this error:

I guess it's related to auth, but not sure how to fix it.

Thanks!

python commands: uri of service is hardcoded

eg.
uri = "https://stream.watsonplatform.net/speech-to-text/api/v1/customizations"
when one may have to use
"https://stream-fra.watsonplatform.net/speech-to-text/api/v1/customizations"
(of course, solution is easy using an env variable)
Best,
fps

converting rtf files to txt

python convert_rtf.py *.rtf does not work. It only works on one file at a time (wildcards not allowed)

Notify user about minimum audio length

The UI doesn't tell the user what went wrong when attempting to train a model without the minimum of 10 minutes of audio. The error message is generic. The error message can be found in the server logs, but it would be helpful to tell the user about this.

We can either put a disclaimer about the minimum length requirement or propagate this error message from Watson STT.

Audio preprocessing

Hey, what are the possible Audio Pre-processing steps that can be used to improve transcript quality? Is there any library in python for denoising or audio enhancement without using deep learning ( as it is taking lot of time for a small audio clip). ?

inconsistency between preparing data and using it in UI

When preparing the corpus data, we tell the user to issue the following command:

sed -f fixup.sed Documents/*.txt > corpus-1.input

But when trying to upload the corpus file in the UI, it only allows txt files.

There seems to be missing steps for deploying this to IBM Cloud.

There seem to be missing steps for deploying this to IBM Cloud using Cloud Foundry.
Are you guys planning to add that soon?

Thanks.

Deleting and adding 2nd corporas

While playing with the app, I noted a few things.

1_ I added a new corpora language model than did the training and all worked fine.
2_ Next then I added second corpora using the same sound file and updated a few other things. Then when I do training again and use the base model, I only see my corpora1 taken effect.
3_ Next then I deleted all corporate yet still able to run the custom language model and getting corpora1 output. There seems to be a bug with adding 2nd corpora's and removing corpora's.

4_ Lastly, when all corpora's deleted then I see the pendding status for my model training.

Corpus List Error Internal Server Error

Followed all steps, but receiving Corpus List Error
Internal Server Error when I navigate to Corpora.
Unable to upload the txt file.
Any ideas?

Transcribe file > 15mb

The transcribe section won't accept a file >15mb. Do you have instructions for utilizing the trained models on an mp3 in a bucket or somewhere? My goal is to run this on a 1 hour audio file that is >15mb. Currently unable to do this in the GUI. Thanks.

GUI fails when training acoustic model

status in GUI reads: "training", but see the following errors in the console:

00/train Chrome 71.0 Mac OS X 10.14.0 538.350548 ms (req_id=01eeef73-f715-4bc3-b007-c44a34baec8f, remote-address=::ffff:127.0.0.1, ip=::ffff:127.0.0.1, method=GET, url=/api/acoustic-model, referer=http://localhost:3000/train, body={}, short-body={}, http-version=1.1, response-time=538.350548, status-code=200, incoming=<--)
[0] [2019-02-26T22:26:16.412Z]  INFO: logger/43680 on Richs-MBP-15.attlocal.net: ::ffff:127.0.0.1 <-- GET /api/acoustic-model HTTP/1.1 200 369 http://localhost:3000/train Chrome 71.0 Mac OS X 10.14.0 566.323107 ms (req_id=7cbc3d9e-4b51-441b-bf93-6dbb7cecbe30, remote-address=::ffff:127.0.0.1, ip=::ffff:127.0.0.1, method=GET, url=/api/acoustic-model, referer=http://localhost:3000/train, body={}, short-body={}, http-version=1.1, response-time=566.323107, status-code=200, incoming=<--)
[0] (node:43680) UnhandledPromiseRejectionWarning: RangeError [ERR_HTTP_INVALID_STATUS_CODE]: Invalid status code: ETIMEDOUT
[0]     at ServerResponse.writeHead (_http_server.js:208:11)
[0]     at ServerResponse.writeHead (/Users/rhagarty/journeys/Train-Custom-Speech-Model/node_modules/on-headers/index.js:55:19)
[0]     at ServerResponse.writeHead (/Users/rhagarty/journeys/Train-Custom-Speech-Model/node_modules/on-headers/index.js:55:19)
[0]     at ServerResponse._implicitHeader (_http_server.js:199:8)
[0]     at ServerResponse.write (/Users/rhagarty/journeys/Train-Custom-Speech-Model/node_modules/compression/index.js:84:14)
[0]     at writetop (/Users/rhagarty/journeys/Train-Custom-Speech-Model/node_modules/express-session/index.js:290:26)
[0]     at ServerResponse.end (/Users/rhagarty/journeys/Train-Custom-Speech-Model/node_modules/express-session/index.js:338:16)
[0]     at ServerResponse.send (/Users/rhagarty/journeys/Train-Custom-Speech-Model/node_modules/express/lib/response.js:221:10)
[0]     at ServerResponse.json (/Users/rhagarty/journeys/Train-Custom-Speech-Model/node_modules/express/lib/response.js:267:15)
[0]     at /Users/rhagarty/journeys/Train-Custom-Speech-Model/dist/controllers/api.js:123:70
[0] (node:43680) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
[0] (node:43680) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
[0] _http_server.js:208
[0]     throw new ERR_HTTP_INVALID_STATUS_CODE(originalStatusCode);
[0]     ^
[0]
[0] RangeError [ERR_HTTP_INVALID_STATUS_CODE]: Invalid status code: ETIMEDOUT
[0]     at ServerResponse.writeHead (_http_server.js:208:11)
[0]     at ServerResponse.writeHead (/Users/rhagarty/journeys/Train-Custom-Speech-Model/node_modules/on-headers/index.js:55:19)
[0]     at ServerResponse.writeHead (/Users/rhagarty/journeys/Train-Custom-Speech-Model/node_modules/on-headers/index.js:55:19)
[0]     at ServerResponse._implicitHeader (_http_server.js:199:8)
[0]     at ServerResponse.end (/Users/rhagarty/journeys/Train-Custom-Speech-Model/node_modules/compression/index.js:103:14)
[0]     at writeend (/Users/rhagarty/journeys/Train-Custom-Speech-Model/node_modules/express-session/index.js:261:22)
[0]     at Immediate.onsave (/Users/rhagarty/journeys/Train-Custom-Speech-Model/node_modules/express-session/index.js:335:11)
[0]     at processImmediate (timers.js:632:19)
[1] Proxy error: Could not proxy request /api/acoustic-model from localhost:3000 to http://localhost:5000/.
[1] See https://nodejs.org/api/errors.html#errors_common_system_errors for more information (ECONNRESET).

If I run list_acoustic_model.py, I get the following:

Getting custom acoustic models...
Get acoustice models returns:  200
{"customizations": [
   {
      "owner": "4493493e-2a2a-4eb1-8845-a915035a4142",
      "base_model_name": "en-US_NarrowbandModel",
      "customization_id": "43bc9966-fe48-4f9b-8d88-405091e4c5a9",
      "versions": ["en-US_NarrowbandModel.v2018-07-31"],
      "created": "2019-02-25T23:09:02.767Z",
      "name": "Acoustic model 1",
      "description": "My narrowband acoustic model",
      "progress": 0,
      "language": "en-US",
      "status": "ready"
   },
   {
      "owner": "4493493e-2a2a-4eb1-8845-a915035a4142",
      "base_model_name": "en-US_NarrowbandModel",
      "customization_id": "edb9a7ce-082c-42b3-8289-a8f4fa5eeffb",
      "versions": ["en-US_NarrowbandModel.v2018-07-31"],
      "created": "2019-02-25T23:37:32.070Z",
      "name": "Acoustic-model-1",
      "description": "Custom acoustic model for user1",
      "progress": 0,
      "language": "en-US",
      "status": "training"
   }
]}

How to connect from behind an enterprise's firewall? (proxy settings)

I didn't find how to connect from my company's intranet (we connect to the internet through a proxy with username and password) I am able to use the app with a direct connection to the internet. I am also able to run speech-to-text from behind my company's firewall using curl, or using the python commands.

I tried setting the usual env variables (HTTP_PROXY etc). I also tried to set npm config with:

npm config set proxy ...
npm config set https-proxy ...

without success. Maybe I missed something in the documentation ?
Or maybe it is related to the errors I see displayed in the console when starting ? (see my previous message)

Thanks,

fps

ibm / train-custom-speech-model Goto Github PK

train-custom-speech-model's Introduction

Create a custom Watson Speech to Text model using specialized domain data

Flow

Included components

Featured technologies

Watch the Video

Steps

1. Clone the repo

2. Create IBM Cloud services

3. Configure credentials

4. Download and prepare the data

5. Train the models

a. Run the application

b. Use the Command Line interface

Install dependencies

Train the language model

Train the acoustic model

6. Transcribe your dictation

7. Correct the transcription

Sample output

Links

Troubleshooting

Deploy on IBM Cloud

Learn more

License

train-custom-speech-model's People

Contributors

Stargazers

Watchers

Forkers

train-custom-speech-model's Issues

Recommend Projects

Recommend Topics

Recommend Org