chiachunfu / speech Goto Github PK

View Code? Open in Web Editor NEW

158.0 158.0 63.0 36.91 MB

TensorFlow on mobile with speech-to-text DL models.

Jupyter Notebook 1.49% Python 2.47% Java 64.50% CMake 0.21% C++ 30.90% C 0.43%

speech's People

Contributors

Stargazers

Watchers

speech's Issues

Can't Find the Training File

How did you train your model? Could you please provide the training file as well?

On Data_Process.ipynb file when calculating m, v and s you used mfcc parameters like this,

audio = mfcc(read_audio_from_filename(file, 16000),samplerate=16000,winlen=0.025,winstep=0.01,numcep=39,
                 nfilt=40)

But when I am running other cells on my data,

    inputs = convert_wav_mfcc(wav_path, 16000)
    normalize_inputs = (inputs - m)/s

This throws an exception that shape doesn't match, so I changed the function convert_wav_mfcc to this

samplerate = 16000
winlen = 0.025
winstep = 0.01
numcep = 39
nfilt = 40

def convert_wav_mfcc(file, fs=16000):
    """Turn raw audio data into MFCC with sample rate=fs."""
    inputs = mfcc(read_audio_from_filename(file, fs), samplerate=fs, winlen=winlen, winstep=winstep, numcep=numcep, nfilt=nfilt)
    return inputs

Now everything works fine.

Can't increase the number of layers

I am trying to increase the number of layers of the lstm_ctc.py but I get an error with the dimensions of the cell ( Dimensions must be equal , but are 1024 and 551).
I used different training data though than VCTK

license on librosa (MFCC JAVA)

Hi, I have a same question below #7(Is the java adapted librosa library free for commercial use?
We just want to apply librosa feature extraction part (MFCC) to android devices, but coud not find license onfirmation.
We know we can use basic librosa for commercial use, but is it possible to use your Java version code for commercial use?

All outputScores array values are 0, and result is ""

Hello. I modified the code to recognize the 16kHz .wav file. The code to convert the audio file to the bytes array is as follows:

	String filepath = "G:\\Corpus\\North_Wind_and_the_Sun_passage\\M1\\nws_1.wav";
	ByteArrayOutputStream out = new ByteArrayOutputStream();
	BufferedInputStream in = new BufferedInputStream(new FileInputStream(filePath));
	int read;
	byte[] buff = new byte[1024];
	while ((read = in.read()) > 0) {
		out.write(buff, 0, read);

	}
	out.flush();
	byte[] audioBytes = out.toByteArray();
	System.out.println("audioBytes:" + audioBytes.length);
	return audioBytes;

	byte[] inputBuffer = new byte[RECORDING_LENGTH];
	System.arraycopy(audioBytes, 0, inputBuffer, 0, audioBytes.length);

However, all outputScores array values are 0, and result is "";

Is the java adapted librosa library free for commercial use?

There is no mention of a license for this project, is it allowed though to make adaptions to the code and use it for commercial use?

I am only interested in the code for the MFCC extraction.

how to convert librosa into java?

according to you README, you convert the librosa into java? so you transfer it according to the resource code of librosa right?

Not able to decode

Hi,
I am trying to fit my own model in this app. When I speak into the app, it extracts & prints the MFCC features, but crashes afterwards giving the following error:

04-03 12:56:16.754 24654-24815/org.tensorflow.demo E/TensorFlowInferenceInterface: Failed to run TensorFlow inference with inputs:[SeqLen], outputs:[SparseToDense]

--------- beginning of crash

04-03 12:56:16.755 24654-24815/org.tensorflow.demo E/AndroidRuntime: FATAL EXCEPTION: Thread-7556
Process: org.tensorflow.demo, PID: 24654
java.lang.IllegalArgumentException: Expects arg[0] to be int32 but float is provided
at org.tensorflow.Session.run(Native Method)
at org.tensorflow.Session.access$100(Session.java:48)
at org.tensorflow.Session$Runner.runHelper(Session.java:314)
at org.tensorflow.Session$Runner.run(Session.java:264)
at org.tensorflow.contrib.android.TensorFlowInferenceInterface.run(TensorFlowInferenceInterface.java:228)
at org.tensorflow.contrib.android.TensorFlowInferenceInterface.run(TensorFlowInferenceInterface.java:197)
at org.tensorflow.contrib.android.TensorFlowInferenceInterface.run(TensorFlowInferenceInterface.java:187)
at org.tensorflow.demo.SpeechActivity.recognize(SpeechActivity.java:229)
at org.tensorflow.demo.SpeechActivity.access$100(SpeechActivity.java:48)
at org.tensorflow.demo.SpeechActivity$3.run(SpeechActivity.java:193)
at java.lang.Thread.run(Thread.java:818)

Can someone please help me fix this?

Can this repository be used on PC?

Hi. I need to make a PC version of the speech recognition code. Can this repository be used on PC? If not, how do I need to modify this repository?

can't deploy wavenet model on android

after training on my own data over wavenet model and used the export_wave_net.py to get .pb file when replace your own model in java mean activity with the new .pb the app crashes and gives a message of tensorflow demo is stoped working so what i can do now ??

MFCC window size

Hi, I just came into this repo because I needed to port an MFCC calculation from librosa to java. I found your class very useful, although I had a minimal problem regarding window size.
As my pretrained models did not have default win size (n_fft), I did a minor change to MFCC.java in order to make this work the same as original librosa, producing the same results.
I simply want to share this minor tweak if someone needs this in the future:

	// Marcos not default window size
	private final static int       n_win                = 1600;
...
	private double[] getWindow(){
		//Return a Hann window for even n_fft.
		//The Hann window is a taper formed by using a raised cosine or sine-squared
		//with ends that touch zero.
		double[] win = new double[/*n_fft*/ n_win];
		for (int i = 0; i < /*n_fft*/n_win; i++){
			win[i] = 0.5 - 0.5 * Math.cos(2.0*Math.PI*i/(/*n_fft*/n_win));
		}

		// Marcos: Pad center win to n_ftt (see librosa spectrum.py)
		if (n_win < n_fft) {
			double[] padded_win = new double[n_fft];
			int lpad = (n_fft - n_win) / 2;
			int rpad = n_fft - n_win - lpad;
			for (int l=0;l<lpad;l++)
				padded_win[l] = 0.0;
			for (int m=0;m<n_win;m++)
				padded_win[lpad+m] = win[m];
			for (int r=0;r<rpad;r++)
				padded_win[lpad+n_win+r] = 0.0;
			return padded_win;
		}
		else return win;
	}

这个代码能被修改成PC版的吗？

你好。这个代码只能在Android上使用吗？我尝试修改成PC版的，并使用eclipse编译代码，在加载模型时产生了下面的错误：
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Exception in thread "main" java.lang.NoClassDefFoundError: android/util/Log
at org.tensorflow.contrib.android.TensorFlowInferenceInterface.prepareNativeRuntime(TensorFlowInferenceInterface.java:505)
at org.tensorflow.contrib.android.TensorFlowInferenceInterface.(TensorFlowInferenceInterface.java:124)
at speechRecognition.main.main(main.java:40)
Caused by: java.lang.ClassNotFoundException: android.util.Log
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 3 more

加载模型代码如下：
TensorFlowInferenceInterface inferenceInterface;
InputStream is = new FileInputStream(MODEL_FILENAME);
inferenceInterface = new TensorFlowInferenceInterface(is);

chiachunfu / speech Goto Github PK

speech's People

Contributors

Stargazers

Watchers

Forkers

speech's Issues

Can't Find the Training File

Can't increase the number of layers

license on librosa (MFCC JAVA)

All outputScores array values are 0, and result is ""

Is the java adapted librosa library free for commercial use?

how to convert librosa into java?

Not able to decode

Can this repository be used on PC?

can't deploy wavenet model on android

MFCC window size

这个代码能被修改成PC版的吗？

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent