Giter Club home page Giter Club logo

speech's People

Contributors

chiachunfu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

speech's Issues

Not able to decode

Hi,
I am trying to fit my own model in this app. When I speak into the app, it extracts & prints the MFCC features, but crashes afterwards giving the following error:

04-03 12:56:16.754 24654-24815/org.tensorflow.demo E/TensorFlowInferenceInterface: Failed to run TensorFlow inference with inputs:[SeqLen], outputs:[SparseToDense]

--------- beginning of crash

04-03 12:56:16.755 24654-24815/org.tensorflow.demo E/AndroidRuntime: FATAL EXCEPTION: Thread-7556
Process: org.tensorflow.demo, PID: 24654
java.lang.IllegalArgumentException: Expects arg[0] to be int32 but float is provided
at org.tensorflow.Session.run(Native Method)
at org.tensorflow.Session.access$100(Session.java:48)
at org.tensorflow.Session$Runner.runHelper(Session.java:314)
at org.tensorflow.Session$Runner.run(Session.java:264)
at org.tensorflow.contrib.android.TensorFlowInferenceInterface.run(TensorFlowInferenceInterface.java:228)
at org.tensorflow.contrib.android.TensorFlowInferenceInterface.run(TensorFlowInferenceInterface.java:197)
at org.tensorflow.contrib.android.TensorFlowInferenceInterface.run(TensorFlowInferenceInterface.java:187)
at org.tensorflow.demo.SpeechActivity.recognize(SpeechActivity.java:229)
at org.tensorflow.demo.SpeechActivity.access$100(SpeechActivity.java:48)
at org.tensorflow.demo.SpeechActivity$3.run(SpeechActivity.java:193)
at java.lang.Thread.run(Thread.java:818)

Can someone please help me fix this?

Can this repository be used on PC?

Hi. I need to make a PC version of the speech recognition code. Can this repository be used on PC? If not, how do I need to modify this repository?

can't deploy wavenet model on android

after training on my own data over wavenet model and used the export_wave_net.py to get .pb file when replace your own model in java mean activity with the new .pb the app crashes and gives a message of tensorflow demo is stoped working so what i can do now ??

Can't increase the number of layers

I am trying to increase the number of layers of the lstm_ctc.py but I get an error with the dimensions of the cell ( Dimensions must be equal , but are 1024 and 551).
I used different training data though than VCTK

MFCC window size

Hi, I just came into this repo because I needed to port an MFCC calculation from librosa to java. I found your class very useful, although I had a minimal problem regarding window size.
As my pretrained models did not have default win size (n_fft), I did a minor change to MFCC.java in order to make this work the same as original librosa, producing the same results.
I simply want to share this minor tweak if someone needs this in the future:

	// Marcos not default window size
	private final static int       n_win                = 1600;
...
	private double[] getWindow(){
		//Return a Hann window for even n_fft.
		//The Hann window is a taper formed by using a raised cosine or sine-squared
		//with ends that touch zero.
		double[] win = new double[/*n_fft*/ n_win];
		for (int i = 0; i < /*n_fft*/n_win; i++){
			win[i] = 0.5 - 0.5 * Math.cos(2.0*Math.PI*i/(/*n_fft*/n_win));
		}

		// Marcos: Pad center win to n_ftt (see librosa spectrum.py)
		if (n_win < n_fft) {
			double[] padded_win = new double[n_fft];
			int lpad = (n_fft - n_win) / 2;
			int rpad = n_fft - n_win - lpad;
			for (int l=0;l<lpad;l++)
				padded_win[l] = 0.0;
			for (int m=0;m<n_win;m++)
				padded_win[lpad+m] = win[m];
			for (int r=0;r<rpad;r++)
				padded_win[lpad+n_win+r] = 0.0;
			return padded_win;
		}
		else return win;
	}

All outputScores array values are 0, and result is ""

Hello. I modified the code to recognize the 16kHz .wav file. The code to convert the audio file to the bytes array is as follows:

	String filepath = "G:\\Corpus\\North_Wind_and_the_Sun_passage\\M1\\nws_1.wav";
	ByteArrayOutputStream out = new ByteArrayOutputStream();
	BufferedInputStream in = new BufferedInputStream(new FileInputStream(filePath));
	int read;
	byte[] buff = new byte[1024];
	while ((read = in.read()) > 0) {
		out.write(buff, 0, read);

	}
	out.flush();
	byte[] audioBytes = out.toByteArray();
	System.out.println("audioBytes:" + audioBytes.length);
	return audioBytes;

	byte[] inputBuffer = new byte[RECORDING_LENGTH];
	System.arraycopy(audioBytes, 0, inputBuffer, 0, audioBytes.length);

However, all outputScores array values are 0, and result is "";

这个代码能被修改成PC版的吗?

你好。这个代码只能在Android上使用吗?我尝试修改成PC版的,并使用eclipse编译代码,在加载模型时产生了下面的错误:
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Exception in thread "main" java.lang.NoClassDefFoundError: android/util/Log
at org.tensorflow.contrib.android.TensorFlowInferenceInterface.prepareNativeRuntime(TensorFlowInferenceInterface.java:505)
at org.tensorflow.contrib.android.TensorFlowInferenceInterface.(TensorFlowInferenceInterface.java:124)
at speechRecognition.main.main(main.java:40)
Caused by: java.lang.ClassNotFoundException: android.util.Log
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 3 more

加载模型代码如下:
TensorFlowInferenceInterface inferenceInterface;
InputStream is = new FileInputStream(MODEL_FILENAME);
inferenceInterface = new TensorFlowInferenceInterface(is);

Can't Find the Training File

How did you train your model? Could you please provide the training file as well?

On Data_Process.ipynb file when calculating m, v and s you used mfcc parameters like this,

audio = mfcc(read_audio_from_filename(file, 16000),samplerate=16000,winlen=0.025,winstep=0.01,numcep=39,
                 nfilt=40)

But when I am running other cells on my data,

    inputs = convert_wav_mfcc(wav_path, 16000)
    normalize_inputs = (inputs - m)/s

This throws an exception that shape doesn't match, so I changed the function convert_wav_mfcc to this

samplerate = 16000
winlen = 0.025
winstep = 0.01
numcep = 39
nfilt = 40

def convert_wav_mfcc(file, fs=16000):
    """Turn raw audio data into MFCC with sample rate=fs."""
    inputs = mfcc(read_audio_from_filename(file, fs), samplerate=fs, winlen=winlen, winstep=winstep, numcep=numcep, nfilt=nfilt)
    return inputs

Now everything works fine.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.