sancha / jrae Goto Github PK

I re-implemented a semi-supervised recursive autoencoder in java. I think it is a pretty nice technique. Check it out! Or fork it

Home Page: http://www.socher.org/index.php/Main/Semi-SupervisedRecursiveAutoencodersForPredictingSentimentDistributions

MATLAB 1.66% Shell 0.24% Java 98.10%

jrae's People

Contributors

Stargazers

Watchers

jrae's Issues

Question about an exception

Hi, nice job. I'm also working sentiment analysis. Just trying to see the performance of your model on our dataset. I was trying to run it using the default parameters. But I got the exception "LAPACK Java.raxpy: Parameters for x aren't valid! (n = 50, dx.length = 207050, dxIdx = 207050, incx = 1)" I've search the online but do not have a clue. Do you know how do fix it? I'm running it on Win 7 64bit. Thanks! Z

Converging Problem with FullRun

Hi sancha,
I am using RC3 and have good results with the RAEBuilder class.
However, when I tried the FullRun class (my project needs to run RAE using K-fold cross validation), I encountered the converging problem.
In particular, I have set -MaxIterations 70 . However, the program could not finish the first fold and said
QNMinimizer terminated without converging
FileNotFoundException: data/mov/data/mov/tunedTheta.rae.0.2.0.5.rae

I also attached the screenshot from my machine
Do you have any solutions for this situation?
Best,
Phuong

Other UTF-8 Languages

Hi, I'm using this package for some research/comparison of sentiment analysis in multiple languages, and I was wondering why you used the RobustTokenizer for tokenization? It doesn't work well at all for UTF-8 encoding of non-Latin characters, so I actually went through and made some of the parsing methods more UTF-8 friendly and switched the tokenizer to the PTBTokenizer which is working much better for non-Latin alphabets (i.e. Russian, which is what I had to modify the package for).

Looks great, build instructions

This code looks great. Thanks so much for releasing it. I've been trying to do a similar thing from the matlab code (though not in java) Its been a while since I did things in Java, I'm wondering if you could add instructions to build and verify that the run.sh works?

Here's what I did so far, but I think maybe I need to make a manifest:

mkdir jar

javac `find ./src | grep .java` -Xlint:unchecked -cp libs/jblas-1.2.0.jar:libs/jmatio-0.2.jar:libs/joda-time.jar:libs/junit-4.10.jar:libs/log4j-1.2.16.jar:libs/stanford-corenlp-2012-01-08.jar

jar cf jar/jrae.jar src

Any help here would be great. Sorry to bug you. Thanks

Pretraining on unlabled+labled data

I have a classification task with some labeled data and a large amount of unlabeled but related data. Is there a way i can use both of them in JRAE? I have used JRAE for text classification tasks and i'm trying to measure the effect of background text pretraining on accuracy.

RAEBuilder_test wrong

next test_demo code seems have a little problem.

-DataDir data/
-MaxIterations 80
-ModelFile data/mov/tunedTheta.rae
-ClassifierFile data/mov/Softmax.clf
-NumCores 2
-TrainModel False
-ProbabilitiesOutputFile data/tiny/prob.out

what is the correct test code?

Licence #2

I would like to "reopen" old issue about licence.

As I understand, you are not against idea of licensing your code under more permissive licence. And that is possible even when linking to GPL code:

http://stackoverflow.com/questions/1098051/interfacing-with-gpl-applications-from-mit-licensed-code-is-a-dual-license-una

As long as you own the copyrights to all the code in your project, there's nothing stopping you from releasing your whole project as dual MIT and GPL licensed. Dual licensing means that your users get a choice of what license to take your code under (so you're not restricting them at all - actually just giving them more options than straight-MIT-licensed).

The GPL code you're linking to just says "you have to let your users have the option of licensing your code under GPL". That's satisfied by a dual-license.

What would stop you from pursuing a dual license was if you had actually pasted slabs of GPL code into your project, since you couldn't re-license that code to people as MIT."

Such dual licensing would be a great help for all of us that cannot use your work because of viral aspect of GPL licence.

System requirements for training on a larger corpus

Hi,
I am trying to run the training code using a subset of the Europarl corpus which has 1.9 million sentences. With 8GB RAM, the code has been running since 3 days and I suspect that the allocated memory is insufficient. Would it be possible to provide an estimate of how much memory will be ideal for running the code?

how to feed our own word embeddings to RAE?

Why not sparse auto-encoder ?

I do understand, this may not be a right place to ask the question !

But I'm wondering why simple auto-encoder is used instead of sparse auto-encoder model in this paper !

Accuracy is lower than what is declared in the paper or am i missing something?

Hi,
I'm running JRAE's movie review example with this run.sh file:

!/bin/bash

javac -d bin/ -classpath .:libs/* -Xlint find src | grep java$

java -Xms1g -Xmx30g -XX:+UseTLAB -XX:+UseConcMarkSweepGC -cp .:bin/:libs/* main.RAEBuilder
-DataDir data/mov
-MaxIterations 20
-ModelFile data/mov/tunedTheta.rae
-ClassifierFile data/mov/Softmax.clf
-NumCores 20
-TrainModel True
-ProbabilitiesOutputFile data/mov/prob.out
-TreeDumpDir data/mov/trees

But the problem is that every time i run the experiment i get a different accuracy (e.g. 59%, 60%, 71%, etc.) but never achieved around 77% declared in the paper.
I don't know if i'm missing something or is there any tuning i can do?

sorry if it is not the right place to ask this.

Do you have any idea of this? Any comments or suggestions are very helpful.
Thanks in advance.

sancha / jrae Goto Github PK

jrae's People

Contributors

Stargazers

Watchers

Forkers

jrae's Issues

!/bin/bash

Recommend Projects

Recommend Topics

Recommend Org