sancha / jrae Goto Github PK
View Code? Open in Web Editor NEWI re-implemented a semi-supervised recursive autoencoder in java. I think it is a pretty nice technique. Check it out! Or fork it
I re-implemented a semi-supervised recursive autoencoder in java. I think it is a pretty nice technique. Check it out! Or fork it
Hi, nice job. I'm also working sentiment analysis. Just trying to see the performance of your model on our dataset. I was trying to run it using the default parameters. But I got the exception "LAPACK Java.raxpy: Parameters for x aren't valid! (n = 50, dx.length = 207050, dxIdx = 207050, incx = 1)" I've search the online but do not have a clue. Do you know how do fix it? I'm running it on Win 7 64bit. Thanks! Z
Hi sancha,
I am using RC3 and have good results with the RAEBuilder class.
However, when I tried the FullRun class (my project needs to run RAE using K-fold cross validation), I encountered the converging problem.
In particular, I have set -MaxIterations 70 . However, the program could not finish the first fold and said
QNMinimizer terminated without converging
FileNotFoundException: data/mov/data/mov/tunedTheta.rae.0.2.0.5.rae
I also attached the screenshot from my machine
Do you have any solutions for this situation?
Best,
Phuong
Hi, I'm using this package for some research/comparison of sentiment analysis in multiple languages, and I was wondering why you used the RobustTokenizer for tokenization? It doesn't work well at all for UTF-8 encoding of non-Latin characters, so I actually went through and made some of the parsing methods more UTF-8 friendly and switched the tokenizer to the PTBTokenizer which is working much better for non-Latin alphabets (i.e. Russian, which is what I had to modify the package for).
This code looks great. Thanks so much for releasing it. I've been trying to do a similar thing from the matlab code (though not in java) Its been a while since I did things in Java, I'm wondering if you could add instructions to build and verify that the run.sh works?
Here's what I did so far, but I think maybe I need to make a manifest:
mkdir jar
javac `find ./src | grep .java` -Xlint:unchecked -cp libs/jblas-1.2.0.jar:libs/jmatio-0.2.jar:libs/joda-time.jar:libs/junit-4.10.jar:libs/log4j-1.2.16.jar:libs/stanford-corenlp-2012-01-08.jar
jar cf jar/jrae.jar src
Any help here would be great. Sorry to bug you. Thanks
I have a classification task with some labeled data and a large amount of unlabeled but related data. Is there a way i can use both of them in JRAE? I have used JRAE for text classification tasks and i'm trying to measure the effect of background text pretraining on accuracy.
next test_demo code seems have a little problem.
-DataDir data/
-MaxIterations 80
-ModelFile data/mov/tunedTheta.rae
-ClassifierFile data/mov/Softmax.clf
-NumCores 2
-TrainModel False
-ProbabilitiesOutputFile data/tiny/prob.out
what is the correct test code?
I would like to "reopen" old issue about licence.
As I understand, you are not against idea of licensing your code under more permissive licence. And that is possible even when linking to GPL code:
As long as you own the copyrights to all the code in your project, there's nothing stopping you from releasing your whole project as dual MIT and GPL licensed. Dual licensing means that your users get a choice of what license to take your code under (so you're not restricting them at all - actually just giving them more options than straight-MIT-licensed).
The GPL code you're linking to just says "you have to let your users have the option of licensing your code under GPL". That's satisfied by a dual-license.
What would stop you from pursuing a dual license was if you had actually pasted slabs of GPL code into your project, since you couldn't re-license that code to people as MIT."
Such dual licensing would be a great help for all of us that cannot use your work because of viral aspect of GPL licence.
Hi,
I am trying to run the training code using a subset of the Europarl corpus which has 1.9 million sentences. With 8GB RAM, the code has been running since 3 days and I suspect that the allocated memory is insufficient. Would it be possible to provide an estimate of how much memory will be ideal for running the code?
I do understand, this may not be a right place to ask the question !
But I'm wondering why simple auto-encoder is used instead of sparse auto-encoder model in this paper !
Hi,
I'm running JRAE's movie review example with this run.sh file:
javac -d bin/ -classpath .:libs/* -Xlint find src | grep java$
java -Xms1g -Xmx30g -XX:+UseTLAB -XX:+UseConcMarkSweepGC -cp .:bin/:libs/* main.RAEBuilder
-DataDir data/mov
-MaxIterations 20
-ModelFile data/mov/tunedTheta.rae
-ClassifierFile data/mov/Softmax.clf
-NumCores 20
-TrainModel True
-ProbabilitiesOutputFile data/mov/prob.out
-TreeDumpDir data/mov/trees
But the problem is that every time i run the experiment i get a different accuracy (e.g. 59%, 60%, 71%, etc.) but never achieved around 77% declared in the paper.
I don't know if i'm missing something or is there any tuning i can do?
sorry if it is not the right place to ask this.
in jrae/src/io/ParsedReviewData.java line 202 and 203, code duplicate.
I ran the rc3 version with different -minCount parameters but it seems that it only builds the wordmap.map file once and doesn't rebuild it in next runs ignoring the possible change of -minCount parameter.
This looks really good, but unfortunately I need to understand the license before doing anything with it. Any chance you'd release this with an Apache2 license? http://www.apache.org/licenses/LICENSE-2.0.html
I am trying to use your Java Recursive Autoencoder(JRAE) to reproduce the results reported in Socher 2011 on the Movie Reviews data set.
The result I can get using the stable branch is significantly lower(around 73.0%) than the number reported in Socher's paper(77%). I did the 10-fold cross validation based on the line numbers. The default parameters(MaxIterations=50) are used.
Do you have any idea of this? Any comments or suggestions are very helpful.
Thanks in advance.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.