Giter Club home page Giter Club logo

jrae's People

Contributors

bryant1410 avatar sancha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jrae's Issues

Question about an exception

Hi, nice job. I'm also working sentiment analysis. Just trying to see the performance of your model on our dataset. I was trying to run it using the default parameters. But I got the exception "LAPACK Java.raxpy: Parameters for x aren't valid! (n = 50, dx.length = 207050, dxIdx = 207050, incx = 1)" I've search the online but do not have a clue. Do you know how do fix it? I'm running it on Win 7 64bit. Thanks! Z

Converging Problem with FullRun

Hi sancha,
I am using RC3 and have good results with the RAEBuilder class.
However, when I tried the FullRun class (my project needs to run RAE using K-fold cross validation), I encountered the converging problem.
In particular, I have set -MaxIterations 70 . However, the program could not finish the first fold and said
QNMinimizer terminated without converging
FileNotFoundException: data/mov/data/mov/tunedTheta.rae.0.2.0.5.rae

I also attached the screenshot from my machine
Do you have any solutions for this situation?
Best,
Phuong
screenshot

Other UTF-8 Languages

Hi, I'm using this package for some research/comparison of sentiment analysis in multiple languages, and I was wondering why you used the RobustTokenizer for tokenization? It doesn't work well at all for UTF-8 encoding of non-Latin characters, so I actually went through and made some of the parsing methods more UTF-8 friendly and switched the tokenizer to the PTBTokenizer which is working much better for non-Latin alphabets (i.e. Russian, which is what I had to modify the package for).

Looks great, build instructions

This code looks great. Thanks so much for releasing it. I've been trying to do a similar thing from the matlab code (though not in java) Its been a while since I did things in Java, I'm wondering if you could add instructions to build and verify that the run.sh works?

Here's what I did so far, but I think maybe I need to make a manifest:

mkdir jar

javac `find ./src | grep .java` -Xlint:unchecked -cp libs/jblas-1.2.0.jar:libs/jmatio-0.2.jar:libs/joda-time.jar:libs/junit-4.10.jar:libs/log4j-1.2.16.jar:libs/stanford-corenlp-2012-01-08.jar

jar cf jar/jrae.jar src

Any help here would be great. Sorry to bug you. Thanks

Pretraining on unlabled+labled data

I have a classification task with some labeled data and a large amount of unlabeled but related data. Is there a way i can use both of them in JRAE? I have used JRAE for text classification tasks and i'm trying to measure the effect of background text pretraining on accuracy.

RAEBuilder_test wrong

next test_demo code seems have a little problem.

-DataDir data/
-MaxIterations 80
-ModelFile data/mov/tunedTheta.rae
-ClassifierFile data/mov/Softmax.clf
-NumCores 2
-TrainModel False
-ProbabilitiesOutputFile data/tiny/prob.out

what is the correct test code?

Licence #2

I would like to "reopen" old issue about licence.

As I understand, you are not against idea of licensing your code under more permissive licence. And that is possible even when linking to GPL code:

http://stackoverflow.com/questions/1098051/interfacing-with-gpl-applications-from-mit-licensed-code-is-a-dual-license-una

As long as you own the copyrights to all the code in your project, there's nothing stopping you from releasing your whole project as dual MIT and GPL licensed. Dual licensing means that your users get a choice of what license to take your code under (so you're not restricting them at all - actually just giving them more options than straight-MIT-licensed).

The GPL code you're linking to just says "you have to let your users have the option of licensing your code under GPL". That's satisfied by a dual-license.

What would stop you from pursuing a dual license was if you had actually pasted slabs of GPL code into your project, since you couldn't re-license that code to people as MIT."

Such dual licensing would be a great help for all of us that cannot use your work because of viral aspect of GPL licence.

System requirements for training on a larger corpus

Hi,
I am trying to run the training code using a subset of the Europarl corpus which has 1.9 million sentences. With 8GB RAM, the code has been running since 3 days and I suspect that the allocated memory is insufficient. Would it be possible to provide an estimate of how much memory will be ideal for running the code?

Why not sparse auto-encoder ?

I do understand, this may not be a right place to ask the question !

But I'm wondering why simple auto-encoder is used instead of sparse auto-encoder model in this paper !

Accuracy is lower than what is declared in the paper or am i missing something?

Hi,
I'm running JRAE's movie review example with this run.sh file:

!/bin/bash

javac -d bin/ -classpath .:libs/* -Xlint find src | grep java$

java -Xms1g -Xmx30g -XX:+UseTLAB -XX:+UseConcMarkSweepGC -cp .:bin/:libs/* main.RAEBuilder
-DataDir data/mov
-MaxIterations 20
-ModelFile data/mov/tunedTheta.rae
-ClassifierFile data/mov/Softmax.clf
-NumCores 20
-TrainModel True
-ProbabilitiesOutputFile data/mov/prob.out
-TreeDumpDir data/mov/trees

But the problem is that every time i run the experiment i get a different accuracy (e.g. 59%, 60%, 71%, etc.) but never achieved around 77% declared in the paper.
I don't know if i'm missing something or is there any tuning i can do?

sorry if it is not the right place to ask this.

duplicate code

in jrae/src/io/ParsedReviewData.java line 202 and 203, code duplicate.

wordmap.map should be rebuilt in each run

I ran the rc3 version with different -minCount parameters but it seems that it only builds the wordmap.map file once and doesn't rebuild it in next runs ignoring the possible change of -minCount parameter.

Reproduce the result in Socher 2011 using jrae

I am trying to use your Java Recursive Autoencoder(JRAE) to reproduce the results reported in Socher 2011 on the Movie Reviews data set.

The result I can get using the stable branch is significantly lower(around 73.0%) than the number reported in Socher's paper(77%). I did the 10-fold cross validation based on the line numbers. The default parameters(MaxIterations=50) are used.

Do you have any idea of this? Any comments or suggestions are very helpful.
Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.