Giter Club home page Giter Club logo

liblinear-java's People

Contributors

bwaldvogel avatar cheusov avatar electrum avatar kzn avatar numb3r3 avatar salimm avatar tandronicus avatar vbogach avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

liblinear-java's Issues

Setting

In programmatic usage of the code, it is not possible to set the option for predicting probabilities (the -b option in the command line and setting flag_predict_probability in Predict.java).
I think you should add a new field to (say flag_predict_probability ) to Params class and set the parameter there, so that one can predict with probabilities.
Daniel

Different Results With the Same Experiment

I get slightly different results by running the same experiment (LogReg L1, reg=0.3) each time. Is that possible or must there be a bug either with the library or with my code?

This is not the case if I use LogReg L2. I get the exactly same results. I am testing out the other LogReg L1 implementations (StanfordNLP and Smile) as well. Both's results are deterministic.

How to obtain the support vectors

Hi --

I am using Weka's LibLINEAR class and want to obtain the support vectors after the classifier has been trained.

Is there an example showing how this can be done?

Thanks,
Haimonti

Linear.loadModel(Reader) should not close the reader

It should be the client responsibility to decide whether or not to close the reader in Linear.loadModel(Reader).

Reason: I'm storing the model in a zip file with other information. If Linear.loadModel(Reader) closes the reader, this will close the zip input stream, which breaks my code.

Is a bias term added to the features automatically?

I am maintaining code that I mostly didn't write which uses liblinear for logistic regression. My understanding from the documentation was that setting bias to a value greater than 0 will result in a synthetic feature being added. But I cannot see anywhere in the code where this feature is added either during training or prediction. Is it required to both set the bias parameter to a value greater than 1 and also manually add the synthetic feature node during training and prediction?

Bias parameter not used in Linear.predictValues()

I am using a logistic regression model with 2 features and a bias.

I would expect the score to be calculated as

w1*x1 + w2*x2 + bias

but looking at

dec_values[i] += w[(idx - 1) * nr_w + i] * lx.getValue();
it seems like the bias parameter is never added to dec_values.

Am I right to think the bias parameter should contribute to the score or is my understanding incorrect?

Documentation For Java API

Hi,

This is a great library. I've been looking for a lightweight, reliable and commercial friendly Java implementation of LogReg, MaxEnt and SVM for a while and surprisingly it wasn't easy as I thought. Could you provide some Java code examples for the basic usage of the API? I could figure out the following but I wonder if there is more to know.

        Problem problem = Train.readProblem(new File("train.libsvm"), 1);
        Problem testProb = Train.readProblem(new File("test.libsvm"), 1);

        SolverType solver = SolverType.L2R_LR; // -s 0
        double C = 1.5;    // cost of constraints violation
        double eps = 0.01; // stopping criteria

        Parameter parameter = new Parameter(solver, C, eps);
        final Model model = Linear.train(problem, parameter);
        File modelFile = new File("model");
        model.save(modelFile);
        for (int i = 0; i < testProb.x.length; i++) {
            Feature[] instance = testProb.x[i];
            double prediction = Linear.predict(model, instance);
        }

你好我在用这个东西

你好我在用这个东西.挺好用的.但是有个问题呢...怎么能返回每个类别属于这个类别的分数..因为数据中会有.一些文本..任何类都不属于..我想得道这个分数弄个阈值..来过滤..当然..你要是看不懂中文...额...

how to get belongs to some class score?

predictValues() does not map labels

predictValues() and therefore predictProbability() fill the score array with scores according to the internal label representation, without being mapped by Model.label[].

First, Model.label[] is private, so it's not possible to guess the mapping from the application, and the number of labels in the internal representation might be less than the actual number of labels due to labels not seen in training. Then, the size the scores[] array provided to predictValues() is impossible to guess.

Would it be possible to either expose the mapping or map the scores array?

Affects 32c64ff.

No option for no bias feature for Train.readProblem()

I have two options to map my Dataset object to a liblinear Problem object.

1-) Convert programmatically. This is what I do normally. I also add a bias=1 feature with this method. And it works just fine.

2-) First save the Dataset object as .libsvm and then call the Train.readProblem(). However, readProblem() adds a bias feature by default. Even if I pass bias=0, an extra feature is added for all instances. This is the reason that I can't reproduce the exact same accuracy which I get with the 1st method. I get slightly worse results.

Here you may recommend not adding the bias value while saving as .libsvm since it is already going to be added by the readProblem(). But I create training and test sets separately and create two libsvm files. Let's imagine a situation like this; the training libsvm file has 100 features and in the test libsvm file, no instance has the 100th feature. Thus when I call readProblem() to read training and test problems, the test problem's dimension is 99 not 100. To prevent this I add the bias feature as the last feature by default. This way both sets has 101 features.

In conclusion, I think the bias feature must be optional for the Train.readProblem() method.

Assertions invalid

Some of the assert statements in the code are not valid. Specifically, in Train.java on lines 269 and 272, these assertions can throw ArrayIndexOutOfBoundsException if the libsvm file contains rows without any features. Once I turned off the assertions in my unit tests, everything working fine. Therefore, the rest of the code correctly handles rows without features.

Implement init_sol setter in Parameter

There is currently a field init_sol in Parameter class that is used to initialize model weights. However there is no setter for this field. Could you implement the setter?

Thread safety problem in predict: flag_predict_probability shouldn't be static

The flag_predict_probability boolean in predict is declared static. This causes a problem when running 2 different predict jobs in the same Java process with different probability options: the setting from the second call will overwrite the one from the first call. This may cause a prediction using a non-probabilty-capable solver type to fail, despite being called with correct parameters, if a simultaneous job runs with probabilities enabled.

NPE when setting -wi parameter

java -cp /home/pebble/.m2/repository/de/bwaldvogel/liblinear/1.8/liblinear-1.8.jar de.bwaldvogel.liblinear.Train -v 10 -c 10 -w1 2 vector
Exception in thread "main" java.lang.NullPointerException
at java.lang.System.arraycopy(Native Method)
at de.bwaldvogel.liblinear.Train.parse_command_line(Train.java:123)
at de.bwaldvogel.liblinear.Train.run(Train.java:287)
at de.bwaldvogel.liblinear.Train.main(Train.java:19)

whereas the native liblinear implementation behaves as expected

Bias term is added by default

The following code will result a Model object where model.nr_feature = n-1 even if the number of features in the dataset is n excluding the bias term.

problem.l = l
problem.n = n
problem.x = x
problem.y = y
...
Model model = Linear.train(problem, parameter);

This is because in the above code, bias term is added implicitly (default value for problem.bias is 0) and in Linear.train() there is a line if (prob.bias >= 0) model.nr_feature = n - 1;. To avoid this we can change line problem.n = n as problem.n = n+1. But the Java Api documentation is misleading and CLI documentation says the default value for bias is -1 which makes one think that it is also the case for programmatic access.

Performance improvements

This would slightly differ from the original implementation, however, if we replace all instances of

x += y * z;

with

x = Math.fma(y, z, x);

we can get a huge speedup, for example, replacing all occurrences of this in just the de.bwaldvogel.liblinear.SparseOperator class we can get a 2x speed up on machines with FMA enabled.

However it would require switching to Java 9.

NullPointerException when using sparse data to train Model

Hello everyone,

I am currently working on a project that use your library to train classification models, I build a Problem object with sparse feature vectors (following some of your examples) and when I start the training with such Problem object I get NPE from the train method occurring when it browse through feature vectors.
When looking at your code I can see you browse through features in a "for each" way, but when I follow the exception using debugging tools, I see a "for i=0 to n" logic, that will obviously lead to point toward a null feature in a sparse feature vector.

I have Mac os X 11.2.3 and I use oracle jdk 11.0.6.

Linear's global RNG makes it difficult to reproduce models or track concurrent executions

We use liblinear-java in Tribuo, and it’s working very well. We’re adding a reproducibility package to Tribuo to rebuild Tribuo models from the provenance metadata they carry, and as part of the tests for that package I noticed that liblinear-java has a global RNG that causes some of the algorithms to not produce bit-wise exact reproductions when executed on the same inputs. In general Tribuo tracks all RNG state and manages it to ensure that concurrent training runs use independently tracked streams of random numbers for provenance purposes, and the global shared state in liblinear-java means we can’t effectively track it and so we’ll have to enforce sequential use of liblinear-java via synchronization and consistently reset the RNG to a known state.

Is it possible to move the static random instance in Linear into Problem as an instance field? To preserve the original behaviour it could initialize itself to a Random instance using DEFAULT_RANDOM_SEED, or the code could be modified so it defaults to the global RNG if no instance RNG is present in the Problem. The first option would basically just be a find/replace on random with prob.random, along with adding the extra field to Problem (I think it touches approximately 9 lines). The second option would be a little more involved as it requires guards on the 8 uses of random and thus would slightly increase divergence from the C++ liblinear, so might not be as desirable from a maintainability perspective. However it would preserve the existing behaviour exactly for users who don’t set the random field on Problem. We’d be happy to contribute either patch if you’d accept it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.