bwaldvogel / liblinear-java Goto Github PK

View Code? Open in Web Editor NEW

306.0 306.0 184.0 3.17 MB

Java version of LIBLINEAR

Home Page: https://liblinear.bwaldvogel.de

License: BSD 3-Clause "New" or "Revised" License

Shell 0.04% Java 99.96%

java liblinear logistic-regression svm

liblinear-java's People

Contributors

Stargazers

Watchers

Forkers

freehello kzn scaleunlimited tomz jnioche mikiobraun clearspring ywdong moaikids utcompling vivizhyy strategist922 nesscomputing demand-side-science jinbochen gridline xiang7 crazysnailer jonhare 13911205241 hodzanassredin loull521 njuhugn bigbear2017 fabianmurariu vntk gw901226 m4rkl1u tanghuang emailhxn wachaong shiwuyisheng vishgupt provemyself windj007 rvitale luyee kumaramit01 frederik-vaassen cerisara weixiaohua mindis zhoujialinmumu lekster katrinatviglink hjrnunes hezila cheesinglee jeremyschiff carrot-garden blaisdellk mingminzhen ktham savanti fanzhw wizardmc yunguangwang891017 dujintao ahmanz betoboullosa wangzhiwei-ai chschroeder sunnwind leaveslr jammy112 fanweihua 0704681032 lamour1314 huntertan tonylibing scr entylop jialinjiao sxjzank lambda-ai wanghao2020 chenying99 dondanidang mulloymorrow cokochiaki iuliandumitru jasonliucloudwise tywe bodegfoh bodenc salimm nonego petergaultney rxt2012kc 466152112 singularity-sg appsecai-test anjingde turn pustar jingchun01 woodywang0 charmgil jeffpasternack searchmodel

liblinear-java's Issues

port liblinear 1.91

Any plans to port the new version of liblinear? I saw on the liblinear homepage:

Version 1.91 released on April 26, 2012
http://www.csie.ntu.edu.tw/~cjlin/liblinear/

P.S. Thanks for making a Java version of liblinear available!

In programmatic usage of the code, it is not possible to set the option for predicting probabilities (the -b option in the command line and setting flag_predict_probability in Predict.java).
I think you should add a new field to (say flag_predict_probability ) to Params class and set the parameter there, so that one can predict with probabilities.
Daniel

Different Results With the Same Experiment

I get slightly different results by running the same experiment (LogReg L1, reg=0.3) each time. Is that possible or must there be a bug either with the library or with my code?

This is not the case if I use LogReg L2. I get the exactly same results. I am testing out the other LogReg L1 implementations (StanfordNLP and Smile) as well. Both's results are deterministic.

Does liblinear-java support incremental training?

Hello, I was looking at cjlin1/libsvm#113 and it mentioned that liblinear would be able to, now I've seen that to add support for incremental training you need to use the following extention: https://www.csie.ntu.edu.tw/~cjlin/papers/ws/

Does liblinear-java have this built-in or not? I am working with a relatively large dataset and I can't really load all of it at once in memory.

How to obtain the support vectors

Hi --

I am using Weka's LibLINEAR class and want to obtain the support vectors after the classifier has been trained.

Is there an example showing how this can be done?

Thanks,
Haimonti

Linear.loadModel(Reader) should not close the reader

It should be the client responsibility to decide whether or not to close the reader in Linear.loadModel(Reader).

Reason: I'm storing the model in a zip file with other information. If Linear.loadModel(Reader) closes the reader, this will close the zip input stream, which breaks my code.

Is a bias term added to the features automatically?

I am maintaining code that I mostly didn't write which uses liblinear for logistic regression. My understanding from the documentation was that setting bias to a value greater than 0 will result in a synthetic feature being added. But I cannot see anywhere in the code where this feature is added either during training or prediction. Is it required to both set the bias parameter to a value greater than 1 and also manually add the synthetic feature node during training and prediction?

Suppress output to console when using API

There currently seems to be no way to suppress the console output created in Tron.java
It would be nice to be able to deactivate that.

Bias parameter not used in Linear.predictValues()

I am using a logistic regression model with 2 features and a bias.

I would expect the score to be calculated as

w1*x1 + w2*x2 + bias

but looking at

liblinear-java/src/main/java/de/bwaldvogel/liblinear/Linear.java

Line 503 in 9951888

dec_values[i] += w[(idx - 1) * nr_w + i] * lx.getValue();

it seems like the bias parameter is never added to dec_values.

Am I right to think the bias parameter should contribute to the score or is my understanding incorrect?

Overload Train.readProblem to take an InputStream instead of File

Currently, we have to create a file on the disk to be able to train the model. It would be nice if we could create and pass the dataset in memory.

Documentation For Java API

Hi,

This is a great library. I've been looking for a lightweight, reliable and commercial friendly Java implementation of LogReg, MaxEnt and SVM for a while and surprisingly it wasn't easy as I thought. Could you provide some Java code examples for the basic usage of the API? I could figure out the following but I wonder if there is more to know.

        Problem problem = Train.readProblem(new File("train.libsvm"), 1);
        Problem testProb = Train.readProblem(new File("test.libsvm"), 1);

        SolverType solver = SolverType.L2R_LR; // -s 0
        double C = 1.5;    // cost of constraints violation
        double eps = 0.01; // stopping criteria

        Parameter parameter = new Parameter(solver, C, eps);
        final Model model = Linear.train(problem, parameter);
        File modelFile = new File("model");
        model.save(modelFile);
        for (int i = 0; i < testProb.x.length; i++) {
            Feature[] instance = testProb.x[i];
            double prediction = Linear.predict(model, instance);
        }

你好我在用这个东西

你好我在用这个东西.挺好用的.但是有个问题呢...怎么能返回每个类别属于这个类别的分数..因为数据中会有.一些文本..任何类都不属于..我想得道这个分数弄个阈值..来过滤..当然..你要是看不懂中文...额...

how to get belongs to some class score?

predictValues() does not map labels

predictValues() and therefore predictProbability() fill the score array with scores according to the internal label representation, without being mapped by Model.label[].

First, Model.label[] is private, so it's not possible to guess the mapping from the application, and the number of labels in the internal representation might be less than the actual number of labels due to labels not seen in training. Then, the size the scores[] array provided to predictValues() is impossible to guess.

Would it be possible to either expose the mapping or map the scores array?

Affects 32c64ff.

Port Liblinear weights

Port http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/weights/liblinear-weights-1.94.zip

No option for no bias feature for Train.readProblem()

I have two options to map my Dataset object to a liblinear Problem object.

1-) Convert programmatically. This is what I do normally. I also add a bias=1 feature with this method. And it works just fine.

2-) First save the Dataset object as .libsvm and then call the Train.readProblem(). However, readProblem() adds a bias feature by default. Even if I pass bias=0, an extra feature is added for all instances. This is the reason that I can't reproduce the exact same accuracy which I get with the 1st method. I get slightly worse results.

Here you may recommend not adding the bias value while saving as .libsvm since it is already going to be added by the readProblem(). But I create training and test sets separately and create two libsvm files. Let's imagine a situation like this; the training libsvm file has 100 features and in the test libsvm file, no instance has the 100th feature. Thus when I call readProblem() to read training and test problems, the test problem's dimension is 99 not 100. To prevent this I add the bias feature as the last feature by default. This way both sets has 101 features.

In conclusion, I think the bias feature must be optional for the Train.readProblem() method.

Assertions invalid

Some of the assert statements in the code are not valid. Specifically, in Train.java on lines 269 and 272, these assertions can throw ArrayIndexOutOfBoundsException if the libsvm file contains rows without any features. Once I turned off the assertions in my unit tests, everything working fine. Therefore, the rest of the code correctly handles rows without features.

Implement init_sol setter in Parameter

There is currently a field init_sol in Parameter class that is used to initialize model weights. However there is no setter for this field. Could you implement the setter?

Thread safety problem in predict: flag_predict_probability shouldn't be static

The flag_predict_probability boolean in predict is declared static. This causes a problem when running 2 different predict jobs in the same Java process with different probability options: the setting from the second call will overwrite the one from the first call. This may cause a prediction using a non-probabilty-capable solver type to fail, despite being called with correct parameters, if a simultaneous job runs with probabilities enabled.

how to get the separating hyperplane parameters or separating hyperplane equation？

Hi,Could you help me solve this problem?
how to get the separating hyperplane parameters or separating hyperplane equation ?

NPE when setting -wi parameter

java -cp /home/pebble/.m2/repository/de/bwaldvogel/liblinear/1.8/liblinear-1.8.jar de.bwaldvogel.liblinear.Train -v 10 -c 10 -w1 2 vector
Exception in thread "main" java.lang.NullPointerException
at java.lang.System.arraycopy(Native Method)
at de.bwaldvogel.liblinear.Train.parse_command_line(Train.java:123)
at de.bwaldvogel.liblinear.Train.run(Train.java:287)
at de.bwaldvogel.liblinear.Train.main(Train.java:19)

whereas the native liblinear implementation behaves as expected

Bias term is added by default

The following code will result a Model object where model.nr_feature = n-1 even if the number of features in the dataset is n excluding the bias term.

problem.l = l
problem.n = n
problem.x = x
problem.y = y
...
Model model = Linear.train(problem, parameter);

This is because in the above code, bias term is added implicitly (default value for problem.bias is 0) and in Linear.train() there is a line if (prob.bias >= 0) model.nr_feature = n - 1;. To avoid this we can change line problem.n = n as problem.n = n+1. But the Java Api documentation is misleading and CLI documentation says the default value for bias is -1 which makes one think that it is also the case for programmatic access.

InvalidInputDataException: indices must be sorted in ascending order (line 479)

liblinear has problems reading libsvm formatted files which use index 0

Is there a reason for this or is it just a bug?
If it is a bug could you change line 307 in the Train class to
int indexBefore = -1;

Performance improvements

This would slightly differ from the original implementation, however, if we replace all instances of

x += y * z;

with

x = Math.fma(y, z, x);

we can get a huge speedup, for example, replacing all occurrences of this in just the de.bwaldvogel.liblinear.SparseOperator class we can get a 2x speed up on machines with FMA enabled.

However it would require switching to Java 9.

NullPointerException when using sparse data to train Model

Hello everyone,

I am currently working on a project that use your library to train classification models, I build a Problem object with sparse feature vectors (following some of your examples) and when I start the training with such Problem object I get NPE from the train method occurring when it browse through feature vectors.
When looking at your code I can see you browse through features in a "for each" way, but when I follow the exception using debugging tools, I see a "for i=0 to n" logic, that will obviously lead to point toward a null feature in a sparse feature vector.

I have Mac os X 11.2.3 and I use oracle jdk 11.0.6.

Linear's global RNG makes it difficult to reproduce models or track concurrent executions

We use liblinear-java in Tribuo, and it’s working very well. We’re adding a reproducibility package to Tribuo to rebuild Tribuo models from the provenance metadata they carry, and as part of the tests for that package I noticed that liblinear-java has a global RNG that causes some of the algorithms to not produce bit-wise exact reproductions when executed on the same inputs. In general Tribuo tracks all RNG state and manages it to ensure that concurrent training runs use independently tracked streams of random numbers for provenance purposes, and the global shared state in liblinear-java means we can’t effectively track it and so we’ll have to enforce sequential use of liblinear-java via synchronization and consistently reset the RNG to a known state.

Is it possible to move the static random instance in Linear into Problem as an instance field? To preserve the original behaviour it could initialize itself to a Random instance using DEFAULT_RANDOM_SEED, or the code could be modified so it defaults to the global RNG if no instance RNG is present in the Problem. The first option would basically just be a find/replace on random with prob.random, along with adding the extra field to Problem (I think it touches approximately 9 lines). The second option would be a little more involved as it requires guards on the 8 uses of random and thus would slightly increase divergence from the C++ liblinear, so might not be as desirable from a maintainability perspective. However it would preserve the existing behaviour exactly for users who don’t set the random field on Problem. We’d be happy to contribute either patch if you’d accept it.