bwaldvogel / liblinear-java Goto Github PK
View Code? Open in Web Editor NEWJava version of LIBLINEAR
Home Page: https://liblinear.bwaldvogel.de
License: BSD 3-Clause "New" or "Revised" License
Java version of LIBLINEAR
Home Page: https://liblinear.bwaldvogel.de
License: BSD 3-Clause "New" or "Revised" License
Any plans to port the new version of liblinear? I saw on the liblinear homepage:
Version 1.91 released on April 26, 2012
http://www.csie.ntu.edu.tw/~cjlin/liblinear/
P.S. Thanks for making a Java version of liblinear available!
In programmatic usage of the code, it is not possible to set the option for predicting probabilities (the -b option in the command line and setting flag_predict_probability in Predict.java).
I think you should add a new field to (say flag_predict_probability ) to Params class and set the parameter there, so that one can predict with probabilities.
Daniel
I get slightly different results by running the same experiment (LogReg L1, reg=0.3) each time. Is that possible or must there be a bug either with the library or with my code?
This is not the case if I use LogReg L2. I get the exactly same results. I am testing out the other LogReg L1 implementations (StanfordNLP and Smile) as well. Both's results are deterministic.
Hello, I was looking at cjlin1/libsvm#113 and it mentioned that liblinear
would be able to, now I've seen that to add support for incremental training you need to use the following extention: https://www.csie.ntu.edu.tw/~cjlin/papers/ws/
Does liblinear-java
have this built-in or not? I am working with a relatively large dataset and I can't really load all of it at once in memory.
Hi --
I am using Weka's LibLINEAR class and want to obtain the support vectors after the classifier has been trained.
Is there an example showing how this can be done?
Thanks,
Haimonti
It should be the client responsibility to decide whether or not to close the reader in Linear.loadModel(Reader).
Reason: I'm storing the model in a zip file with other information. If Linear.loadModel(Reader) closes the reader, this will close the zip input stream, which breaks my code.
I am maintaining code that I mostly didn't write which uses liblinear for logistic regression. My understanding from the documentation was that setting bias to a value greater than 0 will result in a synthetic feature being added. But I cannot see anywhere in the code where this feature is added either during training or prediction. Is it required to both set the bias parameter to a value greater than 1 and also manually add the synthetic feature node during training and prediction?
There currently seems to be no way to suppress the console output created in Tron.java
It would be nice to be able to deactivate that.
I am using a logistic regression model with 2 features and a bias.
I would expect the score to be calculated as
w1*x1 + w2*x2 + bias
but looking at
it seems like thebias
parameter is never added to dec_values
.
Am I right to think the bias
parameter should contribute to the score or is my understanding incorrect?
Currently, we have to create a file on the disk to be able to train the model. It would be nice if we could create and pass the dataset in memory.
Hi,
This is a great library. I've been looking for a lightweight, reliable and commercial friendly Java implementation of LogReg, MaxEnt and SVM for a while and surprisingly it wasn't easy as I thought. Could you provide some Java code examples for the basic usage of the API? I could figure out the following but I wonder if there is more to know.
Problem problem = Train.readProblem(new File("train.libsvm"), 1);
Problem testProb = Train.readProblem(new File("test.libsvm"), 1);
SolverType solver = SolverType.L2R_LR; // -s 0
double C = 1.5; // cost of constraints violation
double eps = 0.01; // stopping criteria
Parameter parameter = new Parameter(solver, C, eps);
final Model model = Linear.train(problem, parameter);
File modelFile = new File("model");
model.save(modelFile);
for (int i = 0; i < testProb.x.length; i++) {
Feature[] instance = testProb.x[i];
double prediction = Linear.predict(model, instance);
}
你好我在用这个东西.挺好用的.但是有个问题呢...怎么能返回每个类别属于这个类别的分数..因为数据中会有.一些文本..任何类都不属于..我想得道这个分数弄个阈值..来过滤..当然..你要是看不懂中文...额...
how to get belongs to some class score?
predictValues() and therefore predictProbability() fill the score array with scores according to the internal label representation, without being mapped by Model.label[].
First, Model.label[] is private, so it's not possible to guess the mapping from the application, and the number of labels in the internal representation might be less than the actual number of labels due to labels not seen in training. Then, the size the scores[] array provided to predictValues() is impossible to guess.
Would it be possible to either expose the mapping or map the scores array?
Affects 32c64ff.
I have two options to map my Dataset
object to a liblinear Problem
object.
1-) Convert programmatically. This is what I do normally. I also add a bias=1
feature with this method. And it works just fine.
2-) First save the Dataset
object as .libsvm
and then call the Train.readProblem()
. However, readProblem()
adds a bias feature by default. Even if I pass bias=0
, an extra feature is added for all instances. This is the reason that I can't reproduce the exact same accuracy which I get with the 1st method. I get slightly worse results.
Here you may recommend not adding the bias value while saving as .libsvm
since it is already going to be added by the readProblem()
. But I create training and test sets separately and create two libsvm
files. Let's imagine a situation like this; the training libsvm file has 100 features and in the test libsvm file, no instance has the 100th feature. Thus when I call readProblem()
to read training and test problems, the test problem's dimension is 99 not 100. To prevent this I add the bias feature as the last feature by default. This way both sets has 101 features.
In conclusion, I think the bias feature must be optional for the Train.readProblem()
method.
Some of the assert statements in the code are not valid. Specifically, in Train.java on lines 269 and 272, these assertions can throw ArrayIndexOutOfBoundsException if the libsvm file contains rows without any features. Once I turned off the assertions in my unit tests, everything working fine. Therefore, the rest of the code correctly handles rows without features.
There is currently a field init_sol in Parameter class that is used to initialize model weights. However there is no setter for this field. Could you implement the setter?
The flag_predict_probability boolean in predict is declared static. This causes a problem when running 2 different predict jobs in the same Java process with different probability options: the setting from the second call will overwrite the one from the first call. This may cause a prediction using a non-probabilty-capable solver type to fail, despite being called with correct parameters, if a simultaneous job runs with probabilities enabled.
Hi,Could you help me solve this problem?
how to get the separating hyperplane parameters or separating hyperplane equation ?
java -cp /home/pebble/.m2/repository/de/bwaldvogel/liblinear/1.8/liblinear-1.8.jar de.bwaldvogel.liblinear.Train -v 10 -c 10 -w1 2 vector
Exception in thread "main" java.lang.NullPointerException
at java.lang.System.arraycopy(Native Method)
at de.bwaldvogel.liblinear.Train.parse_command_line(Train.java:123)
at de.bwaldvogel.liblinear.Train.run(Train.java:287)
at de.bwaldvogel.liblinear.Train.main(Train.java:19)
whereas the native liblinear implementation behaves as expected
The following code will result a Model
object where model.nr_feature
= n-1
even if the number of features in the dataset is n
excluding the bias
term.
problem.l = l
problem.n = n
problem.x = x
problem.y = y
...
Model model = Linear.train(problem, parameter);
This is because in the above code, bias
term is added implicitly (default value for problem.bias
is 0
) and in Linear.train()
there is a line if (prob.bias >= 0) model.nr_feature = n - 1;
. To avoid this we can change line problem.n = n
as problem.n = n+1
. But the Java Api documentation is misleading and CLI documentation says the default value for bias
is -1
which makes one think that it is also the case for programmatic access.
liblinear has problems reading libsvm formatted files which use index 0
Is there a reason for this or is it just a bug?
If it is a bug could you change line 307 in the Train class to
int indexBefore = -1;
This would slightly differ from the original implementation, however, if we replace all instances of
x += y * z;
with
x = Math.fma(y, z, x);
we can get a huge speedup, for example, replacing all occurrences of this in just the de.bwaldvogel.liblinear.SparseOperator
class we can get a 2x speed up on machines with FMA enabled.
However it would require switching to Java 9.
Hello everyone,
I am currently working on a project that use your library to train classification models, I build a Problem object with sparse feature vectors (following some of your examples) and when I start the training with such Problem object I get NPE from the train method occurring when it browse through feature vectors.
When looking at your code I can see you browse through features in a "for each" way, but when I follow the exception using debugging tools, I see a "for i=0 to n" logic, that will obviously lead to point toward a null feature in a sparse feature vector.
I have Mac os X 11.2.3 and I use oracle jdk 11.0.6.
We use liblinear-java in Tribuo, and it’s working very well. We’re adding a reproducibility package to Tribuo to rebuild Tribuo models from the provenance metadata they carry, and as part of the tests for that package I noticed that liblinear-java has a global RNG that causes some of the algorithms to not produce bit-wise exact reproductions when executed on the same inputs. In general Tribuo tracks all RNG state and manages it to ensure that concurrent training runs use independently tracked streams of random numbers for provenance purposes, and the global shared state in liblinear-java means we can’t effectively track it and so we’ll have to enforce sequential use of liblinear-java via synchronization and consistently reset the RNG to a known state.
Is it possible to move the static random
instance in Linear
into Problem
as an instance field? To preserve the original behaviour it could initialize itself to a Random instance using DEFAULT_RANDOM_SEED
, or the code could be modified so it defaults to the global RNG if no instance RNG is present in the Problem
. The first option would basically just be a find/replace on random
with prob.random
, along with adding the extra field to Problem
(I think it touches approximately 9 lines). The second option would be a little more involved as it requires guards on the 8 uses of random
and thus would slightly increase divergence from the C++ liblinear, so might not be as desirable from a maintainability perspective. However it would preserve the existing behaviour exactly for users who don’t set the random
field on Problem
. We’d be happy to contribute either patch if you’d accept it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.