Giter Club home page Giter Club logo

python-libffm's Introduction

libffm

A Python wrapper for the libffm library.

Quick start

git clone [email protected]:turi-code/GraphLab-Create-SDK.git sdk
git clone [email protected]:turi-code/python-libffm.git ffm
cd ffm
make

To run the following examples you will also need to register for GraphLab Create. This software is free for non-commercial use and has a 30 day free trial otherwise.

After that, try running the basic example:

ipython examples/basic.py

If you want to try a less synthetic example, download the 1TB Criteo dataset. First test things out with a small sample of the dataset.

gzip -cd day_0.gz| head -n 1000000 > criteo-sample.tsv

Next we have a sample script for performing some of the same types of feature engineering that the contest winners have been using:

ipython examples/criteo_process.py

Train a FFM model on this data.

ipython examples/criteo_sample.py

You should see something like the following (which appears to be overfitting in this example):

PROGRESS: iter   tr_logloss   va_logloss
PROGRESS:    0      0.12794      0.12353
PROGRESS:    1      0.10907      0.12636
PROGRESS:    2      0.09263      0.13318
PROGRESS:    3      0.07679      0.14200
PROGRESS:    4      0.06411      0.15130
PROGRESS:    5      0.05484      0.16034
...

Usage

The package makes it easy to train models directly from SFrames.

import ffm

train = gl.SFrame('examples/small.tr.sframe')
test = gl.SFrame('examples/small.te.sframe')

m = ffm.FFM(lam=.1)
m.fit(train, target='y', nr_iters=50)
yhat = m.predict(test)

Each column is interpreted as a separate "field" in the model. Only dict columns are currently supported, where the keys of each dict are integers that represent the feature id.

Code

  • libfmm.cpp: uses C++ macros provided by Turi's SDK to wrap libffm's methods as Python classes and methods.
  • fmm.py: a scikit-learn-style wrapper.
  • lib/: the original library, where cout statements have been replaced with Turi's progress_stream to allow progress printing to Python.
  • examples/: example scripts for training models using the sample data provided with the original package as well as with data similar to Kaggle's criteo competition.

More details

For more on how and why we made this, see the blog post.

License

This package provided under the 3-clause BSD license.

python-libffm's People

Contributors

biancamo avatar chrisdubois avatar timmuss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-libffm's Issues

Error in Make process: 'No such file or directory'

I am running the quick start procedures on Windows terminal. After I do 'cd ffm' and then 'make', I receive the following error:

'fatal error: parallel/winpthreadsll.h: No such file or directory'

I attached a screenshot of the full output below.

ffmscreen

make error

I got an error like:
../sdk/graphlab/parallel/winpthreadsll.h:50:46: fatal error: cross_platform/windows_wrapper.hpp: No such file or directory

win10, how to fix this, thx

compile error from Ubuntu system

I guess I don't have all dependencies install. I have installed graph lab create. But here is my error:

g++ -o libffm.so -O3 -std=c++11 -I ../sdk -shared -fPIC -march=native -fopenmp src/libffm.cpp lib/ffm.o
src/libffm.cpp: In function 'graphlab::gl_sarray predict_sframe(ffm::ffm_model_, graphlab::gl_sframe, std::string, std::vectorstd::basic_string)':
src/libffm.cpp:53:63: error: call of overloaded 'get_column_index(graphlab::gl_sframe&, std::string&)' is ambiguous
size_t target_col_idx = get_column_index(data, target_column);
^
src/libffm.cpp:53:63: note: candidates are:
src/libffm.cpp:22:8: note: size_t get_column_index(graphlab::gl_sframe, std::string)
size_t get_column_index(gl_sframe sf, string colname) {
^
In file included from src/libffm.cpp:16:0:
../sdk/../ffm/lib/ffm.h:22:8: note: size_t ffm::get_column_index(graphlab::gl_sframe, std::string)
size_t get_column_index(graphlab::gl_sframe sf, std::string colname) {
^
src/libffm.cpp:56:58: error: call of overloaded 'get_column_index(graphlab::gl_sframe&, std::basic_string&)' is ambiguous
feature_col_idxs.push_back(get_column_index(data, col));
^
src/libffm.cpp:56:58: note: candidates are:
src/libffm.cpp:22:8: note: size_t get_column_index(graphlab::gl_sframe, std::string)
size_t get_column_index(gl_sframe sf, string colname) {
^
In file included from src/libffm.cpp:16:0:
../sdk/../ffm/lib/ffm.h:22:8: note: size_t ffm::get_column_index(graphlab::gl_sframe, std::string)
size_t get_column_index(graphlab::gl_sframe sf, std::string colname) {
^
make: *_* [libffm.so] Error 1

libffm import error

I got an error when try to import libffm within python:
ImportError: ./libffm.so: undefined symbol: _ZTIN8graphlab18toolkit_class_baseE

Mention that i have git clone and make the GraphLab-Create-SDK, python-libffm(added to pythonpath), also register and install the graphlab via pip.

does anyone meet the same error?

Cannot install libffm

I get the following error during make

../sdk/graphlab/cppipc/server/comm_server.hpp:22:41: fatal error: nanosockets/socket_errors.hpp: No such file or directory compilation terminated. Makefile:16: recipe for target 'libffm.so' failed make: *** [libffm.so] Error 1

error 'ValueError: sample larger than populatio' running BPR() model

HI all,

I've run libfm in Ubuntu using the dataset detailed below,
Random model run OK,
however when I tried the rest of the models in my list (I followed the models_example.py example provided here), i.e. BPR, TFIDFModel, Popularity, TensorCoFi, ..., an error "ValueError: sample larger than population" is always triggered with every model.

Please, does anyone know what could be the source of this problem? any suggestion?
There are many entries in Internet related with this problem in python, but the answers and potential causes described I think doesn't apply this case, so is unclear for me.
Btw, the dataset size is bigger than 5K rows...

Thanks in advance,
regards,
R.
------------------ test:

python modeltest2.py
user item rating time title
0 1123 0 2 838985046 NameFilm
1 1107 0 1 838985046 NameFilm
2 1107 0 1 838985046 NameFilm
3 1107 0 2 838985046 NameFilm
4 1107 1 1 838985046 NameFilm
0:00:00.083082 Random [0.262394934911661]
0:00:09.887563 BPR (dim=10,iter=15,reg=0.0001,eta=0.001)

Traceback (most recent call last):
File "modeltest2.py", line 57, in
print evaluator.evaluate_model(m, testing, all_items=items,)
File "build/bdist.linux-x86_64/egg/testfm/evaluation/evaluator.py", line 83, in evaluate_model
File "build/bdist.linux-x86_64/egg/testfm/evaluation/evaluator.py", line 30, in partial_measure
File "/usr/lib/python2.7/random.py", line 321, in sample
raise ValueError("sample larger than population")
ValueError: sample larger than population

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.