Giter Club home page Giter Club logo

cdfm's Introduction

Combination-dependent Factorization Machines

License: MIT codecov PyPI

requirements

  • CPython 3.6.x, 3.7.x

dependencies

install

$ pip install cdfm

usage

1. prepare your dataset

The dataset format is like SVM-rank one. The difference is eid must be specified in a line. Here is a definition of a line. | symbol means OR (so <str>|<int> means the value must have either str or int type).

fundamentals

<line>     .=. <label> qid:<qid> eid:<eid> <features>#<comments>

<label>    .=. <float>|<str as a class>
<qid>      .=. <str>|<int>
<eid>      .=. <str>|<int>
<features> .=. <dim>:<value>
<dim>      .=. <0 or Natural Number>
<value>    .=. <float>
<comments> .=. <Any text will do>

Let me show you an example.

0.5 qid:1 eid:x 1:0.1 2:-0.2 3:0.3 # comment A
0.0 qid:1 eid:y 1:-0.1 2:0.2 4:0.4
-0.5 qid:1 eid:z 2:-0.2 3:0.3 4:-0.4 # comment C
0.5 qid:2 eid:y 1:0.1 2:-0.2 3:0.3
0.0 qid:2 eid:z 1:-0.1 2:0.2 4:0.4
-0.5 qid:2 eid:w 2:-0.2 3:0.3 4:-0.4 # comment E

distance factors

Additionally, you can use distance between entities in a group.

<line>     .=. qid:<qid> eid:<eid> cid:<cid> <factors> # <comments>

<cid>      .=. <str>|<int>
<factors>  .=. <dim>:<value>
<div>      .=. <0 or Natural Number>
<value>    .=. <float>
<comments> .=. <Any text will do>

Let me show you an example.

qid:3 eid:x cid:y 1:0.5 2:-0.3 3:1.2 # comment A
qid:3 eid:x cid:z 1:0.0 2:0.2 3:0.8 # comment B
qid:3 eid:y cid:z 1:0.2 2:0.3 3:-0.7 # comment C

2. loading your dataset

from cdfm.utils import load_cdfmdata

# loading dataset as a DataFrame object
# 1. features
features_path = '/path/to/features'
n_dimensions = 10
features = load_cdfmdata(features_path, n_dimensions)
# features.columns
# >>> Index(['label', 'qid', 'eid', 'features'], dtype='object')

# 2. proximities
proximities_path = '/path/to/proximities'
n_dimensions = 2
proximities = load_cdfmdata(proximities_path, n_dimensions, mode='proximity')
# proximities.columns
# >>> Index(['qid', 'eid', 'cid', 'proximities'], dtype='object')

# some preprocessing here...

# Finally, build a dataset
train = build_cdfmdata(features)  # using features only
train = build_cdfmdata(features, proximities)  # using proximities

3. fitting the model

from cdfm.models import CDFMRanker

# define your model
model = CDFMRanker(k=8, n_iter=300, init_eta=1e-2)
# fitting, printing out epoch losses if verbose is True
model.fit(train, verbose=True)

4. save the model

import pickle

with open('/path/to/file.pkl', mode='wb') as fp:
    pickle.dump(model, fp)

5. make prediction

# loading test dataset
test_df = load_cdfmdata(test_path, n_dimensions)
test = build_cdfmdata(test_df)
pred = model.predict(test)

examples

Tutorial using NAR Horse Racing dataset.

# pwd
# >>> path/to/cdfm
$ mkdir dumps
$ mkdir dumps/models       # pickle dumps fitted models.
$ mkdir dumps/predictions  # pandas dumps evaluation dataset.

$ python example.py --k 2 --n-iter 100

development

# 1. install develop dependencies
$ pip install -e .[dev]

# 2. linting
$ pylint cdfm  # check pylintrc for more details...

# 3. type checking
$ mypy @mypy_check_files --config-file=mypy.ini

# 4. testing
$ pytest

cdfm's People

Contributors

moriaki3193 avatar

Stargazers

 avatar

Forkers

sandy4321

cdfm's Issues

微分結果の式を追加する

  • 各関数の入力はEqnと同様Pointwiseなものとする
  • 以下の式の各パラメータでの微分結果を実装する
    • Iec
    • Ief
    • Iff
    • p_Iec
  • それぞれ対応するテストを書く

CDFMRankerの実装

  • 確率的勾配降下法(ミニバッチではない)で最適化を行う
  • 損失関数の設定とその微分結果が正しいかどうかを再確認する

Record Class の実装

  • 次のフィールドを持つNamedTupleとして実装する
    • qid
    • eid
    • cids
    • features
    • proximities
  • 属性へのアクセスはインデクシングで行うほうがパフォーマンスが良い

モデルの式を追加する

  • 全ての入力は一つのインスタンスについてのもの(Pointwiseな入力)とする
  • 対応するテストコードを追加する
  • 各関数のプロファイルを取る

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.