Giter Club home page Giter Club logo

prediction-flow's Introduction

Build Status

PyPI version

prediction-flow

prediction-flow is a Python package providing modern Deep-Learning based CTR models. Models are implemented by PyTorch.

how to use

  • Install using pip.
pip install prediction-flow

feature

how to define feature

There are two parameters for all feature types, name and column_flow. The name parameter is used to index the column raw data from input data frame. The column_flow parameter is a single transformer of a list of transformers. The transformer is used to pre-process the column data before training the model.

  • dense number feature
Number('age', StandardScaler())
Number('ctr', None)
  • sparse category feature
Category('movieId', CategoryEncoder(min_cnt=1))
  • var length sequence feature
Sequence('genres', SequenceEncoder(sep='|', min_cnt=1))

transformer

The following transformers are provided now.

transformer supported feature type detail
StandardScaler Number Wrapper of scikit-learn's StandardScaler. Null value must be filled in advance.
LogTransformer Number Log scaler. Null value must be filled in advance.
CategoryEncoder Category Converting str value to int. Null value must be filled in advance using '__UNKNOWN__'.
SequenceEncoder Sequence Converting sequence str value to int. Null value must be filled in advance using '__UNKNOWN__'.

model

model reference
DNN -
Wide & Deep [DLRS 2016]Wide & Deep Learning for Recommender Systems
DeepFM [IJCAI 2017]DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
DIN [KDD 2018]Deep Interest Network for Click-Through Rate Prediction
DNN + GRU + GRU + Attention [AAAI 2019]Deep Interest Evolution Network for Click-Through Rate Prediction
DNN + GRU + AIGRU [AAAI 2019]Deep Interest Evolution Network for Click-Through Rate Prediction
DNN + GRU + AGRU [AAAI 2019]Deep Interest Evolution Network for Click-Through Rate Prediction
DNN + GRU + AUGRU [AAAI 2019]Deep Interest Evolution Network for Click-Through Rate Prediction
DIEN [AAAI 2019]Deep Interest Evolution Network for Click-Through Rate Prediction
OTHER TODO

example

movielens-1M

This dataset is just used to test the code can run, accuracy does not make sense.

amazon

accuracy

benchmark

acknowledge and reference

  • Referring the design from DeepCTR, the features are divided into dense (class Number), sparse (class Category), sequence (class Sequence) types.

prediction-flow's People

Contributors

dydcfg avatar github-hongweizhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

prediction-flow's Issues

pip install failed due to pandas issue

When I install with pip. An error occurred saying pandas built failed....
But my pandas is updated. Is there a way to use the package with "git clone"?

内存不足

我在自己的数据集通过DNN模型进行训练,但每一个epoch都会增加内存,大概训练10个epoch内存就爆了,请问该怎样解决?

内存:16GB
数据集大小:400MB

Incremental training

I am interested to use the DIEN model. What are the strategies to do incremental training to fit new user interactions and new items without having to retrain the whole model from scratch ?

I read the paper of the model and they only discuss how to reduce latency for model serving and not how to do incremental training.

RuntimeError: Expected object of scalar type Float but got scalar type Long for sequence element 1 in sequence argument at position #1 'tensors'

Traceback (most recent call last):
File "D:/项目/CTR/video-click-contest/src/model/Flow_DeepFM.py", line 272, in
fit(10, model, loss_func, optimizer, train_loader, valid_loader, notebook=True, auxiliary_loss_rate=0.1)
File "D:\项目\CTR\video-click-contest\prediction_flow\pytorch\functions.py", line 49, in fit
pred = model(batch)
File "D:\Programs\Anaconda\envs\python3.6\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "D:\项目\CTR\video-click-contest\prediction_flow\pytorch\deepfm.py", line 132, in forward
linear_concat = torch.cat(number_inputs, dim=1)
RuntimeError: Expected object of scalar type Float but got scalar type Long for sequence element 1 in sequence argument at position #1 'tensors'

在我自己的数据集上运行报错;
模型输入的构建方式应该是没有问题。

Possibility of adding DICE to replace prelu? Also, a small bug for the GPU implementation for newer PyTorch versions

Hi,

谢谢大佬的code。想问一下在DIN/DIEN里是否会在未来加入DICE激活函数来取代torch自身提供的prelu?

目前发现的小问题,在gpu运行下(torch 1.9.0):
在DIEN.py里因为pytorch版本更新带来的一些new behaviour,需要加入
keys_length=keys_length.to(torch.device("cpu"))
作为torch里的改动要求(需要非cuda的cpu long type)。之后在interest.py里加入
keys_length=keys_length.to(torch.device("cuda" if torch.cuda.is_available() else "cpu"))
来改回为GPU。

非常感谢!

How to deal with the multiple lables?

I'm frustrated when I realize the ESMM model based on prediction-flow, it has two labels which are click and post-click-conversation lables. It seems like create_dataloader_fn should be rewriten?

我发现在DIEN论文里它的book数据集可以达到84,为什么我使用了Auxloss,还是无法达到84的效果。

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

attendance one key and many values

pairs=[{'ad': 'q_topic_1', 'pos_hist': 'm_interested_topics'},
{'ad': 'q_topic_2', 'pos_hist': 'm_interested_topics'},
{'ad': 'q_topic_3', 'pos_hist': 'm_interested_topics'},
{'ad': 'q_topic_4', 'pos_hist': 'm_interested_topics'},
{'ad': 'q_topic_5', 'pos_hist': 'm_interested_topics'}
],

RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor

Describe the bug
When I run DIEN model, I get this error. I tried lengths.cpu(), lengths.to('cpu') but none of them work. Would you provide a solution for this?

DIEN_augru
HBox(children=(FloatProgress(value=0.0, description='training routine', max=2.0, style=ProgressStyle(descripti…
HBox(children=(FloatProgress(value=0.0, description='train', max=8486.0, style=ProgressStyle(description_width…
HBox(children=(FloatProgress(value=0.0, description='valid', max=947.0, style=ProgressStyle(description_width=…
GPU is available, transfer model to GPU.
Traceback (most recent call last):

  File "<ipython-input-47-cf075d4611bd>", line 18, in <module>
    scores1, model_loss_curves1 = run(models)

  File "<ipython-input-47-cf075d4611bd>", line 9, in run
    train_loader, valid_loader, notebook=True, auxiliary_loss_rate=1)

  File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/prediction_flow/pytorch/functions.py", line 57, in fit
    pred = model(batch)

  File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)

  File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/prediction_flow/pytorch/dien.py", line 100, in forward
    query, pos_hist, keys_length, neg_hist))

  File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)

  File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/prediction_flow/pytorch/nn/interest.py", line 235, in forward
    enforce_sorted=False)

  File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/torch/nn/utils/rnn.py", line 244, in pack_padded_sequence
    _VF._pack_padded_sequence(input, lengths, batch_first)

RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor

Additional context
Pytorch 1.7.1v

deepfm跑movielens-1M的AUC是多少?

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.
请问你的deepfm跑movielens-1M的AUC是多少?我看网上别人动不动就是0.88左右的,我自己写的跑测试集才0.77至0.85左右AUC,可能是我的参数没优化好

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

执行movielens-1m.ipynb时报错
Traceback (most recent call last):
File "D:/项目/CTR/prediction-flow/examples/movielens/movielens-1m.py", line 118, in
fit(10, model, loss_func, optimizer, train_loader, valid_loader, notebook=True, auxiliary_loss_rate=0.1)
File "D:\项目\CTR\prediction-flow\prediction_flow\pytorch\functions.py", line 57, in fit
pred = model(batch)
File "D:\Programs\Anaconda\envs\python3.6\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "D:\项目\CTR\prediction-flow\prediction_flow\pytorch\interest_net.py", line 200, in forward
feature.name](x[feature.name])
File "D:\Programs\Anaconda\envs\python3.6\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "D:\Programs\Anaconda\envs\python3.6\lib\site-packages\torch\nn\modules\sparse.py", line 114, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "D:\Programs\Anaconda\envs\python3.6\lib\site-packages\torch\nn\functional.py", line 1467, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.