shenweichen / deepctr-torch Goto Github PK

View Code? Open in Web Editor NEW

2.8K 2.8K 675.0 5.43 MB

【PyTorch】Easy-to-use,Modular and Extendible package of deep-learning based CTR models.

Home Page: https://deepctr-torch.readthedocs.io/en/latest/index.html

License: Apache License 2.0

Python 100.00%

ctr-models deep-learning deepctr deepctr-pytorch deepfm deeprec fibinet torchrec xdeepfm

deepctr-torch's Introduction

deepctr-torch's People

Contributors

Stargazers

Watchers

Forkers

gzgd-rtrs-group awesome-archive chenkkkk patrickcs01 yufengwhy damioncheng hjnhenry emilywangattri ajing praysunday wyuedgg lxytsos jangocheng dolisun tchigher econben xjdupeng zengai ellieokok chrisyxue eshaoliu zakra sailfish009 anhduc2203 zhouyonglong seongl shubhampachori12110095 qianrenjian zzalpha yfreedomlithu richiesui superrichiesui ykaneko1992 dreadlord1984 excuses123 qazcy1983 dingyh0626 helloyym jkomiyama miracle-fmh pyy0715 buaachuanwang linyishi matthew-tech machinelp algorithmbeginner newrgb stjordanis jamcodec aakashofficial jhhugo saviourz magicbupt amirstudy lennonmwy thinksee wenyanghan lihy96 liangsheng jiqiujia xw-jia fkwang guibeyonce dustofstars chunyuany monyapeng li-study arita37 curlykonda danni9594 tikitaka-ball i-hun pointcloudniphon gandad gsj1029 rohitk-singh shopngine wangwei19870604 hw-thomas yuanlenali binhmisfit weberrr wcq-glhf aichimashen fishredleaf luwenfeng123 shunlu91 htymn smartm001 silver391 michaelldd zhaolulul hanchenresearch khuongnd gaoskye gaomingweig sunupo neoscar zhoujiangbing seuzha

deepctr-torch's Issues

model save issue

Please refer to the FAQ in doc and search for the related issues before you ask the question.

Describe the question(问题描述)
A clear and concise description of what the question is.
Traceback (most recent call last):
File "test.py", line 113, in
torch.save(model,savePath+"DeepFM.h5")
File "/home/zhangkai/.conda/envs/pytorchDeepCTR/lib/python3.7/site-packages/torch/serialization.py", line 224, in save
return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
File "/home/zhangkai/.conda/envs/pytorchDeepCTR/lib/python3.7/site-packages/torch/serialization.py", line 149, in _with_file_like
return body(f)
File "/home/zhangkai/.conda/envs/pytorchDeepCTR/lib/python3.7/site-packages/torch/serialization.py", line 224, in
return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
File "/home/zhangkai/.conda/envs/pytorchDeepCTR/lib/python3.7/site-packages/torch/serialization.py", line 296, in _save
pickler.dump(obj)
AttributeError: Can't pickle local object 'BaseModel._get_metrics..'
Additional context
Add any other context about the problem here.

Operating environment(运行环境):

python version [e.g. 3.6]
torch version [e.g. 1.2.0,]
deepctr-torch version [e.g. 0.1.0,]

deepctr_torch(version 0.2.1) din can't support pytorch at version 1.5.0 above

Describe the bug(问题描述)
if run the din example code with pytorch 1.5.0 , get error as bellow
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

To solve(解决)

reinstall pytorch to 1.4.0

Operating environment(运行环境):

python version [e.g. 3.7]
torch version [e.g. 1.5.0]
deepctr-torch version [e.g. 0.2.0,]

Additional context
Add any other context about the problem here.

Provide learning rate parameter interface

I found that there is no interface for users to adjust learning rate for specific algorithm, so can u provide a parameter lr when state a algorithm class so that we could change this parameter value

如果训练数据太大，直接读取csv文件，压根内存放不下，不知道这些代码如何分批次读取文件的？

data = pd.read_csv('./criteo_sample.txt')
代码里这样读取文件的，请问如何分批处理，我的csv文件250G,哪有这么大的内存？

How to export DeepCTR model to onnx ?

Please refer to the FAQ in doc and search for the related issues before you ask the question.

Describe the question(问题描述)
I would like to export model to onnx format, so I can use it for example with Apache PredictionIO

Additional context
I tried with :

    dummy_input = np.array([np.array([[0]]), np.array([[0]])])
    pred_ans = model.predict(dummy_input, 100)
    print(pred_ans)
    torch.onnx.export(model, (dummy_input), "deepCTR.onnx")

predict works well, I am getting: [[0.0065056]]
but wwith onnx.export, I am getting error with this trace:

RuntimeError: Only tuples, lists and Variables supported as JIT inputs/outputs. Dictionaries and strings are also accepted but their usage is not recommended. But got unsupported type numpy.ndarray

Operating environment(运行环境):

python version: 3.7
torch version: 1.4.0
deepctr-torch version: 0.1.0

where is the QR code

Describe the question
I come from zhihu but could not find the QR code u said. Is there any problem with github？e.g prohibition or else？

Why the fm only work for sparse feature columns？

Is there any plan to add more models?

Hi,

Is there any plan to support the DLRM and NCF?

multi gpus

How to run the demo with multiple GPUs？

Dense value is not for DNN input in AutoInt

Describe the bug(问题描述)
In original paper of AutoInt, dense value is also converted to a fix length embedding. Using a DNN for dense value list may be wrong.

DeepCTR-Torch/deepctr_torch/models/autoint.py

Line 98 in 687a094

dnn_input = combined_dnn_input(sparse_embedding_list, dense_value_list)

torch 版本使用问题

torch 版本如何加入 earlystop 啊

if mengshen can share the data all in a link?, this data now is too small ,thanks!!

suggetion for a new function

I think is there any possible to add a function such as early_stop_rounds, like xgb.

din模型问题

Describe the bug(问题描述)
1 din.py文件forward 中keys length为常数1，不表示序列长度
2 部分sparse fear特征取了两次，lookup embedding中mask feature list不起作用

To Reproduce(复现步骤)
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Operating environment(运行环境):

python version [e.g. 3.5, 3.6]
torch version [e.g. 1.1.0, 1.2.0]
deepctr-torch version [e.g. 0.1.0,]

Additional context
Add any other context about the problem here.

DCN-M的tf版本近期会更新吗

CCPM model中ConvLayer(nn.Module) Layer层没有加tanh的激活层

原论文2.3节指出在pooling layer后加入tanh激活。

The output shape of Multihead_attention in Autoint should be (batchsize, filednum, atten_embedding)

Describe the bug(问题描述)
the change of the feature shape in Multihead_attention is :
(batchsize, filed_num, embedding_size)->(batchsize*h, filed_num, embedding_size/h)
->(batchsize, filed_num, embedding_size)

you can refer to the implementation of multihead_attention that the author referenced.
https://github.com/Kyubyong/transformer/blob/fb023bb097e08d53baf25b46a9da490beba51a21/modules.py#L153

Also, you can use torch.nn.MultiheadAttention directly in pytorch 1.1.0
https://pytorch.org/docs/1.1.0/_modules/torch/nn/modules/activation.html#MultiheadAttention

Thank you!

如果我想实现多分类，应该在哪里进行改动呢

Please refer to the FAQ in doc and search for the related issues before you ask the question.

Describe the question(问题描述)
A clear and concise description of what the question is.

Additional context
Add any other context about the problem here.

Operating environment(运行环境):

python version [e.g. 3.6]
torch version [e.g. 1.2.0,]
deepctr-torch version [e.g. 0.1.0,]

为什么运行会无端卡住

Describe the bug(问题描述)
运行example中的crite_classification时，在训练完一个epoch后就卡住了，无故不往下运行，不知道什么原因，其他的example也一样，第一个epoch的tqdm显示完就卡住了

Operating environment(运行环境):

python 3.6
pytorch 1.3.1

ZeroDivisionError: float division by zero

get error when run example below in windows 10. The model is default: FiBiNET.

python .\examples\run_classification_criteo.py

Traceback (most recent call last):
  File ".\examples\run_classification_criteo.py", line 54, in <module>
    l2_reg_embedding=1e-5, device=device)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\deepctr_torch\models\fibinet.py", line 53, in __init__
    self.SE = SENETLayer(self.filed_size, reduction_ratio, seed, device)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\deepctr_torch\layers\interaction.py", line 81, in __init__
    nn.Linear(self.filed_size, self.reduction_size, bias=False),
  File "C:\Users\civil\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\linear.py", line 77, in __init__
    self.reset_parameters()
  File "C:\Users\civil\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\linear.py", line 80, in reset_parameters
    init.kaiming_uniform_(self.weight, a=math.sqrt(5))
  File "C:\Users\civil\AppData\Roaming\Python\Python37\site-packages\torch\nn\init.py", line 316, in kaiming_uniform_
    std = gain / math.sqrt(fan)
ZeroDivisionError: float division by zero

But i just tried deepFM, it seems no float division by zero issue.

Planning to add DIN and DIEN in Torch?

Describe the question(问题描述)
Are you planning to add a PyTorch implementation of DIN and DIEN? If so, what's the current time horizon? Or how could one go about helping with that?

Additional context
Since there already exists a DIEN implementation in Tensorflow, this could be used as a great reference point for converting the model to PyTorch.

alpha should be wrappered with Parameter in Dice

As model.parameters() used in optim, alpha in Dice should use nn.parameter.Parameter to wrapper. Just like alpha in pRelu. Otherwise this paramter will fix to init value in the training process.

Why enable retain_graph in backward?

Describe the question(问题描述)
Why use retain_graph in model backward? Does the model need backward twice?

DeepCTR-Torch/deepctr_torch/models/basemodel.py

Line 226 in caa12dd

total_loss.backward(retain_graph=True)

one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [6725, 1]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True)

AutoInt implementation is tottaly different from the paper.

No embeddings for continuous values
No skip connections between attention blocks

a question about whether self.feature_index is in CPU or GPU

Hello,
I am relatively new to Pytorch. I have a question regarding the data in CPU/GPU, for example, in the source code here

DeepCTR-Torch/deepctr_torch/models/basemodel.py

Line 310 in bb60643

 def input_from_feature_columns(self, X, feature_columns, embedding_dict, support_dense=True): 

In particular,

sparse_embedding_list = [embedding_dict[feat.embedding_name]( X[:, self.feature_index[feat.name][0]:self.feature_index[feat.name][1]].long()) for feat in sparse_feature_columns]

It is pretty clear that X is the minibatch tensor data that has already been moved to GPU. embedding_dict is a Moduledict and feature_index is a plain Python Ordereddict. I had a few basic reading, it sounds like a dictionary object cannot directly be migrated into GPU, but only each tensor for each key can be copied one by one. So I am not sure for example, in the constructor init() of the BaseModel, what self.to(device) is going to do for its dictionary object?

That comes to my more specific question here, I am wondering when calling to build sparse_embedding_list, how Pytorch handles this, is everything here already in GPU when calling init() or some runtime copy of the value self.feature_index[feat.name] into GPU, does it incur any efficiency issue?

Thank you for your help!

导入包时，有时会有警告

测试用例：

from deepctr_torch.models import WDL,DeepFM
from deepctr_torch.inputs import  SparseFeat, DenseFeat,get_fixlen_feature_names

然后出现警告
/usr/local/anaconda3/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
return f(*args, **kwds)

我的版本是python3.7
貌似问题不严重，只不过希望能找出问题出在哪里

KeyError: "attribute 'update' already exists"

Hi, there，I happend to this error, I didn't figure out why

how to use this repo in multi-class classfication task?

I want to leverage some models in this repo to make prediction which is a multi-class classfication, could u give me some advise or suggestion? thx u very much

Linear model weight parameters (CUDA) are not included in state_dict()

Describe the bug
Linear model weight parameters are not included in state_dict() after sending to CUDA. I found this problem when loading the serialized model parameters with torch.save(model.state_dict(), PATH)

To Reproduce
The tensor is sent to other device after creating nn.Parameter, and so the returned instance is not a trainable tensor anymore. To avoid this, the to_(device) should be called first when creating a tensor, followed by creating nn.Parameter.

DeepCTR-Torch/deepctr_torch/models/basemodel.py

Lines 52 to 53 in bb60643

 self.weight = nn.Parameter(torch.Tensor(sum(fc.dimension for fc in self.dense_feature_columns), 1)).to( 

 device)

Similar issue can be found in pytorch forum.
https://discuss.pytorch.org/t/cuda-parameter-not-included-in-state-dict/30801

Operating environment:

python version 3.7
torch version 1.4.0
deepctr-torch version 1.0.3

想请教一下，训练完模型之后想实时预测，输入数据怎么处理呀

Please refer to the FAQ in doc and search for the related issues before you ask the question.

Describe the question(问题描述)
A clear and concise description of what the question is.
想请教一下，训练完模型之后想实时预测，输入数据怎么处理呀，尤其是sparse feature，不知道模型训练的时候是否有保存feature映射表
Additional context
Add any other context about the problem here.

Operating environment(运行环境):

python version [e.g. 3.6]
torch version [e.g. 1.2.0,]
deepctr-torch version [e.g. 0.1.0,]

A question about the AUC reduction after the version update

Describe the question
I have tried the 0.2.1 and 0.2.3 versions of DeepCTR respectively, but the same data set and parameters obtained a large difference in AUC and Loss in different versions of DeepCTR.In this experiment, I tried FibiNet and NFM respectively. The problem I encountered was that the AUC of DeepCTR version 0.2.3 was lower than version 0.2.1

Additional context
I have uploaded two versions of the project to Baidu Netdisk.The test data is a 25M Criteo data set randomly selected. After downloading, run the run.py of the two projects to see different AUC running results.This is the download link:

链接: https://pan.baidu.com/s/1191pHvL3wMaCM5TsAo4jgA 提取码: hexw

Please download and troubleshoot this issue, thank you again for your contribution.

Operating environment

python version 3.6.5
torch version 1.6.0
deepctr-torch version0.2.1&0.2.3

pred_ans 输出的测试样本的预测值为什么全都小于1

Describe the bug(问题描述)
运行这个example

run_regression_movielens.py 结果所有的预测 pred_ans （line 45）都非常小，运行了几次几乎全都小于1. 这是因为数据集太少还是模型的参数没调好？MSE 非常大

pred_ans ：（每一行是每个测试样本的预测rating的值）
array([[0.301614 ],
[0.3022644 ],
[0.32717213],
[0.2999353 ],
[0.30203775],
[0.3013655 ],
[0.31287715],
[0.3015241 ],
[0.28850323],
[0.30220094],
[0.30088642],
[0.30200478],
[0.30210623],
[0.30099627],
[0.30173683],
[0.30157563],
[0.31417343],
[0.3014299 ],
[0.300555 ],
[0.3021357 ],
[0.30141425],
[0.30120167],
[0.31250164],
[0.30241737],
[0.3014774 ],
[0.28925306],
[0.30212253],
[0.3021568 ],
[0.30231932],
[0.30191606],
[0.3141842 ],
[0.30059364],
[0.30211878],
[0.30189154],
[0.30140838],
[0.30042845],
[0.30151388],
[0.30248943],
[0.32791406],
[0.3018949 ]], dtype=float32)

真是的预测值 rating 在测试集上是： [3, 5, 3, 5, 4, 4, 5, 5, 4, 3, 3, 3, 3, 2, 3, 2, 4, 5, 4, 2, 3, 3, 4, 1, 5, 4, 3, 2, 2, 4, 3, 5, 2, 3, 3, 3, 3, 1, 3, 4]

Pytest for AFM model is not stable

Describe the bug(问题描述)

If we run several pytest, some of them would result in failure of tests/models/AFM_test.py

To Reproduce(复现步骤)

Steps to reproduce the behavior:

Go to main directory: DeepCTR-Torch/
Run pytest in command line for multiple times
See error

Operating environment(运行环境):

python version: 3.6.4
torch version: 1.3.1
deepctr-torch version: dev branch

Additional context

Details of the error:

when prediction value is very small, metric logloss calculate error

Describe the bug(问题描述)
/usr/local/anaconda3/lib/python3.7/site-packages/sklearn/metrics/classification.py:2174: RuntimeWarning: divide by zero encountered in log
loss = -(transformed_labels * np.log(y_pred)).sum(axis=1)

To Reproduce(复现步骤)
Steps to reproduce the behavior:

model = xDeepFM(linear_feature_columns,dnn_feature_columns,task='binary',device=device)
model.compile("adam", "binary_crossentropy",
              metrics=['log_loss'], )

Operating environment(运行环境):

python version [3.7]
torch version [1.1.0]
deepctr-torch version 0.1.2
sklearn version 0.0

Additional context
my model and data will predict very small value like 0.0001 and value close to 1 like 0.9998 even appear 0 and 1. so it will make skearn metrci caculate error.

运行How to add a long dense feature vector as a input to the model?报错

Describe the bug(问题描述)
A clear and concise description of what the bug is.

To Reproduce(复现步骤)
Steps to reproduce the behavior:
https://deepctr-torch.readthedocs.io/en/latest/FAQ.html#how-to-add-a-long-dense-feature-vector-as-a-input-to-the-model 网站中提供的代码运行错误。
basemodel.py中的
np.hstack(list(map(lambda x: np.expand_dims(x, axis=1), x))))
报错
ValueError: all the input array dimensions except for the concatenation axis must match exactly

Operating environment(运行环境):

python version 3.7
torch version 1.2.0
deepctr-torch version 0.1.1

Additional context
Add any other context about the problem here.

Example error run_classification_criteo.py

Describe the bug(问题描述)
A clear and concise description of what the bug is.
cuda ready...
cuda:0
Train on 160 samples, validate on 0 samples, 5 steps per epoch
Traceback (most recent call last):
File "run_classification_criteo.py", line 62, in
batch_size=32, epochs=10, validation_split=0.0, verbose=2)
File "/home/xinyu/anaconda3/lib/python3.7/site-packages/deepctr_torch/models/basemodel.py", line 224, in fit
total_loss.backward(retain_graph=True)
File "/home/xinyu/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/xinyu/anaconda3/lib/python3.7/site-packages/torch/autograd/init.py", line 100, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [128]], which is output 0 of SelectBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

To Reproduce(复现步骤)
Steps to reproduce the behavior:

Just run run_classification_criteo.py

Operating environment(运行环境):

python version [3.7]
torch version [1.5.0]

Additional context
Add any other context about the problem here.

请问还是不是不支持变长序列跟sparse features共享emb？

Please refer to the FAQ in doc and search for the related issues before you ask the question.

Describe the question(问题描述)
A clear and concise description of what the question is.

Additional context
Add any other context about the problem here.

Operating environment(运行环境):

python version [e.g. 3.6]
torch version [e.g. 1.2.0,]
deepctr-torch version [e.g. 0.1.0,]

更新0.2.3后import deepctr_torch出现"ImportError: DLL load failed: 找不到指定的模块。"错误

Please refer to the FAQ in doc and search for the related issues before you ask the question.

Describe the question(问题描述)
A clear and concise description of what the question is.

Additional context
Add any other context about the problem here.

Operating environment(运行环境):

python version 3.5
torch version 1.2.0
deepctr-torch version 0.2.3

deepctr-torch版本怎么early_stopping呢？

Please refer to the FAQ in doc and search for the related issues before you ask the question.

Describe the question(问题描述)
A clear and concise description of what the question is.

Additional context
Add any other context about the problem here.

Operating environment(运行环境):

python version [e.g. 3.6]
torch version [e.g. 1.2.0,]
deepctr-torch version [e.g. 0.1.0,]

AFM的bias没有初始化，导致实验结果不同。

Describe the bug(问题描述)
AFM的bias没有初始化，导致实验结果不同。

To Reproduce(复现步骤)
由于这里bias没有初始化，且有梯度，导致不论是否使用attention network，每次结果均不同，且差异极大，一开始以为是随机种子问题，固定了之后，发现还是有问题。最后发现，原来有参数没有初始化。

Operating environment(运行环境):

python version [3.8]
torch version [1.4.0]
deepctr-torch version [0.8.0,]

Additional context
可以加上这一句

for tensor in [self.attention_b]:
    nn.init.zeros_(tensor,)

index out of range error when I use trained model to predict on another data set

I trained a classification model using DeepFM model, it worked well.
I load this trained model and use it to predict on another data set, error occurred. the error message is like below.

The root cause is on functional.py, caused by the following instrument, len(weight) =35, and an element in input is 35, so index out of range error occurred. How can I fix this?

    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

Error Message
Traceback (most recent call last):
File "D:/DeepCTR-Torch/examples/run_classification_churn.py", line 134, in
pred_ans = model.predict(test_model_input, 256)
File "D:\DeepCTR-Torch\deepctr_torch\models\basemodel.py", line 300, in predict
y_pred = model(x).cpu().data.numpy() # .squeeze()
File "D:\anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "D:\DeepCTR-Torch\deepctr_torch\models\deepfm.py", line 73, in forward
self.embedding_dict)
File "D:\DeepCTR-Torch\deepctr_torch\models\basemodel.py", line 324, in input_from_feature_columns
feat in sparse_feature_columns]
File "D:\DeepCTR-Torch\deepctr_torch\models\basemodel.py", line 324, in
feat in sparse_feature_columns]
File "D:\anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "D:\anaconda3\envs\torch\lib\site-packages\torch\nn\modules\sparse.py", line 114, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "D:\anaconda3\envs\torch\lib\site-packages\torch\nn\functional.py", line 1485, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: index out of range: Tried to access index 35 out of table with 34 rows. at C:\w\1\s\tmp_conda_3.6_171155\conda\conda-bld\pytorch_1570813991702\work\aten\src\TH/generic/THTensorEvenMoreMath.cpp:418

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

Describe the bug(问题描述)
A clear and concise description of what the bug is.
照着example跑deepfm的时候，model.fit的时候报错

Operating environment(运行环境):

python version 3.7.6
torch version 1.5.0+cu101
deepctr-torch version 0.2.1

想请教一下怎么调用GPU

您好，我跑了example里的几个程序，直接用nvprof查看GPU运行情况，都显示：No CUDA Application was profiled.
请问几个model中有可以调用GPU的吗？如果有，又怎么用工具（e.g. nvprof）显示调用中的GPU性能呢？

InteractingLayer的softmax dim = 1 是不是错了感觉应该等于3

Describe the bug(问题描述)
A clear and concise description of what the bug is.

To Reproduce(复现步骤)
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Operating environment(运行环境):

python version [e.g. 3.5, 3.6]
torch version [e.g. 1.1.0, 1.2.0]
deepctr-torch version [e.g. 0.1.0,]

Additional context
Add any other context about the problem here.

Pytorch 1.7.0 RuntimeError

Describe the bug(问题描述)
Install deepctr-torch using pip install -U deepctr-torch, run criteo example but get an error:
RuntimeError: Output 0 of UnbindBackward is a view and its base or another view of its base has been modified inplace. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.

To Reproduce(复现步骤)
Steps to reproduce the behavior:

conda create -n deepctr-torch
conda activate deepctr-torch
pip install -U deepctr-torch
git clone https://github.com/shenweichen/DeepCTR-Torch.git
cd ./DeepCTR-Torch/examples
python run_classification_criteo.py
get the RuntimeError

Operating environment(运行环境):

python version [3.8.3]
torch version [1.7.0]
deepctr-torch version [0.2.3]
deepctr version [0.8.2]

Additional context
Roll back pytorch to 1.6.0 and it runs normally.

关于BaseModel类的fit方法的疑问

Please refer to the FAQ in doc and search for the related issues before you ask the question.

Describe the question(问题描述)
A clear and concise description of what the question is.

Additional context
` with tqdm(enumerate(train_loader), disable=verbose != 1) as t:
for index, (x_train, y_train) in t:
x = x_train.to(self.device).float()
y = y_train.to(self.device).float()

                    y_pred = model(x).squeeze()

                    optim.zero_grad()
                    loss = loss_func(y_pred, y.squeeze(), reduction='sum')

                    total_loss = loss + self.reg_loss + self.aux_loss

                    loss_epoch += loss.item()
                    total_loss_epoch += total_loss.item()
                    total_loss.backward(retain_graph=True)
                    optim.step()`

1.self.reg_loss和self.aux_loss应该是每次前向传播都需要计算一次吧，我现在只看到初始化的时候计算了一次
2.保留计算图的目的是什么，并没有第二次的反向传播啊？是跟1有关吗？难道能自动更新reg_loss的值吗？

TensorDataset with DataLoader could lead bad performance

Describe the bug(问题描述)
TensorDataset with DataLoader will lead to slow data reading. Use simple tensor slicing could have ~4x speed improvement (especially in large batch size).

Additional context
Relative code:

DeepCTR-Torch/deepctr_torch/models/basemodel.py

Lines 186 to 193 in 687a094

 train_tensor_data = Data.TensorDataset( 

 torch.from_numpy( 

 np.concatenate(x, axis=-1)), 

 torch.from_numpy(y)) 

 if batch_size is None: 

 batch_size = 256 

 train_loader = DataLoader( 

 dataset=train_tensor_data, shuffle=shuffle, batch_size=batch_size)

Relative discussion:

pytorch/pytorch#4959

如果我不做CTR用他来做一个药品推荐系统的话，最后是一个softmax来看TOP5的概率这样做可以吗

就是用户输入一些症状描述，我加一些环节让他变成我们输入的Onehot的样子，然后做预测，这样可以做吗？然后是一个很多类的输出概率，我只取前五就行了~

Categorical Columns Handling + Few specific questions

Hi,
I'd like to ask few questions about the algorithm:

Categorical Columns handling -
In case I have a dataset of appID, advertisementID, country and deviceOS, and label (CTR). - e.g. 123, 1234, US, Android, 0.15 (some aggregation level).
Also, a categorical value can be "userId" for example (userId <> advertisementID - instead appId <> advertisementID)
How does the appID and advertisementID being handled?
I read that the embedding you guys are doing is good for high cardinality values (e.g. appId / advID - thousands of possible values).
Data Limitations:
Also, is there any limitation to number of categorical columns? Number of columns overall? Number of Data points?
I'm asking this as I am used to use Spark up till now with distributed training, so I am wondering what could be the best way for me to migrate.
Unseen values:
Also, what about categories that were not in the train but are in the test? How should one tackle them?
I'd love to even go on a call with any of you just to make those questions much more clear, so maybe then I'll be able to add some documentation to your site about "this is the base dataset" ---> "this is the new dataset", or something like that.

	self.weight = nn.Parameter(torch.Tensor(sum(fc.dimension for fc in self.dense_feature_columns), 1)).to(
	device)

	train_tensor_data = Data.TensorDataset(
	torch.from_numpy(
	np.concatenate(x, axis=-1)),
	torch.from_numpy(y))
	if batch_size is None:
	batch_size = 256
	train_loader = DataLoader(
	dataset=train_tensor_data, shuffle=shuffle, batch_size=batch_size)

shenweichen / deepctr-torch Goto Github PK

deepctr-torch's Introduction

deepctr-torch's People

Contributors

Stargazers

Watchers

Forkers

deepctr-torch's Issues

Describe the bug(问题描述)

To Reproduce(复现步骤)

Operating environment(运行环境):

Additional context

Recommend Projects

Recommend Topics

Recommend Org