chenglongchen / tensorflow-deepfm Goto Github PK

View Code? Open in Web Editor NEW

2.0K 67.0 806.0 149 KB

Tensorflow implementation of DeepFM for CTR prediction.

License: MIT License

Python 100.00%

factorization-machine deepfm ctr-prediction deep-ctr ctr click-through-rate

tensorflow-deepfm's Introduction

tensorflow-DeepFM

This project includes a Tensorflow implementation of DeepFM [1].

NEWS

A modified version of DeepFM is used to win the 4th Place for Mercari Price Suggestion Challenge on Kaggle. See the slide here how we deal with fields containing sequences, how we incoporate various FM components into deep model.

Usage

Input Format

This implementation requires the input data in the following format:

Xi: [[ind1_1, ind1_2, ...], [ind2_1, ind2_2, ...], ..., [indi_1, indi_2, ..., indi_j, ...], ...]
- indi_j is the feature index of feature field j of sample i in the dataset
Xv: [[val1_1, val1_2, ...], [val2_1, val2_2, ...], ..., [vali_1, vali_2, ..., vali_j, ...], ...]
- vali_j is the feature value of feature field j of sample i in the dataset
- vali_j can be either binary (1/0, for binary/categorical features) or float (e.g., 10.24, for numerical features)
y: target of each sample in the dataset (1/0 for classification, numeric number for regression)

Please see example/DataReader.py an example how to prepare the data in required format for DeepFM.

Init and train a model

import tensorflow as tf
from sklearn.metrics import roc_auc_score

# params
dfm_params = {
    "use_fm": True,
    "use_deep": True,
    "embedding_size": 8,
    "dropout_fm": [1.0, 1.0],
    "deep_layers": [32, 32],
    "dropout_deep": [0.5, 0.5, 0.5],
    "deep_layers_activation": tf.nn.relu,
    "epoch": 30,
    "batch_size": 1024,
    "learning_rate": 0.001,
    "optimizer_type": "adam",
    "batch_norm": 1,
    "batch_norm_decay": 0.995,
    "l2_reg": 0.01,
    "verbose": True,
    "eval_metric": roc_auc_score,
    "random_seed": 2017
}

# prepare training and validation data in the required format
Xi_train, Xv_train, y_train = prepare(...)
Xi_valid, Xv_valid, y_valid = prepare(...)

# init a DeepFM model
dfm = DeepFM(**dfm_params)

# fit a DeepFM model
dfm.fit(Xi_train, Xv_train, y_train)

# make prediction
dfm.predict(Xi_valid, Xv_valid)

# evaluate a trained model
dfm.evaluate(Xi_valid, Xv_valid, y_valid)

You can use early_stopping in the training as follow

dfm.fit(Xi_train, Xv_train, y_train, Xi_valid, Xv_valid, y_valid, early_stopping=True)

You can refit the model on the whole training and validation set as follow

dfm.fit(Xi_train, Xv_train, y_train, Xi_valid, Xv_valid, y_valid, early_stopping=True, refit=True)

You can use the FM or DNN part only by setting the parameter use_fm or use_dnn to False.

Regression

This implementation also supports regression task. To use DeepFM for regression, you can set loss_type as mse. Accordingly, you should use eval_metric for regression, e.g., mse or mae.

Example

Folder example includes an example usage of DeepFM/FM/DNN models for Porto Seguro's Safe Driver Prediction competition on Kaggle.

Please download the data from the competition website and put them into the example/data folder.

To train DeepFM model for this dataset, run

$ cd example
$ python main.py

Please see example/DataReader.py how to parse the raw dataset into the required format for DeepFM.

Performance

DeepFM

FM

DNN

Some tips

You should tune the parameters for each model in order to get reasonable performance.
You can also try to ensemble these models or ensemble them with other models (e.g., XGBoost or LightGBM).

Reference

[1] DeepFM: A Factorization-Machine based Neural Network for CTR Prediction, Huifeng Guo, Ruiming Tang, Yunming Yey, Zhenguo Li, Xiuqiang He.

Acknowledgments

This project gets inspirations from the following projects:

He Xiangnan's neural_factorization_machine
Jian Zhang's YellowFin (yellowfin optimizer is taken from here)

License

MIT

tensorflow-deepfm's People

Contributors

Stargazers

Watchers

Forkers

tandychao enjoysport2022 yfzyl yifenzhong1920 jql1994 ryttnk jiaqinglin vnnw mathlf2015 canoefzh penalizedem quanie menikhilpandey sirius081 catcoding123 shinesun130 hawhuang kentchun33333 cnstevenyu joke-dk china1000 satadru5 mathetes87 cz9779 zs167275 liuyangvoid momucf ambier 0x1997 dmjvictory taohu88 hxyshare wenhk happynoom mengqingfei1993 mlreadingparty lijiankou kamolau guhay chenghuige arvidzt sunsanqis37 inistlwq buptguo l294265421 qingyuanxingsi shirleywan shaoguangcheng algoding larsoncs zhiquanchen okfang kaholll qiqimaochiyu xiaoanshi lecheng yingxi320 wlcoolongs zhihaosun yuqingsheng zhongkailv yorkchu1995 yuanjie-ai yuanquanresearch land1725 deepblue666 hanhaonwu czaoth jeusalem wenbintan kekezy mengchengjiang kepengxu guker zhuiyuan616124 dennisjay vode zhlhuangfu robink87 xwc940512 wakupoo hgrui preacle marksjl lxianwei003 yuhai-china zldeng zzhsysu leftthink jetaimy zwtt1994 zbn123 lihongqiang xd9999 dimplesl xuke16 hanszhuang jilljenn ritali rajneesh-tiwari

tensorflow-deepfm's Issues

TypeError: Expected string passed to parameter 'tensor_names' of op 'SaveV2', got ['Variable', 'Variable/Adam', 'Variable/Adam_1', 'Variable_1', 'Variable_1/Adam', 'Variable_1/Adam_1', 'Variable_2', 'Variable_2/Adam', 'Variab

请问各位：
TypeError: Expected string passed to parameter 'tensor_names' of op 'SaveV2', got ['Variable', 'Variable/Adam', 'Variable/Adam_1', 'Variable_1', 'Variable_1/Adam', 'Variable_1/Adam_1', 'Variable_2', 'Variable_2/Adam', 'Variab
这个错误是为什么？
完整的错误信息是这样：
Traceback (most recent call last):
File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 511, in _apply_op_helper
preferred_dtype=default_dtype)
File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 1175, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\constant_op.py", line 304, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\constant_op.py", line 245, in constant
allow_broadcast=True)
File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\constant_op.py", line 283, in _constant_impl
allow_broadcast=allow_broadcast))
File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\tensor_util.py", line 501, in make_tensor_proto
(dtype, nparray.dtype, values))
TypeError: Incompatible types: <dtype: 'string'> vs. object. Value is ['Variable', 'Variable/Adam', 'Variable/Adam_1', 'Variable_1', 'Variable_1/Adam', 'Variable_1/Adam_1', 'Variable_2', 'Variable_2/Adam', 'Variable_2/Adam_1', 'Variable_3', 'Variable_3/Adam', 'Variable_3/Adam_1', 'Variable_4', 'Variable_4/Adam', 'Variable_4/Adam_1', 'Variable_5', 'Variable_5/Adam', 'Variable_5/Adam_1', 'beta1_power', 'beta2_power', 'bn_0/beta', 'bn_0/beta/Adam', 'bn_0/beta/Adam_1', 'bn_0/gamma', 'bn_0/gamma/Adam', 'bn_0/gamma/Adam_1', 'bn_0/moving_mean', 'bn_0/moving_variance', 'bn_1/beta', 'bn_1/beta/Adam', 'bn_1/beta/Adam_1', 'bn_1/gamma', 'bn_1/gamma/Adam', 'bn_1/gamma/Adam_1', 'bn_1/moving_mean', 'bn_1/moving_variance', 'feature_bias', 'feature_bias/Adam', 'feature_bias/Adam_1', 'feature_embeddings', 'feature_embeddings/Adam', 'feature_embeddings/Adam_1']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:/pycharm project/2/main.py", line 148, in
y_train_dfm, y_test_dfm = _run_base_model_dfm(dfTrain, dfTest, folds, dfm_params)
File "E:/pycharm project/2/main.py", line 68, in _run_base_model_dfm
dfm = DeepFM(**dfm_params)
File "E:\pycharm project\2\DeepFM.py", line 61, in init
self._init_graph()
File "E:\pycharm project\2\DeepFM.py", line 156, in _init_graph
self.saver = tf.train.Saver()
File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 832, in init
self.build()
File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 844, in build
self._build(self._filename, build_save=True, build_restore=True)
File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 881, in _build
build_save=build_save, build_restore=build_restore)
File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 510, in _build_internal
save_tensor = self._AddSaveOps(filename_tensor, saveables)
File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 210, in _AddSaveOps
save = self.save_op(filename_tensor, saveables)
File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 124, in save_op
tensors)
File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_io_ops.py", line 1920, in save_v2
name=name)
File "C:\Program Files\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 520, in _apply_op_helper
repr(values), type(values).name))
TypeError: Expected string passed to parameter 'tensor_names' of op 'SaveV2', got ['Variable', 'Variable/Adam', 'Variable/Adam_1', 'Variable_1', 'Variable_1/Adam', 'Variable_1/Adam_1', 'Variable_2', 'Variable_2/Adam', 'Variable_2/Adam_1', 'Variable_3', 'Variable_3/Adam', 'Variable_3/Adam_1', 'Variable_4', 'Variable_4/Adam', 'Variable_4/Adam_1', 'Variable_5', 'Variable_5/Adam', 'Variable_5/Adam_1', 'beta1_power', 'beta2_power', 'bn_0/beta', 'bn_0/beta/Adam', 'bn_0/beta/Adam_1', 'bn_0/gamma', 'bn_0/gamma/Adam', 'bn_0/gamma/Adam_1', 'bn_0/moving_mean', 'bn_0/moving_variance', 'bn_1/beta', 'bn_1/beta/Adam', 'bn_1/beta/Adam_1', 'bn_1/gamma', 'bn_1/gamma/Adam', 'bn_1/gamma/Adam_1', 'bn_1/moving_mean', 'bn_1/moving_variance', 'feature_bias', 'feature_bias/Adam', 'feature_bias/Adam_1', 'feature_embeddings', 'feature_embeddings/Adam', 'feature_embeddings/Adam_1'] of type 'list' instead.

Process finished with exit code 1

GPU Utilization is very law

Hey, I am training the example on v100 card. And GPU Utilization is a maximum of 12%. What can be a cause?? Has the code written to utilize GPU fully or modification is required ?? Is it because of the small dataset? Please give suggestions.

关于特征的向量表示我有一个疑问

def _initialize_weights(self):
weights = dict()
# embeddings
# FM 组件
weights["feature_embeddings"] = tf.Variable(
tf.random_normal([self.feature_size, self.embedding_size], 0.0, 0.01),
name="feature_embeddings") # feature_size * K
上面是初始化权重的方法但是我们如何能直接初始化一个特征表示的矩阵weights["feature_embeddings"]，每个特征表示的向量不是应该通过相应的嵌入层网络获得吗？
还有伴随着训练完成，为什么weights["feature_embeddings"]会发生变化??

Is the MSE loss in the code right?

tensorflow-DeepFM/DeepFM.py

Line 129 in b0391f0

self.loss = tf.nn.l2_loss(tf.subtract(self.label, self.out))

In the condition that shows loss_type=="mse", use "self.out" directly or adding "self.out = tf.nn.sigmoid(self.out)" and then use the "self.out"?

连续特征为什么需要做embedding？

您好，，我看到代码里面对于连续特征也做了embedding，不知道这样做的原因是？

序列特征如何处理

你好
我有一些序列特征比如用户购买历史，用户搜索历史。
这些特征如何使用deepfm处理比较好？

example has `metric` as model param, but constructor takes `eval_metric`

Could you explain the way computing gini? I did not know why computing such that

fm的隐向量v呢

兄弟，fm的隐向量v呢？

MAE loss doesn't decrease?

Not sure why but if I use deepFM for regression problem, MAE not improving?
Example:

import numpy as np
from DeepFM import DeepFM
import tensorflow as tf
import pandas as pd
from sklearn.cross_validation import KFold
from sklearn.metrics import mean_absolute_error
from DataReader import FeatureDictionary, DataParser
train=pd.DataFrame(np.random.randn(100, 100).astype('float'))
y=pd.DataFrame(np.random.randn(100,1).astype('float'))
train['loss']=y
test=pd.DataFrame(np.random.randn(100, 100).astype('float'))
cols=[c for c in train.columns if 'loss' not in c]
fd = FeatureDictionary(dfTrain=train, dfTest=test,numeric_cols=cols,ignore_cols='loss')
data_parser = DataParser(feat_dict=fd)
Xi_train, Xv_train,y_train = data_parser.parse(df=train, has_label=True)
Xi_test, Xv_test = data_parser.parse(df=test)
dfm_params = {
"use_fm": True,
"use_deep": True,
"embedding_size": 8,
"dropout_fm": [1.0, 1.0],
"deep_layers": [32, 32],
"dropout_deep": [0.5, 0.5, 0.5],
"deep_layers_activation": tf.nn.relu,
"epoch": 30,
"batch_size": 1024,
"learning_rate": 0.001,
"optimizer_type": "adam",
"batch_norm": 1,
"batch_norm_decay": 0.995,
"l2_reg": 0.01,
"verbose": True,
"metric": mean_absolute_error,
"random_seed": 42,'greater_is_better':False
}
dfm_params["feature_size"] = fd.feat_dim
dfm_params["field_size"] = len(Xi_train[0])
get = lambda x, l: [x[i] for i in l]
del train['loss']
train=train.values
y=y.values
y=y.reshape((-1))
folds = KFold(len(y),8, shuffle=True, random_state=2016)
for i, (train_idx, valid_idx) in enumerate(folds):
Xi_train, Xv_train_, y_train_ = get(Xi_train, train_idx), get(Xv_train, train_idx), get(y_train, train_idx)
Xi_valid, Xv_valid, y_valid = get(Xi_train, valid_idx), get(Xv_train, valid_idx), get(y_train, valid_idx)
dfm = DeepFM(**dfm_params)
dfm.fit(Xi_train, Xv_train, y_train, Xi_valid_, Xv_valid_, y_valid_)

[1] train-result=0.9263, valid-result=0.9745 [0.0 s]
[2] train-result=0.9263, valid-result=0.9745 [0.0 s]
[3] train-result=0.9263, valid-result=0.9745 [0.0 s]
[4] train-result=0.9263, valid-result=0.9745 [0.0 s]
[5] train-result=0.9263, valid-result=0.9745 [0.0 s]
[6] train-result=0.9263, valid-result=0.9745 [0.0 s]
[7] train-result=0.9263, valid-result=0.9746 [0.0 s]
[8] train-result=0.9263, valid-result=0.9746 [0.0 s]
[9] train-result=0.9263, valid-result=0.9746 [0.0 s]
[10] train-result=0.9263, valid-result=0.9746 [0.0 s]
[11] train-result=0.9264, valid-result=0.9746 [0.0 s]
[12] train-result=0.9264, valid-result=0.9746 [0.0 s]
[13] train-result=0.9264, valid-result=0.9746 [0.0 s]
[14] train-result=0.9264, valid-result=0.9746 [0.0 s]
[15] train-result=0.9264, valid-result=0.9746 [0.0 s]
[16] train-result=0.9264, valid-result=0.9747 [0.0 s]
[17] train-result=0.9264, valid-result=0.9747 [0.0 s]
[18] train-result=0.9264, valid-result=0.9747 [0.0 s]
[19] train-result=0.9264, valid-result=0.9747 [0.0 s]
[20] train-result=0.9264, valid-result=0.9747 [0.0 s]
[21] train-result=0.9264, valid-result=0.9747 [0.0 s]
[22] train-result=0.9264, valid-result=0.9747 [0.0 s]
[23] train-result=0.9264, valid-result=0.9747 [0.0 s]
[24] train-result=0.9264, valid-result=0.9747 [0.0 s]
[25] train-result=0.9264, valid-result=0.9747 [0.0 s]
[26] train-result=0.9264, valid-result=0.9747 [0.0 s]
[27] train-result=0.9264, valid-result=0.9748 [0.0 s]
[28] train-result=0.9264, valid-result=0.9748 [0.0 s]
[29] train-result=0.9264, valid-result=0.9748 [0.0 s]
[30] train-result=0.9264, valid-result=0.9748 [0.0 s]

模型结构

老哥，我对你代码的理解，first_order是F维，second_order是K维，Deep层输出维度。这三个维度concat之后再加权输出？
这个跟论文好像不一样呀……

该问题不存在

无题，李商隐

int/long型的特征要放入NUMERIC_COLS当中吗？

看了下代码，整型类的特征似乎和category和binary类型的特征是当作一种类型处理的？是不是要把整型转换成float，然后再放入NUMERIC_COLS呢？
我试着直接将整型类特征放入NUMERIC_COLS，跑的时候结果不太对。

关于代码89行

self.y_first_order = tf.reduce_sum(tf.multiply(self.y_first_order, feat_value), 2) # None * F

这里的multiply应该是add才对吧？wx+b， feat_value为wx， y_first_order为b，

不知各位有何看法？

Why predict same test set, but given different predict_prob?

After train a DeepFM model, I call predict() many times with same test set, but given predict_prob is slightly different.
such as:

array([ 0.3498897 , 0.30687785, 0.34534037], dtype=float32) #first time call predict()

array([ 0.3498849 , 0.30688545, 0.34534937], dtype=float32) #second time call predict()

No module named yellowfin

(tensorflow) ➜  example git:(master) ✗ python main.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Traceback (most recent call last):
  File "main.py", line 16, in <module>
    from DeepFM import DeepFM
  File "../DeepFM.py", line 15, in <module>
    from yellowfin import YFOptimizer

换了一批数据，auc一直0.5，为什么？

memory error for big data

I try your code on an ad dataset (https://www.kaggle.com/c/avazu-ctr-prediction/data), it has 40+ million samples with totally 30+ thousand features, which is 100X larger than the one used in this repository.

Then I got memory error, do you have any suggestions? Thank you.

请问下如果是针对多分类，target的格式应该是什么？

多分类情况下，我尝试了对label进行One-hot或者直接一维类别数组丢进去，程序都会报错！
请问下该如何进行定义target格式？

数据不需要做onehot处理吗

concat层的输入

你好，请教一个问题。
我看代码里面concat层是把fm部分的一阶向量、二阶向量和dnn部分的最后一个隐层的向量连接在一块作为一个concat层，然后接一个全连接和sigmoid，得到最后的输出。但是原论文感觉是把fm的一阶部分、二阶部分和dnn部分的输出进行求和，然后在和的基础上使用sigmoid，得到最后的输出。

请问这么实现是有什么性能上的优势吗？
多谢了

线上复现

用这个代码跑出来线上才0.27左右，请问需要做什么调整吗？

Field-aware?

The Figure 4 in the DeepFM paper seems to indicate that the embeddings are "field-aware", but looking at the code it appears that field information is not used. Can you kindly confirm?

运行直接出AttributeError

‘module’ object has no attribute 'expressions'请问这个怎么解决啊大佬

换了一批其他数据，然后dfm.predict 结果为nan

换了一批其他数据，然后dfm.predict 结果为nan，求大神指点

feature field j has two or more nonzero values

Hi~
In my dataset if feature field j has two or more nonzero values how can i construct Xi and Xv vector ?

在DataParser中的parse函数，为啥 dfv[col] = 1. ?

你好，有一处代码没有看懂，对于所有的非 NUMERIC_COLS 特征，为啥 dfv[col] = 1. ?
for col in dfi.columns:
if col in self.feat_dict.ignore_cols:
dfi.drop(col, axis=1, inplace=True)
dfv.drop(col, axis=1, inplace=True)
continue
if col in self.feat_dict.numeric_cols:
dfi[col] = self.feat_dict.feat_dict[col]
else:
dfi[col] = dfi[col].map(self.feat_dict.feat_dict[col])
dfv[col] = 1.

连续特征的embedding问题

hi，看了代码实现，有个疑问，连续特征乘以k维需要训练的embedding向量（1个数分解为多个数），这样对连续特征也带来了信息损失吧？

抱歉，还是要吐槽一下

在DeepFM.py中，实际feature_bias实际就是一元项wx+b的w，这样第89行代码“self.y_first_order = tf.reduce_sum(tf.multiply(self.y_first_order, feat_value), 2) ” 就能解释通了。而b为concat_bias。

“feature_bias”这个名字容易产生误导。

请问有在criteo数据集测试过性能吗？

在超大规模数据上训练或者增量更新，模型要做哪些方面的更改？

代码出错

你这个prepross里面的代码很搞笑呀
cols = [c for c in dfTrain.columns if c not in ["id", "label"]]
cols = [c for c in cols if (not c in config.IGNORE_COLS)]
cols取的是不在ignore里面的列名，后来的
cat_features_indices = [i for i,c in enumerate(cols) if c in config.CATEGORICAL_COLS]
又要在cols里面吧ignore里面的取出来，肯定就是空值啊

Why are categorical data all set 1?

If do so,what`s the meaning of categorical data?

有没有实现sparse输入的deepfm

sparse_index = tf.placeholder(tf.int64, [None, 2])
sparse_ids = tf.placeholder(tf.int64, [None])
sparse_values = tf.placeholder(tf.float32, [None])
sparse_shape = tf.placeholder(tf.int64, [2])
ids = tf.SparseTensor(sparse_index, sparse_ids, sparse_shape)
values = tf.SparseTensor(sparse_index, sparse_values, sparse_shape)

这样子的sparse输入

训练报错

在训练的时候报了这个错误，很难知道哪里有问题，还请指教：

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-187-236f2a6678c7> in <module>()
----> 1 dfm.fit(Xi,Xv,y_out)

/nfs/private/HM-PROJECT/DeepFM.py in fit(self, Xi_train, Xv_train, y_train, Xi_valid, Xv_valid, y_valid, early_stopping, refit)
    282             for i in range(total_batch):
    283                 Xi_batch, Xv_batch, y_batch = self.get_batch(Xi_train, Xv_train, y_train, self.batch_size, i)
--> 284                 self.fit_on_batch(Xi_batch, Xv_batch, y_batch)
    285 
    286             # evaluate training and validation datasets

/nfs/private/HM-PROJECT/DeepFM.py in fit_on_batch(self, Xi, Xv, y)
    254                      self.dropout_keep_deep: self.dropout_deep,
    255                      self.train_phase: True}
--> 256         loss, opt = self.sess.run((self.loss, self.optimizer), feed_dict=feed_dict)
    257         return loss
    258 

/home/luban/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    898     try:
    899       result = self._run(None, fetches, feed_dict, options_ptr,
--> 900                          run_metadata_ptr)
    901       if run_metadata:
    902         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/home/luban/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1133     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1134       results = self._do_run(handle, final_targets, final_fetches,
-> 1135                              feed_dict_tensor, options, run_metadata)
   1136     else:
   1137       results = []

/home/luban/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1314     if handle is None:
   1315       return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1316                            run_metadata)
   1317     else:
   1318       return self._do_call(_prun_fn, handle, feeds, fetches)

/home/luban/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1333         except KeyError:
   1334           pass
-> 1335       raise type(e)(node_def, op, message)
   1336 
   1337   def _extend_graph(self):

InvalidArgumentError: slice index 3 of dimension 0 out of bounds.
	 [[Node: strided_slice_5 = StridedSlice[Index=DT_INT32, T=DT_FLOAT, begin_mask=0, ellipsis_mask=0, end_mask=0, new_axis_mask=0, shrink_axis_mask=1, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_dropout_keep_deep_0_0/_11, strided_slice_5/stack, strided_slice_5/stack_1, strided_slice/stack_1)]]

三种特征的区别是什么，如何区分

嗨，陈工，您好：
首先感谢您对deepFM的代码进行开源，该代码对我的帮助非常大，后续的毕业设计准备采用该代码进行模型训练及验证工作。
我的想法是将该代码用在一个新的kaggle 竞赛项目。
在您提供的代码中：有下述三种类别，请问这三种类别的区别是什么，尤其是IGNORE_COLS 这个，什么样的特征可以作为该项，期待您的回复

types of columns of the dataset dataframe

CATEGORICAL_COLS = [
# 'ps_ind_02_cat', 'ps_ind_04_cat', 'ps_ind_05_cat',
# 'ps_car_01_cat', 'ps_car_02_cat', 'ps_car_03_cat',
# 'ps_car_04_cat', 'ps_car_05_cat', 'ps_car_06_cat',
# 'ps_car_07_cat', 'ps_car_08_cat', 'ps_car_09_cat',
# 'ps_car_10_cat', 'ps_car_11_cat',
]

NUMERIC_COLS = [
# # binary
# "ps_ind_06_bin", "ps_ind_07_bin", "ps_ind_08_bin",
# "ps_ind_09_bin", "ps_ind_10_bin", "ps_ind_11_bin",
# "ps_ind_12_bin", "ps_ind_13_bin", "ps_ind_16_bin",
# "ps_ind_17_bin", "ps_ind_18_bin",
# "ps_calc_15_bin", "ps_calc_16_bin", "ps_calc_17_bin",
# "ps_calc_18_bin", "ps_calc_19_bin", "ps_calc_20_bin",
# numeric
"ps_reg_01", "ps_reg_02", "ps_reg_03",
"ps_car_12", "ps_car_13", "ps_car_14", "ps_car_15",

# feature engineering
"missing_feat", "ps_car_13_x_ps_reg_03",

]

IGNORE_COLS = [
"id", "target",
"ps_calc_01", "ps_calc_02", "ps_calc_03", "ps_calc_04",
"ps_calc_05", "ps_calc_06", "ps_calc_07", "ps_calc_08",
"ps_calc_09", "ps_calc_10", "ps_calc_11", "ps_calc_12",
"ps_calc_13", "ps_calc_14",
"ps_calc_15_bin", "ps_calc_16_bin", "ps_calc_17_bin",
"ps_calc_18_bin", "ps_calc_19_bin", "ps_calc_20_bin"
]

auc ????

why auc so low ?

如何处理pre-trained embeddings?

如果我有一份训练好的向量想用于训练，但是不想被模型end-to-end修正。那么应该如何操作？
感觉需要自己重新修改代码,如下：

tf.get_variable(name = "pre_trained_embedding",
            shape = (ids_num,embedding_dimension),
            initializer = tf.constant_initializer(preTrainedEmbeddingData),
            trainable = False)

然后将这个embedding结果与其它的输入进行concat, 不知道这样思考对不对？

如何export模型

我用如下代码导出DeepFM模型，但是发现freeze_model_dir/variables为空，请问应该如何导出DeepFM的pb模型？

def freeze_model(self): freeze_model_dir = "freeze_model_dir" save_dir = 'checkpoints/' save_path = os.path.join(save_dir, 'best_validation') start_time = time() print(tf.trainable_variables()) print("freeze model...") SIGNATURE_NAME = "serving_default" builder = tf.saved_model.builder.SavedModelBuilder(freeze_model_dir) inputs = {'feat_index': tf.saved_model.utils.build_tensor_info(self.feat_index), 'feat_value': tf.saved_model.utils.build_tensor_info(self.feat_value), 'dropout_keep_fm': tf.saved_model.utils.build_tensor_info(self.dropout_keep_fm), 'dropput_keep_deep': tf.saved_model.utils.build_tensor_info(self.dropout_keep_deep), 'train_phase': tf.saved_model.utils.build_tensor_info(self.train_phase)} outputs = {'y_pred': tf.saved_model.utils.build_tensor_info(self.y_pred)} builder.add_meta_graph_and_variables(self.sess, [tf.saved_model.tag_constants.SERVING], signature_def_map={ tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:tf.saved_model.signature_def_utils.build_signature_def( inputs, outputs, tf.saved_model.signature_constants.PREDICT_METHOD_NAME ) }, main_op=tf.tables_initializer(), strip_default_attrs=True ) builder.save()

The AUC Result

May I know why the AUC score in the Kaggle dataset and my own dataset is so strange that never runs above 0.5 ?
Thanks a lot~

from sklearn.metrics import roc_auc_score

[1] train-result=0.0130, valid-result=0.0551 [33.9 s]
[2] train-result=0.2280, valid-result=0.2518 [35.6 s]
[3] train-result=0.2539, valid-result=0.2593 [33.7 s]
[4] train-result=0.2590, valid-result=0.2629 [34.0 s]
[5] train-result=0.2559, valid-result=0.2561 [36.0 s]
[6] train-result=0.2588, valid-result=0.2594 [34.4 s]
[7] train-result=0.2579, valid-result=0.2587 [33.2 s]
[8] train-result=0.2640, valid-result=0.2626 [33.6 s]
[9] train-result=0.2671, valid-result=0.2638 [34.8 s]
[10] train-result=0.2619, valid-result=0.2601 [35.0 s]
[11] train-result=0.2667, valid-result=0.2609 [36.3 s]
[12] train-result=0.2664, valid-result=0.2627 [37.6 s]
[13] train-result=0.2687, valid-result=0.2650 [34.1 s]
[14] train-result=0.2724, valid-result=0.2692 [33.9 s]
[15] train-result=0.2689, valid-result=0.2663 [34.0 s]
[16] train-result=0.2710, valid-result=0.2671 [33.2 s]
[17] train-result=0.2677, valid-result=0.2655 [33.1 s]
[18] train-result=0.2694, valid-result=0.2674 [34.5 s]

如何处理一个field中有多个特征为1的情况

数据集某一维特征中有多个值，那么onehot之后field中有多个列为1，deepfm怎么处理这样的情况呢？
(field_size ≠ input_size)

这些参数可以给个解释吗

dfm_params = {
"use_fm": True,
"use_deep": True,
"embedding_size": 8,
"dropout_fm": [1.0, 1.0],
"deep_layers": [32, 32],
"dropout_deep": [0.5, 0.5, 0.5],
"deep_layers_activation": tf.nn.relu,
"epoch": 30,
"batch_size": 1024,
"learning_rate": 0.001,
"optimizer_type": "adam",
"batch_norm": 1,
"batch_norm_decay": 0.995,
"l2_reg": 0.01,
"verbose": True,
"eval_metric": gini_norm,
"random_seed": config.RANDOM_SEED
}
本人小白求解释不想太过于深入底层的东西

Input Dimension Error

Hello,

In your paper, it is written that "the lengths of different input field vectors can be different" which means the input data Xi can contain lists of different lengths. I have tried to run the code with the latter, but it gave the following error "ValueError: setting an array element with a sequence.". Can you please suggest what to do?

Thanks in advance

Deep Component最开始的输入应该是最原始的Embedding结果？

tensorflow-DeepFM/DeepFM.py

Line 106 in b0391f0

 self.y_deep = tf.reshape(self.embeddings, shape=[-1, self.field_size * self.embedding_size]) # None * (F*K) 

这里是Deep Component中最原始的输入部分。但是此时的Embedding已经不在是最初的嵌入之后的结果了。此处的embedding乘上了输入x,目的应该是方便计算FM的二维组合特征部分。

在论文里，没有看到说Deep Component'的输入是什么，但是从给出的架构图上看，应该就是最原始的Embedding的结果。

希望得到解答，非常感谢~

收敛极度慢

40个左右特征，其中20+个数值型，总共500万训练数据，100万测试数据。
一个epoch大概200秒，跑到126的时候，忍受不了速度，停了，可以看得出其实一直有在收敛，但是速度很慢，有给点建议的么，各种优化算法和学习率、batch size都试过。
feature_size: 533
field_size: 38
#params: 15692
[1] train.csv-result=0.2512, valid-result=0.2515 [202.4 s]
[2] train.csv-result=0.2572, valid-result=0.2561 [190.0 s]
[3] train.csv-result=0.2578, valid-result=0.2568 [200.4 s]
[4] train.csv-result=0.2598, valid-result=0.2583 [188.3 s]
[5] train.csv-result=0.2582, valid-result=0.2569 [191.0 s]
[6] train.csv-result=0.2611, valid-result=0.2598 [201.3 s]
[7] train.csv-result=0.2594, valid-result=0.2578 [190.5 s]
[8] train.csv-result=0.2662, valid-result=0.2648 [182.1 s]
[9] train.csv-result=0.2639, valid-result=0.2623 [192.6 s]
[10] train.csv-result=0.2655, valid-result=0.2639 [190.7 s]
[11] train.csv-result=0.2671, valid-result=0.2654 [201.4 s]
[12] train.csv-result=0.2662, valid-result=0.2644 [190.8 s]
[13] train.csv-result=0.2674, valid-result=0.2653 [191.0 s]
[14] train.csv-result=0.2686, valid-result=0.2665 [200.5 s]
[15] train.csv-result=0.2678, valid-result=0.2657 [189.6 s]
[16] train.csv-result=0.2691, valid-result=0.2664 [200.2 s]
[17] train.csv-result=0.2704, valid-result=0.2676 [183.4 s]
[18] train.csv-result=0.2717, valid-result=0.2692 [173.5 s]
[19] train.csv-result=0.2721, valid-result=0.2688 [183.9 s]
[20] train.csv-result=0.2744, valid-result=0.2709 [173.3 s]
[21] train.csv-result=0.2757, valid-result=0.2718 [182.0 s]
[22] train.csv-result=0.2778, valid-result=0.2734 [173.0 s]
[23] train.csv-result=0.2803, valid-result=0.2753 [173.0 s]
[24] train.csv-result=0.2806, valid-result=0.2747 [183.5 s]
[25] train.csv-result=0.2852, valid-result=0.2792 [173.4 s]
[26] train.csv-result=0.2882, valid-result=0.2813 [184.0 s]
[27] train.csv-result=0.2890, valid-result=0.2818 [174.0 s]
[28] train.csv-result=0.2909, valid-result=0.2834 [174.2 s]
[29] train.csv-result=0.2921, valid-result=0.2841 [184.5 s]
[30] train.csv-result=0.2940, valid-result=0.2860 [174.8 s]
[31] train.csv-result=0.2945, valid-result=0.2862 [186.5 s]
[32] train.csv-result=0.2946, valid-result=0.2859 [191.1 s]
[33] train.csv-result=0.2966, valid-result=0.2879 [190.2 s]
[34] train.csv-result=0.2976, valid-result=0.2888 [200.9 s]
[35] train.csv-result=0.2984, valid-result=0.2892 [190.7 s]
[36] train.csv-result=0.2979, valid-result=0.2885 [201.8 s]
[37] train.csv-result=0.2987, valid-result=0.2896 [191.1 s]
[38] train.csv-result=0.2997, valid-result=0.2899 [191.4 s]
[39] train.csv-result=0.3006, valid-result=0.2910 [201.6 s]
[40] train.csv-result=0.3011, valid-result=0.2914 [191.4 s]
[41] train.csv-result=0.3006, valid-result=0.2902 [201.5 s]
[42] train.csv-result=0.3018, valid-result=0.2919 [190.5 s]
[43] train.csv-result=0.3022, valid-result=0.2912 [189.5 s]
[44] train.csv-result=0.3029, valid-result=0.2926 [200.5 s]
[45] train.csv-result=0.3027, valid-result=0.2923 [190.9 s]
[46] train.csv-result=0.3031, valid-result=0.2922 [200.6 s]
[47] train.csv-result=0.3037, valid-result=0.2930 [190.0 s]
[48] train.csv-result=0.3037, valid-result=0.2929 [190.1 s]
[49] train.csv-result=0.3043, valid-result=0.2929 [200.6 s]
[50] train.csv-result=0.3039, valid-result=0.2932 [190.7 s]
[51] train.csv-result=0.3048, valid-result=0.2939 [200.9 s]
[52] train.csv-result=0.3046, valid-result=0.2926 [190.3 s]
[53] train.csv-result=0.3050, valid-result=0.2936 [189.6 s]
[54] train.csv-result=0.3057, valid-result=0.2940 [200.4 s]
[55] train.csv-result=0.3055, valid-result=0.2944 [190.5 s]
[56] train.csv-result=0.3058, valid-result=0.2943 [201.2 s]
[57] train.csv-result=0.3062, valid-result=0.2948 [190.5 s]
[58] train.csv-result=0.3057, valid-result=0.2948 [191.1 s]
[59] train.csv-result=0.3064, valid-result=0.2949 [201.3 s]
[60] train.csv-result=0.3065, valid-result=0.2944 [190.6 s]
[61] train.csv-result=0.3072, valid-result=0.2955 [201.2 s]
[62] train.csv-result=0.3072, valid-result=0.2954 [190.0 s]
[63] train.csv-result=0.3072, valid-result=0.2955 [189.9 s]
[64] train.csv-result=0.3067, valid-result=0.2944 [200.6 s]
[65] train.csv-result=0.3076, valid-result=0.2958 [190.4 s]
[66] train.csv-result=0.3080, valid-result=0.2964 [200.1 s]
[67] train.csv-result=0.3083, valid-result=0.2960 [190.9 s]
[68] train.csv-result=0.3082, valid-result=0.2963 [190.8 s]
[69] train.csv-result=0.3083, valid-result=0.2957 [201.4 s]
[70] train.csv-result=0.3087, valid-result=0.2964 [190.5 s]
[71] train.csv-result=0.3083, valid-result=0.2966 [201.2 s]
[72] train.csv-result=0.3082, valid-result=0.2963 [190.7 s]
[73] train.csv-result=0.3089, valid-result=0.2966 [191.1 s]
[74] train.csv-result=0.3085, valid-result=0.2957 [201.2 s]
[75] train.csv-result=0.3088, valid-result=0.2971 [190.2 s]
[76] train.csv-result=0.3093, valid-result=0.2974 [190.8 s]
[77] train.csv-result=0.3094, valid-result=0.2973 [201.0 s]
[78] train.csv-result=0.3096, valid-result=0.2971 [190.8 s]
[79] train.csv-result=0.3089, valid-result=0.2969 [201.3 s]
[80] train.csv-result=0.3099, valid-result=0.2972 [190.7 s]
[81] train.csv-result=0.3097, valid-result=0.2969 [190.8 s]
[82] train.csv-result=0.3100, valid-result=0.2974 [200.7 s]
[83] train.csv-result=0.3096, valid-result=0.2970 [190.8 s]
[84] train.csv-result=0.3104, valid-result=0.2979 [200.7 s]
[85] train.csv-result=0.3095, valid-result=0.2974 [190.5 s]
[86] train.csv-result=0.3091, valid-result=0.2967 [190.0 s]
[87] train.csv-result=0.3104, valid-result=0.2982 [201.1 s]
[88] train.csv-result=0.3101, valid-result=0.2967 [190.3 s]
[89] train.csv-result=0.3109, valid-result=0.2981 [201.1 s]
[90] train.csv-result=0.3102, valid-result=0.2977 [190.4 s]
[91] train.csv-result=0.3112, valid-result=0.2982 [190.7 s]
[92] train.csv-result=0.3110, valid-result=0.2980 [200.6 s]
[93] train.csv-result=0.3111, valid-result=0.2980 [191.0 s]
[94] train.csv-result=0.3117, valid-result=0.2985 [201.1 s]
[95] train.csv-result=0.3108, valid-result=0.2977 [191.0 s]
[96] train.csv-result=0.3117, valid-result=0.2989 [190.2 s]
[97] train.csv-result=0.3114, valid-result=0.2982 [201.1 s]
[98] train.csv-result=0.3111, valid-result=0.2985 [190.8 s]
[99] train.csv-result=0.3116, valid-result=0.2980 [201.1 s]
[100] train.csv-result=0.3120, valid-result=0.2984 [191.0 s]
[101] train.csv-result=0.3122, valid-result=0.2984 [191.1 s]
[102] train.csv-result=0.3120, valid-result=0.2985 [201.9 s]
[103] train.csv-result=0.3124, valid-result=0.2992 [191.0 s]
[104] train.csv-result=0.3120, valid-result=0.2989 [200.3 s]
[105] train.csv-result=0.3114, valid-result=0.2980 [191.7 s]
[106] train.csv-result=0.3109, valid-result=0.2974 [189.1 s]
[107] train.csv-result=0.3121, valid-result=0.2986 [195.3 s]
[108] train.csv-result=0.3123, valid-result=0.2991 [184.6 s]
[109] train.csv-result=0.3122, valid-result=0.2993 [194.1 s]
[110] train.csv-result=0.3127, valid-result=0.2995 [184.1 s]
[111] train.csv-result=0.3126, valid-result=0.2986 [184.3 s]
[112] train.csv-result=0.3128, valid-result=0.2995 [192.9 s]
[113] train.csv-result=0.3127, valid-result=0.2986 [183.8 s]
[114] train.csv-result=0.3129, valid-result=0.2990 [194.0 s]
[115] train.csv-result=0.3133, valid-result=0.2994 [184.0 s]
[116] train.csv-result=0.3127, valid-result=0.2989 [183.0 s]
[117] train.csv-result=0.3131, valid-result=0.2997 [193.4 s]
[118] train.csv-result=0.3128, valid-result=0.2991 [185.9 s]
[119] train.csv-result=0.3102, valid-result=0.2969 [194.4 s]
[120] train.csv-result=0.3131, valid-result=0.2997 [184.5 s]
[121] train.csv-result=0.3132, valid-result=0.2988 [177.8 s]
[122] train.csv-result=0.3136, valid-result=0.2995 [191.2 s]
[123] train.csv-result=0.3136, valid-result=0.2995 [180.3 s]
[124] train.csv-result=0.3139, valid-result=0.3001 [191.6 s]
[125] train.csv-result=0.3136, valid-result=0.3000 [182.3 s]
[126] train.csv-result=0.3139, valid-result=0.3001 [179.6 s]

在线预测效率大概多久？

输入是文本特征

你好，我想请教一下deepfm模型的输入可以是评论文本表示的user,item 特征吗？

ImportError: No module named 'yellowfin'

/usr/local/lib/python3.5/dist-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "main.py", line 16, in
from DeepFM import DeepFM
File "../DeepFM.py", line 15, in
from yellowfin import YFOptimizer
ImportError: No module named 'yellowfin'
你好,我用1.0版本的tensorflow报错这句话,把tf升级成最新版本还是有这个问题,不知道该怎么解决?

cate

关于代码107行

deepFM.py 第107行为：self.y_deep = tf.nn.dropout(self.y_deep, self.dropout_keep_deep[0])，本人在实验中，删除该行，效果反而变好了很多。查阅资料后，发现dropout一般的对象是deep的中间节点，而该行是针对初始节点。

楼主可以自行实验，以加验证。

祝好~