Giter Club home page Giter Club logo

deep-ctr-prediction's Issues

DIN样本数据含义

您好,您能帮忙解释下DIN代码中用到的数据集是什么意思嘛,主要是:product_id_att,creative_id_att,user_click_products_att,user_click_products_att,因为官方的代码里没有用过多的特征,感觉您的代码比较有实用性,烦请帮忙解答一下含义和数据格式,谢谢~

关于工业级生产tfrecord的疑问,希望作者帮我解惑

一般做法是用spark生成tfrecord再拉到本地GPU跑,但是如果CTR数据集如果小的话(2000W条以下,50个特征以下),能读进内存里,发现用tf.data.TFRecordDataset反而很慢。

本人用pandas读进内存再用tf.keras.utils.sequence构造数据生成器,大概只需要20ms/step
但用tf.data.TFRecordDataset就上升到了2s/step
总共21个step每个epoch

不知道作者是否有些建议给我,指点迷津。
例如生成tfrecord的内部格式是如何的,我现在是一个record有n个特征(key)
等等的建议,谢谢作者

attention unit

您好,请教一个问题,计算attention unit的时候有一个max_seq_len,这里是对输入数据进行了padding吗?因为行为流有长有短

din dataset

请问DIN模型中的训练数据可以提供下嘛

x-deeplearning ESMM AUC 计算bug

感觉x-deeplearning 中auc 计算batch auc,把其中只包含negative的batch 算作invalid 抛弃,这个做法跟tensorflow里面做法不一样。会比较大的影响auc的计算,因为一个batch中没有positive 也会影响全局的FP,auc应该算全局的。
具体请看: alibaba/x-deeplearning#355

ESMM根本跑不通

第一个错误就是esmm.py line46中user_id上一行少了一个逗号。。
想问下这个代码跑过么

预测代码无法运行

user_id = features['user_id']
click_label = features['label']
conversion_label = features['is_conversion']

if mode == tf.estimator.ModeKeys.PREDICT:
predictions = {
'ctr_preds': ctr_preds,
'cvr_preds': cvr_preds,
'ctcvr_preds': ctcvr_preds,
'user_id': user_id,
'click_label': click_label,
'conversion_label': conversion_label
}
以上代码为ESSM第37-50行,从代码逻辑来看click_label,conversion_label是预测的对象,应该对应网络中的tensor啊,为什么从features直接读取啊?我的理解这两个label应该是分别从ctr_preds,ctcvr_preds转换过来的吧

关于模型导出的一些问题

您好,我现在想把训练好的模型文件单独导入然后用做预测。但是使用tf.estimator我无法找到输入接口。查阅了相关资料,适应如像你代码中的tf.estimator.export.RegressionOutput可以定义输出接口,但是我如何定义输入接口,即sess.run(output,feed_dict=???)。output是哪个?feed_dict又该如何赋值?

求问环境搭建

能问下你的工作环境吗?python版本 tensflow版本以及 estimator

esmm训练loss一直在400左右震荡,什么原因?

训练
INFO:tensorflow:global_step/sec: 12.3102
I0511 16:43:03.123719 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 12.3102
INFO:tensorflow:loss = 472.53354, step = 38300 (8.123 sec)
I0511 16:43:03.124620 139644774647616 basic_session_run_hooks.py:260] loss = 472.53354, step = 38300 (8.123 sec)
INFO:tensorflow:global_step/sec: 13.4449
I0511 16:43:10.561764 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 13.4449
INFO:tensorflow:loss = 496.88828, step = 38400 (7.439 sec)
I0511 16:43:10.563358 139644774647616 basic_session_run_hooks.py:260] loss = 496.88828, step = 38400 (7.439 sec)
INFO:tensorflow:global_step/sec: 13.5721
I0511 16:43:17.929780 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 13.5721
INFO:tensorflow:loss = 494.92902, step = 38500 (7.368 sec)
I0511 16:43:17.931165 139644774647616 basic_session_run_hooks.py:260] loss = 494.92902, step = 38500 (7.368 sec)
INFO:tensorflow:global_step/sec: 10.366
I0511 16:43:27.576712 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 10.366
INFO:tensorflow:loss = 477.77087, step = 38600 (9.647 sec)
I0511 16:43:27.578247 139644774647616 basic_session_run_hooks.py:260] loss = 477.77087, step = 38600 (9.647 sec)
INFO:tensorflow:global_step/sec: 13.6566
I0511 16:43:34.899176 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 13.6566
INFO:tensorflow:loss = 469.46252, step = 38700 (7.322 sec)
I0511 16:43:34.900484 139644774647616 basic_session_run_hooks.py:260] loss = 469.46252, step = 38700 (7.322 sec)
INFO:tensorflow:global_step/sec: 14.8222
I0511 16:43:41.645576 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 14.8222
INFO:tensorflow:loss = 505.62067, step = 38800 (6.746 sec)
I0511 16:43:41.646508 139644774647616 basic_session_run_hooks.py:260] loss = 505.62067, step = 38800 (6.746 sec)
INFO:tensorflow:global_step/sec: 14.7337
I0511 16:43:48.432974 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 14.7337
INFO:tensorflow:loss = 508.70572, step = 38900 (6.788 sec)
I0511 16:43:48.434319 139644774647616 basic_session_run_hooks.py:260] loss = 508.70572, step = 38900 (6.788 sec)
INFO:tensorflow:global_step/sec: 14.245
I0511 16:43:55.452730 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 14.245
INFO:tensorflow:loss = 481.75873, step = 39000 (7.019 sec)
I0511 16:43:55.453657 139644774647616 basic_session_run_hooks.py:260] loss = 481.75873, step = 39000 (7.019 sec)
INFO:tensorflow:global_step/sec: 14.1653
I0511 16:44:02.512451 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 14.1653
INFO:tensorflow:loss = 492.90146, step = 39100 (7.060 sec)
I0511 16:44:02.513763 139644774647616 basic_session_run_hooks.py:260] loss = 492.90146, step = 39100 (7.060 sec)
INFO:tensorflow:global_step/sec: 13.9005
I0511 16:44:09.706491 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 13.9005
INFO:tensorflow:loss = 481.75992, step = 39200 (7.194 sec)
I0511 16:44:09.708160 139644774647616 basic_session_run_hooks.py:260] loss = 481.75992, step = 39200 (7.194 sec)
INFO:tensorflow:global_step/sec: 13.2426
I0511 16:44:17.257735 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 13.2426
INFO:tensorflow:loss = 490.50104, step = 39300 (7.551 sec)
I0511 16:44:17.259049 139644774647616 basic_session_run_hooks.py:260] loss = 490.50104, step = 39300 (7.551 sec)
INFO:tensorflow:global_step/sec: 10.1131
I0511 16:44:27.145826 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 10.1131
INFO:tensorflow:loss = 478.1643, step = 39400 (9.888 sec)
I0511 16:44:27.146829 139644774647616 basic_session_run_hooks.py:260] loss = 478.1643, step = 39400 (9.888 sec)
INFO:tensorflow:global_step/sec: 10.4625
I0511 16:44:36.704025 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 10.4625
INFO:tensorflow:loss = 469.9007, step = 39500 (9.559 sec)
I0511 16:44:36.705528 139644774647616 basic_session_run_hooks.py:260] loss = 469.9007, step = 39500 (9.559 sec)
INFO:tensorflow:global_step/sec: 14.7374
I0511 16:44:43.489141 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 14.7374
INFO:tensorflow:loss = 481.07245, step = 39600 (6.785 sec)
I0511 16:44:43.490077 139644774647616 basic_session_run_hooks.py:260] loss = 481.07245, step = 39600 (6.785 sec)
INFO:tensorflow:global_step/sec: 11.0079
I0511 16:44:52.573785 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 11.0079

最终的auc
ctr_accuracy: 0.62085813
ctr_auc: 0.6696427
cvr_accuracy: 0.9009072
cvr_auc: 0.67000365
global_step: 40000
loss: 488.3895

DIN 序列特征

你好,请问一下,在你构建序列特征attention 部分时,看到你写了个din_feature_column.py, 你是想用feature_column 的方法构建序列特征么?但是我看你实现的时候确是下面的代码。
last_click_creativeid = tf.string_to_hash_bucket_fast(features["user_click_creatives_att"], 200000)
creativeid_embeddings = tf.get_variable(name="attention_creativeid_embeddings", dtype=tf.float32,
shape=[200000, 20])
last_click_creativeid_emb = tf.nn.embedding_lookup(creativeid_embeddings, last_click_creativeid)
att_creativeid = tf.string_to_hash_bucket_fast(features["creative_id_att"], 200000)
creativeid_emb = tf.nn.embedding_lookup(creativeid_embeddings, att_creativeid)

代码实现相关

在模型文件esmm.py中,mode为predict模式下,有一个
export_outputs = {
'regression': tf.estimator.export.RegressionOutput(predictions['cvr_preds']) #线上预测需要的
}
这个导出来有什么具体的用处吗?因为你在train.py文件中导出了模型,所以不是很理解这里export_outputs有什么用?

另一个问题是:
session_config = tf.ConfigProto(device_count={'GPU': 1, 'CPU': 10},
inter_op_parallelism_threads=10,
intra_op_parallelism_threads=10
# log_device_placement=True
)
这里多线程的设置有什么理论依据吗?还是经验设置?

特征工程

能说下你特征工程部分为什么要这么做吗?重点是定义了下面两个列表来进行分桶,这样分桶的意义在于?
DayShowSegs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 37, 38, 39, 41, 42, 44, 46, 47, 49, 51, 54, 56, 59, 61, 65, 68, 72, 76, 81, 86, 92, 100, 109, 120, 134, 153, 184, 243, 1195]
DayClickSegs = [1, 2, 3, 6, 23]

DeepFM模型相关疑问

  1. 看build_model_columns只返回了deep_columns,其他的feature好像在DeepFM里面都没有用?
  2. 顺便请问一下对于多值类特征,比如tags这种,用tfrecord如何来保存处理呢?

内存溢出

程序一直卡在这个地方,然后占用的内存暴涨,最后退出,请问博主有遇到过类似问题吗?
20191227163021

about tf-serving

hi,dear,
after Saved the SavedModel according to the official link, I used the tensorflow-model-server reload the SavedModel, but when I use requests to query, I got the bug bellow,

>>> headers
{'content-type': 'application/json'}
>>> data
'{"signature_name": "serving_default", "instances": [{"age": [46.0], "education_num": [10.0], "capital_gain": [7688.0], "capital_loss": [0.0], "hours_per_week": [38.0]}, {"age": [24.0], "education_num": [13.0], "capital_gain": [0.0], "capital_loss": [0.0], "hours_per_week": [50.0]}]}'
>>> jp=requests.post(url2,data=data,headers=headers)
>>> jp.text
'{\n    "error": "Failed to process element: 0 key: age of \'instances\' list. Error: INVALID_ARGUMENT: JSON object: does not have named input: age"\n}'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.