qiaoguan / deep-ctr-prediction Goto Github PK
View Code? Open in Web Editor NEWCTR prediction models based on deep learning(基于深度学习的广告推荐CTR预估模型)
Home Page: https://github.com/qiaoguan/deep-ctr-prediction
CTR prediction models based on deep learning(基于深度学习的广告推荐CTR预估模型)
Home Page: https://github.com/qiaoguan/deep-ctr-prediction
您好,您能帮忙解释下DIN代码中用到的数据集是什么意思嘛,主要是:product_id_att,creative_id_att,user_click_products_att,user_click_products_att,因为官方的代码里没有用过多的特征,感觉您的代码比较有实用性,烦请帮忙解答一下含义和数据格式,谢谢~
一般做法是用spark生成tfrecord再拉到本地GPU跑,但是如果CTR数据集如果小的话(2000W条以下,50个特征以下),能读进内存里,发现用tf.data.TFRecordDataset反而很慢。
本人用pandas读进内存再用tf.keras.utils.sequence构造数据生成器,大概只需要20ms/step
但用tf.data.TFRecordDataset就上升到了2s/step
总共21个step每个epoch
不知道作者是否有些建议给我,指点迷津。
例如生成tfrecord的内部格式是如何的,我现在是一个record有n个特征(key)
等等的建议,谢谢作者
您好,请教一个问题,计算attention unit的时候有一个max_seq_len,这里是对输入数据进行了padding吗?因为行为流有长有短
我想问一下在train.py中的tf.estimator.Estimator()定义内的参数model_fn应该赋值什么,您赋值的din_model_fn是什么意思
Parameter Server架构还是All Reduce架构?
CPU还是GPU?
有没有开源代码参考?
用不用改TensorFlow源码?
性价比最高的方案是?
比如有user有1000W,item有1000W,那么要有 1000W*1000W = 1000000亿 的特征数据?
请问DIN模型中的训练数据可以提供下嘛
感觉x-deeplearning 中auc 计算batch auc,把其中只包含negative的batch 算作invalid 抛弃,这个做法跟tensorflow里面做法不一样。会比较大的影响auc的计算,因为一个batch中没有positive 也会影响全局的FP,auc应该算全局的。
具体请看: alibaba/x-deeplearning#355
第一个错误就是esmm.py line46中user_id上一行少了一个逗号。。
想问下这个代码跑过么
user_id = features['user_id']
click_label = features['label']
conversion_label = features['is_conversion']
if mode == tf.estimator.ModeKeys.PREDICT:
predictions = {
'ctr_preds': ctr_preds,
'cvr_preds': cvr_preds,
'ctcvr_preds': ctcvr_preds,
'user_id': user_id,
'click_label': click_label,
'conversion_label': conversion_label
}
以上代码为ESSM第37-50行,从代码逻辑来看click_label,conversion_label是预测的对象,应该对应网络中的tensor啊,为什么从features直接读取啊?我的理解这两个label应该是分别从ctr_preds,ctcvr_preds转换过来的吧
您好,我现在想把训练好的模型文件单独导入然后用做预测。但是使用tf.estimator我无法找到输入接口。查阅了相关资料,适应如像你代码中的tf.estimator.export.RegressionOutput可以定义输出接口,但是我如何定义输入接口,即sess.run(output,feed_dict=???)。output是哪个?feed_dict又该如何赋值?
能问下你的工作环境吗?python版本 tensflow版本以及 estimator
hi,dear大佬
请教下Youtube2016的召回阶段能计算AUC吗?
或者说召回阶段能计算AUC吗?
这种next item的推荐中有没有计算AUC的方法啊?
多谢大佬
训练
INFO:tensorflow:global_step/sec: 12.3102
I0511 16:43:03.123719 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 12.3102
INFO:tensorflow:loss = 472.53354, step = 38300 (8.123 sec)
I0511 16:43:03.124620 139644774647616 basic_session_run_hooks.py:260] loss = 472.53354, step = 38300 (8.123 sec)
INFO:tensorflow:global_step/sec: 13.4449
I0511 16:43:10.561764 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 13.4449
INFO:tensorflow:loss = 496.88828, step = 38400 (7.439 sec)
I0511 16:43:10.563358 139644774647616 basic_session_run_hooks.py:260] loss = 496.88828, step = 38400 (7.439 sec)
INFO:tensorflow:global_step/sec: 13.5721
I0511 16:43:17.929780 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 13.5721
INFO:tensorflow:loss = 494.92902, step = 38500 (7.368 sec)
I0511 16:43:17.931165 139644774647616 basic_session_run_hooks.py:260] loss = 494.92902, step = 38500 (7.368 sec)
INFO:tensorflow:global_step/sec: 10.366
I0511 16:43:27.576712 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 10.366
INFO:tensorflow:loss = 477.77087, step = 38600 (9.647 sec)
I0511 16:43:27.578247 139644774647616 basic_session_run_hooks.py:260] loss = 477.77087, step = 38600 (9.647 sec)
INFO:tensorflow:global_step/sec: 13.6566
I0511 16:43:34.899176 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 13.6566
INFO:tensorflow:loss = 469.46252, step = 38700 (7.322 sec)
I0511 16:43:34.900484 139644774647616 basic_session_run_hooks.py:260] loss = 469.46252, step = 38700 (7.322 sec)
INFO:tensorflow:global_step/sec: 14.8222
I0511 16:43:41.645576 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 14.8222
INFO:tensorflow:loss = 505.62067, step = 38800 (6.746 sec)
I0511 16:43:41.646508 139644774647616 basic_session_run_hooks.py:260] loss = 505.62067, step = 38800 (6.746 sec)
INFO:tensorflow:global_step/sec: 14.7337
I0511 16:43:48.432974 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 14.7337
INFO:tensorflow:loss = 508.70572, step = 38900 (6.788 sec)
I0511 16:43:48.434319 139644774647616 basic_session_run_hooks.py:260] loss = 508.70572, step = 38900 (6.788 sec)
INFO:tensorflow:global_step/sec: 14.245
I0511 16:43:55.452730 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 14.245
INFO:tensorflow:loss = 481.75873, step = 39000 (7.019 sec)
I0511 16:43:55.453657 139644774647616 basic_session_run_hooks.py:260] loss = 481.75873, step = 39000 (7.019 sec)
INFO:tensorflow:global_step/sec: 14.1653
I0511 16:44:02.512451 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 14.1653
INFO:tensorflow:loss = 492.90146, step = 39100 (7.060 sec)
I0511 16:44:02.513763 139644774647616 basic_session_run_hooks.py:260] loss = 492.90146, step = 39100 (7.060 sec)
INFO:tensorflow:global_step/sec: 13.9005
I0511 16:44:09.706491 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 13.9005
INFO:tensorflow:loss = 481.75992, step = 39200 (7.194 sec)
I0511 16:44:09.708160 139644774647616 basic_session_run_hooks.py:260] loss = 481.75992, step = 39200 (7.194 sec)
INFO:tensorflow:global_step/sec: 13.2426
I0511 16:44:17.257735 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 13.2426
INFO:tensorflow:loss = 490.50104, step = 39300 (7.551 sec)
I0511 16:44:17.259049 139644774647616 basic_session_run_hooks.py:260] loss = 490.50104, step = 39300 (7.551 sec)
INFO:tensorflow:global_step/sec: 10.1131
I0511 16:44:27.145826 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 10.1131
INFO:tensorflow:loss = 478.1643, step = 39400 (9.888 sec)
I0511 16:44:27.146829 139644774647616 basic_session_run_hooks.py:260] loss = 478.1643, step = 39400 (9.888 sec)
INFO:tensorflow:global_step/sec: 10.4625
I0511 16:44:36.704025 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 10.4625
INFO:tensorflow:loss = 469.9007, step = 39500 (9.559 sec)
I0511 16:44:36.705528 139644774647616 basic_session_run_hooks.py:260] loss = 469.9007, step = 39500 (9.559 sec)
INFO:tensorflow:global_step/sec: 14.7374
I0511 16:44:43.489141 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 14.7374
INFO:tensorflow:loss = 481.07245, step = 39600 (6.785 sec)
I0511 16:44:43.490077 139644774647616 basic_session_run_hooks.py:260] loss = 481.07245, step = 39600 (6.785 sec)
INFO:tensorflow:global_step/sec: 11.0079
I0511 16:44:52.573785 139644774647616 basic_session_run_hooks.py:692] global_step/sec: 11.0079
最终的auc
ctr_accuracy: 0.62085813
ctr_auc: 0.6696427
cvr_accuracy: 0.9009072
cvr_auc: 0.67000365
global_step: 40000
loss: 488.3895
你好,请问一下,在你构建序列特征attention 部分时,看到你写了个din_feature_column.py, 你是想用feature_column 的方法构建序列特征么?但是我看你实现的时候确是下面的代码。
last_click_creativeid = tf.string_to_hash_bucket_fast(features["user_click_creatives_att"], 200000)
creativeid_embeddings = tf.get_variable(name="attention_creativeid_embeddings", dtype=tf.float32,
shape=[200000, 20])
last_click_creativeid_emb = tf.nn.embedding_lookup(creativeid_embeddings, last_click_creativeid)
att_creativeid = tf.string_to_hash_bucket_fast(features["creative_id_att"], 200000)
creativeid_emb = tf.nn.embedding_lookup(creativeid_embeddings, att_creativeid)
在模型文件esmm.py中,mode为predict模式下,有一个
export_outputs = {
'regression': tf.estimator.export.RegressionOutput(predictions['cvr_preds']) #线上预测需要的
}
这个导出来有什么具体的用处吗?因为你在train.py文件中导出了模型,所以不是很理解这里export_outputs有什么用?
另一个问题是:
session_config = tf.ConfigProto(device_count={'GPU': 1, 'CPU': 10},
inter_op_parallelism_threads=10,
intra_op_parallelism_threads=10
# log_device_placement=True
)
这里多线程的设置有什么理论依据吗?还是经验设置?
能说下你特征工程部分为什么要这么做吗?重点是定义了下面两个列表来进行分桶,这样分桶的意义在于?
DayShowSegs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 37, 38, 39, 41, 42, 44, 46, 47, 49, 51, 54, 56, 59, 61, 65, 68, 72, 76, 81, 86, 92, 100, 109, 120, 134, 153, 184, 243, 1195]
DayClickSegs = [1, 2, 3, 6, 23]
hi,dear,
after Saved the SavedModel according to the official link, I used the tensorflow-model-server reload the SavedModel, but when I use requests to query, I got the bug bellow,
>>> headers
{'content-type': 'application/json'}
>>> data
'{"signature_name": "serving_default", "instances": [{"age": [46.0], "education_num": [10.0], "capital_gain": [7688.0], "capital_loss": [0.0], "hours_per_week": [38.0]}, {"age": [24.0], "education_num": [13.0], "capital_gain": [0.0], "capital_loss": [0.0], "hours_per_week": [50.0]}]}'
>>> jp=requests.post(url2,data=data,headers=headers)
>>> jp.text
'{\n "error": "Failed to process element: 0 key: age of \'instances\' list. Error: INVALID_ARGUMENT: JSON object: does not have named input: age"\n}'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.