Giter Club home page Giter Club logo

deepctr-torch's Introduction

deepctr-torch's People

Contributors

chenkkkk avatar gaohongkui avatar jyihuo avatar shenweichen avatar tangaqi avatar uestc7d avatar weiyucheng avatar wutongzhang avatar zanshuxun avatar zengai avatar zhangyuef avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepctr-torch's Issues

model save issue

Please refer to the FAQ in doc and search for the related issues before you ask the question.

Describe the question(问题描述)
A clear and concise description of what the question is.
Traceback (most recent call last):
File "test.py", line 113, in
torch.save(model,savePath+"DeepFM.h5")
File "/home/zhangkai/.conda/envs/pytorchDeepCTR/lib/python3.7/site-packages/torch/serialization.py", line 224, in save
return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
File "/home/zhangkai/.conda/envs/pytorchDeepCTR/lib/python3.7/site-packages/torch/serialization.py", line 149, in _with_file_like
return body(f)
File "/home/zhangkai/.conda/envs/pytorchDeepCTR/lib/python3.7/site-packages/torch/serialization.py", line 224, in
return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
File "/home/zhangkai/.conda/envs/pytorchDeepCTR/lib/python3.7/site-packages/torch/serialization.py", line 296, in _save
pickler.dump(obj)
AttributeError: Can't pickle local object 'BaseModel._get_metrics..'
Additional context
Add any other context about the problem here.

Operating environment(运行环境):

  • python version [e.g. 3.6]
  • torch version [e.g. 1.2.0,]
  • deepctr-torch version [e.g. 0.1.0,]

deepctr_torch(version 0.2.1) din can't support pytorch at version 1.5.0 above

Describe the bug(问题描述)
if run the din example code with pytorch 1.5.0 , get error as bellow
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

To solve(解决)

  1. reinstall pytorch to 1.4.0

Operating environment(运行环境):

  • python version [e.g. 3.7]
  • torch version [e.g. 1.5.0]
  • deepctr-torch version [e.g. 0.2.0,]

Additional context
Add any other context about the problem here.

Provide learning rate parameter interface

I found that there is no interface for users to adjust learning rate for specific algorithm, so can u provide a parameter lr when state a algorithm class so that we could change this parameter value

How to export DeepCTR model to onnx ?

Please refer to the FAQ in doc and search for the related issues before you ask the question.

Describe the question(问题描述)
I would like to export model to onnx format, so I can use it for example with Apache PredictionIO

Additional context
I tried with :

    dummy_input = np.array([np.array([[0]]), np.array([[0]])])
    pred_ans = model.predict(dummy_input, 100)
    print(pred_ans)
    torch.onnx.export(model, (dummy_input), "deepCTR.onnx")

predict works well, I am getting: [[0.0065056]]
but wwith onnx.export, I am getting error with this trace:

RuntimeError: Only tuples, lists and Variables supported as JIT inputs/outputs. Dictionaries and strings are also accepted but their usage is not recommended. But got unsupported type numpy.ndarray

Operating environment(运行环境):

  • python version: 3.7
  • torch version: 1.4.0
  • deepctr-torch version: 0.1.0

where is the QR code

Describe the question
I come from zhihu but could not find the QR code u said. Is there any problem with github?e.g prohibition or else?

multi gpus

How to run the demo with multiple GPUs?

din模型问题

Describe the bug(问题描述)
1 din.py文件forward 中keys length为常数1,不表示序列长度
2 部分sparse fear特征取了两次,lookup embedding中mask feature list不起作用

To Reproduce(复现步骤)
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Operating environment(运行环境):

  • python version [e.g. 3.5, 3.6]
  • torch version [e.g. 1.1.0, 1.2.0]
  • deepctr-torch version [e.g. 0.1.0,]

Additional context
Add any other context about the problem here.

The output shape of Multihead_attention in Autoint should be (batchsize, filednum, atten_embedding)

Describe the bug(问题描述)
the change of the feature shape in Multihead_attention is :
(batchsize, filed_num, embedding_size)->(batchsize*h, filed_num, embedding_size/h)
->(batchsize, filed_num, embedding_size)

you can refer to the implementation of multihead_attention that the author referenced.
https://github.com/Kyubyong/transformer/blob/fb023bb097e08d53baf25b46a9da490beba51a21/modules.py#L153

Also, you can use torch.nn.MultiheadAttention directly in pytorch 1.1.0
https://pytorch.org/docs/1.1.0/_modules/torch/nn/modules/activation.html#MultiheadAttention

Thank you!

如果我想实现多分类,应该在哪里进行改动呢

Please refer to the FAQ in doc and search for the related issues before you ask the question.

Describe the question(问题描述)
A clear and concise description of what the question is.

Additional context
Add any other context about the problem here.

Operating environment(运行环境):

  • python version [e.g. 3.6]
  • torch version [e.g. 1.2.0,]
  • deepctr-torch version [e.g. 0.1.0,]

为什么运行会无端卡住

Describe the bug(问题描述)
运行example中的crite_classification时,在训练完一个epoch后就卡住了,无故不往下运行,不知道什么原因,其他的example也一样,第一个epoch的tqdm显示完就卡住了

Operating environment(运行环境):

  • python 3.6
  • pytorch 1.3.1

ZeroDivisionError: float division by zero

get error when run example below in windows 10. The model is default: FiBiNET.

python .\examples\run_classification_criteo.py

Traceback (most recent call last):
  File ".\examples\run_classification_criteo.py", line 54, in <module>
    l2_reg_embedding=1e-5, device=device)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\deepctr_torch\models\fibinet.py", line 53, in __init__
    self.SE = SENETLayer(self.filed_size, reduction_ratio, seed, device)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\deepctr_torch\layers\interaction.py", line 81, in __init__
    nn.Linear(self.filed_size, self.reduction_size, bias=False),
  File "C:\Users\civil\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\linear.py", line 77, in __init__
    self.reset_parameters()
  File "C:\Users\civil\AppData\Roaming\Python\Python37\site-packages\torch\nn\modules\linear.py", line 80, in reset_parameters
    init.kaiming_uniform_(self.weight, a=math.sqrt(5))
  File "C:\Users\civil\AppData\Roaming\Python\Python37\site-packages\torch\nn\init.py", line 316, in kaiming_uniform_
    std = gain / math.sqrt(fan)
ZeroDivisionError: float division by zero

But i just tried deepFM, it seems no float division by zero issue.

Planning to add DIN and DIEN in Torch?

Describe the question(问题描述)
Are you planning to add a PyTorch implementation of DIN and DIEN? If so, what's the current time horizon? Or how could one go about helping with that?

Additional context
Since there already exists a DIEN implementation in Tensorflow, this could be used as a great reference point for converting the model to PyTorch.

alpha should be wrappered with Parameter in Dice

As model.parameters() used in optim, alpha in Dice should use nn.parameter.Parameter to wrapper. Just like alpha in pRelu. Otherwise this paramter will fix to init value in the training process.

a question about whether self.feature_index is in CPU or GPU

Hello,
I am relatively new to Pytorch. I have a question regarding the data in CPU/GPU, for example, in the source code here

def input_from_feature_columns(self, X, feature_columns, embedding_dict, support_dense=True):

In particular,

sparse_embedding_list = [embedding_dict[feat.embedding_name]( X[:, self.feature_index[feat.name][0]:self.feature_index[feat.name][1]].long()) for feat in sparse_feature_columns]

It is pretty clear that X is the minibatch tensor data that has already been moved to GPU. embedding_dict is a Moduledict and feature_index is a plain Python Ordereddict. I had a few basic reading, it sounds like a dictionary object cannot directly be migrated into GPU, but only each tensor for each key can be copied one by one. So I am not sure for example, in the constructor init() of the BaseModel, what self.to(device) is going to do for its dictionary object?

That comes to my more specific question here, I am wondering when calling to build sparse_embedding_list, how Pytorch handles this, is everything here already in GPU when calling init() or some runtime copy of the value self.feature_index[feat.name] into GPU, does it incur any efficiency issue?

Thank you for your help!

导入包时,有时会有警告

测试用例:

from deepctr_torch.models import WDL,DeepFM
from deepctr_torch.inputs import  SparseFeat, DenseFeat,get_fixlen_feature_names

然后出现警告
/usr/local/anaconda3/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
return f(*args, **kwds)

我的版本是python3.7
貌似问题不严重,只不过希望能找出问题出在哪里

Linear model weight parameters (CUDA) are not included in state_dict()

Describe the bug
Linear model weight parameters are not included in state_dict() after sending to CUDA. I found this problem when loading the serialized model parameters with torch.save(model.state_dict(), PATH)

To Reproduce
The tensor is sent to other device after creating nn.Parameter, and so the returned instance is not a trainable tensor anymore. To avoid this, the to_(device) should be called first when creating a tensor, followed by creating nn.Parameter.

self.weight = nn.Parameter(torch.Tensor(sum(fc.dimension for fc in self.dense_feature_columns), 1)).to(
device)

Similar issue can be found in pytorch forum.
https://discuss.pytorch.org/t/cuda-parameter-not-included-in-state-dict/30801

Operating environment:

  • python version 3.7
  • torch version 1.4.0
  • deepctr-torch version 1.0.3

想请教一下,训练完模型之后想实时预测,输入数据怎么处理呀

Please refer to the FAQ in doc and search for the related issues before you ask the question.

Describe the question(问题描述)
A clear and concise description of what the question is.
想请教一下,训练完模型之后想实时预测,输入数据怎么处理呀,尤其是sparse feature,不知道模型训练的时候是否有保存feature映射表
Additional context
Add any other context about the problem here.

Operating environment(运行环境):

  • python version [e.g. 3.6]
  • torch version [e.g. 1.2.0,]
  • deepctr-torch version [e.g. 0.1.0,]

A question about the AUC reduction after the version update

Describe the question
I have tried the 0.2.1 and 0.2.3 versions of DeepCTR respectively, but the same data set and parameters obtained a large difference in AUC and Loss in different versions of DeepCTR.In this experiment, I tried FibiNet and NFM respectively. The problem I encountered was that the AUC of DeepCTR version 0.2.3 was lower than version 0.2.1

Additional context
I have uploaded two versions of the project to Baidu Netdisk.The test data is a 25M Criteo data set randomly selected. After downloading, run the run.py of the two projects to see different AUC running results.This is the download link:

链接: https://pan.baidu.com/s/1191pHvL3wMaCM5TsAo4jgA 提取码: hexw

Please download and troubleshoot this issue, thank you again for your contribution.

Operating environment

  • python version 3.6.5
  • torch version 1.6.0
  • deepctr-torch version0.2.1&0.2.3

pred_ans 输出的测试样本的预测值为什么全都小于1

Describe the bug(问题描述)
运行这个example

run_regression_movielens.py 结果所有的预测 pred_ans (line 45) 都非常小,运行了几次几乎全都小于1. 这是因为数据集太少还是模型的参数没调好?MSE 非常大

pred_ans :(每一行是每个测试样本的预测rating的值 )
array([[0.301614 ],
[0.3022644 ],
[0.32717213],
[0.2999353 ],
[0.30203775],
[0.3013655 ],
[0.31287715],
[0.3015241 ],
[0.28850323],
[0.30220094],
[0.30088642],
[0.30200478],
[0.30210623],
[0.30099627],
[0.30173683],
[0.30157563],
[0.31417343],
[0.3014299 ],
[0.300555 ],
[0.3021357 ],
[0.30141425],
[0.30120167],
[0.31250164],
[0.30241737],
[0.3014774 ],
[0.28925306],
[0.30212253],
[0.3021568 ],
[0.30231932],
[0.30191606],
[0.3141842 ],
[0.30059364],
[0.30211878],
[0.30189154],
[0.30140838],
[0.30042845],
[0.30151388],
[0.30248943],
[0.32791406],
[0.3018949 ]], dtype=float32)

真是的预测值 rating 在测试集上是: [3, 5, 3, 5, 4, 4, 5, 5, 4, 3, 3, 3, 3, 2, 3, 2, 4, 5, 4, 2, 3, 3, 4, 1, 5, 4, 3, 2, 2, 4, 3, 5, 2, 3, 3, 3, 3, 1, 3, 4]

Pytest for AFM model is not stable

Describe the bug(问题描述)

If we run several pytest, some of them would result in failure of tests/models/AFM_test.py

To Reproduce(复现步骤)

Steps to reproduce the behavior:

  1. Go to main directory: DeepCTR-Torch/
  2. Run pytest in command line for multiple times
  3. See error

Operating environment(运行环境):

  • python version: 3.6.4
  • torch version: 1.3.1
  • deepctr-torch version: dev branch

Additional context

image

Details of the error:
image

when prediction value is very small, metric logloss calculate error

Describe the bug(问题描述)
/usr/local/anaconda3/lib/python3.7/site-packages/sklearn/metrics/classification.py:2174: RuntimeWarning: divide by zero encountered in log
loss = -(transformed_labels * np.log(y_pred)).sum(axis=1)

To Reproduce(复现步骤)
Steps to reproduce the behavior:

model = xDeepFM(linear_feature_columns,dnn_feature_columns,task='binary',device=device)
model.compile("adam", "binary_crossentropy",
              metrics=['log_loss'], )

Operating environment(运行环境):

  • python version [3.7]
  • torch version [1.1.0]
  • deepctr-torch version 0.1.2
  • sklearn version 0.0

Additional context
my model and data will predict very small value like 0.0001 and value close to 1 like 0.9998 even appear 0 and 1. so it will make skearn metrci caculate error.

运行How to add a long dense feature vector as a input to the model?报错

Describe the bug(问题描述)
A clear and concise description of what the bug is.

To Reproduce(复现步骤)
Steps to reproduce the behavior:
https://deepctr-torch.readthedocs.io/en/latest/FAQ.html#how-to-add-a-long-dense-feature-vector-as-a-input-to-the-model 网站中提供的代码运行错误。
basemodel.py中的
np.hstack(list(map(lambda x: np.expand_dims(x, axis=1), x))))
报错
ValueError: all the input array dimensions except for the concatenation axis must match exactly

Operating environment(运行环境):

  • python version 3.7
  • torch version 1.2.0
  • deepctr-torch version 0.1.1

Additional context
Add any other context about the problem here.

Example error run_classification_criteo.py

Describe the bug(问题描述)
A clear and concise description of what the bug is.
cuda ready...
cuda:0
Train on 160 samples, validate on 0 samples, 5 steps per epoch
Traceback (most recent call last):
File "run_classification_criteo.py", line 62, in
batch_size=32, epochs=10, validation_split=0.0, verbose=2)
File "/home/xinyu/anaconda3/lib/python3.7/site-packages/deepctr_torch/models/basemodel.py", line 224, in fit
total_loss.backward(retain_graph=True)
File "/home/xinyu/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/xinyu/anaconda3/lib/python3.7/site-packages/torch/autograd/init.py", line 100, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [128]], which is output 0 of SelectBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

To Reproduce(复现步骤)
Steps to reproduce the behavior:

  1. Just run run_classification_criteo.py

Operating environment(运行环境):

  • python version [3.7]
  • torch version [1.5.0]

Additional context
Add any other context about the problem here.

请问还是不是不支持 变长序列跟sparse features共享emb?

Please refer to the FAQ in doc and search for the related issues before you ask the question.

Describe the question(问题描述)
A clear and concise description of what the question is.

Additional context
Add any other context about the problem here.

Operating environment(运行环境):

  • python version [e.g. 3.6]
  • torch version [e.g. 1.2.0,]
  • deepctr-torch version [e.g. 0.1.0,]

deepctr-torch版本怎么early_stopping呢?

Please refer to the FAQ in doc and search for the related issues before you ask the question.

Describe the question(问题描述)
A clear and concise description of what the question is.

Additional context
Add any other context about the problem here.

Operating environment(运行环境):

  • python version [e.g. 3.6]
  • torch version [e.g. 1.2.0,]
  • deepctr-torch version [e.g. 0.1.0,]

AFM的bias没有初始化,导致实验结果不同。

Describe the bug(问题描述)
AFM的bias没有初始化,导致实验结果不同。

To Reproduce(复现步骤)
由于这里bias没有初始化,且有梯度,导致不论是否使用attention network,每次结果均不同,且差异极大,一开始以为是随机种子问题,固定了之后,发现还是有问题。最后发现,原来有参数没有初始化。

Operating environment(运行环境):

  • python version [3.8]
  • torch version [1.4.0]
  • deepctr-torch version [0.8.0,]

Additional context
可以加上这一句

for tensor in [self.attention_b]:
    nn.init.zeros_(tensor,)

index out of range error when I use trained model to predict on another data set

I trained a classification model using DeepFM model, it worked well.
I load this trained model and use it to predict on another data set, error occurred. the error message is like below.

The root cause is on functional.py, caused by the following instrument, len(weight) =35, and an element in input is 35, so index out of range error occurred. How can I fix this?

    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

Error Message
Traceback (most recent call last):
File "D:/DeepCTR-Torch/examples/run_classification_churn.py", line 134, in
pred_ans = model.predict(test_model_input, 256)
File "D:\DeepCTR-Torch\deepctr_torch\models\basemodel.py", line 300, in predict
y_pred = model(x).cpu().data.numpy() # .squeeze()
File "D:\anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "D:\DeepCTR-Torch\deepctr_torch\models\deepfm.py", line 73, in forward
self.embedding_dict)
File "D:\DeepCTR-Torch\deepctr_torch\models\basemodel.py", line 324, in input_from_feature_columns
feat in sparse_feature_columns]
File "D:\DeepCTR-Torch\deepctr_torch\models\basemodel.py", line 324, in
feat in sparse_feature_columns]
File "D:\anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "D:\anaconda3\envs\torch\lib\site-packages\torch\nn\modules\sparse.py", line 114, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "D:\anaconda3\envs\torch\lib\site-packages\torch\nn\functional.py", line 1485, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: index out of range: Tried to access index 35 out of table with 34 rows. at C:\w\1\s\tmp_conda_3.6_171155\conda\conda-bld\pytorch_1570813991702\work\aten\src\TH/generic/THTensorEvenMoreMath.cpp:418

想请教一下怎么调用GPU

您好,我跑了example里的几个程序,直接用nvprof查看GPU运行情况,都显示:No CUDA Application was profiled.
请问几个model中有可以调用GPU的吗?如果有,又怎么用工具(e.g. nvprof)显示调用中的GPU性能呢?

InteractingLayer的softmax dim = 1 是不是错了感觉应该等于3

Describe the bug(问题描述)
A clear and concise description of what the bug is.

To Reproduce(复现步骤)
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Operating environment(运行环境):

  • python version [e.g. 3.5, 3.6]
  • torch version [e.g. 1.1.0, 1.2.0]
  • deepctr-torch version [e.g. 0.1.0,]

Additional context
Add any other context about the problem here.

Pytorch 1.7.0 RuntimeError

Describe the bug(问题描述)
Install deepctr-torch using pip install -U deepctr-torch, run criteo example but get an error:
RuntimeError: Output 0 of UnbindBackward is a view and its base or another view of its base has been modified inplace. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.

To Reproduce(复现步骤)
Steps to reproduce the behavior:

  1. conda create -n deepctr-torch
  2. conda activate deepctr-torch
  3. pip install -U deepctr-torch
  4. git clone https://github.com/shenweichen/DeepCTR-Torch.git
  5. cd ./DeepCTR-Torch/examples
  6. python run_classification_criteo.py
  7. get the RuntimeError

Operating environment(运行环境):

  • python version [3.8.3]
  • torch version [1.7.0]
  • deepctr-torch version [0.2.3]
  • deepctr version [0.8.2]

Additional context
Roll back pytorch to 1.6.0 and it runs normally.

关于BaseModel类的fit方法的疑问

Please refer to the FAQ in doc and search for the related issues before you ask the question.

Describe the question(问题描述)
A clear and concise description of what the question is.

Additional context
` with tqdm(enumerate(train_loader), disable=verbose != 1) as t:
for index, (x_train, y_train) in t:
x = x_train.to(self.device).float()
y = y_train.to(self.device).float()

                    y_pred = model(x).squeeze()

                    optim.zero_grad()
                    loss = loss_func(y_pred, y.squeeze(), reduction='sum')

                    total_loss = loss + self.reg_loss + self.aux_loss

                    loss_epoch += loss.item()
                    total_loss_epoch += total_loss.item()
                    total_loss.backward(retain_graph=True)
                    optim.step()`

1.self.reg_loss和self.aux_loss应该是每次前向传播都需要计算一次吧,我现在只看到初始化的时候计算了一次
2.保留计算图的目的是什么,并没有第二次的反向传播啊?是跟1有关吗?难道能自动更新reg_loss的值吗?

TensorDataset with DataLoader could lead bad performance

Describe the bug(问题描述)
TensorDataset with DataLoader will lead to slow data reading. Use simple tensor slicing could have ~4x speed improvement (especially in large batch size).

Additional context
Relative code:

train_tensor_data = Data.TensorDataset(
torch.from_numpy(
np.concatenate(x, axis=-1)),
torch.from_numpy(y))
if batch_size is None:
batch_size = 256
train_loader = DataLoader(
dataset=train_tensor_data, shuffle=shuffle, batch_size=batch_size)

Relative discussion:

pytorch/pytorch#4959

Categorical Columns Handling + Few specific questions

Hi,
I'd like to ask few questions about the algorithm:

  1. Categorical Columns handling -
    In case I have a dataset of appID, advertisementID, country and deviceOS, and label (CTR). - e.g. 123, 1234, US, Android, 0.15 (some aggregation level).
    Also, a categorical value can be "userId" for example (userId <> advertisementID - instead appId <> advertisementID)
    How does the appID and advertisementID being handled?
    I read that the embedding you guys are doing is good for high cardinality values (e.g. appId / advID - thousands of possible values).

  2. Data Limitations:
    Also, is there any limitation to number of categorical columns? Number of columns overall? Number of Data points?
    I'm asking this as I am used to use Spark up till now with distributed training, so I am wondering what could be the best way for me to migrate.

  3. Unseen values:
    Also, what about categories that were not in the train but are in the test? How should one tackle them?
    I'd love to even go on a call with any of you just to make those questions much more clear, so maybe then I'll be able to add some documentation to your site about "this is the base dataset" ---> "this is the new dataset", or something like that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.