yandex-research / rtdl Goto Github PK
View Code? Open in Web Editor NEWResearch on Tabular Deep Learning: Papers & Packages
License: Apache License 2.0
Research on Tabular Deep Learning: Papers & Packages
License: Apache License 2.0
Hi Yury,
Thank you for your excellent work. I get a problem when handling categorical features. Do I need to pre-train the embedding layer when applying it to the data processing or just to attach the embedding layer to the model and train it with the model.
When we run !pip install rtdl, it said
Collecting rtdl
Downloading rtdl-0.0.13-py3-none-any.whl (23 kB)
Requirement already satisfied: numpy<2,>=1.18 in /usr/local/lib/python3.10/dist-packages (from rtdl) (1.22.4)
Collecting torch<2,>=1.7 (from rtdl)
Downloading torch-1.13.1-cp310-cp310-manylinux1_x86_64.whl (887.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 887.5/887.5 MB 1.9 MB/s eta 0:00:00
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch<2,>=1.7->rtdl) (4.7.1)
Collecting nvidia-cuda-runtime-cu11==11.7.99 (from torch<2,>=1.7->rtdl)
Downloading nvidia_cuda_runtime_cu11-11.7.99-py3-none-manylinux1_x86_64.whl (849 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 849.3/849.3 kB 71.1 MB/s eta 0:00:00
Collecting nvidia-cudnn-cu11==8.5.0.96 (from torch<2,>=1.7->rtdl)
Downloading nvidia_cudnn_cu11-8.5.0.96-2-py3-none-manylinux1_x86_64.whl (557.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 557.1/557.1 MB 2.1 MB/s eta 0:00:00
Collecting nvidia-cublas-cu11==11.10.3.66 (from torch<2,>=1.7->rtdl)
Downloading nvidia_cublas_cu11-11.10.3.66-py3-none-manylinux1_x86_64.whl (317.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 317.1/317.1 MB 2.6 MB/s eta 0:00:00
Collecting nvidia-cuda-nvrtc-cu11==11.7.99 (from torch<2,>=1.7->rtdl)
Downloading nvidia_cuda_nvrtc_cu11-11.7.99-2-py3-none-manylinux1_x86_64.whl (21.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.0/21.0 MB 77.3 MB/s eta 0:00:00
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch<2,>=1.7->rtdl) (67.7.2)
Requirement already satisfied: wheel in /usr/local/lib/python3.10/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch<2,>=1.7->rtdl) (0.40.0)
Installing collected packages: nvidia-cuda-runtime-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cublas-cu11, nvidia-cudnn-cu11, torch, rtdl
Attempting uninstall: torch
Found existing installation: torch 2.0.1+cu118
Uninstalling torch-2.0.1+cu118:
Successfully uninstalled torch-2.0.1+cu118
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchaudio 2.0.2+cu118 requires torch==2.0.1, but you have torch 1.13.1 which is incompatible.
torchdata 0.6.1 requires torch==2.0.1, but you have torch 1.13.1 which is incompatible.
torchtext 0.15.2 requires torch==2.0.1, but you have torch 1.13.1 which is incompatible.
torchvision 0.15.2+cu118 requires torch==2.0.1, but you have torch 1.13.1 which is incompatible.
Successfully installed nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 rtdl-0.0.13 torch-1.13.1
There is a VAE model from microsoft on tabular data:
Modelling individual column distribution
Modelling copulae between columns.
Hello,
I am trying to use PiecewiseLinearEncoder(). I think I found a few typos. Please check my work.
I first ran into an issue in piecewise_linear_encoding where I got the error in line 618 saying "RuntimeError: The size of tensor a (3688) must match the size of tensor b (32) at non-singleton dimension 1"
I dug into the code and found that when PiecewiseLinearEncoder was calling piecewise_linear_encoding the positional arguments of indices and ratios were switched in the former from what was expected in the latter.
Additionally, when inspecting piecewise_linear_encoding it looks like bin_edges = as_tensor(bin_ratios) not "as_tensor(bin_edges)" which would make more sense.
Can you please check this out? Much appreciated.
Hello! I have written a scikit-learn interface for the RTDL package (https://github.com/hengzhe-zhang/scikit-rtdl). I rely on the skorch to avoid coding errors, and set the default parameters based on the parameters presented in your paper. Hoping you will like it!
I ran your model in colab for a few hours before google terminated it. I used pickle.dump/load to store the trained model. It works to make predictions but it doesn't seem to be able to resume training.
if progress.success:
print(' <<< BEST VALIDATION EPOCH', end='')
with open(mydrive+jobname, 'wb') as filehandler:
dump((model, y_std, y_mean),filehandler)
#we could see result was improving
with open(mydrive+jobname, 'rb') as filehandler:
model, y_std, y_mean = load(filehandler)
pred=model(batch,None) #this seems to work
for epoch in range(1, n_epochs + 1):
for iteration, batch_idx in enumerate(train_loader):
model.train()
optimizer.zero_grad()
x_batch = X['train'][batch_idx]
y_batch = y['train'][batch_idx]
loss = loss_fn(apply_model(x_batch).squeeze(1), y_batch)
loss.backward()
optimizer.step()
if iteration % report_frequency == 0:
print(f'(epoch) {epoch} (batch) {iteration} (loss) {loss.item():.4f}')
#no improvement any more. even the model was dumped immediately after created.
what is the right way to store the model so that I can resume the training?
This project is interesting and I want to use it as the baseline algorithm for my paper. However, it seems that I need to take several steps in order to make a prediction. Consequently, is it possible to provide a scikit-learn interface for making a convenient comparison between different algorithms?
Hi! I am trying to understand the usage of python package zero, which is used in the example of rtdl. But I found that the linkage in the comment line of the code is not available anymore.
Here is the invalid link:
https://yura52.github.io/zero/0.0.4/reference/api/zero.improve_reproducibility.html
I am wondering is there any other document? Thank you!
Regards.
Namely, uncomment this block. To do that, the function must also take bin_edges
as an argument
"is_last_bin = bin_indices + 1 == torch.as_tensor(list(map(len, bin_edges)))is_last_bin = bin_indices + 1 == torch.as_tensor(list(map(len, bin_edges)))" should be "is_last_bin = bin_indices + 1 == torch.as_tensor(list(map(len, bin_edges)))is_last_bin = bin_indices + 1 == torch.as_tensor(list(map(len, bin_edges))).unsqueeze(-1))", to use tensor broadcast func
Hi ! are there any examples for extracting embedings for custom tabular data ?
regards
the condition here is broken. it should be {int: 'int', float: 'float'}[type(left)]
instead of str(type(left))
I use the sample code to prepare the dataset:
device = 'cpu'
dataset = sklearn.datasets.fetch_california_housing()
task_type = 'regression'
X_all = dataset['data'].astype('float32')
y_all = dataset['target'].astype('float32')
n_classes = None
X = {}
y = {}
X['train'], X['test'], y['train'], y['test'] = sklearn.model_selection.train_test_split(
X_all, y_all, train_size=0.8
)
X['train'], X['val'], y['train'], y['val'] = sklearn.model_selection.train_test_split(
X['train'], y['train'], train_size=0.8
)
# not the best way to preprocess features, but enough for the demonstration
preprocess = sklearn.preprocessing.StandardScaler().fit(X['train'])
X = {
k: torch.tensor(preprocess.fit_transform(v), device=device)
for k, v in X.items()
}
y = {k: torch.tensor(v, device=device) for k, v in y.items()}
# !!! CRUCIAL for neural networks when solving regression problems !!!
y_mean = y['train'].mean().item()
y_std = y['train'].std().item()
y = {k: (v - y_mean) / y_std for k, v in y.items()}
y = {k: v.float() for k, v in y.items()}
And I train a LGBMRegressor with the default hyper parameters:
model = lgb.LGBMRegressor()
model.fit(X['train'], y['train'])
But when I evaluate on the test fold, I found the performance is 0.68:
>>> test_pred = model.predict(X['test'])
>>> test_pred = torch.from_numpy(test_pred)
>>> rmse = torch.nn.functional.mse_loss(
>>> test_pred.view(-1), y['test'].view(-1)) ** 0.5 * y_std
>>> print(f'Test RMSE: {rmse:.2f}.')
Test RMSE: 0.68.
Even using the model from rtdl
gives me 0.56 RMSE:
(epoch) 57 (batch) 0 (loss) 0.1885
(epoch) 57 (batch) 10 (loss) 0.1315
(epoch) 57 (batch) 20 (loss) 0.1735
(epoch) 57 (batch) 30 (loss) 0.1197
(epoch) 57 (batch) 40 (loss) 0.1952
(epoch) 57 (batch) 50 (loss) 0.1167
Epoch 057 | Validation score: 0.7334 | Test score: 0.5612 <<< BEST VALIDATION EPOCH
Is there anything I miss? How can I reproduce the performance in your paper? Thanks!
Dear maintainers,
I've been using your package for a while now (especially for the FTT model). I've never encountered any trouble, and it helped me boost my performance on a tabular dataset.
Recently, I've been doing some ablation studies, i.e., I've removed some input features to check if the performance of the model would decrease (and if yes, at what point). I've discovered that, when using several GPUs (2x RTX 2080 Ti), the training fails when it has a certain number of input features (but not always, it really depends of the number of input features).
I'm using torch.nn.DataParallel
to implement data parallelism, and for some reason related to my framework I don't wish to use torch.nn.parallel.DistributedDataParallel
.
Here's a minimal reproducible example to prove my point:
import scipy.sparse # scipy is 1.10.1
import torch # torch is 1.13.1
import rtdl # rtdl is 0.0.13
import torchmetrics # torchmetrics is 0.11.4
import multiprocessing
import numpy as np # numpy is 1.24.3
class SparseDataset(torch.utils.data.Dataset):
def __init__(self, X, y):
self.X = X # Sparse input data
self.y = y # Target data
def __getitem__(self, index):
X = torch.from_numpy(self.X[index].toarray()[0]).float() # Convert the sparse input data to a dense tensor
y = self.y[index] # Target data
return X, y
def __len__(self):
return self.X.shape[0] # Number of items in the dataset
class FTT(rtdl.FTTransformer):
def __init__(self, n_num_features=None, cat_cardinalities=None, d_token=16, n_blocks=1, attention_n_heads=4, attention_dropout=0.3, attention_initialization='kaiming', attention_normalization='LayerNorm', ffn_d_hidden=16, ffn_dropout=0.1, ffn_activation='ReGLU', ffn_normalization='LayerNorm', residual_dropout=0.0, prenormalization=True, first_prenormalization=False, last_layer_query_idx=[-1], n_tokens=None, kv_compression_ratio=0.004, kv_compression_sharing='headwise', head_activation='ReLU', head_normalization='LayerNorm', d_out=None):
feature_tokenizer = rtdl.FeatureTokenizer(
n_num_features=n_num_features,
cat_cardinalities=cat_cardinalities,
d_token=d_token
)
transformer = rtdl.Transformer(
d_token=d_token,
n_blocks=n_blocks,
attention_n_heads=attention_n_heads,
attention_dropout=attention_dropout,
attention_initialization=attention_initialization,
attention_normalization=attention_normalization,
ffn_d_hidden=ffn_d_hidden,
ffn_dropout=ffn_dropout,
ffn_activation=ffn_activation,
ffn_normalization=ffn_normalization,
residual_dropout=residual_dropout,
prenormalization=prenormalization,
first_prenormalization=first_prenormalization,
last_layer_query_idx=last_layer_query_idx,
n_tokens=n_num_features+1,
kv_compression_ratio=kv_compression_ratio,
kv_compression_sharing=kv_compression_sharing,
head_activation=head_activation,
head_normalization=head_normalization,
d_out=d_out
)
super(FTT, self).__init__(feature_tokenizer, transformer)
def forward(self, x_num=None, x_cat=None):
x = self.feature_tokenizer(x_num, x_cat)
print(f"First forward step --> shape is {x.shape} & device is {x.device}") # For debug 1/7
x = self.cls_token(x)
print(f"Second forward step --> shape is {x.shape} & device is {x.device}") # For debug 2/7
x = self.transformer(x)
print(f"Third forward step --> shape is {x.shape} & device is {x.device}") # For debug 3/7
return x
num_train_samples = 803473 # Number of samples in the real training dataset
num_test_samples = 82787 # Number of samples in the real testing dataset
num_input_features = 152 # Number of input features in the real dataset
#num_input_features = 252 # If we put another number of features, less say 100 more, then the code works (it depends of the number we put here)
num_classes = 228 # Number of classes in the real dataset
X_train = scipy.sparse.random(num_train_samples, num_input_features, density=0.01, format='csr')
y_train = np.random.randint(0, num_classes, num_train_samples)
X_test = scipy.sparse.random(num_test_samples, num_input_features, density=0.01, format='csr')
y_test = np.random.randint(0, num_classes, num_test_samples)
train_dataset = SparseDataset(X_train, y_train) # Create a train dataset
train_dataloader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=512, shuffle=True, num_workers=multiprocessing.cpu_count()) # Create a train DataLoader
test_dataset = SparseDataset(X_test, y_test) # Create a test dataset
test_dataloader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=512, shuffle=True, num_workers=multiprocessing.cpu_count()) # Create a test DataLoader
model = FTT(n_num_features=num_input_features, d_out=num_classes)
model = torch.nn.DataParallel(model).cuda() # Run the model parallelly and move it to GPU
criterion = torch.nn.CrossEntropyLoss().cuda() # Instantiate loss class and move it to GPU
optimizer = torch.optim.AdamW(model.parameters(), lr=0.001, weight_decay=0.0001) # Instantiate optimizer class
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='max', factor=0.1, patience=5) # Instantiate step learning scheduler class
print(model) # For debug 4/7
for i in model.named_parameters():
print(f"{i[0]} -> {i[1].device}") # For debug 5/7
for epoch in range(100):
for batch, labels in train_dataloader: # Iterate through train dataset
batch = batch.requires_grad_().cuda() # Load batches with gradient accumulation capabilities
print(f"Batch shape is {batch.shape} & device is {batch.device}") # For debug 6/7
labels = labels.cuda() # Use GPU for tensors
print(f"Labels shape is {labels.shape} & device is {labels.device}") # For debug 7/7
optimizer.zero_grad() # Clear gradients w.r.t. parameters
########################################
### THE CODE FAILS ON THE LINE BELOW ###
########################################
outputs = model(x_num=batch, x_cat=None) # Forward pass to get output/logits
########################################
### THE CODE FAILS ON THE LINE ABOVE ###
########################################
loss = criterion(outputs, labels) # Calculate Loss: softmax --> cross entropy loss
loss.backward() # Getting gradients w.r.t. parameters
optimizer.step() # Updating parameters
metric = torchmetrics.Accuracy(task="multiclass", num_classes=num_classes) # Instantiate the accuracy metric
metric.cuda() # Move the metric to the device
model.eval() # Set the model to evaluation mode
with torch.no_grad(): # Disable gradient computation
for batch, labels in test_dataloader: # Iterate through the batches in the test dataset
batch = batch.cuda() # Move the batch to the device
labels = labels.cuda() # Move the labels to the device
outputs = model(x_num=batch, x_cat=None) # Forward pass to get output/logits
accuracy = metric(outputs, labels) # Calculate the accuracy
accuracy_epoch = metric.compute() * 100 # Compute the overall accuracy for the epoch
model.train()
scheduler.step(accuracy)
print(f'Epoch {epoch} completed.')
print(f'Accuracy: {accuracy_epoch:.4f}%.')
When my training data (a CSR matrix) has a shape of (803473, 152) (i.e., 803473 samples with each 152 features), this code fails (on multi-GPU). However, if I have a training data of shape (803473, 252) (I just tried a random number), then it works smoothly.
Here are the logs:
/home/my_user_name/.conda/envs/my_environment_name/lib/python3.11/site-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 32 worker processes in total. Our suggested max number of worker in current system is 16, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
/home/my_user_name/.conda/envs/my_environment_name/lib/python3.11/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
DataParallel(
(module): FTT(
(feature_tokenizer): FeatureTokenizer(
(num_tokenizer): NumericalFeatureTokenizer()
)
(cls_token): CLSToken()
(transformer): Transformer(
(blocks): ModuleList(
(0): ModuleDict(
(attention): MultiheadAttention(
(W_q): Linear(in_features=16, out_features=16, bias=True)
(W_k): Linear(in_features=16, out_features=16, bias=True)
(W_v): Linear(in_features=16, out_features=16, bias=True)
(W_out): Linear(in_features=16, out_features=16, bias=True)
(dropout): Dropout(p=0.3, inplace=False)
)
(ffn): FFN(
(linear_first): Linear(in_features=16, out_features=32, bias=True)
(activation): ReGLU()
(dropout): Dropout(p=0.1, inplace=False)
(linear_second): Linear(in_features=16, out_features=16, bias=True)
)
(attention_residual_dropout): Dropout(p=0.0, inplace=False)
(ffn_residual_dropout): Dropout(p=0.0, inplace=False)
(output): Identity()
(ffn_normalization): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
(key_compression): Linear(in_features=153, out_features=0, bias=False)
(value_compression): Linear(in_features=153, out_features=0, bias=False)
)
)
(head): Head(
(normalization): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
(activation): ReLU()
(linear): Linear(in_features=16, out_features=228, bias=True)
)
)
)
)
module.feature_tokenizer.num_tokenizer.weight -> cuda:0
module.feature_tokenizer.num_tokenizer.bias -> cuda:0
module.cls_token.weight -> cuda:0
module.transformer.blocks.0.attention.W_q.weight -> cuda:0
module.transformer.blocks.0.attention.W_q.bias -> cuda:0
module.transformer.blocks.0.attention.W_k.weight -> cuda:0
module.transformer.blocks.0.attention.W_k.bias -> cuda:0
module.transformer.blocks.0.attention.W_v.weight -> cuda:0
module.transformer.blocks.0.attention.W_v.bias -> cuda:0
module.transformer.blocks.0.attention.W_out.weight -> cuda:0
module.transformer.blocks.0.attention.W_out.bias -> cuda:0
module.transformer.blocks.0.ffn.linear_first.weight -> cuda:0
module.transformer.blocks.0.ffn.linear_first.bias -> cuda:0
module.transformer.blocks.0.ffn.linear_second.weight -> cuda:0
module.transformer.blocks.0.ffn.linear_second.bias -> cuda:0
module.transformer.blocks.0.ffn_normalization.weight -> cuda:0
module.transformer.blocks.0.ffn_normalization.bias -> cuda:0
module.transformer.blocks.0.key_compression.weight -> cuda:0
module.transformer.blocks.0.value_compression.weight -> cuda:0
module.transformer.head.normalization.weight -> cuda:0
module.transformer.head.normalization.bias -> cuda:0
module.transformer.head.linear.weight -> cuda:0
module.transformer.head.linear.bias -> cuda:0
Batch shape is torch.Size([512, 152]) & device is cuda:0
Labels shape is torch.Size([512]) & device is cuda:0
First forward step --> shape is torch.Size([256, 152, 16]) & device is cuda:1
First forward step --> shape is torch.Size([256, 152, 16]) & device is cuda:0
Second forward step --> shape is torch.Size([256, 153, 16]) & device is cuda:1
Second forward step --> shape is torch.Size([256, 153, 16]) & device is cuda:0
Third forward step --> shape is torch.Size([256, 228]) & device is cuda:0
Traceback (most recent call last):
File "/home/my_user_name/my_framework_name/minimal_reproducible_example.py", line 98, in <module>
outputs = model(x_num=batch, x_cat=None) # Forward pass to get output/logits
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/my_user_name/.conda/envs/my_environment_name/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/my_user_name/.conda/envs/my_environment_name/lib/python3.11/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/my_user_name/.conda/envs/my_environment_name/lib/python3.11/site-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/my_user_name/.conda/envs/my_environment_name/lib/python3.11/site-packages/torch/nn/parallel/parallel_apply.py", line 89, in parallel_apply
output.reraise()
File "/home/my_user_name/.conda/envs/my_environment_name/lib/python3.11/site-packages/torch/_utils.py", line 543, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
File "/home/my_user_name/.conda/envs/my_environment_name/lib/python3.11/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
output = module(*input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/my_user_name/.conda/envs/my_environment_name/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/my_user_name/my_framework_name/minimal_reproducible_example.py", line 57, in forward
x = self.transformer(x)
^^^^^^^^^^^^^^^^^^^
File "/home/my_user_name/.conda/envs/my_environment_name/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/my_user_name/.conda/envs/my_environment_name/lib/python3.11/site-packages/rtdl/modules.py", line 1150, in forward
x_residual, _ = layer['attention'](
^^^^^^^^^^^^^^^^^^^
File "/home/my_user_name/.conda/envs/my_environment_name/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/my_user_name/.conda/envs/my_environment_name/lib/python3.11/site-packages/rtdl/modules.py", line 893, in forward
k = key_compression(k.transpose(1, 2)).transpose(1, 2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/my_user_name/.conda/envs/my_environment_name/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/my_user_name/.conda/envs/my_environment_name/lib/python3.11/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: size mismatch, got 4096, 4096x153,0
Some reminders:
Thanks for your help!
It seems that in LinearEmbeddings output size (-1 dim) is being used for kaiming, not input size, which is inconsistent with both Pytorch nn Linear initialization (https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) and other similar layers (e. g. NLinear)
https://github.com/Yura52/rtdl/blob/b354b35d68f28b4f5bbebd2e6d5b1f6cfa91eed1/rtdl/nn/_embeddings.py#L275
Hi Yura!
I have run the test code, which is mentioned in your code data.piecewise_linear_encoding
I have this error: RuntimeError: The size of tensor a (4) must match the size of tensor b (100) at non-singleton dimension 1
this error causes by is_last_bin = bin_indices + 1 == as_tensor(list(map(len, bin_edges)))
while bin_indices + 1
is a torch.Size([100, 4])
, and as_tensor(list(map(len, bin_edges)))
returns a torch.Size([100])
Hi,
I am trying to get some visualizations of interpretable results from FT-Transformer, such as feature importance or attention heatmaps. I find some discussions about feature importance in paper section 5.3, but I don't know how to achieve it. Is there a way to achieve these based on the source codes you published? Or can you make an example of implementation?
Thank you very much!
Hi, you did a great implementation of the tab-transformer. However, when I use your example notebook to do the simple regression for the Sin(x), neither the baseline model or the FTTransformer give the good results. I have no idea about this and want to know why.
Here is the link
Hi! Is it possible to have an example which involves also categorical variables?
other than the max bucket from argmax, can we also have the probabilities for each class? currently the prediction often have a lot negative values and I don't know what would be the right way to convert them to probabilities.
The code crushes at this line, because prenormalization is not in self
On this line, should be module_type and not str
https://github.com/Yura52/rtdl/blob/b130dd2e596c17109bef825bc9c8608e1ae617cc/rtdl/nn/_utils.py#L21
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.