I created a CustomDataset to prepare the training, Validation and Test set to make sure sure they have the same format (Same number of column after Dummies, Global normalization...). This works fine.
The DataLoader calls the CustomDataset according to the type of Dataset it requires ('Train', 'Test' or 'Valid'). as below:
data_dataset = {x: CustomDataset(csv_file_data=data_dir+'train.csv',
csv_file_test=data_dir+'test.csv',
**params,
data='train' if x == 'train'
else 'valid' if x =='valid'
else 'test')
for x in ['train', 'valid', 'test']
}
data_loader = {x :torch.utils.data.DataLoader(data_dataset[x], batch_size=1, shuffle=True)
for x in ['train', 'valid', 'test']}
The various sets looks like this:
print('TRAINING')
data, lab_target = data_dataset['train'].__getitem__(0)
print('DATASET')
print('Data shape: ', data.shape)
print('Data type: ', type(data))
print('Data size: {}'.format(data.size()))
#print('Exampe of the feature for the 1st entry {}'.format(data[0]))
print('\nTarget at the first row: {}'.format(lab_target.size()))
print('Example of the label for the 1st entry: {}'.format(lab_target[0]))
print()
print('Train Loader type')
train_iter = iter(data_loader['train'])
print(type(train_iter))
datas, labels_target = train_iter.next()
print('DATALOADER')
print('images shape on batch size = ', datas.size())
print('Example of datas for the 1st entry {}'.format(datas[0].size()))
#print('\nTaregt type on batch size = {}'.format(labels_target))
print('Target type on batch size = {}'.format(type(labels_target)))
print('Target shape on batch size = ', labels_target.shape)
print(len(train_iter))
and the output is
TRAINING
DATASET
Data shape: torch.Size([1095, 288])
Data type: <class 'torch.Tensor'>
Data size: torch.Size([1095, 288])
Target at the first row: torch.Size([1095, 1])
Example of the label for the 1st entry: tensor([208500.])
Train Loader type
<class 'torch.utils.data.dataloader._SingleProcessDataLoaderIter'>
DATALOADER
images shape on batch size = torch.Size([1, 1095, 288])
Example of datas for the 1st entry torch.Size([1095, 288])
Target type on batch size = <class 'torch.Tensor'>
Target shape on batch size = torch.Size([1, 1095, 1])
1460
Here the batch size is "1"
if I would have changed the batch size to 10 for example the outcome would be like this
DATALOADER
images shape on batch size = torch.Size([10, 1095, 288])
Example of datas for the 1st entry torch.Size([1095, 288])
Target type on batch size = <class 'torch.Tensor'>
Target shape on batch size = torch.Size([10, 1095, 1])
146
The training set is 1460 entries, but I split the set into 75% for training 25% for validation. 288 represents the number of features including all the dummies.
so my issue is when I train my model (below an extract of the training
model.train()
for idx, (data, target) in enumerate(loaders['train']):
if use_cuda:
data, target = data.cuda(), target.cuda()
optimizer.zero_grad()
output = model(data)
'''for name, param in model.named_parameters():
if param.requires_grad:
print(name, param.data)'''
loss = criterion(output, target)
loss.backward()
optimizer.step()
train_loss = criterion(output, target)
for each batch, it will take all the entries and I don't know how to change so I can split the training set into batch.
I look forward to hearing your inputs.