A resource for learning about Machine learning & Deep Learning

Home Page: https://www.youtube.com/c/AladdinPersson

License: MIT License

Python 60.60% Shell 0.22% Jupyter Notebook 39.19%

pytorch pytorch-implementation pytorch-tutorial pytorch-gan pytorch-examples tensorflow2 tensorflow-tutorials tensorflow-examples machine-learning machine-learning-algorithms pytorch-tutorials

machine-learning-collection's Introduction

Machine Learning Collection

In this repository you will find tutorials and projects related to Machine Learning. I try to make the code as clear as possible, and the goal is be to used as a learning resource and a way to lookup problems to solve specific problems. For most I have also done video explanations on YouTube if you want a walkthrough for the code. If you got any questions or suggestions for future videos I prefer if you ask it on YouTube. This repository is contribution friendly, so if you feel you want to add something then I'd happily merge a PR 😃

Machine Learning Algorithms
PyTorch Tutorials
TensorFlow Tutorials
- Beginner Tutorials
- Architectures

Machine Learning

Linear Regression - With Gradient Descent ✅
Linear Regression - With Normal Equation ✅
Logistic Regression
Naive Bayes - Gaussian Naive Bayes
K-nearest neighbors
K-means clustering
Support Vector Machine - Using CVXOPT
Neural Network
Decision Tree

PyTorch Tutorials

If you have any specific video suggestion please make a comment on YouTube :)

Basics

Architectures

LeNet5 - CNN architecture
VGG - CNN architecture
Inception v1 - CNN architecture
ResNet - CNN architecture
EfficientNet - CNN architecture

PyTorch Lightning

TensorFlow Tutorials

If you have any specific video suggestion please make a comment on YouTube :)

Beginner Tutorials

Tutorial 1 - Installation, Video Only
Tutorial 2 - Tensor Basics
Tutorial 3 - Neural Network
Tutorial 4 - Convolutional Neural Network
Tutorial 5 - Regularization
Tutorial 6 - RNN, GRU, LSTM
Tutorial 7 - Functional API
Tutorial 8 - Keras Subclassing
Tutorial 9 - Custom Layers
Tutorial 10 - Saving and Loading Models
Tutorial 11 - Transfer Learning
Tutorial 12 - TensorFlow Datasets
Tutorial 13 - Data Augmentation
Tutorial 14 - Callbacks
Tutorial 15 - Custom model.fit
Tutorial 16 - Custom Loops
Tutorial 17 - TensorBoard
Tutorial 18 - Custom Dataset Images
Tutorial 19 - Custom Dataset Text
Tutorial 20 - Classifying Skin Cancer - Beginner Project Example

CNN Architectures

machine-learning-collection's People

Contributors

Stargazers

Watchers

Forkers

ameerhamza111 shimaa1 anjanimsp shandhir-vnz babyhandzzz garrisongys gaussian37 halisun nana980125 omarosman iamsantoshkumar chuongdd284 maxcodextc rsohlot abe-mark45 halesmith pranay7ej sap143 yest 00krishna tripathiarpan20 christysch jasonychuang 1chimarugin 16umm001 xahiru zh-007 saeedahmadian chandel13 keumdohoon ysy1052505300 viragumathe5 kamran017 sadransh vinace zmf0507 philipandreadis taredevarsh parvez2017 lasith-niro nithingopakumar pk1510 jeffgan99 alierkan skn1998 nguthrie ashkjatav colin-fox dnyandeobharambe rohitpatwa naagar gravitychen diabloboy sripadh8 ibrahimatef ai-stuff hamiedamr sumanyu-21 dr-alok-tiwari mengluchu aridhasan siddu1998 rahul-38-26-0111-0003 dojunpark vslobody 1239238241 theanhle ekasitp mpmazim aniruddhamaity911 simplytalking shivammangale mesneym alaap001 balachakradhar sslee8778960 blgnksy matrixlover luowensheng pankajchoudhury dongwei0 bilalhsp bgonggw reyemarr h3ma209 bioinfo-fiend vjerin hieuhoang avinregmi kitylam9 jaswanthpy dreadchild erenozcelik suhendaragung20 yilu1021 oscarjia matanhalfon mohsensharifi1991 petitioner yyunhh

machine-learning-collection's Issues

Training on a subset of Voc

If I want to train on a subset of VOC, do I have to create a dataset with a subset of that with it's annotations?
Or is it possible to directly do it from this code?

The midpoint of the bounding boxes

Hi, I was watching your video on intersection over union, which helped me alot. I tried to breakdown the code and learn it. When I calculated the midpoint with one of the tensor in the iou_test.py t1_box1 = torch.tensor([0.8, 0.1, 0.2, 0.2])I got different midpoints.
box1_x1 = t1_box1[..., 0:1] - t1_box1[..., 2:3] / 2 gave me 0.7000 and box1_x1 = (t1_box1[..., 0:1] - t1_box1[..., 2:3]) / 2 gave me 0.3000. Which is the recommended midpoint?

too much time in get_evaluation_bboxes in yolo v3

it takes so long time.
in get_evaluation_bboxes, it takes so much time to run below code, (about more than 10 hours)
for idx in range(batch_size):
nms_boxes = non_max_suppression(
bboxes[idx],
iou_threshold=iou_threshold,
threshold=threshold,
box_format=box_format,
)

decesion tree algorithm error

how to handle this error

Why do you need to slice captions?

https://github.com/AladdinPerzon/Machine-Learning-Collection/blob/9f3b2a82c0b8b6ba8c16293d8118d8d8c888f8e6/ML/Pytorch/more_advanced/image_captioning/train.py#L82

Hello, thank you for your version of the image captioning solution!
However, one thing is not clear to me. Why would you do that slice? If I correctly understood the captions, in that case, is a padded batch of captions, so it looks like:
1 1 1 1 1 2
1 1 1 2 0 0
1 1 1 1 2 0

and if you make a slice [:,: - 1]
that would be:
1 1 1 1 1
1 1 1 2 0
1 1 1 1 2
(1 is any token, 2 is and 0 is padding)

So if you want to get rid of tokens that would not work.

Yolov3 requirements.txt missing

Hi,

As mentioned in the README for the Yolov3 folder I was trying to find the requirements file in the repo unfortunately I am not able to find it. Can anyone help me out with this?

README,md

Folder structure

I have a question about the gradient propagation of the Discriminator in WGAN-GP.

In the training process of WGAN-GP (train.py), the following grdient propagation was performed.

        fake = gen(noise)
        critic_real = critic(real).reshape(-1)
        critic_fake = critic(fake).reshape(-1)
        gp = gradient_penalty(critic, real, fake, device=device)
        loss_critic = (
            -(torch.mean(critic_real)-torch.mean(critic_fake)) + LAMBDA_GP * gp
        )
        critic.zero_grad()
        loss_critic.backward(retain_graph=True)
        opt_critic.step()

You used the final loss_critic for gradient propagation. I looked at other people's code additionally. I could see critic_real and critic_fake also doing critic_real.backward() and critic_fake.backward(). What's the difference between this method? And which method would you prefer?

Example) Zeleni9/pytorch-wgan/models /wgan_grdient_penalty.py-https://github.com/Zeleni9/pytorch-wgan

ITransformer division error

Hey Aladin, i think the division should be by the head_dim, in the code its using the full embed dimension

Machine-Learning-Collection/ML/Pytorch/more_advanced/transformer_from_scratch/transformer_from_scratch.py

Line 62 in bd4f07f

attention = torch.softmax(energy / (self.embed_size ** (1 / 2)), dim=3)

You are dividing for square root of 256 and the correct should be square root o 64, 8 in the end

Dimension problem in pytorch_lr_scheduler

while performing the model in check accuracy function the input x of size [128,1,28,28].
it should be [128,784].

RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[30, 416, 416, 3] to have 3 channels, but got 416 channels instead

RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[30, 416, 416, 3] to have 3 channels, but got 416 channels instead

I get an Runtime Error when trying to train my model.

Someone ever encountered that problem and is able to help with that one ?

I cant really understand where these Dimension problems come from and which parameters i got to check.

Greetings!

U-Net Model`s Accuracy decrease when use model.eval()

Hi ,
I used the code from
https://github.com/aladdinpersson/Machine-Learning-Collection/tree/master/ML/Pytorch/image_segmentation/semantic_segmentation_unet

to build an U-net then training on it and it works great, but I found in inference stage, If I use model.eval() the accuracy will strongly decrease. But Once I removed this line and let model run on train mode then the model will perform well..
I could find the reason, I have read some website which said maybe the Net invoke the same Batchnormlize layer in different position, But from the code I can`t see the same issues.

Anyone have any ideas?

ProGan RuntimeError

i downloaded celeba_hq image dataset,modified config.py (DATASET = 'celeba_hq') , modified train.py( at main()
# import sys
# sys.exit())
then when i run python train.py i got this error

return F.conv_transpose2d( RuntimeError: Expected 4-dimensional input for 4-dimensional weight [512, 512, 4, 4], but got 2-dimensional input of size [256, 512] instead

Expected object of scalar type Long but got scalar type Float for sequence element 1 in sequence argument at position #1 'tensors'

Hi,

I rewrote the code along with watching your tutorial. When I run the training procedure, I get the following error:

Traceback (most recent call last):
  File "/home/niko/programs/pycharm-community-2019.2.1/helpers/pydev/pydevd.py", line 1415, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/niko/programs/pycharm-community-2019.2.1/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/niko/workspace/pytorch-and-lightning-tutorials/yolo/train_original.py", line 147, in <module>
    main()
  File "/home/niko/workspace/pytorch-and-lightning-tutorials/yolo/train_original.py", line 126, in main
    train_loader, model, iou_threshold=0.5, threshold=0.4
  File "/home/niko/workspace/pytorch-and-lightning-tutorials/yolo/utils.py", line 255, in get_bboxes
    true_bboxes = cellboxes_to_boxes(labels)
  File "/home/niko/workspace/pytorch-and-lightning-tutorials/yolo/utils.py", line 322, in cellboxes_to_boxes
    converted_pred = convert_cellboxes(out).reshape(out.shape[0], S * S, -1)
  File "/home/niko/workspace/pytorch-and-lightning-tutorials/yolo/utils.py", line 315, in convert_cellboxes
    (predicted_class, best_confidence, converted_bboxes), dim=-1
RuntimeError: Expected object of scalar type Long but got scalar type Float for sequence element 1 in sequence argument at position #1 'tensors'

Then I tried to copy the exact same code from your train.py and dataset.py file but the error still persisted. I guess getitem in dataset.py should return long instead of float types for bounding boxes. Do you know what might be the cause of the error above?

problem with alpha in wgan model when calculate the gp

Image captioning: all training example output is <UNK>

When training for image captioning, in the first epoch, the print_examples function returns the following

Example 1 CORRECT: Dog on a beach by the ocean
Example 1 OUTPUT: chasing stores mossy participates player brush museum phone handle drops native punk buried alongside cellphones very bags hairy paintball mouths mats markings volleyball backpacker dressed backpacks legos light bitten various pillow singing attempt superman weather try gnawing ceiling shaped tree someone phone scarf crouching courtyard cows indoors seeds hits hits
Example 2 CORRECT: Child holding red frisbee outdoors
Example 2 OUTPUT: chasing stores mossy bushes tags hardwood tulips chin lining gnawing taken tinkerbell both kind cable tile colorfully shepherd dangling skinny cake scene tattooed swimmer beverage come points come 23 wheels puppy scenic ring snake one piggy snowboard camera slightly fireworks nature try gnawing ceiling shaped tree someone phone scarf crouching
Example 3 CORRECT: Bus driving by parked cars
Example 3 OUTPUT: trucks each that cheerleader hawk jeeps formal ring skeleton forested various plastic goofy snowmobile dances very wearing seaweed cards kick works baseman past daughter football waterfalls bathroom motorcycle bar bikers phone following kid ring past converse nose nose college wide skyscraper rough holding bending seeds broken kissing follows pouring pouring
Example 4 CORRECT: A small boat in the ocean
Example 4 OUTPUT: chasing stores mossy bushes tags hardwood tulips chin lining gnawing taken tinkerbell both kind cable tile colorfully shepherd dangling skinny cake scene tattooed swimmer beverage come points come 23 wheels puppy scenic ring snake one piggy snowboard camera slightly fireworks nature try gnawing ceiling shaped tree someone phone scarf crouching
Example 5 CORRECT: A cowboy riding a horse in the desert
Example 5 OUTPUT: avoid windsurfing alongside roof between enjoys dimly artists artists others biting upon holding silhouette ascending apples curve tennis o leaves gives dinner chasing picnic pack ceremony kayak kayak office festive hikes covered visible signs dancing construction construction when hiking pillow foot leotard about all pit between stool ear sports cigarette

however, after the first epoch and later, the print_examples function returns:

Example 1 CORRECT: Dog on a beach by the ocean                                  
Example 1 OUTPUT: <SOS> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK>
Example 2 CORRECT: Child holding red frisbee outdoors
Example 2 OUTPUT: <SOS> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK>
Example 3 CORRECT: Bus driving by parked cars
Example 3 OUTPUT: <SOS> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK>
Example 4 CORRECT: A small boat in the ocean
Example 4 OUTPUT: <SOS> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK>
Example 5 CORRECT: A cowboy riding a horse in the desert
Example 5 OUTPUT: <SOS> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK>

Im not sure what's going on

Clarification - Inference

Hi @aladdinpersson

thanks for the work you share. Please could you provide a clear explanation on how inference work?
I have watched your videos and still don't understand a 100% how:
1- The seqeunce is produced at training time
2- How the sequence is produced at test time

I saw your inference script but honestly, the whole thing is super blur to me.

Machine-Learning-Collection/ML/Pytorch/more_advanced/seq2seq_transformer/utils.py

Line 7 in 235beb2

 def translate_sentence(model, sentence, german, english, device, max_length=50): 

Transformer Question, and Request

Learning PyTorch and love your videos. Your code is so clean and your explanations so crisp.

Question/Bug?:
In SelfAttention you split values, keys, and query by the number of heads. Then pass this into Linear with same input and output dimension. Why not keep the full dimension (ie: not split) and let the Linear do the reduction?
This would allow linear to learn what to take out of the input rather?

btw, https://github.com/tunz/transformer-pytorch/blob/master/model/transformer.py, class MultiHeadAttention(nn.Module) does this (if I interpret their code correctly).

The paper https://arxiv.org/pdf/1706.03762.pdf indicates "learned
linear projections to dk, dk and dv dimensions".

If I'm all wrong, would love to be corrected as I learning.
If I'm right, would also love to know that I'm starting to understand this stuff.

Request:
Starting to understand torch.einsum power but I am sure I am missing a bunch.
Can you do a video on this?

Regards,
John

PROGAN ISSUE

I am using my own gray scale image dataset
loop = tqdm(loader, leave=True)
for batch_idx, (real, _) in enumerate(loop):
real = real.to(config.DEVICE)
cur_batch_size = real.shape[0]
On this loop i am getting an issue

ValueError: too many values to unpack (expected 2)

Output becomes zero after optimizer.step() yolo-v1 model

I encountered this error while i was trying train the model on my local gpu

Here : Machine-Learning-Collection/ML/Pytorch/object_detection/YOLO/

This is the test script that i have used to test the yolo-v1 model

if __name__ == '__main__':

    csv_file_path = 'PascalVOC_YOLO/100examples.csv'
    img_dir = 'PascalVOC_YOLO/images'
    label_path = 'PascalVOC_YOLO/labels'

    learning_rate = 1e-10
    num_workers = 2
    batch_size = 2
    weight_decay = 1e-4

    sample_dataset = VOCDataset( csv_file_path , img_dir , label_path, transform=transform)

    sample_loader = DataLoader(
        dataset=sample_dataset,
        batch_size=2,
        num_workers=2,
        pin_memory=True,
        shuffle=True,
        drop_last=True,
    )

    device = 'cuda'if torch.cuda.is_available() else 'cpu'

    model = Yolov1(split_size=7, num_boxes=2, num_classes=20).to(device).half()

    optimizer = optim.Adam( model.parameters() , lr=learning_rate , weight_decay=weight_decay)

    loss_func = YoloLoss().to(device)

    for _ in  range(2):

        print('iter : ',_,'\n')

        x , y = next( iter(sample_loader) )

        x , y = Variable(x).to(device).half() , Variable(y).to(device).half()

        # print( 'infinite : ' , torch.isfinite(x))

        # print('x : ',x)

        out = model(x)

        print('out : ',out , '\n')

        loss = loss_func(out,y)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        print( 'loss : ',loss , '\n')
        print( 'loss : data ', loss.data , '\n')
        print(' loss : grad ',loss.grad , '\n')

        for name, param in model.named_parameters():
            print(name, torch.isfinite(param.grad).all() , torch.max(abs(param.grad)) )

        print('\n')

Note : i am using half() because of the cuda error => RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

while running the script i was getting this output below

iter :  0

out :  tensor([[-0.1432,  0.0819,  0.0342,  ..., -0.0377, -0.0745,  0.1312],
        [ 0.1110, -0.0650,  0.2410,  ..., -0.0765,  0.3328,  0.1908]],
       device='cuda:0', dtype=torch.float16, grad_fn=<AddmmBackward>)

loss :  tensor(1., device='cuda:0', dtype=torch.float16, grad_fn=<ClampBackward>)

loss : data  tensor(1., device='cuda:0', dtype=torch.float16)

test.py:94: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
  print(' loss : grad ',loss.grad , '\n')
 loss : grad  None

darknet.0.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.0.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.0.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.2.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.2.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.2.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.4.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.4.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.4.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.5.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.5.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.5.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.6.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.6.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.6.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.7.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.7.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.7.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.9.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.9.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.9.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.10.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.10.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.10.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.11.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.11.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.11.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.12.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.12.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.12.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.13.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.13.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.13.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.14.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.14.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.14.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.15.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.15.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.15.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.16.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.16.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.16.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.17.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.17.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.17.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.18.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.18.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.18.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.20.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.20.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.20.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.21.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.21.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.21.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.22.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.22.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.22.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.23.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.23.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.23.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.24.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.24.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.24.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.25.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.25.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.25.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.26.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.26.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.26.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.27.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.27.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.27.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
fcs.1.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
fcs.1.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
fcs.4.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
fcs.4.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)


iter :  1

out :  tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0',
       dtype=torch.float16, grad_fn=<AddmmBackward>)

[W python_anomaly_mode.cpp:104] Warning: Error detected in MseLossBackward. Traceback of forward call that caused the error:
  File "test.py", line 86, in <module>
    loss = loss_func(out,y)
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/e/workspace/@training/@datasets/cnns/yolo/yolo-v1-pytorch/loss.py", line 120, in forward
    torch.flatten(exists_box * target[..., :20], end_dim=-2,),
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 528, in forward
    return F.mse_loss(input, target, reduction=self.reduction)
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/nn/functional.py", line 2929, in mse_loss
    return torch._C._nn.mse_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
 (function _print_stack)
Traceback (most recent call last):
  File "test.py", line 89, in <module>
    loss.backward()
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/autograd/__init__.py", line 147, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: Function 'MseLossBackward' returned nan values in its 0th output.
(dev) buckaroo@hansolo:/mnt/e/workspace/@training/@datasets/cnns/yolo/yolo-v1-pytorch$ python3 test.py
iter :  0

out :  tensor([[-0.1044, -0.3135, -0.4897,  ..., -0.1079, -0.0055, -0.0380],
        [ 0.1190, -0.3154, -0.0910,  ..., -0.0995, -0.1595, -0.0576]],
       device='cuda:0', dtype=torch.float16, grad_fn=<AddmmBackward>)

loss :  tensor(1.0010, device='cuda:0', dtype=torch.float16, grad_fn=<AddBackward0>)

loss : data  tensor(1.0010, device='cuda:0', dtype=torch.float16)

test.py:94: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
  print(' loss : grad ',loss.grad , '\n')
 loss : grad  None

darknet.0.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.0.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.0.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.2.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.2.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.2.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.4.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.4.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.4.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.5.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.5.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.5.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.6.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.6.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.6.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.7.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.7.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.7.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.9.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.9.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.9.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.10.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.10.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.10.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.11.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.11.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.11.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.12.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.12.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.12.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.13.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.13.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.13.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.14.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.14.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.14.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.15.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.15.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.15.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.16.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.16.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.16.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.17.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.17.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.17.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.18.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.18.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.18.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.20.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.20.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.20.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.21.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.21.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.21.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.22.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.22.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.22.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.23.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.23.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.23.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.24.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.24.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.24.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.25.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.25.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.25.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.26.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.26.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.26.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.27.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.27.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.27.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
fcs.1.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
fcs.1.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
fcs.4.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
fcs.4.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)

iter :  1

out :  tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0',
       dtype=torch.float16, grad_fn=<AddmmBackward>)

Observations

As you can see that for the first time , the model out is a valid tensor with values ( i.e before optimizer.step() )
when the iteration 1 begins ( i.e after optimizer.step() ) output becomes nan

Debug method : 0

after setting this torch.autograd.set_detect_anomaly(True) globally
i found this result below

[W python_anomaly_mode.cpp:104] Warning: Error detected in MseLossBackward. Traceback of forward call that caused the error:
  File "test.py", line 86, in <module>
    loss = loss_func(out,y)
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/e/workspace/@training/@datasets/cnns/yolo/yolo-v1-pytorch/loss.py", line 120, in forward
    torch.flatten(exists_box * target[..., :20], end_dim=-2,),
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 528, in forward
    return F.mse_loss(input, target, reduction=self.reduction)
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/nn/functional.py", line 2929, in mse_loss
    return torch._C._nn.mse_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
 (function _print_stack)
Traceback (most recent call last):
  File "test.py", line 89, in <module>
    loss.backward()
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/autograd/__init__.py", line 147, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: Function 'MseLossBackward' returned nan values in its 0th output.

so i have tried

clamping the loss tensors torch.clamp(value, min=0.0 , max=1.0) in loss.py
adding epsilon (1e-6) after torch.sqrt() like torch.sqrt(val+epsilon) in loss.py

But this didnot fix my issue.

Reference

Getting NaN values in backward pass

Output of Model is nan every time

Nan Loss coming after some time

Getting Nan after first iteration with custom loss

Weights become NaN values after first batch step

Why nan after backward pass?

NaN values popping up during loss.backward()

Debugging neural networks

So kindly help me debug this issue , thanks in advance

ProGAN Pretrained weights link is broken!

When i click to dowload pretrained weights i get redirected to https://github.com/aladdinpersson/Machine-Learning-Collection/tree/master/ML/Pytorch/GANs/ProGAN

How can I obtain the output of an intermediate layer (feature extraction) in Model Subclassing ?

Please refer to the below link for more details on the issue.

https://stackoverflow.com/questions/64471742/skip-some-layers-in-keras-model-during-evaluation-validation-phase

My requirement is to override the Evaluation/Validation step after each epoch, with using the Existing Fit function.

Following below link does not work (when this code is written in test_step method)
https://keras.io/getting_started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer-feature-extraction

Tensorflow tutorial - video - 8

Hi,
Thanks for the great resource on DL. I found that the code snippet of video tutorial - 8 is missing in your repo. Could you please upload that?

Small mistakes in the DCGAN code

in the class Discriminator(nn.Module) and Generator(nn.Module)

the first and last Conv2d should have bias=False. and stride= 1,
remove comment from nn.BatchNorm2d(out_channels).

true_boxes list

The ML/Pytorch/object detection/metrics/mean_avg_precision.py is all what I was wanting for so long.
Great thanks to you @aladdinpersson

But I just wanted to ask if how can I get the true_boxes list while integrating the code into my validation pipeline.
However, I can produce the pred_boxes list in my validation code since I know where I produce the detections, scores, classes etc.

Does it means I should take it same as pred_boxes list except for where the class_prediction is 0 according to the definition?
Am I right?

Thanks in advance

Train set up Yolov3

Hey Aladdin,
Thank you for your great tutorial! I have a question about your Yolov3 training. I've tried training Yolo on Pascal VOC with my own settings, but I'm stuck at a mAP of 57.5. How did you get 78.2 mAP? Can you tell me your settings?

My settings:

BATCH_SIZE = 16
IMAGE_SIZE = 416
LEARNING_RATE = 1e-5
WEIGHT_DECAY = 1e-4
NUM_EPOCHS = 500
CONF_THRESHOLD = 0.2
MAP_IOU_THRESH = 0.5
NMS_IOU_THRESH = 0.45

Getting error while executing Sementic segmentation w. UNET in pytorch

Hi,
I watched your recent tutorial on sementic segmentation with pytorch. Being new to pytorch I was looking for some tutorial with good explanations especially in segmentation module and your tutorial came as a great help.
I tried to implement your way on a UNet network for segmentation on google-colab but getting an error. I tried to fix it but no luck. Can you please help me in fixing the error.
The error I am getting is:

TypeError Traceback (most recent call last)
in ()
85
86 if name == "main":
---> 87 main()

7 frames
in main()
67
68 for epoch in range(Num_epochs):
---> 69 train_fn(train_loader, model, optimizer, loss_fn, scaler)
70
71

in train_fn(loader, model, optimizer, loss_fn, scaler)
2 loop = tqdm(loader)
3
----> 4 for batch_idx, (data, targets) in enumerate(loop):
5 data= data.to(device=device)
6 targets= targets.float().unsqueeze(1).to(device=device)

/usr/local/lib/python3.6/dist-packages/tqdm/std.py in iter(self)
1102 fp_write=getattr(self.fp, 'write', sys.stderr.write))
1103
-> 1104 for obj in iterable:
1105 yield obj
1106 # Update and possibly print the progressbar.

/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in next(self)
433 if self._sampler_iter is None:
434 self._reset()
--> 435 data = self._next_data()
436 self._num_yielded += 1
437 if self._dataset_kind == _DatasetKind.Iterable and \

/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in _next_data(self)
473 def _next_data(self):
474 index = self._next_index() # may raise StopIteration
--> 475 data = self._dataset_fetcher.fetch(index) # may raise StopIteration
476 if self._pin_memory:
477 data = _utils.pin_memory.pin_memory(data)

/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
42 def fetch(self, possibly_batched_index):
43 if self.auto_collation:
---> 44 data = [self.dataset[idx] for idx in possibly_batched_index]
45 else:
46 data = self.dataset[possibly_batched_index]

/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py in (.0)
42 def fetch(self, possibly_batched_index):
43 if self.auto_collation:
---> 44 data = [self.dataset[idx] for idx in possibly_batched_index]
45 else:
46 data = self.dataset[possibly_batched_index]

in getitem(self, index)
17
18 if self.transform is not None:
---> 19 augmentations= self.transform(image=image, mask=mask)
20 image = augmentations["image"]
21 mask = augmentations["mask"]

TypeError: 'int' object is not callable

RuntimeError: cannot perform reduction function argmax on a tensor with no elements because the operation does not have an identity

Thanks for this amazing tutorial and repo. I had been using this code to perform tumor detection on my data set. I have been getting the error above as mentioned in the title. Did anyone else get this error?

Just to mention, I have around 45000 2D images but only around 4% of them have tumors in them, and the rest 96% do not contain any tumors. So, in order to annotate these images with no tumors, I have created empty .txt files with the same name as the image. Is this correct? Or should I be using [0, 0, 0, 0, 0] in the annotation files for the images with no tumors?

Thanks in advance.

data loader loss compute problem

According to yolo detection thought which cell the midpoint(center_x, center_y) falls in is responsible for detect the object, but in upper code not consider the adjoin grid cell, if they also have the greater than ignore_iou_thresh, the adjoin grid cell will also compute the loss. Because the code do not set their targets[scale_idx][anchor_on_scale, i, j, 0] = -1? I am looking forward to your answer. Thank you in advance.

Unable to perform inference on pretrained weights

Hi Aladdin great tutorials you have here. I was really able to understand for the first time how to code YOLOv3 . But i couldn't find the code for inference so i decided to write one on my own but i stumbled across following issues.

I tried to use your plot_couple_examples function ,but always got cuda out of memory error ,which was really weird cause i could train the model for batch size 6 on my 4gb gtx1650 gpu
Then i tried to manually pass an image by reading through opencv and then expanding dimension to have batch size of 1 but got the following error
RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[1, 416, 416, 3] to have 3 channels, but got 416 channels instead
Can you please suggest some fix i am using your repo as starting point for my college project

DCGAN - Dimension

If I try to change FEATURES_DISC and FEATURE_GEN to a number that is not 64 I still get generated sample that have size 64x64.
Is it normal / does it exist a way to fix that?
Thank you for your stunning work, btw.

attention = torch.softmax(energy / (self.embed_size ** (1 / 2)), dim=3)

should be
attention = torch.softmax(energy / (self.head_dim ** (1 / 2)), dim=3)

error when running code

when running your code i get this error:

RuntimeError: Expected object of device type cuda but got device type cpu for argument #3 'index' in call to _th_index_select

any tips? thank you

custom image testing

Hi,
I am using method 1 from tutorial 18 for subfolders image dataset for using custom dataset.
My code is running perfectly, but I want to know how can I test my own image(not included in dataset) on the model ?

Transformer implementation end to end training

Idea for new Video

3 ideas:

a cheat sheet with a video on all the functions in PyTorch and what context function x is used.
key lessons for deep learning. Example: skip connections good, normalize data good, drop out good but when used with batch normalize ...
what to expected when training big networks, how does loss normally behave.

btw, your einsum video was perfect.

CycleGAN

Hey there,

With the exception of changing the paths to make it google colab friendly, removing the val load in train, and setting load = False, I copied the files exactly as they are...but I keep getting this error. I'm not sure what I'm doing wrong.

0% 0/6287 [00:00<?, ?it/s]Traceback (most recent call last):
File "train.py", line 150, in
main()
File "train.py", line 141, in main
train_fn(disc_H, disc_Z, gen_Z, gen_H, loader, opt_disc, opt_gen, L1, mse, d_scaler, g_scaler)
File "train.py", line 26, in train_fn
D_H_real = disc_H(image)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/MyDrive/Colab Notebooks/Project1Monet/CycleGAN_from_scratch_RESNET/discriminator_model.py", line 41, in forward
x = self.initial(x)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 119, in forward
input = module(input)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 399, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 394, in _conv_forward
_pair(0), self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [64, 3, 4, 4], expected input[1, 256, 258, 5] to have 3 channels, but got 256 channels instead
0% 0/6287 [00:02<?, ?it/s]

YOLOv3, problems in get_evaluation_bboxes

Hi there! First, thanks for your works!
I would like to know if it's a issue or i am the only one with such a problem.
I'm trying to run the train on 8 examples, with batch size of 4 (but even on training loader is the same):
when i change the CONF_THRESHOLD to less than 0.6, the train seems to never go ahead on "mean average precision": the print(mapval.item) never appear.
When i change to (or equal than) 0.6, it is computed but the MAP is always 0, the no obj accuracy is always 100% and obj accuracy is always 0%.

I ran the train on training loader for 70 epochs more or less, with CONF_THRESHOLD at 0.6, but MAP was something like 0.004. I printed some images but i had several dozen of boxes in the image.

i tried to debug it but without any luck. Can someone elaborate it?

Few doubts about ProGran Implementation

Thank you so much for the wonderful implementation of the ProGan model. I just had a couple of doubts though-

In the WSConv2d here, I am not really sure what is happening. Setting bias of Conv layer to None but storing its value before that. I am a bit confused.
Also here, do we have some special reason for choosing a value of 30 for PROGRESSIVE_EPOCHS

Would really appreciate your help.

Cant find the UNET model code.

Please let me know where the UNET model is. I was unable to find it.

Small error in pix2pix for anime dataset

Hey, wanted to let you know that in your pix2pix implementation, when using the anime set, the target image is actually on the left side, instead of the right. So you'll have to switch Column 22/23 in dataset.py. Might wanna add that to the readme or sth. Otherwise really nice implementation and thanks for providing this!

To learn positional encoding

Hi,nice work!
I have a question about transformer~

I wonder how to change the nn.embedding(including inputs and positional encoding.) using learning method?

thanks!

Model overfitting for 20 classes ( PASCAL VOC 2007 + 2012 dataset )

Hi Aladdin , Thank you so much for your video and explanations,
I am currently doing a project on object detection , and your video helped me a lot.
thank you once again.

I have a problem of overfitting in the model . I am getting test map as 10% , train map 90% . I trained on PASCAL VOC 2007 + VOC 2012 data.
I have tried every way I could think of to reduce the overfitting ( dropout layer , weight decay , added 5k more images ,data augmentation , used pretrained extraction weights ,step LR etc etc ) , tried everything as close as possible to original paper
its been a month now and I am still not able to figure out why . could you please help me? ( I have used your code for everything). it would be a great help if you can suggest something with respect to your code .

P.S : I used the same code and modified it for 2 classes and 5 classes , I have got good results , 2classes : test map 50% , 5 classes : test map 60%.

Error while using GPU

I have CUDA enabled GPU and using the exact same code in your transformer from scratch code I got the same error in #9 . The error is in line 66. When I inspect the your code, I couldn't figure out the problem.

AttributeError: 'Batch' object has no attribute 'src'

https://github.com/aladdinpersson/Machine-Learning-Collection/blob/master/ML/Pytorch/more_advanced/Seq2Seq/seq2seq.py

inp_data = batch.src.to(device)

AttributeError: 'Batch' object has no attribute 'src'

I am using

]$ conda list | grep torchtext
torchtext 0.9.1 pypi_0 pypi

How can i fix this issue in torchtext 0.9.1

need dev

hi i need a dev to build an app whit flutter & tensor flow. Are u avaiable?

Performance issues in the program

Hello,I found a performance issue in aladdinpersson_Machine-Learning-Collection/ML/TensorFlow/Basics/tutorial7-indepth-functional.py ,
train_dataset.map was called without num_parallel_calls.
I think it will increase the efficiency of your program if you add this.

The same issues also exist in test_dataset.map ,
ds_train = ds_train.map(read_image).map(augment).batch(2)
and other three places.

Here is the documemtation of tensorflow to support this thing.

Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.

GTA5 to Cityscapes translation

Hello,
I trained this model to translate GTA5 images to Cityscapes images,
But it gave me poor result,
Anyone can help me to increase this results !!

ESRGAN pretraining problem

Where can I find the pretrain weight of ESRGAN?

Error in pytorch seq2seq attention

Aladdin,

I watched your excellent youtube videos on transformer implementation with pytorch. When I tried to test in google colab, I got

NotImplementedError in https://github.com/aladdinpersson/Machine-Learning-Collection/blob/master/ML/Pytorch/more_advanced/Seq2Seq_attention/seq2seq_attention.py line 243:

for batch_idx, batch in enumerate(train_iterator):
...

Please advise. Thanks a lot.

Number Parameters EfficientNet-B0

Hey Aladdin, thanks for the awesome YouTube videos.

I was checking the implementation of EfficientNet that you provided, and I noticed that in the final example, the total number of parameters for EfficientNet-B0 is 14,047,366.

Is this correct?
According to Table 2 of the original EfficientNet paper, I thought that Version B0 was supposed to have 5.3M parameters.
Thanks in advance,

PS. In order to calculate the total number of parameters, I inserted the following line of code into the test() function:
print(f'Total Number of Parameters: {sum( p.numel() for p in model.parameters() if p.requires_grad ):,}')

aladdinpersson / machine-learning-collection Goto Github PK