Giter Club home page Giter Club logo

Comments (13)

yuhuixu1993 avatar yuhuixu1993 commented on July 18, 2024

@xxsgcjwddsg ,the subset still has 1000 classes. Actually, we sampled 10% and 2.5% from each class for training and validation respectively. Yes, hyper-parameters are architecture weights.

from pc-darts.

pawopawo avatar pawopawo commented on July 18, 2024

@xxsgcjwddsg ,the subset still has 1000 classes. Actually, we sampled 10% and 2.5% from each class for training and validation respectively. Yes, hyper-parameters are architecture weights.

Thanks.
How do you search with multiple GPUs? I add model = nn.DataParallel(model) before model = model.cuda() and model = model.module after it, but still search on one GPU.

from pc-darts.

yuhuixu1993 avatar yuhuixu1993 commented on July 18, 2024

@xxsgcjwddsg ,you need to comment this line torch.cuda.set_device(args.gpu)

from pc-darts.

pawopawo avatar pawopawo commented on July 18, 2024

@xxsgcjwddsg ,you need to comment this line torch.cuda.set_device(args.gpu)

image
Thanks !
I has comment torch.cuda.set_device(args.gpu), but don't work.

from pc-darts.

yuhuixu1993 avatar yuhuixu1993 commented on July 18, 2024

Can you show the errors? I think maybe the error still comes from the model.module? E.g. in the SGD optimizer, it should be model.parameters()not model.module.parameters(). Besides, do you change the model.module in the train function and validation function?

from pc-darts.

pawopawo avatar pawopawo commented on July 18, 2024

Thanks a lot.
I shouldn't add "model = model.module" after "model = nn.DataParallel(model).cuda()"

from pc-darts.

pawopawo avatar pawopawo commented on July 18, 2024

image
Hi, when run architecture.step(), it encountered an error of OOM.
Thanks for your reply!

from pc-darts.

yuhuixu1993 avatar yuhuixu1993 commented on July 18, 2024

@xxsgcjwddsg ,just for sure that you use 8 v100gpus. Besides,I notice that your parameter size is nearly two times of mine, how many layers are stacked in the search period? We use 8 in our experiments. And the initial channels are 16.

from pc-darts.

pawopawo avatar pawopawo commented on July 18, 2024

image
Thanks for your reply.
I use 8 v100 gpus, but the first one takes up most. As the picture shows,the batch size is 256.
I think the reason is using self.model.module._loss(input, target). However, it will run only on one GPU.

from pc-darts.

yuhuixu1993 avatar yuhuixu1993 commented on July 18, 2024

@xxsgcjwddsg ,hi,maybe you need to remove the .to(..... .device) e.g. to(xtemp.device) code in the model_search_imagenet.py. I run this code in the company, may be there are differences within devices. If it worked, please tell me and I will update the code.

from pc-darts.

pawopawo avatar pawopawo commented on July 18, 2024

image
Thanks for your reply. It cann't works.
I have another question. The paper randomly sample two subsets from the 1.3M training set of ImageNet, with 10% and 2.5% images, respectively. The batch_size of valid_queue is also 1024? If it is 1024, then the frequency of architect.step() is 1/4 of optimizer.step()?

from pc-darts.

yuhuixu1993 avatar yuhuixu1993 commented on July 18, 2024

@xxsgcjwddsg ,hi, I have no idea now, I can run the code with 8 V100(16G each). Maybe you can add my wechat, we can talk about more details. Yes, and I also change the validation batch-size to balance the frequency and found no difference. Which version of pytorch you use by the way?

from pc-darts.

HeathHose avatar HeathHose commented on July 18, 2024

@yuhuixu1993, hi. I notice that the validation batch-size is same as the train batch-size.According to my understanding, it means that the valid dataset will be used four times?
Or another way, architect steps one time while optimizer steps four times?

from pc-darts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.