Giter Club home page Giter Club logo

pc-darts's Introduction

al-folio

Preview

A simple, clean, and responsive Jekyll theme for academics.


deploy Maintainers GitHub contributors Docker Image Version Docker Image Size Docker Pulls

GitHub release GitHub license GitHub stars GitHub forks

User community

The vibrant community of al-folio users is growing! Academics around the world use this theme for their homepages, blogs, lab pages, as well as webpages for courses, workshops, conferences, meetups, and more. Check out the community webpages below. Feel free to add your own page(s) by sending a PR.

Academics
Labs
Courses CMU PGM (S-19)
CMU DeepRL (F-19, S-20, F-20, S-21, F-21, S-22)
CMU MMML (F-20, F-22)
CMU AMMML (S-22, S-23)
CMU ASI (S-23)
CMU Distributed Systems (S-21)
Conferences & workshops ICLR Blog Post Track (2023, 2024)
ML Retrospectives (NeurIPS: 2019, 2020; ICML: 2020)
HAMLETS (NeurIPS: 2020)
ICBINB (NeurIPS: 2020, 2021)
Neural Compression (ICLR: 2021)
Score Based Methods (NeurIPS: 2022)
Images2Symbols (CogSci: 2022)
Medical Robotics Junior Faculty Forum (ISMR: 2023)
Beyond Vision: Physics meets AI (ICIAP: 2023)
Workshop on Diffusion Models (NeurIPS: 2023)

Lighthouse PageSpeed Insights

Desktop

Google Lighthouse PageSpeed Insights

Run the test yourself: Google Lighthouse PageSpeed Insights

Mobile

Google Lighthouse PageSpeed Insights

Run the test yourself: Google Lighthouse PageSpeed Insights

Table Of Contents

Getting started

Want to learn more about Jekyll? Check out this tutorial. Why Jekyll? Read Andrej Karpathy's blog post!

Installing

For installation details please refer to INSTALL.md.

Customizing

For customization details please refer to CUSTOMIZE.md.

Features

Light/Dark Mode

This template has a built-in light/dark mode. It detects the user preferred color scheme and automatically switches to it. You can also manually switch between light and dark mode by clicking on the sun/moon icon in the top right corner of the page.


CV

There are currently 2 different ways of generating the CV page content. The first one is by using a json file located in assets/json/resume.json. It is a known standard for creating a CV programmatically. The second one, currently used as a fallback when the json file is not found, is by using a yml file located in _data/cv.yml. This was the original way of creating the CV page content and since it is more human readable than a json file we decided to keep it as an option.

What this means is, if there is no resume data defined in _config.yml and loaded via a json file, it will load the contents of _data/cv.yml as fallback.

CV Preview


People

You can create a people page if you want to feature more than one person. Each person can have its own short bio, profile picture, and you can also set if every person will appear at the same or opposite sides.

People Preview


Publications

Your publications' page is generated automatically from your BibTex bibliography. Simply edit _bibliography/papers.bib. You can also add new *.bib files and customize the look of your publications however you like by editing _pages/publications.md. By default, the publications will be sorted by year and the most recent will be displayed first. You can change this behavior and more in the Jekyll Scholar section in _config.yml file.

You can add extra information to a publication, like a PDF file in the assets/pdf/ directory and add the path to the PDF file in the BibTeX entry with the pdf field. Some of the supported fields are: abstract, altmetric, arxiv, bibtex_show, blog, code, dimensions, doi, eprint, html, isbn, pdf, pmid, poster, slides, supp, video, and website.

Publications Preview


Collections

This Jekyll theme implements collections to let you break up your work into categories. The theme comes with two default collections: news and projects. Items from the news collection are automatically displayed on the home page. Items from the projects collection are displayed on a responsive grid on projects page.

Projects Preview

You can easily create your own collections, apps, short stories, courses, or whatever your creative work is. To do this, edit the collections in the _config.yml file, create a corresponding folder, and create a landing page for your collection, similar to _pages/projects.md.


Layouts

al-folio comes with stylish layouts for pages and blog posts.

The iconic style of Distill

The theme allows you to create blog posts in the distill.pub style:

Distill Preview

For more details on how to create distill-styled posts using <d-*> tags, please refer to the example.

Full support for math & code

al-folio supports fast math typesetting through MathJax and code syntax highlighting using GitHub style. Also supports chartjs charts, mermaid diagrams, and TikZ figures.

Photos, Audio, Video and more

Photo formatting is made simple using Bootstrap's grid system. Easily create beautiful grids within your blog posts and project pages, also with support for video and audio embeds:


Other features

GitHub's repositories and user stats

al-folio uses github-readme-stats and github-profile-trophy to display GitHub repositories and user stats on the /repositories/ page.

Repositories Preview

Edit the _data/repositories.yml and change the github_users and github_repos lists to include your own GitHub profile and repositories to the /repositories/ page.

You may also use the following codes for displaying this in any other pages.

<!-- code for GitHub users -->
{% if site.data.repositories.github_users %}
<div class="repositories d-flex flex-wrap flex-md-row flex-column justify-content-between align-items-center">
  {% for user in site.data.repositories.github_users %} {% include repository/repo_user.liquid username=user %} {% endfor %}
</div>
{% endif %}

<!-- code for GitHub trophies -->
{% if site.repo_trophies.enabled %} {% for user in site.data.repositories.github_users %} {% if site.data.repositories.github_users.size > 1 %}
<h4>{{ user }}</h4>
{% endif %}
<div class="repositories d-flex flex-wrap flex-md-row flex-column justify-content-between align-items-center">
  {% include repository/repo_trophies.liquid username=user %}
</div>
{% endfor %} {% endif %}

<!-- code for GitHub repositories -->
{% if site.data.repositories.github_repos %}
<div class="repositories d-flex flex-wrap flex-md-row flex-column justify-content-between align-items-center">
  {% for repo in site.data.repositories.github_repos %} {% include repository/repo.liquid repository=repo %} {% endfor %}
</div>
{% endif %}

Theming

A variety of beautiful theme colors have been selected for you to choose from. The default is purple, but you can quickly change it by editing the --global-theme-color variable in the _sass/_themes.scss file. Other color variables are listed there as well. The stock theme color options available can be found at _sass/_variables.scss. You can also add your own colors to this file assigning each a name for ease of use across the template.


Social media previews

al-folio supports preview images on social media. To enable this functionality you will need to set serve_og_meta to true in your _config.yml. Once you have done so, all your site's pages will include Open Graph data in the HTML head element.

You will then need to configure what image to display in your site's social media previews. This can be configured on a per-page basis, by setting the og_image page variable. If for an individual page this variable is not set, then the theme will fall back to a site-wide og_image variable, configurable in your _config.yml. In both the page-specific and site-wide cases, the og_image variable needs to hold the URL for the image you wish to display in social media previews.


Atom (RSS-like) Feed

It generates an Atom (RSS-like) feed of your posts, useful for Atom and RSS readers. The feed is reachable simply by typing after your homepage /feed.xml. E.g. assuming your website mountpoint is the main folder, you can type yourusername.github.io/feed.xml


Related posts

By default, there will be a related posts section on the bottom of the blog posts. These are generated by selecting the max_related most recent posts that share at least min_common_tags tags with the current post. If you do not want to display related posts on a specific post, simply add related_posts: false to the front matter of the post. If you want to disable it for all posts, simply set enabled to false in the related_blog_posts section in _config.yml.


Code quality checks

Currently, we run some checks to ensure that the code quality and generated site are good. The checks are done using GitHub Actions and the following tools:

  • Prettier - check if the formatting of the code follows the style guide
  • lychee - check for broken links
  • Axe (need to run manually) - do some accessibility testing

We decided to keep Axe runs manual because fixing the issues are not straightforward and might be hard for people without web development knowledge.

FAQ

For frequently asked questions, please refer to FAQ.md.

Contributing

Contributions to al-folio are very welcome! Before you get started, please take a look at the guidelines.

If you would like to improve documentation or fix a minor inconsistency or bug, please feel free to send a PR directly to master. For more complex issues/bugs or feature requests, please open an issue using the appropriate template.

Maintainers

Our most active contributors are welcome to join the maintainers team. If you are interested, please reach out!


Maruan

Rohan Deb Sarkar

Amir Pourmand

George

All Contributors

Star History

Star History Chart

License

The theme is available as open source under the terms of the MIT License.

Originally, al-folio was based on the *folio theme (published by Lia Bogoev and under the MIT license). Since then, it got a full re-write of the styles and many additional cool features.

pc-darts's People

Contributors

yuhuixu1993 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pc-darts's Issues

train_search_imagenet lr scheduler is weird

Hi @yuhuixu1993 ,

I am trying to run train_search_imagenet.
But I find that the learning rate decay is weird.
if the initial_lr is 0.5 and there is Warming-up Epoc.
The lr should be 0.1, 0.2, 0.3, 0.4, 0.5, 0.49xxx, 0.49xxxx.....
However, the lr become 0.1, 0.04, 0.024, 0.018, ...
It seems that you times 1/5, 2/5, 3/5, ... on previous lr not on the initial_lr 0.5
Am I right?

pytorch vision

Tesla V100 which requires CUDA_VERSION >= 9000 for optimal performance and fast startup time, but your PyTorch(0.3.1) was compiled with CUDA_VERSION 8000.

GPU Utilization is Bad

Hi, Thanks for your great work. When I search imagenet on 4 GPUs, it is observed that the GPU utilization is really low (<20 %). How could I fix this?

Thanks in advance.

search on ImageNet

When search on ImageNet,at seed 0,the network has 1 skip_connect and flops is 645M. At seed 1, the network has 0 skip_connect and flops is 712M. I also test PC_DARTS_image ,the flops is 595M.

I test on different dataset ,it seems the flops can be approximate by 700-50*skip_connect, am I right?Sometimes max_pool and ave_pool show in norm cell, they also reduce 50M. If that's right, all work based on DARTS report their flops below 600M,so most of thier work has 2 or more skip_connect.

But when I search, often get 0 or 1 skip_connect, I wonder if "args.epochs" should be expanded?what epochs is suitable?

PC-DARTS on medical images classification

Hello,
I would like to begin by thanking you for this work. Me as an intern , i am investigating PC-DARTS for my breast-cancer diagnosis task ( classification of different cancers ). and I am trying to find the best architecture possible for this.

And since all your tests were done on natural images datasets ,do you have any comments about using them on such medical datasets? hyper-parameters that we should choose carefully ?

I did run some tests on it and i am saturating at 58 % on validation set, would an end-to-end training can tell me if the cell configuration i got is messy ?

thanks in advance for answering.

Is a channel sampling mask fixed?

Hello, thank you for your code sharing.
When looking into your code, I have a question about implementation for your partial channel connection idea.

In your code (model_search.py), it seems that "channel_shuffle" function only choose the first quartile of channels (including "forward" function of MixedOp class).
Does it mean that a channel sampling mask S_i,j defined in your paper is a fixed mask?

Please answer my question.
Thank you!

有一个小bug

当开始train_search.py的时候,会不停的创建log.txt,但事实上应该是只有第一个创建的txt是有效的。之后会不停的创建log文件,并且这些文件都是空的

about the valid_queue

In train_search.py, you train the architect after epoch 15, is it neccessary to get the valid input before epoch 15?
It will save time if you do not get the valid input before epoch 15.

How was PC_DARTS_cifar got?

Hi @yuhuixu1993,

Appreciate if you may reply to the following questions:

  1. was PC_DARTS_cifar searched by CIFAR10 or CIFAR100?
  2. was PC_DARTS_cifar the generated genotype of the last (50th) epoch?
    Thanks a lot!!

Best,
Bolian

Out of memory during searching & training.

hi, @yuhuixu1993, thank you for sharing your amazing work.

I have tried your code to search a network on cifar10, and I followed the instruction ``python train_search.py'' to run the code; however, the search procedure was ended up with CUDA out of memory error, i.e.,

09/01 01:55:07 PM train 000 2.426532e-01 90.234375 100.000000
09/01 01:57:19 PM train 050 2.439179e-01 91.590074 99.862132
09/01 01:59:23 PM train_acc 91.776000
09/01 01:59:23 PM epoch 49 lr 1.000000e-03
09/01 01:59:23 PM genotype = Genotype(normal=[('sep_conv_3x3', 0), ('skip_connect', 1), ('sep_conv_3x3', 0), ('sep_conv_3x3', 1), ('sep_conv_3x3', 1), ('sep_conv_3x3', 0), ('sep_conv_3x3', 0), ('sep_conv_5x5', 1)], normal_concat=range(2, 6), reduce=[('skip_connect', 1), ('max_pool_3x3', 0), ('sep_conv_5x5', 1), ('dil_conv_5x5', 0), ('sep_conv_5x5', 1), ('sep_conv_5x5', 2), ('sep_conv_5x5', 2), ('sep_conv_5x5', 1)], reduce_concat=range(2, 6))
tensor([[0.1646, 0.1040, 0.1085, 0.1185, 0.1612, 0.1110, 0.1120, 0.1202],
        [0.1498, 0.1149, 0.1231, 0.1378, 0.0881, 0.1234, 0.1322, 0.1307],
        [0.1124, 0.1017, 0.1039, 0.1098, 0.1700, 0.1085, 0.1513, 0.1423],
        [0.1205, 0.1086, 0.1074, 0.1125, 0.1670, 0.1348, 0.1133, 0.1358],
        [0.1754, 0.0947, 0.0893, 0.1235, 0.1288, 0.1079, 0.1647, 0.1156],
        [0.1414, 0.0955, 0.0984, 0.1059, 0.1967, 0.1468, 0.0984, 0.1169],
        [0.1193, 0.1157, 0.1068, 0.1123, 0.2055, 0.1308, 0.1107, 0.0988],
        [0.1533, 0.1183, 0.1268, 0.1451, 0.1239, 0.1101, 0.1156, 0.1068],
        [0.1498, 0.1063, 0.1073, 0.1328, 0.1361, 0.1421, 0.1167, 0.1089],
        [0.1132, 0.1281, 0.1156, 0.1162, 0.1608, 0.1445, 0.1159, 0.1056],
        [0.1024, 0.1174, 0.1393, 0.1294, 0.1359, 0.1527, 0.1220, 0.1010],
        [0.1306, 0.1090, 0.1147, 0.1288, 0.1520, 0.1055, 0.1442, 0.1152],
        [0.1462, 0.0942, 0.0924, 0.1288, 0.1084, 0.1127, 0.1749, 0.1425],
        [0.1416, 0.0991, 0.0974, 0.1282, 0.1396, 0.1392, 0.1244, 0.1306]],
       device='cuda:1', grad_fn=<SoftmaxBackward>)
tensor([[0.1148, 0.1495, 0.1392, 0.1084, 0.1359, 0.1406, 0.1165, 0.0951],
        [0.1327, 0.1076, 0.0993, 0.1602, 0.1379, 0.1272, 0.1087, 0.1264],
        [0.1164, 0.1317, 0.1378, 0.1009, 0.1293, 0.1330, 0.1105, 0.1405],
        [0.1259, 0.1005, 0.1057, 0.1288, 0.1212, 0.1532, 0.1372, 0.1273],
        [0.1387, 0.0891, 0.0919, 0.1215, 0.1347, 0.1005, 0.1860, 0.1375],
        [0.1181, 0.1241, 0.1199, 0.1367, 0.1470, 0.1228, 0.1066, 0.1249],
        [0.1344, 0.1070, 0.1094, 0.1165, 0.1187, 0.1590, 0.1319, 0.1233],
        [0.1372, 0.1035, 0.1067, 0.1308, 0.1076, 0.1623, 0.1355, 0.1164],
        [0.1400, 0.1138, 0.1198, 0.1409, 0.1042, 0.1349, 0.1158, 0.1307],
        [0.1309, 0.1175, 0.1291, 0.1270, 0.1265, 0.1102, 0.1504, 0.1083],
        [0.1337, 0.1022, 0.1076, 0.1221, 0.1369, 0.1550, 0.1146, 0.1279],
        [0.1339, 0.0890, 0.0939, 0.1212, 0.1117, 0.1830, 0.1362, 0.1313],
        [0.1431, 0.1034, 0.1148, 0.1350, 0.1092, 0.1270, 0.1375, 0.1300],
        [0.1396, 0.0920, 0.0998, 0.1267, 0.1404, 0.0974, 0.1373, 0.1669]],
       device='cuda:1', grad_fn=<SoftmaxBackward>)
tensor([0.3777, 0.3653, 0.2571], device='cuda:1', grad_fn=<SoftmaxBackward>)
09/01 01:59:27 PM train 000 2.647865e-01 89.843750 100.000000
09/01 02:01:39 PM train 050 2.318434e-01 91.980699 99.892770
09/01 02:03:43 PM train_acc 91.660000
09/01 02:03:43 PM valid 000 5.367675e-01 83.593750 99.218750
Traceback (most recent call last):
  File "train_search.py", line 206, in <module>
    main() 
  File "train_search.py", line 130, in main
    valid_acc, valid_obj = infer(valid_queue, model, criterion)
  File "train_search.py", line 190, in infer
    logits = model(input)
  File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ganji/Documents/work/pc-darts/model_search.py", line 159, in forward
    s0, s1 = s1, cell(s0, s1, weights,weights2)
  File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ganji/Documents/work/pc-darts/model_search.py", line 85, in forward
    s = sum(weights2[offset+j]*self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states))
  File "/home/ganji/Documents/work/pc-darts/model_search.py", line 85, in <genexpr>
    s = sum(weights2[offset+j]*self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states))
  File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ganji/Documents/work/pc-darts/model_search.py", line 44, in forward
    temp1 = sum(w * op(xtemp) for w, op in zip(weights, self._ops))
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 1; 15.90 GiB total capacity; 15.29 GiB already allocated; 15.56 MiB free; 7.17 MiB cached)

Then, I continued to run the command ``python train.py --auxiliary --cutout'', and the training of the searched model also raised the OOM error, i.e.,

➜  pc-darts python train.py --auxiliary --cutout --gpu 1
Experiment dir : eval-EXP-20190901-143341
09/01 02:33:41 PM gpu device = 1
09/01 02:33:41 PM args = Namespace(arch='PCDARTS', auxiliary=True, auxiliary_weight=0.4, batch_size=96, cutout=True, cutout_length=16, data='../data', drop_path_prob=0.3, epochs=600, gpu=1, grad_clip=5, init_channels=36, layers=20, learning_rate=0.025, model_path='saved_models', momentum=0.9, report_freq=50, save='eval-EXP-20190901-143341', seed=0, set='cifar10', weight_decay=0.0003)
108 108 36
108 144 36
144 144 36
144 144 36
144 144 36
144 144 36
144 144 72
144 288 72
288 288 72
288 288 72
288 288 72
288 288 72
288 288 72
288 288 144
288 576 144
576 576 144
576 576 144
576 576 144
576 576 144
576 576 144
09/01 02:33:44 PM param size = 3.634678MB
Files already downloaded and verified
Files already downloaded and verified
/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:82: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
09/01 02:33:46 PM epoch 0 lr 2.499983e-02
train.py:136: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
  nn.utils.clip_grad_norm(model.parameters(), args.grad_clip)
09/01 02:33:48 PM train 000 3.258214e+00 8.333333 50.000000
09/01 02:34:18 PM train 050 3.215875e+00 13.623365 56.638069
09/01 02:34:49 PM train 100 3.148910e+00 15.459983 61.984321
09/01 02:35:19 PM train 150 3.054110e+00 18.329194 67.335814
09/01 02:35:49 PM train 200 2.970589e+00 20.677860 71.035445
09/01 02:36:19 PM train 250 2.899201e+00 22.705012 73.725927
09/01 02:36:50 PM train 300 2.842999e+00 24.228266 75.633303
09/01 02:37:20 PM train 350 2.789153e+00 25.741927 77.148620
09/01 02:37:50 PM train 400 2.736966e+00 27.153469 78.514648
09/01 02:38:20 PM train 450 2.694277e+00 28.466832 79.557000
09/01 02:38:50 PM train 500 2.656195e+00 29.663588 80.561790
09/01 02:39:03 PM train_acc 30.111999
09/01 02:39:03 PM valid 000 1.350200e+00 52.083332 91.666664
Traceback (most recent call last):
  File "train.py", line 177, in <module>
    main() 
  File "train.py", line 113, in main
    valid_acc, valid_obj = infer(valid_queue, model, criterion)
  File "train.py", line 161, in infer
    logits, _ = model(input)
  File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ganji/Documents/work/pc-darts/model.py", line 150, in forward
    s0, s1 = s1, cell(s0, s1, self.drop_path_prob)
  File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ganji/Documents/work/pc-darts/model.py", line 51, in forward
    h1 = op1(h1)
  File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ganji/Documents/work/pc-darts/operations.py", line 66, in forward
    return self.op(x)
  File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 343, in forward
    return self.conv2d_forward(input, self.weight)
  File "/home/cvmt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 340, in conv2d_forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 1; 15.90 GiB total capacity; 14.26 GiB already allocated; 1.56 MiB free; 1.05 GiB cached)

In addition, my environment is ``Ubuntu 16.04 + CUDA 10.0 + Python 3.7 + PyTorch 1.2''.

Since I am new to NAS, I cannot figure out what causes the OOM error. Could you help fix this error or give some suggestions? Thanks.

incessant create floder

When I run train_search.py, it creates many floder start with "experiment dir:search" Only one floder is used to save log. logs in other floders is empty. I can't find the commend to created floder. So how can close it ?

About search cost in imagenet

"PC-DARTS allows a direct search on ImageNet (while DARTS failed due to low
stability), and achieves a state-of-the-art top-1 error of 24.2% (under the mobile setting) with only
3.8 GPU-days (11.5 hours) on 8 GPUs for search" the sentence means what?

if use a single gpu, it is 3.8GPUS-days ; if use 8 gpus, it is 11.5 hours ?

finally accuracy

hi,I want to know train how many epoch can get the 2.5% error in cifar10 ?
thank you vary much!

Question about search on custom dataset

First of all, thank you for your greate work.

I have a gender recognition project. The dataset I am using now is Celeba. I divide the dataset to female and male according to their lables, then follow the parameters you used to search on imagenet. As mentioned in the paper, during the first 35 of 50 epochs , only weights of the network are updated while the architechture is not changed. Then in the last 15 epochs, the architechture search is performed.

However, the results show that after 35 epochs, the accuracy of trainning/validation/ test are decressed dramatically. The architecture search does not bring any improvement but is harmful to the accuracy. The figure of the accuracy is shown below.

image

The parameters I used are shown below. Any parameters that are not shown are the same as the default values in file "train_search_imagenet.py"

image

About "replace input_search, target_search = next(iter(valid_queue))"

Thank you for your release code! I noticed that you replaced "input_search, target_search = next(iter(valid_queue))", why it is much faster? And why the code in "try" is "next(valid_queue_iter)" instead of "next(valid_queue)"? Hope for your reply!
try:
input_search, target_search = next(valid_queue_iter)
except:
valid_queue_iter = iter(valid_queue)
input_search, target_search = next(valid_queue_iter)

Using 1080Ti and python2 pytorch1.0 out of memory

I run the code as suggested but fail to get the code running but get an error "out of memory".

Envs:
I use one GPU 1080TI with 11178MiB.
I use python2.7 and PyTorch 1.0.
I simply download the code and run python train_search.py.

Would you please tell me where I went wrong?

Why modifying architecture after epoch 15

Hi,
I am wondering in the search stage , we begin to update the architecture only after 15 epochs ? what's the principal reason ?
If not working with CIFAR10 or IMAGENET , can this particular epoch number change ?if yes, it depends on what ?

Thanks in advance

model 'genotype' has no attribute 'DARTS'

I exec 'python train.py --auxiliary --cutout' and run an error which logs shows:
File “train.py ” line 72,in main
File "",line 1,in
AttributeError: module 'genotypes' has no attribute 'DARTS'
My pytorch version is 1.1.0 python version is python3

Why not use more reduction cells?

Thank you for this wonderful code. I have a question. In the paper on ImageNet, you start with three conv layers to reduce the resolution. Why not use cells to reduce the size,

parallel train_search code on ImageNet

hi, @yuhuixu1993, thanks for your good work.
You mentioned "We use eight Tesla V100 GPUs for search, and the total batch size is
1,024. The entire search process takes around 11.5 hours. "
I just wanna to know do you have any plan to realease your parallel train_search code on ImageNet ? Thank you.

test.py运行报错

test.py文件中的model.pt找不到,我试图改为weight.pt会报内存溢出的错误,请问这是什么问题?model.pt是在训练时生曾的文件吧?可是我找不到?

ImageNet search question

"To reduce search time, we randomly sample two subsets from the 1.3M training set of ImageNet, with 10% and 2.5% images, respectively. The former one is used for training network weights and the latter for updating hyper-parameters." I have two questions:

  1. 10% images of ImageNet means the subset only has 100 classes?
  2. hyper-parameters means architecture weights?
    Thanks for your reply!

How to derive the final architecture?

Hi Yuhui,

After I search on the CIFAR10 dataset, I get one type of genotype, which is familiar with the reported case in your paper.

However, when I derive the final architecture and calculate the FLOPs and latency, it seems a little strange.

For example, I run

from model import NetworkCIFAR as Network
import genotypes

genotype = genotype = eval("genotypes.%s" % "PCDARTS")

with torch.cuda.device(0):
    model = Network(36, 1000, 14, True, genotype)
    model.drop_path_prob = 0.3
    model.eval()
    flops, params =  get_model_complexity_info(model, (3, 224, 224), as_strings=True, print_per_layer_stat=True)
    print("{:<30}  {:<8}".format("Computational complexity: ", flops))
    print("{:<30}  {:<8}".format("Number of parameters: ", params))

The reported model complexity and number of parameters for the searched genotypes (with 14 layers)are as follows:

Computational complexity:       20.11 GMac
Number of parameters:           4.3 M  

But when I run the resnet50 for comparison:

from torchvision.models import resnet50

with torch.cuda.device(0):
    model = resnet50(pretrained=False)
    flops, params = get_model_complexity_info(model, (3, 224, 224), as_strings=True,
                                             print_per_layer_stat=True)
    print('{:<30}  {:<8}'.format('Computational complexity: ', flops))
    print('{:<30}  {:<8}'.format('Number of parameters: ', params))

The reported model complexity and number of parameters for resnet50 are as follows:

Computational complexity:       4.12 GMac
Number of parameters:           25.56 M 

The reported FLOPs in your paper on ImageNet setting is 597M. It seems there is something wrong with my derived final architecture. At your convenience, could you help to give clarifications about how to derive the final architecture? I will consider to deploy the searched model on some hardware devices and try to add some hardware-aware constraints for the overall design.

Additionally, the latency for the searched genotype (with 14 layers) is nearly ten times as the resnet50, which is unacceptable.

I am also an undergraduate from SJTU. Really thanks for your help. hahahaha

randomness involved in the channel_shuffle function?

Hi @yuhuixu1993,

It seems the channel_shuffle function evenly redistributes the channels composed of those pass through mixed operations (the first 1/K) and those do not (the later 1-1/K) in a determined pattern. Although it can produce much more complex results in later nodes as more channel_shuffle functions contribute, there is no randomness involved. Is it possible for the network to get used to the pattern and lose the regularization effect eventually. Why didn't you apply random sampling? Thank you!

def channel_shuffle(x, groups):
    batchsize, num_channels, height, width = x.data.size()
    channels_per_group = num_channels // groups
    # reshape
    x = x.view(batchsize, groups,
               channels_per_group, height, width)
    x = torch.transpose(x, 1, 2).contiguous()
    # flatten
    x = x.view(batchsize, -1, height, width)
    return x

In my toy example:

x = torch.Tensor([ [ [ [1,1,1], [1,1,1], [1,1,1] ],
                     [ [2,2,2], [2,2,2], [2,2,2] ],
                     [ [3,3,3], [3,3,3], [3,3,3] ],
                     [ [4,4,4], [4,4,4], [4,4,4] ],
                     [ [5,5,5], [5,5,5], [5,5,5] ],
                     [ [6,6,6], [6,6,6], [6,6,6] ],
                     [ [7,7,7], [7,7,7], [7,7,7] ],
                     [ [8,8,8], [8,8,8], [8,8,8] ] ] ])

the output of channel_shuffle(x, 4) is always

tensor([[[[1., 1., 1.],
          [1., 1., 1.],
          [1., 1., 1.]],

         [[3., 3., 3.],
          [3., 3., 3.],
          [3., 3., 3.]],

         [[5., 5., 5.],
          [5., 5., 5.],
          [5., 5., 5.]],

         [[7., 7., 7.],
          [7., 7., 7.],
          [7., 7., 7.]],

         [[2., 2., 2.],
          [2., 2., 2.],
          [2., 2., 2.]],

         [[4., 4., 4.],
          [4., 4., 4.],
          [4., 4., 4.]],

         [[6., 6., 6.],
          [6., 6., 6.],
          [6., 6., 6.]],

         [[8., 8., 8.],
          [8., 8., 8.],
          [8., 8., 8.]]]])

Search on ImageNet

@yuhuixu1993 thanks for you last timely reply,I now run search model in ImageNet,I use 100% ImageNet (batch size=1024 and 8 V100),and 2 day pass,In epoch 5 the logs shows tran_acc 3.305580,is it right?and i have another question,I saw in you paper " Still, a total of 50 epochs are trained and architecture hyper-parameters are frozen during the first 35 epochs." I am a little coufuse about this step.

Results on ImageNet

hi @yuhuixu1993 thanks for sharing your excellent work. I just wonder the search cost when you search on ImageNet.
PC-DARTS uses 12.5% images of Imagenet for searching in 4.3, so would the search cost be 3.8*8=30.4 GPU-days, if we use 100% images of ImageNet?
thx

save genotype

Thanks for your work.

I was wondering how to save the network that have been found by PCDARTS. I see that At the beigining, of the training, utils.create_exp_dir create a directory with all py scripts. However, it seems to me that the genotype of the new model is not saved.

Cannot re-implement your claimed result

Hello, when I am trying to re-implement your result on Cifar-10 with your code, I search 4 times, and train them with your code for 600 epochs, the best accuracy on validation set are 96.77, 97.32, 97.35, 97.21 separately. But in paper you claim the accuracy on testing set is 97.43+-0.07. Obviously here is a significant gap, why does this happen? Hope to get your response, thank you!

感谢你的开源

这是很有效的方法,看了整个框架后,我有这样两个疑问:
1、Genotype是怎么进行更新的:在model_search.py里,实现了生成新的Genotype,那下一步的迭代是如何使用这个新的Genotype呢?
2、每次只对1/4的X进行运算,剩下的3/4直接传递下去了,这是否可以看做是skip_connection呢?

about random sample

In sec 4.1, you said:

instead choose the first K channels of xi for operation mixture directly. To compensate, after xj is obtained, we shuffle its channels before using it for further computations.

but i think it is not a random sample implement because choosing the first K channels and channel shuffle are all determined operation.

i'm glad that you tell me if there is any misunderstanding of the implementation.

search accuracy for imagenet

I am getting 31% validation accuracy after the search directly on imagenet, is it the same with what you guys got? If not, can you tell me what is the validation accuracy I should expect at the end of train_search_imagenet?

CIFAR10 test error

Does anybody run the codes on CIFAR10?
My valid_acc on CIFAR10 is only 97.06
I just run
python train.py --auxiliary --cutout
and set the batch_size to be 128 (default 96)
so, what the problem with my experiment? thanks

text error in the paper about network parameters freeze

The following description is extracted from Section 4.2:
we freeze network hyper-parameters and
only allow network parameters to be tuned in the first 15 epochs

I guess the authors wanted to say the parameters are allowed to be tuned only after the first 15 epochs, which is consistent with the codes.

We cannot obtain your claimed result on ImageNet after trying many configurations

Hello, we have sent you several emails before asking about this problem. We first try to use 10% training set for training and 2.5% training set for validation and use a linear learning rate scheduler, with SGDM we train 250 epochs but the result is horrible. Then we try to use the whole ImageNet and get an even much more horrible result. Later we change the optimizer and use the whole ImageNet, now we still have clear accuracy gap. May you kindly give us more details about how you obtained the results in your paper? We really need your response, or we may have to follow our "wrong" configuration, do fair comparisons, and report our real observations to the conferences. Thank you so much!

why did you discard parameter '--unrolled'?

@yuhuixu1993 hi, I just noticed that you have dropped the parameter '--unrolled' after 'python train_search.py' command in ReadME. I want to know it's a carefulless missing or you truely did it. After all, the derivation of architecture parameters will be incomplete without '--unrolled'.

how to train by using multi GPU?

Hi:
I have 8 GPUs in one computer. if I use model = nn.DataParallel(model) in train_serach.py,it can't works.How to train pc-dart by using 8 gpus?

Advantage of weights-free operations(skip_connection)

@yuhuixu1993 Hi, thanks for your paper and project. I have studied Darts series for a while, and was confused about skip_connection issue(Skip_connection appears more and more during the searching process). I found that in your paper(PC-darts), you use 'consistent' explain the advantage of weights-free ops. How to understand the 'consistent' you talked, does it mean that the output across iterations are consistent? And how does this consistency bring advantages, does it have a certain impact on the direction of gradient descent during backpropagation?
Looking forward to your explanation, Sincerly

split for nas

Thanks for your awesome work.
I notice that some code is used to keep the same iterations in train and search

assert train_iters==valid_iters

I can't understand it. Is it a convention in nas?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.