Giter Club home page Giter Club logo

advent's Issues

Is it typing error? (at loss function)

Hello. Maybe I found some difference between code and paper.

image
The loss function of training discriminator represents that source domain is 0, and target domain is 1.

image
However, same part in paper shows source domain is 1, and target domain is 0.

Is it the intended notation or just typing error?

reproducibility of the results

I'm having trouble reproducing the results if I normalize the input image with mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225) and use the pretrained resnet-101 model on pytorch official website, could you please explain the reason, thanks.

Why doesn't the output shape of the discriminator have to be (B,1,1,1)?

From the code, I know the structure of discriminator used fully convolution network(like discriminator in DCGAN), but when we input some any size self-information map , I(x),we can't fix the output shape of discriminator to (B, C, 1, 1), maybe we get a output whose shape is (B, 1, 4, 4) and then create a ground truth tensor whose all elements is 1 or 0 (source or target) to calculate BCE loss.
I can't know why the output shape of discriminator don't have to be (B, 1, 1, 1), and we can directly use them for BCE loss.
Thank you!

Data normalization before feeding to the Segmentation network

Hi, I looked at the dataloader and training codes completely and noticed that you are subtracting the Imagenet mean from the input RGB images (GTA and Cityscapes) but not normalizing the data to either 0 to 1 (or to -0.5 to 0.5 for that matter).

Just wanted to know if I missed out on something or you intentionally chose to skip the normalization before feeding the data to the segmentation network.

Appreciate your response.

Training Performance and Stability

Hello,

Thank you for your work on this repo! I have a quick question. When I train a model (either MinEnt or Advent), I find that the validation performance varies widely (sometimes 5-10 mIOU points) from snapshot to snapshot (where snapshots are taken every 2000 iterations with the default learning rate). Do you recall experiencing this type of variation in mIOU across snapshots?

If so, did you just report the score of the best single snapshot on the 500-image val set (i.e. take the best evaluation under the 'best' config for cfg.TEST.MODE)?

It is possible that the performance stabilizes after more steps, but I am currently at iteration 60,000, so that seems unlikely at this point. Thank you!

Note: I have read all the previous issues and I am not concerned with attaining 43.3 vs 43.8 or that sort of thing. I am concerned with performance varying more widely, from say 41 to 37 to 42 within a few thousand iterations.

Direct entropy minimization for object detection (YOLOv3)

Hello @tuanhungvu ,

I am a student at TU Chemnitz, Germany. I am currently working on my master thesis project titled, 'Unsupervised Domain Adaptation for object detection.' I am working on the direct entropy minimization method mentioned in your paper. I am using YOLOv3 as my base architecture for object detection. I just wanted to confirm if I am implementing the method correctly for YOLOv3, as in the paper it is defined for SSD and I could not find any source code of the same for reference.

I am a little confused about the term 'soft-detection map' which is to be used to calculate the entropy for object detection. I read the paper and found some similarities between SSD and YOLOv3 but I am not absolutely sure if I am using the correct feature map during implementation. It would be great if you could help me with this.

  1. Could you specify from which exact layer is the 'soft detection' map taken for SSD? By any chance would you know what will be its equivalent in YOLOv3?

  2. In the equation,
    image
    is it correct that C represents the class probabilities for each anchor box or does it represent all the offsets obtained for each anchor box after applying the kernel?

  3. In YOLOv3, feature maps are obtained at 3 different scales. So should the feature map be considered as the output from the previous convolutional layer that would be used for detection or just the class probabilities obtained after processing the feature map to apply softmax and calculate the entropy map?

I hope I am able to express my doubt in a clear way. In case, you need some additional information, please let me know.
Thanks in advance.

Class ratio prior loss ?

Hello, thank for sharing the great work. I couldn't find the class ratio prior loss and wasn't sure i understood how it worked form your paper. Did i miss something ?
Thanks!
Mathilde

One hot encoding

Dear Author,

No one hot encoding required while training source dataset?

thank you

Question about the weighting factor of the entropy loss

Hi,
I checked the code ,and suprised to find that the weighting factor of the entropy loss and adversarial loss are both really small(1e-3 compared with the segmentation weighting factor 1.0), so I wonder if it really works out in practice. And for the direct entropy loss, could it be possible that the network eventually predicts every pixel as one class, in which case the network would have a really low entropy loss but bad performance.
Looking foward to hearing your reply.

Question about caculation of image mean?

I found in your code IMG_MEAN is set to [104.00698793, 116.66876762, 122.67891434], is IMG_MEAN calcuated on source domain only or target domain only or both?

source_label=0; target_label=1

@tuanhungvu hello~, I find that in ADVENT/advent/domain_adaptation/train_UDA.py, the source_label = 0 and target_label = 1, which is inverse of what is descripted in the original paper. This makes me very confused, I do hope the authors could give me an answer, thanks~

How to perform ensembling?

Very insightful paper. I have one question.
In the paper, the ensembling of EntMin and EntAdv achieves better performance.
How did you perform the ensembling?
Did you calculate the average of the probability map (after the softmax) or logit map (before the softmax) of the two models?

Thanks for your reply in advance.

How to implement from SYNTHIA to Cityscapes?

Hello, I am re implementing the adaptation from SYNTHIA to Cityscapes. Except for the image size [1280, 760], I used the exact same setting up with GTA5->Cityscapes, including init parameters, learning rate, iterations, ect. But according to my training result, i could only get the best mIOU: 39.1% for 16 classes. I would like to ask, how should I improve to get the similar performance as in your paper. It is shown as below:
image

binary segmentation

Hi, how can I change the dataset to binary segmentation, I have images and masks and can generate a data list, but the format seems quiet different from the cityscapes.

Any guidance will be so grateful!

question about the segmentation network

hello, @valeoai do you have tried different segmentation network in ADVENT? because now in advent it has only used deeplab as segmentation network, i want to ask, if the different segmentation network has influence an the segmentation result. Is it possible through changing segmentation network to get better result? Thank you in advance

performance

Hi. I git clone your repro and run “python train.py --cfg ./configs/advent.yml”, but I test it with the command "python test.py --cfg ./configs/advent.yml" , the test mIoU is only 43.25 which is lower than the paper. Did you finetune the model and get mIoU 43.8? Thanks.

About implementation of AdvEnt+MinEnt

Hi,

I implemented the AdvEnt+MinEnt (45.5 in paper) by combining advEnd and MinEnd together, but got 42.48 % instead.

For GTA5-> Cityscapes:
I saw the result for MinEnt + ER in paper. My question is that do you use +ER or class prior techniques with AdvEnd+MinEnt experiment ?

Could you please briefly clarify how you implement AdvEnd+MinEnt experiment ?

Best,
Chang

How to reduce the requirement of GPU memory

Hi, I am trying to reproduce the state-of-the-art of your paper advent.
The server of my school is under system update, so
For now, I only have one rtx2070s 8GB for training, is there any method to reduce the requirement of GPU memory for training your model?
i have noticed that the batch size is 1 in the config file.

looking forward to your reply!
many thanks!

About implementation of class-ratio priors

Thanks for your great work, but I have a little confusion about the class-ratio priors. I can't find the implementation of class-ratio priors in your project. And I wonder whether the implementation of CP is what I describe.
First,calculate the distribution of class from source target and get the ps.
Then, pass the feature map after softmax layer of target domain picture to a global average pooling layer to get the mean of class score. px
Finally, ps subtract px,and add them if the subtraction result is over 0 among class channel.
Besides, I want to ask another question,about the loss function lcp. Why is the subtraction result should be over 0? Maybe a modulus of the result can help to let the target domain's distribution close to source domain's distribution.
Thanks!

About the entropy minimization

Hello~
I notice that the log you are using in the entropy minimization is log2() rather than log(), have you tried torch.log() ?

About the code of ADVENT

Hello,

I am trying to reproduce the result reported in the paper( ADVENT) but the result I got is 42.36(advent only, best result), I want to know if there is any mistake in the configure or have you ever meet such result?
image

Hyperparameters for training with CycleGAN translated images

Hello, thanks for publishing your code.

I am trying to reproduce your results regarding the use of CycleGAN translated images, but am unable to. Did you use the same hyperparameters for this run, or did you change these? If so, what hyperparameter values did you change?

Thanks in advance :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.