Giter Club home page Giter Club logo

Comments (17)

Yuliang-Zou avatar Yuliang-Zou commented on August 20, 2024 1

Yes. I think you can mainly follow the data generation of `pascal_voc_seg'. It should be straightforward to adapt to your use case.

from wss.

Yuliang-Zou avatar Yuliang-Zou commented on August 20, 2024 1

Ah you are right. The one I referred to is for the overall accuracy. You should use the one you are referring to for per-class accuracy.

from wss.

Yuliang-Zou avatar Yuliang-Zou commented on August 20, 2024

TensorFlow sometimes provides wrong error messages...
I think your dataset_dir path may be wrong, you need to specify to the directory that contains the tfrecord files.

from wss.

saramsv avatar saramsv commented on August 20, 2024

@Yuliang-Zou Yeah looks like that was the problem. Changed it to dataset/pascal_voc_seg/ and it is running now! Thank you!
So I think my next step should be generating tfrecords for my data using the code here. Is that correct?

from wss.

saramsv avatar saramsv commented on August 20, 2024

@Yuliang-Zou I trained the model on Pascal VOC and have a few issues with it. The loss and accuracy don't almost change through training. I have attached a screenshot here. The acc_seg is never above 0.05! I was wondering if you could help me understand what I am doing wrong.
Screen Shot 2021-03-23 at 8 57 01 AM

I ran the code without any changes as follows:

python train_sup.py \
  --logtostderr \
  --train_split="train" \
  --model_variant="xception_65" \
  --atrous_rates=6 \
  --atrous_rates=12 \
  --atrous_rates=18 \
  --output_stride=16 \
  --decoder_output_stride=4 \
  --train_crop_size="513,513" \
  --num_clones=4 \
  --train_batch_size=64 \
  --training_number_of_steps=3000 \
  --fine_tune_batch_norm=true \
  --train_logdir="logs" \
  --dataset_dir="dataset/pascal_voc_seg/"

When the training is done it prints:

Finished training! Saving model to disk.
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/summary/writer/writer.py:386: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened.
  warnings.warn("Attempting to use a closed FileWriter. "

And when I run eval.py using the following command:

python eval.py \
  --logtostderr \
  --eval_split="val" \
  --model_variant="xception_65" \
  --atrous_rates=6 \
  --atrous_rates=12 \
  --atrous_rates=18 \
  --output_stride=16 \
  --decoder_output_stride=4 \
  --eval_crop_size="513,513" \
  --checkpoint_dir="logs" \
  --eval_logdir="logs_eval" \
  --dataset_dir="dataset/pascal_voc_seg"\
  --max_number_of_evaluations=1

I get:

eval/miou_class_17[0]eval/miou_class_18[0]eval/miou_class_6[0]eval/miou_class_10[0]eval/miou_class_7[0]                                                                   [1/1835]
eval/miou_class_16[0]eval/miou_class_1[0]eval/miou_overall[0.0349133424]eval/miou_class_9[0]
eval/miou_class_5[0]eval/miou_class_15[0]eval/miou_class_2[0]
eval/miou_class_3[0]
eval/miou_class_0[0.733180523]

eval/miou_class_11[0]eval/miou_class_8[0]


eval/miou_class_4[0]


eval/miou_class_20[0]
eval/miou_class_12[0]eval/miou_class_13[0]

eval/miou_class_14[0]



eval/miou_class_19[0]


Thank you so much for your time and help!

from wss.

Yuliang-Zou avatar Yuliang-Zou commented on August 20, 2024

Hmmm, interesting.
So first of all, your training iterations are not enough. It should be 30k instead of 3k.
If it still does not work, then maybe try to turn fine_tune_batch_norm to False.

from wss.

saramsv avatar saramsv commented on August 20, 2024

@Yuliang-Zou Thank you so much for your quick response. I tried 30k the result was better but not as expected. It only got to ~30% acc as shown in the plot.
wss1
I tried 30k with the pretrained weight from imagenet and fine-tune=False

  --fine_tune_batch_norm=False \
  --tf_initial_checkpoint="models/xception/model.ckpt"

got more reasonable accuracy (~64%):
wss2
And eval.py works too.

eval/miou_class_3[0.761627734]eval/miou_class_11[0.550620198]eval/miou_class_1[0.751154363]
eval/miou_class_15[0.736175716]
eval/miou_class_13[0.702824056]
eval/miou_class_6[0.877344787]
eval/miou_class_8[0.805240393]eval/miou_class_4[0.634970427]eval/miou_class_10[0.722814679]eval/miou_class_9[0.233576939]eval/miou_class_17[0.672929585]
eval/miou_class_16[0.430694789]eval/miou_class_5[0.667984486]eval/miou_class_12[0.763805628]
eval/miou_class_20[0.64040792]eval/miou_class_7[0.799465537]eval/miou_class_2[0.302537024]
eval/miou_class_19[0.760207057]
eval/miou_class_14[0.726642609]
eval/miou_class_18[0.43171677]eval/miou_class_0[0.910657108]
eval/miou_overall[0.661114216

The fact that the acc starts from zero, tells me that the model isn't reading the weight from the ImageNet CKPs properly. I also get a lot of warning with this message
'W0325 18:40:07.277173 140412294371136 variables.py:672] Checkpoint is missing variable...'

Also, I have another question about the tfrecords for unlabeled images (no annotation and no image-level labels). How do you generate tfrecords for these images? Are images used for train_aug-00000-of-00010.tfrecord, for example, have image-level labels? If not, how do you generate the tfrecords for them?

I really appreciate your help!

from wss.

Yuliang-Zou avatar Yuliang-Zou commented on August 20, 2024

Can you provide more information about the missing variable? Although I guess the missing variables are actually the decoder part (not trained on ImageNet). Another thing, the reference performance is achieved by 8 x 2-GPU internal machines. Since you are using a different configuration, you might not be able to get the same numbers.
As for the image-level labels, I actually convert ground truth segmentation maps to get them.

from wss.

saramsv avatar saramsv commented on August 20, 2024

Yeah, sure. I have attached a file that has the warning messages. And I think you are right about the variables being from the decoder part (at least most of them are).
log.txt

Yeah, I understand that. I am only using 4 V100 GPUs.

But based on your paper, in addition to the pixel-level labeled images, you are also using images with no labels ("We propose a simple one-stage framework to improve semantic segmentation by using a limited amount of pixel-labeled data and sufficient unlabeled data or image-level labeled data").
I am a bit confused by "As for the image-level labels, I actually convert ground truth segmentation maps to get them." because the assumption of your work is that for some images you have nothing (no image level and no pixel level) and you generate pseudo labels them. I guess the correct question is what part of the code is taking care of those images and how you generate the pseudo-labels? Are they generated as a preposessing step or during the training. If they are generated during training, where do you give them to the program and in what format?

Looking at your code, these are the relevant parameters to be set when using the unlabeled images . I am assuming in your case unlabeled/image-level labeled images are in train_aug-0000* and are given to the program by setting 'train_split_cls'?!
Also, are the following default parameters okay or they need to be ganged? Another question I have is the difference between your pseudo labels and soft labels.

## Pseudo_seg options.
flags.DEFINE_boolean('weakly', False, 'Using image-level labeled data or not')

flags.DEFINE_string('train_split_cls', 'train_aug',
                    'Which split of the dataset to be used for training (cls)')

# Pseudo label settings.
flags.DEFINE_boolean('soft_pseudo_label', True, 'Use soft pseudo label or not')

flags.DEFINE_float('pseudo_label_threshold', 0.0,
                   'Confidence threshold to filter pseudo labels')

flags.DEFINE_float('unlabeled_weight', 1.0,
                   'Weight of the unlabeled consistency loss')

# Attention settings.
flags.DEFINE_list('att_strides', '15,16', 'Hypercolumn layer strides.')

flags.DEFINE_integer('attention_dim', 128,
                     'Key and query dimension of self-attention module')

flags.DEFINE_boolean('use_attention', True,
                     'Use self-attention for weak augmented branch or not')

flags.DEFINE_boolean('att_v2', True,
                     'Use self-attention v2 or not.')

# Ensemble settings.
flags.DEFINE_enum('pseudo_src', 'avg', ['att', 'avg'],
                  'Pseudo label source, self-attention or average.')

flags.DEFINE_float(
    'temperature', 0.5,
    'Temperature for pseudo label sharpen, only valid when using soft label')

flags.DEFINE_boolean(
    'logit_norm', True,
    'Use logit norm to change the flatness or not')

flags.DEFINE_boolean(
    'cls_with_cls', True,
    'Using samples_cls or samples_seg to train the classifier. Only valid in wss mode.') 

Sorry if I am asking too many questions :)

from wss.

Yuliang-Zou avatar Yuliang-Zou commented on August 20, 2024

We assume that we have limited pixel-level labeled data and a lot of unlabeled or image-level labeled data. It is easy to understand the unlabeled part. For the image-level labels, since VOC itself does not provide this label, we convert some pixel-level labeled data to image-level labeled data (and assume we don't have pixel-level labels for these data). You can take a look at here, which is the data loading code to handle this.

Pseudo label is another concept. We first use Grad-CAM to generate a coarse-grained segmentation and then refine it with self-attention. We then combine the predictions from both decoder and self-attention Grad-CAM to construct our pseudo label. These pseudo labels are soft (not one-hot) and they are generating on-the-fly, as training proceeds their quality becomes better.

from wss.

saramsv avatar saramsv commented on August 20, 2024

That makes sense and I understood that you used the pixel-level labels to get image-level labels. What I am not sure about is the following?

  1. Can one use your method with limited pixel-level labeled data and a lot of unlabeled images (no image-level labels are available and we can not use pixel-level labels to generate them (like you did) because there is no pixel-level labels available for these images)? From reading your paper, I though your method can handle this situation as well. But I also saw in def _preprocess_image(self, sample) the sample contains image and label. What happens when there is no label?
  2. Where in the code the pseudo-labeling process for unlabelled images is done?
  3. Since the pseudo labels are generated on-the-fly, where in the code the images/their paths are given to the model
  4. I guess all this boils down to, if I have only limited pixel-level labeled data and a lot of unlabeled images, how do I run your code on it?

Thanks again for your time!

from wss.

Yuliang-Zou avatar Yuliang-Zou commented on August 20, 2024
  1. We don't have special handling for unlabeled data. We still load the ground truth labels but do not use them for training. If your unlabeled data is "truly unlabeled", you can make fake ground truth segmentation maps by filling all the values to ignore value (e.g., 255 in most cases). That's also how I handle this when I use the truly unlabeled images from the COCO dataset.
  2. The pseudo label generation code is here. For unlabeled data, we will use the model's classification output as image-level labels.
  3. We do not save those pseudo labels. So we don't need additional paths.
  4. As mentioned in 1, you can manually create fake ground truth as placeholders for those unlabeled images. And then you create tfrecords for your pixel-level and unlabeled data separately. Lastly, in order to get a better result, you can use a stronger pre-trained ckpt.

from wss.

saramsv avatar saramsv commented on August 20, 2024

That makes sense. I'll give it a try see how it goes. Thank you so much for your help!

from wss.

saramsv avatar saramsv commented on August 20, 2024

@Yuliang-Zou Thank you so much for your help! I followed your guidance and it worked. I have one more question though! Is there an easy way to get pixel accuracy, in addition to IoU, from eval.py?

from wss.

Yuliang-Zou avatar Yuliang-Zou commented on August 20, 2024

I think you can use tf.metrics.accuracy to get that. Just like here.

from wss.

saramsv avatar saramsv commented on August 20, 2024

That is for classification, right? Should I not use something similar to this for pixel accuracy?

from wss.

saramsv avatar saramsv commented on August 20, 2024

BTW, I added

metric_map['eval_pix_acc'] = tf.metrics.accuracy(
        labels=labels, predictions=predictions,
        weights=weights)

after this line and it seems to be working. Not sure if it actually gives me the overall pixel accuracy or not.

from wss.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.