Comments (27)
OK.I found the bug. It is in the L138 of train_voc12.py
image_batch_val, label_batch_val = read_data(is_training=False)
The above function must be have one more aug such as split name, otherwise, it will read from training data set :)
def read_data(is_training, batch_size=args.batch_size):
file_pattern = '{}_{}.tfrecord'.format(args.data_name, args.split_name)
I also can achieve about 80% mIoU with above code, but when I corrected the function (by add split name as val
), the performance will be 73.6% as my own implementation.
from deeplabv3-tensorflow.
Hi, everyone, I made a mistake. @John1231983 is right about the validation, thanks for your patience. I will fix the bug and rerun the program.
from deeplabv3-tensorflow.
@NanqingD Could you look at my problem?
from deeplabv3-tensorflow.
Have you seen 4804235?
from deeplabv3-tensorflow.
Have you try to run it and achieve the performance? I sure that the implementation has some issue as above when i look at the code
from deeplabv3-tensorflow.
Still not but 85.41% is better than every ablations in the original paper. Have you followed the training protocol in the readme?
from deeplabv3-tensorflow.
I trained 4 days but i cannot achieved yet. So i cancel the process. I found other solution likes pretrain to speed up the converage. But i am not clear that we will train all paramaters likes conv_trainable in the implementation or just conv +bn in aspp. Do you know that? In my option, first we will copy the pretrained weight from resnet and use them as initial points then train the weight again.
from deeplabv3-tensorflow.
Seems that loading pretrained weights is in the tasklist. But I am still curious to know how 85.41% was achieved.
from deeplabv3-tensorflow.
So let try to run it. I may not totally true. I guess the author mixing train and val set together, so the value like train from train+val set
from deeplabv3-tensorflow.
Ok.. of course If he mixed train and val this numbers don't make sense
from deeplabv3-tensorflow.
I think so. How is my understanding about pretrain? Is it correct? It means we will load pretrained wieght and use it as initial value instead of random initialization. Then the value will be updated during training
from deeplabv3-tensorflow.
Good catch!
from deeplabv3-tensorflow.
@bhack: could you rerun his code and let me know your performance?
from deeplabv3-tensorflow.
it will be next cvpr best paper, if the repo author can run a 85% validation result without imagenet weight initilizing. 💯
from deeplabv3-tensorflow.
Hi, I believed the author achieved the number, but he/she was wrong one line in implementation, so the number that he shows just come from the training set, not validation set. It looks like you are training 3 days and then you measure mIoU in the same dataset that you train. Thus, the performance will increase day by day. I guess after 1 weeks, he can achieve 99%. Just one typo in implementation, other is fine
from deeplabv3-tensorflow.
According to the code from this repo, it will run validation every 1000 step.
If the code is correct, the val_mean_iou is like blow.
step 1000, train_mean_iou: 0.038962, val_mean_iou: 0.010141
step 2000, train_mean_iou: 0.042974, val_mean_iou: 0.028284
step 3000, train_mean_iou: 0.042751, val_mean_iou: 0.013903
step 4000, train_mean_iou: 0.046491, val_mean_iou: 0.027592
step 5000, train_mean_iou: 0.051442, val_mean_iou: 0.028335
step 6000, train_mean_iou: 0.063231, val_mean_iou: 0.043679
step 7000, train_mean_iou: 0.068127, val_mean_iou: 0.059629
step 8000, train_mean_iou: 0.079136, val_mean_iou: 0.067333
step 9000, train_mean_iou: 0.090848, val_mean_iou: 0.079447
step 10000, train_mean_iou: 0.091812, val_mean_iou: 0.080594
step 11000, train_mean_iou: 0.094520, val_mean_iou: 0.082270
step 12000, train_mean_iou: 0.091182, val_mean_iou: 0.037890
step 13000, train_mean_iou: 0.094305, val_mean_iou: 0.100399
step 14000, train_mean_iou: 0.108395, val_mean_iou: 0.087806
step 15000, train_mean_iou: 0.130730, val_mean_iou: 0.077178
step 16000, train_mean_iou: 0.151436, val_mean_iou: 0.094387
step 17000, train_mean_iou: 0.162695, val_mean_iou: 0.099858
step 18000, train_mean_iou: 0.161996, val_mean_iou: 0.092451
step 19000, train_mean_iou: 0.166216, val_mean_iou: 0.121211
step 20000, train_mean_iou: 0.173018, val_mean_iou: 0.138802
step 21000, train_mean_iou: 0.178273, val_mean_iou: 0.109637
step 22000, train_mean_iou: 0.165977, val_mean_iou: 0.163120
step 23000, train_mean_iou: 0.157765, val_mean_iou: 0.100000
step 24000, train_mean_iou: 0.170514, val_mean_iou: 0.124348
step 25000, train_mean_iou: 0.198407, val_mean_iou: 0.144772
step 26000, train_mean_iou: 0.222081, val_mean_iou: 0.098426
step 27000, train_mean_iou: 0.237516, val_mean_iou: 0.219594
step 28000, train_mean_iou: 0.227344, val_mean_iou: 0.148576
step 29000, train_mean_iou: 0.237092, val_mean_iou: 0.163101
step 30000, train_mean_iou: 0.246426, val_mean_iou: 0.156120
step 31000, train_mean_iou: 0.238966, val_mean_iou: 0.121964
step 32000, train_mean_iou: 0.232442, val_mean_iou: 0.172701
step 33000, train_mean_iou: 0.212796, val_mean_iou: 0.135206
step 34000, train_mean_iou: 0.221639, val_mean_iou: 0.164141
step 35000, train_mean_iou: 0.232702, val_mean_iou: 0.187685
step 36000, train_mean_iou: 0.258620, val_mean_iou: 0.126137
step 37000, train_mean_iou: 0.286420, val_mean_iou: 0.137220
step 38000, train_mean_iou: 0.281958, val_mean_iou: 0.220993
step 39000, train_mean_iou: 0.294367, val_mean_iou: 0.146129
step 40000, train_mean_iou: 0.286681, val_mean_iou: 0.180327
step 41000, train_mean_iou: 0.291149, val_mean_iou: 0.230863
step 42000, train_mean_iou: 0.268544, val_mean_iou: 0.259150
step 43000, train_mean_iou: 0.302505, val_mean_iou: 0.246976
step 44000, train_mean_iou: 0.264544, val_mean_iou: 0.209577
step 45000, train_mean_iou: 0.265050, val_mean_iou: 0.175589
step 46000, train_mean_iou: 0.282189, val_mean_iou: 0.162485
step 47000, train_mean_iou: 0.314683, val_mean_iou: 0.185510
step 48000, train_mean_iou: 0.316408, val_mean_iou: 0.259217
step 49000, train_mean_iou: 0.331183, val_mean_iou: 0.288642
step 50000, train_mean_iou: 0.330321, val_mean_iou: 0.292073
step 51000, train_mean_iou: 0.331521, val_mean_iou: 0.251862
step 52000, train_mean_iou: 0.321944, val_mean_iou: 0.223568
step 53000, train_mean_iou: 0.324300, val_mean_iou: 0.177001
step 54000, train_mean_iou: 0.313486, val_mean_iou: 0.237518
step 55000, train_mean_iou: 0.301724, val_mean_iou: 0.293159
step 56000, train_mean_iou: 0.319151, val_mean_iou: 0.163533
step 57000, train_mean_iou: 0.330160, val_mean_iou: 0.263894
step 58000, train_mean_iou: 0.361916, val_mean_iou: 0.243117
step 59000, train_mean_iou: 0.366205, val_mean_iou: 0.249002
step 60000, train_mean_iou: 0.367591, val_mean_iou: 0.183255
step 61000, train_mean_iou: 0.366055, val_mean_iou: 0.250439
step 62000, train_mean_iou: 0.366535, val_mean_iou: 0.318944
step 63000, train_mean_iou: 0.363185, val_mean_iou: 0.266321
step 64000, train_mean_iou: 0.344529, val_mean_iou: 0.293947
step 65000, train_mean_iou: 0.342547, val_mean_iou: 0.275548
step 66000, train_mean_iou: 0.346227, val_mean_iou: 0.219023
step 67000, train_mean_iou: 0.355806, val_mean_iou: 0.235097
step 68000, train_mean_iou: 0.382246, val_mean_iou: 0.169041
step 69000, train_mean_iou: 0.393502, val_mean_iou: 0.314861
from deeplabv3-tensorflow.
Great to see the log @ksnzh : Actually, I have run the code for 3 days and only achieved around 60% validation (I did not remember the exact number). One thing I want to notice is that you have to change the learning weight by hand, For example, after 69k (as you show in log), you can reduce the learning rate and see the performance can be better. One more thing, it is trained from scratch so take a lot of time to have a good performance
from deeplabv3-tensorflow.
@NanqingD Could you please look at the thread?
from deeplabv3-tensorflow.
hi, everyone.
Inference strategy on val set: The proposed model is
trained with output stride = 16, and then during inference
we apply output stride = 8 to get more detailed feature
map. As shown in Tab. 4, interestingly, when evaluating
our best cascaded model with output stride = 8, the performance
improves over evaluating with output stride = 16
by 1:39%.
do you know what this sentence mean? In my understanding, it means set the block5 rate=2 when training, and set the block5 rate=1 when inference. Am I right? What you dude's opinion?
from deeplabv3-tensorflow.
No. Training block 3 unchanged and same as resnet, block 4 with rate as 2. Block 5 rate 4....In inference, block 3 rate 2, block 4 is 4, block 5 is 8
from deeplabv3-tensorflow.
One more thing, if you want to use pre-trained model (or future plan), then you have to change the name of resnet in the line L94
scope ='resnet{}'.format(depth)
to
scope ='resnet_v1_{}'.format(depth)
I think it is better if you use the pretrained model as the author did and you can converge faster. After changed the line, in the train_voc12.py, just add the line load
function
ckpt_path = './resnet_v1_101.ckpt'
And the lines after the net,endpoint=deeplabv3(...)
in train_voc12.py
exclude=['resnet_v1_101/aspp/1x1conv', 'resnet_v1_101/aspp/1x1conv/BatchNorm', 'resnet_v1_101/aspp/rate6',
'resnet_v1_101/aspp/rate6/BatchNorm', 'resnet_v1_101/aspp/rate12',
'resnet_v1_101/aspp/rate12/BatchNorm', 'resnet_v1_101/aspp/rate18',
'resnet_v1_101/aspp/rate18/BatchNorm', 'resnet_v1_101/img_pool/1x1conv/BatchNorm',
'resnet_v1_101/img_pool/1x1conv', 'resnet_v1_101/fusion/1x1conv',
'resnet_v1_101/fusion/1x1conv/BatchNorm', 'resnet_v1_101/logits/Conv']
restore_var = slim.get_variables_to_restore(exclude=exclude)
Hope you achieved a good result. In my case, my modified code achieved about 74% mIoU in batch size of 8
from deeplabv3-tensorflow.
@John1231983 , Hi, John. I will follow up your idea in the next few months. I will try to fix it piece by piece on my free time. Again, thanks for your support.
from deeplabv3-tensorflow.
Good job. One more thing I forget. In the deeplabv3.py, the line should be changed from add to concat aspp = tf.add_n(aspp_list)
. I think the author used concatenation
from deeplabv3-tensorflow.
@John1231983 Done. I reran the code for 12 hours, the training mIOU is 77% and validation mIOU is 64%. I think the sanity check is passed. I am gonna close this issue.
Welcome to (re)open issues if find more problems.
from deeplabv3-tensorflow.
Good job. Have you run wiyh pretrain model or from scratch? I trained with pretrained model and achieved 73% mIoU on validation using poly learning rate. If you used pretrained model, so I think your mIOU still low. You can use poly schedule instead of step schedule
from deeplabv3-tensorflow.
@John1231983 , Hi, John,I want to know how do you set the learning rate and batchsize.If possible,can you share the snapshots/checkpoint with me?My email address is [email protected].
from deeplabv3-tensorflow.
@FlyingIce1 now there is an official reference implementation if you are interested.
from deeplabv3-tensorflow.
Related Issues (20)
- shuffle batch out of range error HOT 1
- Multigrid block misunderstanding. HOT 16
- Can you share weights for the Network?
- get bug of memory leaks HOT 6
- '''image-level feature'''
- runs error
- How to visualize training results? HOT 4
- Verification problem?
- training error
- Checkpoint Error HOT 1
- No result in training
- ImportError: cannot import name 'VOC_CATS'
- Bad profoemance on the right and botton side of the image
- AttributeError: 'Namespace' object has no attribute 'is_training'
- PosiDeLabVv3 TysFouth-Fouth-Fas-FasMaskrCNN和TysSouth-DeLIPAB RESNET HOT 1
- streaming mean iou
- Where to get the SegmentationClassAug labels? HOT 1
- An issue about the train_voc12.py finished with exit code -1073741819 when processed step 1000 !!! HOT 1
- SPLIT_NAME.txt
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deeplabv3-tensorflow.