Giter Club home page Giter Club logo

Comments (6)

giswqs avatar giswqs commented on May 25, 2024 1

@Geoyi finally, I am able to run the code on a Microsoft Azure Linux VM with GPU. It took me a while to install cuda 9.0, cudnn 7.0, python 3.5, tensorflow-gpu 1.5, etc. on a clean Ubuntu Server 16.04. Got all kinds of errors during installation, but I was able to solve them eventually. The training is still running with an average speed of 1 sec/step.

2018-02-16 15:12:54.550821: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 6a58:00:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-02-16 15:12:54.550889: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80, pci bus id: 6a58:00:00.0, compute capability: 3.7)
INFO:tensorflow:Restoring parameters from ssd_inception_v2_coco_2017_11_17/model.ckpt
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path training/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Recording summary at step 0.
INFO:tensorflow:global step 1: loss = 297.4543 (14.441 sec/step)
INFO:tensorflow:global step 2: loss = 288.2479 (1.025 sec/step)
INFO:tensorflow:global step 3: loss = 273.9143 (0.980 sec/step)
INFO:tensorflow:global step 4: loss = 259.0708 (0.992 sec/step)
INFO:tensorflow:global step 5: loss = 245.9408 (0.976 sec/step)
INFO:tensorflow:global step 6: loss = 230.3553 (0.992 sec/step)
INFO:tensorflow:global step 7: loss = 220.4153 (1.007 sec/step)
INFO:tensorflow:global step 8: loss = 209.2201 (0.985 sec/step)
INFO:tensorflow:global step 9: loss = 190.2542 (0.969 sec/step)
INFO:tensorflow:global step 10: loss = 172.4187 (1.023 sec/step)
INFO:tensorflow:global step 11: loss = 153.9014 (1.023 sec/step)
INFO:tensorflow:global step 12: loss = 133.6975 (0.998 sec/step)
INFO:tensorflow:global step 13: loss = 122.8432 (0.994 sec/step)
INFO:tensorflow:global step 14: loss = 104.7338 (0.968 sec/step)
INFO:tensorflow:global step 15: loss = 88.5447 (0.979 sec/step)
INFO:tensorflow:global step 16: loss = 83.2795 (1.028 sec/step)
INFO:tensorflow:global step 17: loss = 63.6085 (1.009 sec/step)
INFO:tensorflow:global step 18: loss = 70.0747 (0.981 sec/step)
INFO:tensorflow:global step 19: loss = 51.9096 (0.992 sec/step)
INFO:tensorflow:global step 20: loss = 41.9647 (1.062 sec/step)
INFO:tensorflow:global step 21: loss = 37.5462 (1.002 sec/step)
INFO:tensorflow:global step 22: loss = 29.1385 (1.056 sec/step)
INFO:tensorflow:global step 23: loss = 30.1743 (1.006 sec/step)
INFO:tensorflow:global step 24: loss = 30.5369 (1.019 sec/step)
INFO:tensorflow:global step 25: loss = 30.2714 (1.000 sec/step)
INFO:tensorflow:global step 26: loss = 23.5878 (1.013 sec/step)
...
INFO:tensorflow:global step 21325: loss = 3.0498 (1.088 sec/step)
INFO:tensorflow:global step 21326: loss = 5.0268 (1.092 sec/step)
INFO:tensorflow:global step 21327: loss = 3.2619 (1.051 sec/step)
INFO:tensorflow:global step 21328: loss = 4.2388 (1.068 sec/step)
INFO:tensorflow:global step 21329: loss = 4.5754 (1.079 sec/step)
INFO:tensorflow:global step 21330: loss = 2.8839 (1.060 sec/step)
INFO:tensorflow:global step 21331: loss = 4.7587 (1.059 sec/step)
INFO:tensorflow:global step 21332: loss = 3.1406 (1.045 sec/step)
INFO:tensorflow:global step 21333: loss = 5.1399 (1.081 sec/step)
INFO:tensorflow:global step 21334: loss = 4.9081 (1.038 sec/step)
INFO:tensorflow:global step 21335: loss = 3.0094 (1.041 sec/step)
INFO:tensorflow:global step 21336: loss = 4.7036 (1.154 sec/step)

from label-maker.

Geoyi avatar Geoyi commented on May 25, 2024

Hi @giswqs, glad to see you back to try out our TF object detection case as well.
My bad that I kinda use two many bounding boxes for the test case, to answer your question on:

I only got 196 image tiles for the Mexico example instead of 385 image tiles as stated in your tutorial. I did try on two different computers, both of which got the same 196 image titles.

You might want to replace the bounding box in your configure file with:
[-99.17667388916016,19.466430383606728,-99.11865234374999,19.51813278329343]

The original folder name in the zip file ssd_inception_v2_coco is ssd_inception_v2_coco_2017_11_17, so I had to change the folder name to ssd_inception_v2_coco. You might want to note this in your tutorial.

Actually, the ssd_inception_v2_coco.config is a different file from the folder ssd_inception_v2_coco_2017_11_17, you should just download our config file here. DON'T change the folder name, otherwise you will need to change the path in the config file as well. The folder ssd_inception_v2_coco_2017_11_17 is the model checkpoint, and model config file ssd_inception_v2_coco.config is setup to pull everything together for the train.py if that makes sense.

module 'tensorflow.contrib.slim.python.slim.data.tfexample_decoder' has no attribute 'BackupHandler'

Not sure how this happened, but you can do:

# From tensorflow/models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

Let me know if you still run into other problems, good luck!!!

from label-maker.

giswqs avatar giswqs commented on May 25, 2024

@Geoyi Thanks for your hints! I tried your new bounding box, now I got 227 image tiles. Nevermind, now I know what the problem is and I can play with the bounding box.

Regarding the config file, I did use the provided ssd_inception_v2_coco.config. What I meant was the folder name within the downloaded zip file ssd_inception_v2_coco. After checking the ssd_inception_v2_coco.config Line 152, I realized that I should not have changed the folder name ssd_inception_v2_coco_2017_11_17.

So the folder structure should probably be:

models/research/object_detection/
├──ssd_inception_v2_coco_2017_11_17/
├── training/
│   └── ssd_inception_v2_coco.config
├── data/
│   ├── train_buildings.record
│   ├── test_buildings.record
│   └── building_od.pbtxt
└── images/
    ├── train/
    └── test/

I guess the folder structure in your tutorial is missing the partial folder name _2017_11_17

I have tried every possible way I could, but I still could not by bypass the 'BackupHandler' error. I guess I have to wait until I upgrade to tensorflow v1.4 and cuda9.0 to test the code again.

from label-maker.

Geoyi avatar Geoyi commented on May 25, 2024

Yay, definitely upgrade your TensorFlow, @giswqs, you should install tensorflow-gpu too, you have GPU backend and it will speed up the training.
You're completely right about image number 227. That's the number I have to train the model, I tested too many examples, and apparently, one of the examples was too slow and I killed it.
Thanks for the fact check, and I will definitely update our walkthrough.

from label-maker.

Geoyi avatar Geoyi commented on May 25, 2024

That's a great news, and thanks for sharing it @giswqs.
Mine was 12s per step, yours is faster with GPU, can't wait to see how it turns out.

from label-maker.

drewbo avatar drewbo commented on May 25, 2024

Walkthrough was updated in 086ff8c and 1043702

from label-maker.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.