Comments (6)
@Geoyi finally, I am able to run the code on a Microsoft Azure Linux VM with GPU. It took me a while to install cuda 9.0, cudnn 7.0, python 3.5, tensorflow-gpu 1.5, etc. on a clean Ubuntu Server 16.04. Got all kinds of errors during installation, but I was able to solve them eventually. The training is still running with an average speed of 1 sec/step.
2018-02-16 15:12:54.550821: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 6a58:00:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-02-16 15:12:54.550889: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80, pci bus id: 6a58:00:00.0, compute capability: 3.7)
INFO:tensorflow:Restoring parameters from ssd_inception_v2_coco_2017_11_17/model.ckpt
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path training/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Recording summary at step 0.
INFO:tensorflow:global step 1: loss = 297.4543 (14.441 sec/step)
INFO:tensorflow:global step 2: loss = 288.2479 (1.025 sec/step)
INFO:tensorflow:global step 3: loss = 273.9143 (0.980 sec/step)
INFO:tensorflow:global step 4: loss = 259.0708 (0.992 sec/step)
INFO:tensorflow:global step 5: loss = 245.9408 (0.976 sec/step)
INFO:tensorflow:global step 6: loss = 230.3553 (0.992 sec/step)
INFO:tensorflow:global step 7: loss = 220.4153 (1.007 sec/step)
INFO:tensorflow:global step 8: loss = 209.2201 (0.985 sec/step)
INFO:tensorflow:global step 9: loss = 190.2542 (0.969 sec/step)
INFO:tensorflow:global step 10: loss = 172.4187 (1.023 sec/step)
INFO:tensorflow:global step 11: loss = 153.9014 (1.023 sec/step)
INFO:tensorflow:global step 12: loss = 133.6975 (0.998 sec/step)
INFO:tensorflow:global step 13: loss = 122.8432 (0.994 sec/step)
INFO:tensorflow:global step 14: loss = 104.7338 (0.968 sec/step)
INFO:tensorflow:global step 15: loss = 88.5447 (0.979 sec/step)
INFO:tensorflow:global step 16: loss = 83.2795 (1.028 sec/step)
INFO:tensorflow:global step 17: loss = 63.6085 (1.009 sec/step)
INFO:tensorflow:global step 18: loss = 70.0747 (0.981 sec/step)
INFO:tensorflow:global step 19: loss = 51.9096 (0.992 sec/step)
INFO:tensorflow:global step 20: loss = 41.9647 (1.062 sec/step)
INFO:tensorflow:global step 21: loss = 37.5462 (1.002 sec/step)
INFO:tensorflow:global step 22: loss = 29.1385 (1.056 sec/step)
INFO:tensorflow:global step 23: loss = 30.1743 (1.006 sec/step)
INFO:tensorflow:global step 24: loss = 30.5369 (1.019 sec/step)
INFO:tensorflow:global step 25: loss = 30.2714 (1.000 sec/step)
INFO:tensorflow:global step 26: loss = 23.5878 (1.013 sec/step)
...
INFO:tensorflow:global step 21325: loss = 3.0498 (1.088 sec/step)
INFO:tensorflow:global step 21326: loss = 5.0268 (1.092 sec/step)
INFO:tensorflow:global step 21327: loss = 3.2619 (1.051 sec/step)
INFO:tensorflow:global step 21328: loss = 4.2388 (1.068 sec/step)
INFO:tensorflow:global step 21329: loss = 4.5754 (1.079 sec/step)
INFO:tensorflow:global step 21330: loss = 2.8839 (1.060 sec/step)
INFO:tensorflow:global step 21331: loss = 4.7587 (1.059 sec/step)
INFO:tensorflow:global step 21332: loss = 3.1406 (1.045 sec/step)
INFO:tensorflow:global step 21333: loss = 5.1399 (1.081 sec/step)
INFO:tensorflow:global step 21334: loss = 4.9081 (1.038 sec/step)
INFO:tensorflow:global step 21335: loss = 3.0094 (1.041 sec/step)
INFO:tensorflow:global step 21336: loss = 4.7036 (1.154 sec/step)
from label-maker.
Hi @giswqs, glad to see you back to try out our TF object detection case as well.
My bad that I kinda use two many bounding boxes for the test case, to answer your question on:
I only got 196 image tiles for the Mexico example instead of 385 image tiles as stated in your tutorial. I did try on two different computers, both of which got the same 196 image titles.
You might want to replace the bounding box in your configure file with:
[-99.17667388916016,19.466430383606728,-99.11865234374999,19.51813278329343]
The original folder name in the zip file ssd_inception_v2_coco is ssd_inception_v2_coco_2017_11_17, so I had to change the folder name to ssd_inception_v2_coco. You might want to note this in your tutorial.
Actually, the ssd_inception_v2_coco.config
is a different file from the folder ssd_inception_v2_coco_2017_11_17
, you should just download our config file here. DON'T change the folder name, otherwise you will need to change the path in the config file as well. The folder ssd_inception_v2_coco_2017_11_17
is the model checkpoint, and model config file ssd_inception_v2_coco.config
is setup to pull everything together for the train.py
if that makes sense.
module 'tensorflow.contrib.slim.python.slim.data.tfexample_decoder' has no attribute 'BackupHandler'
Not sure how this happened, but you can do:
# From tensorflow/models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
Let me know if you still run into other problems, good luck!!!
from label-maker.
@Geoyi Thanks for your hints! I tried your new bounding box, now I got 227 image tiles. Nevermind, now I know what the problem is and I can play with the bounding box.
Regarding the config file, I did use the provided ssd_inception_v2_coco.config
. What I meant was the folder name within the downloaded zip file ssd_inception_v2_coco
. After checking the ssd_inception_v2_coco.config
Line 152, I realized that I should not have changed the folder name ssd_inception_v2_coco_2017_11_17
.
So the folder structure should probably be:
models/research/object_detection/
├──ssd_inception_v2_coco_2017_11_17/
├── training/
│ └── ssd_inception_v2_coco.config
├── data/
│ ├── train_buildings.record
│ ├── test_buildings.record
│ └── building_od.pbtxt
└── images/
├── train/
└── test/
I guess the folder structure in your tutorial is missing the partial folder name _2017_11_17
I have tried every possible way I could, but I still could not by bypass the 'BackupHandler'
error. I guess I have to wait until I upgrade to tensorflow v1.4 and cuda9.0 to test the code again.
from label-maker.
Yay, definitely upgrade your TensorFlow, @giswqs, you should install tensorflow-gpu too, you have GPU backend and it will speed up the training.
You're completely right about image number 227. That's the number I have to train the model, I tested too many examples, and apparently, one of the examples was too slow and I killed it.
Thanks for the fact check, and I will definitely update our walkthrough.
from label-maker.
That's a great news, and thanks for sharing it @giswqs.
Mine was 12s per step, yours is faster with GPU, can't wait to see how it turns out.
from label-maker.
Walkthrough was updated in 086ff8c and 1043702
from label-maker.
Related Issues (20)
- Allow LabelMaker to return TFRecords from NPZ HOT 3
- Pycurl error HOT 2
- Add authentication to `GET` requests
- Not able to install label-maker on mac using the command pip install label-maker? HOT 5
- syntax error while running label-maker download --dest togo --config config.json? HOT 5
- Using this command "label-maker images" getting an error and not able to download the tiles! HOT 2
- Background tile ratio HOT 1
- Circle CI doesn't sync HOT 1
- Is there a way to download images of size greater than 256x256 and better clarity using label-maker? HOT 1
- Is there a way to join the small images tiles and make a bigger image? HOT 1
- `get_bounds` only works for polygons HOT 1
- Can i use to extract tiles from google ? HOT 5
- Can the training improve the detection ?
- Not all labels for border tiles are downloaded HOT 4
- Can we label already downloaded imagery with the custom labels ? HOT 14
- Running label-maker on google colab HOT 1
- No support for Tensorflow 2 HOT 1
- Generalize Final data.npz output HOT 1
- Remove or upgrade click dependency
- Use logging or drop log option
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from label-maker.