developmentseed / label-maker Goto Github PK
View Code? Open in Web Editor NEWData Preparation for Satellite Machine Learning
Home Page: http://devseed.com/label-maker/
License: MIT License
Data Preparation for Satellite Machine Learning
Home Page: http://devseed.com/label-maker/
License: MIT License
@Geoyi Thank you for the great tutorial! I was able to follow through until the very last step to train the object detection model. Before I ran into the error, I had noticed two minor differences from your tutorial:
ssd_inception_v2_coco_2017_11_17
, so I had to change the folder name to ssd_inception_v2_coco
. You might want to note this in your tutorial.I went through the Object Detection Model Setup without any problem. However, when I started to train the model, I got an error AttributeError: module 'tensorflow.contrib.slim.python.slim.data.tfexample_decoder' has no attribute 'BackupHandler'
, which is similar to the issue reported here. I tried the suggested workaround to download the tfexample_decoder.py and replace my local tfexample_decoder.py
, but it did not work. It seems the solution is to upgrade to tensorflow v1.4 or a nightly build, which requires cuda 9.0. However, I am not yet ready to upgrade cuda. Any advice will be appreciated! See below for more information about the errors.
What is the top-level directory of the model you are using:
~/tensorflow/models/research/object_detection
OS Platform and Distribution: Linux Ubuntu 16.04, Elementary OS
TensorFlow installed from (source or binary): pip install tensorflow-gpu
TensorFlow version: v1.3.0
Python version: v3.6.3
CUDA/cuDNN version: CUDA 8.0, cudnn 6.0, nVidia driver: 384.111
GPU: nVidia GeForce GTX 1050 Ti 4GB memory
CPU: Intel x86-64 Intel Core i7-4790K @ 3.60GHz x 8, 32GB memory
Exact command to reproduce:
python train.py --logtostderr \
--train_dir=training/ \
--pipeline_config_path=training/ssd_inception_v2_coco.config
Source code / logs:
(py36) qiusheng@wu-office:/media/hdd/Data/developmentseed/models/research/object_detection$ python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_inception_v2_coco.config
Traceback (most recent call last):
File "train.py", line 169, in <module>
tf.app.run()
File "/home/qiusheng/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 165, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/media/hdd/Data/developmentseed/models/research/object_detection/trainer.py", line 235, in train
train_config.prefetch_queue_capacity, data_augmentation_options)
File "/media/hdd/Data/developmentseed/models/research/object_detection/trainer.py", line 59, in create_input_queue
tensor_dict = create_tensor_dict_fn()
File "train.py", line 122, in get_next
worker_index=FLAGS.task)).get_next()
File "/media/hdd/Data/developmentseed/models/research/object_detection/builders/dataset_builder.py", line 140, in build
label_map_proto_file=label_map_proto_file)
File "/media/hdd/Data/developmentseed/models/research/object_detection/data_decoders/tf_example_decoder.py", line 153, in __init__
label_handler = slim_example_decoder.BackupHandler(
AttributeError: module 'tensorflow.contrib.slim.python.slim.data.tfexample_decoder' has no attribute 'BackupHandler'
Tried the suggested workaround to replace the local tfexample_decoder.py
and got the following error:
(py36) qiusheng@wu-office:/media/hdd/Data/developmentseed/models/research/object_detection$ python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_inception_v2_coco.config
Traceback (most recent call last):
File "train.py", line 49, in <module>
from object_detection import trainer
File "/media/hdd/Data/developmentseed/models/research/object_detection/trainer.py", line 32, in <module>
from object_detection.utils import variables_helper
File "/media/hdd/Data/developmentseed/models/research/object_detection/utils/variables_helper.py", line 23, in <module>
slim = tf.contrib.slim
File "/home/qiusheng/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/util/lazy_loader.py", line 53, in __getattr__
module = self._load()
File "/home/qiusheng/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/util/lazy_loader.py", line 42, in _load
module = importlib.import_module(self.__name__)
File "/home/qiusheng/anaconda3/envs/py36/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/home/qiusheng/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/contrib/__init__.py", line 60, in <module>
from tensorflow.contrib import slim
File "/home/qiusheng/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/contrib/slim/__init__.py", line 44, in <module>
from tensorflow.contrib.slim.python.slim.data import tfexample_decoder
File "/home/qiusheng/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/data/tfexample_decoder.py", line 7
<!DOCTYPE html>
^
SyntaxError: invalid syntax
Running through the README verbatim and using the togo
configuration successfully produces data.npz
, but running the resnet.py
example with that data produces this error:
~/p/maps〉python3 resnet.py
Using TensorFlow backend.
x_train shape: (8, 256, 256, 3)
8 train samples
3 test samples
class_weight {0: 1, 1: 0.0}
2018-04-13 10:58:18.219381: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Traceback (most recent call last):
File "resnet.py", line 83, in <module>
workers=4)
File "/usr/local/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/keras/engine/training.py", line 2160, in fit_generator
val_x, val_y, val_sample_weight)
File "/usr/local/lib/python3.6/site-packages/keras/engine/training.py", line 1480, in _standardize_user_data
exception_prefix='target')
File "/usr/local/lib/python3.6/site-packages/keras/engine/training.py", line 123, in _standardize_input_data
str(data_shape))
ValueError: Error when checking target: expected fc1000 to have shape (2,) but got array with shape (3,)
I'm not sure whether resnet.py is intended to work with the togo
configuration. If not that, is there a configuration that produces data that works with resnet.py
?
I work on Raster Vision at Azavea and this library seems to handle some of the same functionality. We create object detection training chips from GeoTIFFs and human-generated GeoJSON label files, and I'm thinking about how to extend label-maker
to do this. Do you think this would be in the scope of label-maker
, or is the focus just on generating labels from OSM data? Handling human-generated label files would make the library more general and usable with pre-existing benchmark datasets, for instance. I saw that there's an issue (#13) for handling GeoTIFFs instead of TMS URLs, which would be step one. The next step would be handling GeoJSON labels, or GeoTIFFs in the case of segmentation. Or maybe the right approach is to convert our data into vector tiles which this library already handles.
Hallo there,
I could run the building detection demo on my local CPU, since this procedure takes so long, so I am wondering that is that possible to run this label-maker+ tensorflow objecte detection API in AWS, did you guys have some suggestion or brief idea, how could I do that?
Regards
While label-maker download
mbtiles for united_states_of_america
casues Invalid argument as follows:
File "/Users/xxx/xxxx/label-maker/label_maker/download.py", line 32, in download_mbtiles
w.write(r.read())
OSError: [Errno 22] Invalid argument
after the mbtiles has showed be downloaded.
I'm trying to build a quick image classification with mxnet with label-maker
.
label-maker
.mx-lenet.py
to run with SageMaker;rasterio/_base.c(522): fatal error C1083: 无法打开包括文件: “cpl_conv.h”: No such file or directory
Given my understanding of how label-maker
works, tile extents are static (and based on the mbtile files), and with ml_type=='object-detection'
, objects that straddle tile boundaries will be split up in the training data. Ideally, the tile bounds would be generated dynamically so that it could generate tiles that contain the entire object, which should help the model learn better. However, fixing this would probably complicate the implementation (which I think is elegant, btw), and might not be worth it assuming there aren't many objects that straddle tiles, or if you want to be able to detect partial, clipped objects. I was just wondering if this is something you've considered.
Using the Vietnam example ("bounding_box": [105.42,20.75,106.41,21.53]), I noticed that there are approximately 200 tiles (out of 2296 downloaded tiles) with very high cloud cover. At least 100 tiles appear to be completely white (100% cloud cover) with the smallest file size (1.7 kb - 3.3 kb). Most image tiles without cloud cover are larger than 5.0 kb. Using these high cloud cover image titles for training could potentially influence the classification accuracy. I know this problem is caused by the Mapbox satellite images, but I was wondering if it could be possible for the label-maker to somehow identify these high cloud cover image tiles and eliminate them from being used for training?
BTW, I trained the Vietnam example on my 64-bit Linux machine (i7-4790 CPU, 32 GB RAM, NVIDIA EVGA GeForce GTX 1050 Ti GPU) using the resnet.py script. The training took approximately 76 minutes. The test accuracy is 83.22%.
Thank you for developing such a great tool! It makes my life much easier!
Hi folks!
Over at @observablehq we've been having fun using TensorFlow.js to run lightweight ML models in-browser, like in this recent example.
I'd love to tinker with the same idea with a map-trained model, and wonder if the project might include a TensorFlow SavedModel or Keras pre-trained model as a starting point, or as part of the detailed walk-throughs. Would this be in-scope of the project, as a way to play around with the model's activations without needing to setup & train it from scratch?
label-maker
has grown a lot in terms of options and examples, and all the new content has made it difficult to parse the README. We should move the docs and examples to one of the autogenerated docs platforms so it's better organized.
We first need to decide on a platform. Python offers some guidance here and the ReadTheDocs platform is also pretty popular. Any other good options?
I am running label-maker version 0.3.1.
When I run label-maker preview -n 3
, the following error is thrown:
Traceback (most recent call last):
File "/opt/conda/bin/label-maker", line 11, in <module>
sys.exit(cli())
File "/opt/conda/lib/python3.6/site-packages/label_maker/main.py", line 87, in cli
preview(dest_folder=dest_folder, number=number, **config)
TypeError: preview() missing 1 required positional argument: 'imagery_offset'
I don't see anything in the README or the cli help mentioning imagery_offset
in regard to the preview
command.
Some initial digging shows that preview()
does require imagery_offset as a positional argument. I'm not sure why that isn't being populated.
label-maker preview -n 3
generates 4 example tiles for each class, not 3.
probable fix:
preview.py line 60
if n > number:
-> if n >= number
Determining labels for each tile
---
Buildings: 0 features in 0 tiles
Total tiles: 462
Expand either the country
or bounding_box
property of config.json
(or both) to allow for specifying multiple areas to pull training data from
When I install the cloned package using
python setup.py install
there were issues importing new submodules I was working on (i.e., python couldn't find them).
This happened as I was trying to create label_maker/utils/plot_utils.py
even though I put an empty __init__.py
file in place and uninstalled/reinstalled the package. Without any changes to the code, pip install -e .
worked just fine.
In my computer, I use python 2.7, I've installed python 3 using brew
and I've set up alias python=python3
in my ~/. basrc
. however, it throws 👇 an error:
vir $ pip install label-maker
Collecting label-maker
Collecting mapbox-vector-tile==1.2.0 (from label-maker)
Collecting olefile==0.44 (from label-maker)
Collecting Pillow==4.3.0 (from label-maker)
Using cached Pillow-4.3.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Collecting numpy==1.13.3 (from label-maker)
Using cached numpy-1.13.3-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Collecting pyclipper==1.0.6 (from label-maker)
Using cached pyclipper-1.0.6-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Collecting Shapely==1.6.3 (from label-maker)
Using cached Shapely-1.6.3-cp27-cp27m-macosx_10_9_intel.macosx_10_9_x86_64.whl
Collecting pycurl==7.43.0.1 (from label-maker)
Collecting rasterio==1.0a12 (from label-maker)
Using cached rasterio-1.0a12-cp27-cp27m-macosx_10_9_intel.macosx_10_9_x86_64.whl
Collecting six==1.10.0 (from label-maker)
Using cached six-1.10.0-py2.py3-none-any.whl
Collecting mercantile==1.0.0 (from label-maker)
Using cached mercantile-1.0.0-py2-none-any.whl
Collecting pyproj==1.9.5.1 (from label-maker)
Requirement already satisfied: geojson==2.3.0 in /Library/Python/2.7/site-packages (from label-maker)
Requirement already satisfied: click==6.7 in /Library/Python/2.7/site-packages (from label-maker)
Requirement already satisfied: Cerberus==1.1 in /Library/Python/2.7/site-packages (from label-maker)
Collecting tilepie==0.2.1 (from label-maker)
Using cached tilepie-0.2.1-py2.py3-none-any.whl
Collecting protobuf==3.5.0.post1 (from label-maker)
Using cached protobuf-3.5.0.post1-py2.py3-none-any.whl
Collecting requests==2.11.0 (from label-maker)
Using cached requests-2.11.0-py2.py3-none-any.whl
Requirement already satisfied: humanize==0.5.1 in /Library/Python/2.7/site-packages (from label-maker)
Collecting mbutil==0.3.0 (from label-maker)
Using cached mbutil-0.3.0-py2.py3-none-any.whl
Collecting homura==0.1.5 (from label-maker)
Requirement already satisfied: setuptools in /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python (from mapbox-vector-tile==1.2.0->label-maker)
Collecting future (from mapbox-vector-tile==1.2.0->label-maker)
Requirement already satisfied: affine in /Library/Python/2.7/site-packages (from rasterio==1.0a12->label-maker)
Requirement already satisfied: snuggs>=1.4.1 in /Library/Python/2.7/site-packages (from rasterio==1.0a12->label-maker)
Requirement already satisfied: enum34 in /Library/Python/2.7/site-packages (from rasterio==1.0a12->label-maker)
Collecting attrs (from rasterio==1.0a12->label-maker)
Using cached attrs-17.4.0-py2.py3-none-any.whl
Requirement already satisfied: click-plugins in /Library/Python/2.7/site-packages (from rasterio==1.0a12->label-maker)
Requirement already satisfied: cligj in /Library/Python/2.7/site-packages (from rasterio==1.0a12->label-maker)
Requirement already satisfied: certifi in /Library/Python/2.7/site-packages (from homura==0.1.5->label-maker)
Requirement already satisfied: pyparsing in /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python (from snuggs>=1.4.1->rasterio==1.0a12->label-maker)
Installing collected packages: six, protobuf, future, Shapely, pyclipper, mapbox-vector-tile, olefile, Pillow, numpy, pycurl, attrs, rasterio, mercantile, pyproj, tilepie, requests, mbutil, homura, label-maker
Found existing installation: six 1.4.1
DEPRECATION: Uninstalling a distutils installed project (six) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
Uninstalling six-1.4.1:
Exception:
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/commands/install.py", line 342, in run
prefix=options.prefix_path,
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_set.py", line 778, in install
requirement.uninstall(auto_confirm=True)
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_install.py", line 754, in uninstall
paths_to_remove.remove(auto_confirm)
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_uninstall.py", line 115, in remove
renames(path, new_path)
File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/utils/__init__.py", line 267, in renames
shutil.move(old, new)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 302, in move
copy2(src, real_dst)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 131, in copy2
copystat(src, dst)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 103, in copystat
os.chflags(dst, st.st_flags)
OSError: [Errno 1] Operation not permitted: '/var/folders/6v/__4yg2p950382cycwndc2cf80000gn/T/pip-23SqDP-uninstall/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/six-1.4.1-py2.7.egg-info'
For very large areas with sparse data, it can be computational burdensome to save this enormous file, much of which may be "thrown out" when doing the eventual training (if it is background). I'd like to add the --sparse
option which mimics the operation from images
to only save a handful of non-class (background) tiles
In object detection and segmentation, we have to color bounding boxes and color fills once we get beyond binary labels. We should be colorblind-friendly when choosing those colors.
Options so far:
The most complete solution would be to find/write a function that takes number of colors as input, and spits back an idealized list of RGB/Hex vals. This might be overkill though -- a simpler solution would be to cycle through a fixed set of colors that are known to be color blind friendly (like Tableau 10 or something analogous). The latter solution is also nice because we won't add a dependency.
I'm following the Example Use: A building detector with TensorFlow API, and ran into an error when trying to generate the training/eval data.
** What is the top-level directory of the model you are using: **
~/tensorflow/models/research/object_detection
OS Platform and Distribution: Linux Ubuntu 16.04, Elementary OS
TensorFlow installed from (source or binary): pip install tensorflow-gpu
TensorFlow version: v1.5.0
Python version: v3.6.0
CUDA/cuDNN version: CUDA 9.0, cudnn 7.0, nVidia driver: 384.111
GPU: nVidia GeForce GTX 1070 Ti 8GB memory
CPU: Intel x86-64 Intel Core i5-6400K @ 2.70GHz x 4, 16GB memory
Exact command to reproduce:
python tf_record_generation.py --label_input=labels.npz --train_rd_path=data/train_buildings.record --test_rd_path=data/test_buildings.record
Sample code / logs:
python tf_record_generation.py --label_input=labels.npz --train_rd_path=data/train_buildings.record --test_rd_path=data/test_buildings.record
You have 221 training tiles and 144 test tiles ready
Traceback (most recent call last):
File "tf_record_generation.py", line 172, in <module>
tf.app.run()
File "/home/xban/.pyenv/versions/general/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 124, in run
_sys.exit(main(argv))
File "tf_record_generation.py", line 143, in main
tf_example = create_tf_example(group, train_dir)
File "tf_record_generation.py", line 56, in create_tf_example
image = Image.open(encoded_jpg_io)
File "/home/xban/.pyenv/versions/general/lib/python3.6/site-packages/PIL/Image.py", line 2572, in open
% (filename if filename else fp))
OSError: cannot identify image file <_io.BytesIO object at 0x7fc87f2ec678>
I was wondering what the thinking was behind choosing np.save
/np.load
instead of pickle
. I was preparing data for a fairly small region and ended up with 156250 tiles in total. Of which about 50000 are in one of my classes. Running label-maker images
took a long time so I started investigating and switching from .npz
to a pickle means it starts downloading tiles more or less immediately. Before I had to wait a very long time (minutes or more?) to get to the bit when it starts downloading.
If this sounds like something you'd be open to I'd contribute a PR.
The current rasterization code for segmentation labels doesn't support any additional options. For starters, it would be nice if there were an option to buffer Points and LineStrings so that they had a width greater than 1 pixel.
Proposed syntax would look like this in config.json
:
"classes": [
{ "name": "Roads", "filter": ["has", "highway"], "buffer": 3 }
]
where 3
is the number of pixels to buffer the geometry by before rasterizing.
Create an example using non-API restricted imagery so users can start using the tool more easily:
Create a better folder structure to separate unit and validation tests, use unittest
library for both
I need to rework my https://github.com/jremillard/images-to-osm project to use Mapbox tiles. The problem that label-maker is attempting to solve is right at the center of the planned rework. I just wanted to communicate what label-maker would look like if it was a perfect fit for my needs.
The input data (training ) to label-maker should be a set of geojson files. There is a rich and mature existing infrastructure of generating them from OSM and other data sources. They are easy to write code against in any language. Let other tools deal with it.
Label maker config would be
The final output would be intermediate files (training, and validating), not the training images.
When the network is training, the intermediate files can be opened up, and single images can be generated on the fly from a python module. The python module would handle either fetching and forming the training images or getting them from the sat image cache. It would stitch the sat images together, crop them correctly, and output bounding boxes, segmentation masks, and instance masks. The one image at a time would allow data sets that don't fit into memory to be used, keep performance good, and not violate sat image caching licensing restrictions.
If you want to be really nice to people, have an option to write out MS COCO files, since basically everyone is using that data set right now for benchmarking.
I've been following the walkthough found here (albeit with a smaller bounding box), and have initiated a Sagemaker Notebook instance. The data.npz file is sitting in the sagemaker folder, and I'm having no problem reading it when running the relevant sections of mx_lenet_sagemaker.py in a new notebook on the instance, however when I run the second cell of SageMaker_mx-lenet I hit the following error:
ValueError: Error training sagemaker-mxnet-2018-07-08-18-12-13-217: Failed Reason: AlgorithmError: uncaught exception during training: 'NoneType' object has no attribute 'read'
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/container_support/training.py", line 36, in start
fw.train()
File "/usr/local/lib/python2.7/dist-packages/mxnet_container/train.py", line 191, in train
model = user_module.train(**kwargs_to_pass)
File "/opt/ml/code/mx_lenet_sagemaker.py", line 92, in train
train_iter, val_iter = prep_data(data_path)
File "/opt/ml/code/mx_lenet_sagemaker.py", line 14, in prep_data
data = np.load(find_file(data_path, 'data.npz'))
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 402, in load
magic = fid.read(N)
AttributeError: 'NoneType' object has no attribute 'read'
After several hours trying different fixes I'm having little to no luck debugging, but was hoping you could check the example to ensure it runs fine when you attempt it?
Hallo there,
do you guys have any suggestion about this, I am running this on Ubuntu16.04, python3.5
Hao
Right now, tile downloads happen in series/synchronously. This is quite slow for larger data sets. We should take advantage of improved Python 3.6 asyncio
support or an existing library to parallelize the downloads.
The test for whether a tile is background is whether it matches the zero/background class:
class_match(ml_type, tile_results[k], 0)
For classification and object detection this works fine; however for segmentation it will return true for every tile that isn't completely non-background
When creating segmentation labels, for each matching OSM feature in a tile, we convert the coordinates to pixel coordinates and then rasterize. Unfortunately, because some coordinates are slightly outside of the tile (although the feature is partially in), when we clamp the coordinates to our tile pixel bounds, we create some odd visual artifacts
Hallo there,
I have tried witht he example of : A building detector with TensorFlow API at mexico city, I just follow all the default setting of boundary box and training model. Then I got my model trained as well as the test result, the question is that from the blog page" There are 227 buildings in the test dataset, and 191 buildings are predicted correctly by the model (84%)."
I was wondering that is this "84%" refer to the precision? If not, what's this number means? For my own model, I trained it around 20000, and select one of the check point in "training" folders, then got only precision around 20%, so I want to know how we could select the best checkpoint during the training? or we just monitoring the loss and select some checkpoint manually?
Best Regards,
Hao
I ran into problems when testing the two building classification examples (Tanzania and Vietnam) on my Linux machine. All label-maker commands (i.e., download, labels, preview, images
) worked fine except the label-maker package
command.
For the Vietnam example, the label-maker images
command did download 2296 tiles. However, when using the label-maker package
command, I got the following errors. I checked the filenames of the titles shown in the error list below and found that they are not among the 2299 tiles downloaded. I had similar issues with the Tanzania example.
Any help will be appreciated!
************************************* errors *******************************************
(py36) qiusheng@office:/tmp$ label-maker images --dest Vietnam_building --config Vietnam.json
Downloading 2296 tiles to Vietnam_building/tiles
(py36) qiusheng@office:/tmp$ label-maker package --dest Vietnam_building --config Vietnam.json
Couldn't open Vietnam_building/tiles/103935-57783-17.jpg, skipping
Couldn't open Vietnam_building/tiles/104132-57663-17.jpg, skipping
Couldn't open Vietnam_building/tiles/103998-57519-17.jpg, skipping
Couldn't open Vietnam_building/tiles/104007-57744-17.jpg, skipping
Couldn't open Vietnam_building/tiles/103990-57746-17.jpg, skipping
Couldn't open Vietnam_building/tiles/103973-57637-17.jpg, skipping
Couldn't open Vietnam_building/tiles/104061-57529-17.jpg, skipping
Couldn't open Vietnam_building/tiles/103977-57720-17.jpg, skipping
Couldn't open Vietnam_building/tiles/104009-57794-17.jpg, skipping
Couldn't open Vietnam_building/tiles/104224-57747-17.jpg, skipping
......
Couldn't open Vietnam_building/tiles/104194-57708-17.jpg, skipping
Couldn't open Vietnam_building/tiles/104265-57705-17.jpg, skipping
Traceback (most recent call last):
File "/home/qiusheng/anaconda3/envs/py36/bin/label-maker", line 11, in
sys.exit(cli())
File "/home/qiusheng/anaconda3/envs/py36/lib/python3.6/site-packages/label_maker/main.py", line 87, in cli
package_directory(dest_folder=dest_folder, **config)
File "/home/qiusheng/anaconda3/envs/py36/lib/python3.6/site-packages/label_maker/package.py", line 61, in package_directory
img = Image.open(image_file)
File "/home/qiusheng/anaconda3/envs/py36/lib/python3.6/site-packages/PIL/Image.py", line 2572, in open
% (filename if filename else fp))
OSError: cannot identify image file 'Vietnam_building/tiles/104173-57771-17.jpg'
Hello there,
I make this new issue because I got some question about the train and test samples split steps in this example. For the demo area of Mexico, the area of interest consist of 227 tiles in total, so if my understanding is right that, this total 227 tiles should be split into 181 training samples and 46 test samples (using 0,8 split index). But it seem the output using tf_records_generation.py from the example is not like this.
I have a look on the code for tf_records_generation.py:
for tile in tiles:
bboxes = labels[tile].tolist()
width = 256
height = 256
if bboxes:
for bbox in bboxes:
if bbox[4] == 1:
cl_str = "building"
bbox = [max(0, min(255, x)) for x in bbox[0:4]]
y = ["{}.jpg".format(tile), width, height, cl_str, bbox[0], bbox[1], bbox[2], bbox[3]]
tf_tiles_info.append(y)
split_index = int(len(tf_tiles_info) * 0.8)
column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
df = pd.DataFrame(tf_tiles_info, columns=column_name)
# shuffle the dataframe
df = df.sample(frac=1)
train_df = df[:split_index]
test_df = df[split_index:]
split_index = int(len(tf_tiles_info) * 0.8), this split here is split by building features number, right ? From my point of view, we have to split the total tiles and then as well as the building features in those tiles. Otherwise, the test tiles can not be promising to be independent of the training tiles, and the results of test make no sense anymore. If I did some mistakes here or I understand wrong, please feel free to point out.
Best;
Hao
When downloading images from a GeoTIFF, we can apply an offset to our "false tiles" in case the imagery isn't aligned with OpenStreetMap data. This option may help address part of #31 as well
Currently, if you run through the label-maker
readme example - with the togo
country config.json, and running in order:
label-maker download
label-maker labels
label-maker preview -n 10
label-maker images
label-maker package
The last step produces a warning for a missing file:
Saving QA tiles to data/togo.mbtiles
100% 26.6 MiB 1.9 MiB/s 0:00:00 ETA
Retiling QA Tiles to zoom level 12 (takes a bit)
174704 features, 10870707 bytes of geometry, 4 bytes of separate metadata, 2597853 bytes of string pool
99.9% 12/2060/1976
Determining labels for each tile
---
Roads: 11 tiles
Buildings: 9 tiles
Total tiles: 12
Writing out labels to data/labels.npz
Writing example images to data/examples
Downloading at most 10 tiles for class Roads
Downloading at most 10 tiles for class Buildings
Downloading 11 tiles to data/tiles
Couldn't open data/tiles/2063-1978-12.jpg, skipping
Saving packaged file to data/data.npz
Admittedly I haven't done enough digging yet, but it would look like there's possibly an off-by-one error here.
Right now the process of obtaining and preparing OSM QA tiles is a bit slow because we:
tippecanoe-decode
+ stream-filter.js
to create a GeoJSON of all features in just the desired bounding box.tippecanoe
to the desired zoom level.In theory, this would be possible to do much faster (although with more dependencies) if there were a way to fetch arbitrary tiles (or bounding boxes) from OSM QA data through another method (possibly a third party service)
This is part bug, part standardization, and part feature request. Right now preview
doesn't work with TIF extensions. We also use two different lists for deciding if an imagery source should be handled differently. Finally, we should add .vrt
to this list because we can read from Virtual Rasters just like GeoTIFF.
My recommended approach would be adding a list or function in utils.py
and import that in other scripts.
I am attempting to use label-maker images
with a COG download endpoint. The tiles that are downloaded have no extension (and are pretty big). I believe this is due to the COG not being identified as a TIF.
An example of the download endpoint url is:
'https://api.planet.com/data/v1/download?token=eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJRanpYU2t0cENBUjJKR0tZQmdXQ0k4SGtKMlkwY3VPTXdtbVNqRW1xQXlxRVBIYmNDZVNPTXBKd3dYQUFFdDF6MENYbng4bVVXMlVUNG1ubmxDUERtdz09IiwiaXRlbV90eXBlX2lkIjoiUFNPcnRob1RpbGUiLCJ0b2tlbl90eXBlIjoidHlwZWQtaXRlbSIsImV4cCI6MTUyNDg3NzIyMiwiaXRlbV9pZCI6Ijc2MDgxOF80ODQ4NzE4XzIwMTctMDktMTdfMGUyZiIsImFzc2V0X3R5cGUiOiJ2aXN1YWwifQ.W1Zcqw4MJ2A5anAuzkYW5UWS0C8jqjccg62YCSj7mjFtGTCCUsMOZ9HxhUxg9_KpLzt8_GGXn0YHdnvCLxqKew'
The function determining if an image is TIF (here) looks at the extension of the imagery
entry in the config file. The URL given above, which is the imagery
entry in my config file, has no extension.v Therefore, the image is not identified as TIF. Because the image is not identified as a TIF, it is downloaded as a TMS (here), which again looks for an extension (and finds none) and uses that to apply to the downloaded tile.
Is there a way I can force label-maker images
to download my COG from the endpoint as a TIF?
This fails:
~/Builds/label-maker/examples $ label-maker download
Breaking on this relative filepath here
This works:
~/Builds/label-maker $ label-maker download
Probably need to at least add something to the docs, but probably fix it so it's more agnostic of pwd
I am attempting to install label-maker on Ubuntu and the following error message came up. Are the dependencies compatable with Linux systems?
Using cached https://files.pythonhosted.org/packages/77/d9/d272b38e6e25d2686e22f6058820298dadead69340b1c57ff84c87ef81f0/pycurl-7.43.0.1.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "/tmp/pip-install-_pmpop_f/pycurl/setup.py", line 104, in configure_unix
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1344, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'curl-config': 'curl-config'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-_pmpop_f/pycurl/setup.py", line 841, in <module>
ext = get_extension(sys.argv, split_extension_source=split_extension_source)
File "/tmp/pip-install-_pmpop_f/pycurl/setup.py", line 508, in get_extension
ext_config = ExtensionConfiguration(argv)
File "/tmp/pip-install-_pmpop_f/pycurl/setup.py", line 72, in __init__
self.configure()
File "/tmp/pip-install-_pmpop_f/pycurl/setup.py", line 108, in configure_unix
raise ConfigurationError(msg)
__main__.ConfigurationError: Could not run curl-config: [Errno 2] No such file or directory: 'curl-config': 'curl-config'
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
from apport.fileutils import likely_packaged, get_recent_crashes
File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
from apport.report import Report
File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
import apport.fileutils
File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
from apport.packaging_impl import impl as packaging
File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 23, in <module>
import apt
File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'
Original exception was:
Traceback (most recent call last):
File "/tmp/pip-install-_pmpop_f/pycurl/setup.py", line 104, in configure_unix
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1344, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'curl-config': 'curl-config'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-_pmpop_f/pycurl/setup.py", line 841, in <module>
ext = get_extension(sys.argv, split_extension_source=split_extension_source)
File "/tmp/pip-install-_pmpop_f/pycurl/setup.py", line 508, in get_extension
ext_config = ExtensionConfiguration(argv)
File "/tmp/pip-install-_pmpop_f/pycurl/setup.py", line 72, in __init__
self.configure()
File "/tmp/pip-install-_pmpop_f/pycurl/setup.py", line 108, in configure_unix
raise ConfigurationError(msg)
__main__.ConfigurationError: Could not run curl-config: [Errno 2] No such file or directory: 'curl-config': 'curl-config'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-_pmpop_f/pycurl/
Ideally, run this example on Google's cloud. TPUs (google accelerated hardware) don't look to be ready yet for prime time yet, but I've signed up for alerts
In the pylint rc, we should tell it to ignore some of the warnings we've been manually overriding frequently and we're okay with ignoring.
So far, I'm thinking these:
Creating a data.npz
file may result in too large a file for some projects. At that point it may be better to read the training data in with a generator (example: Keras flow_from_directory). The easiest way to do this now would be to skip the final step (label-maker package
) and read directly from the labels.npz
and the tiles
directory. We should create either (1) an example showing how to do this or (2) additional scripts which perform this for the user depending upon configuration (or a combination of the two).
an error:
`Traceback (most recent call last):
File "/Users/xxxx/.virtualenvs/ml_data_gen_py3/bin/label-maker", line 11, in
load_entry_point('label-maker', 'console_scripts', 'label-maker')()
File "/Users/xxx/Documents/Development_Seed/Sat_Deeptrain/label-maker/label_maker/main.py", line 87, in cli
package_directory(dest_folder=dest_folder, **config)
File "/Users/xxx/Documents/Development_Seed/Sat_Deeptrain/label-maker/label_maker/package.py", line 48, in package_directory
for tile in tiles.files:
AttributeError: 'numpy.ndarray' object has no attribute 'files'
original codes over there are:
if ml_type == 'object-detection':
max_features = 0
for tile in tiles.files:
features = len(tiles[tile][0])
if features > max_features:
max_features = features
and fixed by
if ml_type == 'object-detection':
max_features = 0
for tile in labels.files:
features = len(labels[tile])
if features > max_features:
max_features = features
seems worked after the fixing, if I understood the workflow correctly.
In some places we force the dtype to be uint8 (i.e., integer on interval [0-255]). See package.py for an example.
In some situations, we might want to package tifs or raw satellite imagery where the data either isn't made up of integers or doesn't fall on this [0, 255] range. For example, it might be pre-scaled from [0-1] or extend beyond the standard range of RGB vals.
Tried to create labels on
Russia:
{
"country": "russia",
"bounding_box": [36.72353152233891, 55.3729665317045, 36.74095515209966, 55.38046857198313],
"zoom": 12,
"classes": [
{ "name": "Buildings", "filter": ["has", "building"] }
],
"imagery": "http://a.tiles.mapbox.com/v4/mapbox.satellite/{z}/{x}/{y}.jpg?access_token=TOKEN",
"background_ratio": 1,
"ml_type": "classification"
}
and US:
{
"country": "united_states_of_america",
"bounding_box": [-99.79180928271484,32.42732216399054,-99.67628117602538,32.51812432193046],
"zoom": 12,
"classes": [
{ "name": "Buildings", "filter": ["has", "building"] }
],
"imagery": "http://a.tiles.mapbox.com/v4/mapbox.satellite/{z}/{x}/{y}.jpg?access_token=TOKEN",
"background_ratio": 1,
"ml_type": "classification"
}
when running label-maker download
process seems endless.
I started process for the whole night for russia, in the morning it was still active with no any results (geojson file has 0 bytes)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.