Comments (11)
Some command line examples
- Train on GPU:
python main.py --training_file_pattern=/coco_tfrecord/train* --model_name=effcientdet-d0 --model_dir=/tmp/efficientnet/ --hparams="use_bfloat16=false" --use_tpu=False
- Eval on GPU:
// ssuming /tmp/efficientnet-d0/ contains your checkpoint.
python main.py --mode=eval --model_name=efficientdet-d0 --model_dir=/tmp/efficientdet-d0/ --validation_file_pattern=/coco_tfrecord/val* --val_json_file=/coco_tfrecord/instances_val2017.json --hparams="use_bfloat16=false" --use_tpu=False
- Inference a single image:
// pip install pytype pycocotools
python model_inspect.py --runmode=infer --model_name=efficientdet-d0 --ckpt_path=/tmp/efficientdet-d0/ --input_image=/tmp/img1.jpg --output_image_dir=/tmp/det1/
I will add a tutorial colab soon.
from automl.
If I just have Gpus, can I use the trained weights provided by this project to test my own pictures?
from automl.
@mingxingtan
the flag "--use_tpu=False" use cpu instead of tpu to train and it is very slow.
We need to change some code to train efficientdet on GPU?
from automl.
And Edge TPU (Coral). Also Will it we available on TF HUB?
from automl.
@liminghuiv it is a wrong comment and I have just fixed it. Estimator will automatically determine use GPU if you have; otherwise it uses CPU.
from automl.
@kaikaizhu , I guess you can according to https://cloud.google.com/tpu/docs/using-estimator-api:
Models written using TPUEstimator work across CPUs, GPUs, single TPU devices, and whole TPU pods, generally with no code changes.
and the TPUEstimator
is used in this repo:
https://github.com/google/automl/blob/master/efficientdet/main.py#L239
from automl.
I am also interested in training with GPU. any tutorial? thanks a lot.
from automl.
@mingxingtan Hi, I want to detect a image lists in form of 'txt', and I change the code of build_input, but it also error in post process. Because the batch size of inference is 1, when send all images to the model, it also deal as 1 batch, then then anchor numbers will biger then index...
So, could you please publish an inference code to "change ckpt to pb " and inference by pb model for multi-images?
from automl.
Hi @mingxingtan , thanks for taking a look at it. I checked the efficientdet/main.py code:
line 53:
flags.DEFINE_bool('use_tpu', True, 'Use TPUs rather than CPUs')
it seems that it will use CPU instead of GPU, if we set use_tpu FALSE
from automl.
Hi @mingxingtan
Thanks so much for sharing your fantastic work.
I got similar problem here that the TPU estimator does train on my GPU
I am using tensorflow 2.0.0 (a docker image from official tf docker hub)
Although the system has a V100 GPU yet still it only trains on CPU.
Could you give me some tips?
Thank you.
I0529 03:07:38.554332 139669136197440 main.py:383] {'name': 'efficientdet-d0', 'act_type': 'swish', 'image_size': (512, 512), 'input_rand_hflip': True, 'train_scale_min': 0.1, 'train_scale_max': 2.0, 'autoaugment_policy': None, 'use_augmix': False, 'augmix_params': (3, -1, 1), 'num_classes': 20, 'skip_crowd_during_training': True, 'label_id_mapping': None, 'min_level': 3, 'max_level': 7, 'num_scales': 3, 'aspect_ratios': [(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)], 'anchor_scale': 4.0, 'is_training_bn': True, 'momentum': 0.9, 'optimizer': 'sgd', 'learning_rate': 0.08, 'lr_warmup_init': 0.008, 'lr_warmup_epoch': 1.0, 'first_lr_drop_epoch': 200.0, 'second_lr_drop_epoch': 250.0, 'poly_lr_power': 0.9, 'clip_gradients_norm': 10.0, 'num_epochs': 18000, 'data_format': 'channels_last', 'alpha': 0.25, 'gamma': 1.5, 'delta': 0.1, 'box_loss_weight': 50.0, 'iou_loss_type': None, 'iou_loss_weight': 1.0, 'weight_decay': 4e-05, 'strategy': '', 'precision': None, 'box_class_repeats': 3, 'fpn_cell_repeats': 3, 'fpn_num_filters': 64, 'separable_conv': True, 'apply_bn_for_resampling': True, 'conv_after_downsample': False, 'conv_bn_act_pattern': False, 'use_native_resize_op': True, 'pooling_type': None, 'fpn_name': None, 'fpn_weight_method': None, 'fpn_config': None, 'survival_prob': None, 'lr_decay_method': 'cosine', 'moving_average_decay': 0.9998, 'ckpt_var_scope': None, 'var_exclude_expr': '.*/class-predict/.*', 'backbone_name': 'efficientnet-b0', 'backbone_config': None, 'var_freeze_expr': None, 'resnet_depth': 50, 'model_name': 'efficientdet-d0', 'iterations_per_loop': 100, 'model_dir': '../output/exp-001-baseline-d0', 'num_shards': 1, 'num_examples_per_epoch': 2000, 'backbone_ckpt': '/home/appuser/project/pretrained/efficientnet-b0', 'ckpt': None, 'val_json_file': None, 'testdev_dir': None, 'mode': 'train_and_eval', 'DATA_CONF': {'CATEGORIES_IN_RANGE': ['calculus_tooth', 'tooth-decay', 'tooth-whitespot', 'gum-gingivitis', 'stain-tooth-external', 'stain-tooth-internal'], 'EVAL_SCOPE': ['calculus_tooth', 'tooth-decay', 'tooth-whitespot', 'gum-gingivitis', 'stain-tooth-external', 'stain-tooth-internal'], 'METRIC_SCOPE': ['calculus_tooth', 'tooth-decay', 'tooth-whitespot', 'gum-gingivitis', 'stain-tooth-external'], 'EVAL': ['coco_stack_out/user-data-2020-Apr-R3_B10M3-34.json'], 'IMAGE_BASEDIR': '../data/', 'SUB_MASK_CATEGORY': ['calculus'], 'TRAIN': ['coco_stack_out/web_decay_600-26-full.json', 'coco_stack_out/gingivitis_web_490-31-full.json', 'coco_stack_out/calculus_web_230-28-full.json', 'coco_stack_out/mturk_mar_2020r-30-full.json', 'coco_stack_out/legacy_decay-25-full.json', 'coco_stack_out/mturk50_mar16_ro-37.json', 'coco_stack_out/tooth_crawl_web_A-36.json', 'coco_stack_out/spotty_stain_web_A-35.json']}}
I0529 03:07:38.554489 139669136197440 main.py:274] Starting training cycle, epoch: 0 / 18000.
INFO:tensorflow:Using config: {'_model_dir': '../output/exp-001-baseline-d0', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f0729556cf8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=100, num_shards=1, num_cores_per_replica=8, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=[[1, 4, 2, 1], {'mean_num_positives': None, 'source_ids': None, 'groundtruth_data': None, 'image_scales': None, 'box_targets_3': [1, 4, 2, 1], 'cls_targets_3': [1, 4, 2, 1], 'box_targets_4': [1, 4, 2, 1], 'cls_targets_4': [1, 4, 2, 1], 'box_targets_5': [1, 4, 2, 1], 'cls_targets_5': [1, 4, 2, 1], 'box_targets_6': [1, 4, 2, 1], 'cls_targets_6': [1, 4, 2, 1], 'box_targets_7': [1, 4, 2, 1], 'cls_targets_7': [1, 4, 2, 1]}], eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
I0529 03:07:38.554966 139669136197440 estimator.py:212] Using config: {'_model_dir': '../output/exp-001-baseline-d0', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f0729556cf8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=100, num_shards=1, num_cores_per_replica=8, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=[[1, 4, 2, 1], {'mean_num_positives': None, 'source_ids': None, 'groundtruth_data': None, 'image_scales': None, 'box_targets_3': [1, 4, 2, 1], 'cls_targets_3': [1, 4, 2, 1], 'box_targets_4': [1, 4, 2, 1], 'cls_targets_4': [1, 4, 2, 1], 'box_targets_5': [1, 4, 2, 1], 'cls_targets_5': [1, 4, 2, 1], 'box_targets_6': [1, 4, 2, 1], 'cls_targets_6': [1, 4, 2, 1], 'box_targets_7': [1, 4, 2, 1], 'cls_targets_7': [1, 4, 2, 1]}], eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
I0529 03:07:38.555698 139669136197440 tpu_context.py:221] _TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
W0529 03:07:38.556049 139669136197440 tpu_context.py:223] eval_on_tpu ignored because use_tpu is False.
WARNING:tensorflow:From /home/appuser/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
W0529 03:07:38.561795 139669136197440 deprecation.py:506] From /home/appuser/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /home/appuser/.local/lib/python3.6/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0529 03:07:38.562173 139669136197440 deprecation.py:323] From /home/appuser/.local/lib/python3.6/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
2020-05-29 03:07:38.570108: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-05-29 03:07:38.573494: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-29 03:07:38.574404: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:00:1e.0
2020-05-29 03:07:38.574616: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-05-29 03:07:38.575953: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-05-29 03:07:38.577151: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-05-29 03:07:38.577451: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-05-29 03:07:38.579049: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-05-29 03:07:38.580268: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-05-29 03:07:38.584087: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-29 03:07:38.584177: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-29 03:07:38.585136: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-29 03:07:38.586012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
WARNING:tensorflow:From /home/appuser/project/efficientdet/dataloader.py:344: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W0529 03:07:38.623125 139669136197440 deprecation.py:323] From /home/appuser/project/efficientdet/dataloader.py:344: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
WARNING:tensorflow:Entity <function InputReader.__call__.<locals>._dataset_parser at 0x7f073ebbabf8> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: LIVE_VARS_IN
W0529 03:07:39.093873 139669136197440 ag_logging.py:146] Entity <function InputReader.__call__.<locals>._dataset_parser at 0x7f073ebbabf8> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: LIVE_VARS_IN
INFO:tensorflow:Calling model_fn.
I0529 03:07:39.843184 139669136197440 estimator.py:1147] Calling model_fn.
INFO:tensorflow:Running train on CPU
from automl.
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Find your answer through this link:
https://stackoverflow.com/questions/44232898/memoryerror-in-tensorflow-and-successful-numa-node-read-from-sysfs-had-negativ
from automl.
Related Issues (20)
- UserWarning: __floordiv__ is deprecated && assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
- More inplace ops for pytorch lion's impl
- ERROR : 'ImageFont' object has no attribute 'getbbox' HOT 2
- Potentially wrong type inference
- How to apply quantization aware training on EfficientDet keras model?
- How to train ViT image classification model on our dataset using LION optimizer
- how to train model by lion optimizer with fp16? HOT 1
- how to fix (terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) Cell In[48], line 3 1 #!rm summary.h5 2 #!rm statepoint.*.h5 ----> 3 sp_filename = model.run() 5 sp = openmc.StatePoint(sp_filename)?
- Error during prediction within coreML framework of the converted Efficientdet-lite0 model
- why the text label is not showing on the bounding box HOT 1
- Question about Lion HOT 1
- TypeError: The `filenames` argument must contain `tf.string` elements. Got `tf.float32` elements error HOT 1
- buffer_size must be greater than zero error when use custom dataset HOT 1
- p.add_(..., inplace=True) error
- efficientnetv2-bn parameters for progressive learning
- How to add class weights?
- Error reading original efficientdet-d3_frozen.pb on openCV`s readNetFromTensorflow HOT 2
- EfficienDet output format question
- Recommended way for EfficientDet-Lite Quantization
- Training on custom dataset of EfficientDet-0 model crash : TypeError: 'NoneType' object is not callable
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from automl.