Giter Club home page Giter Club logo

Comments (11)

mingxingtan avatar mingxingtan commented on May 17, 2024 45

Some command line examples

  1. Train on GPU:

python main.py --training_file_pattern=/coco_tfrecord/train* --model_name=effcientdet-d0 --model_dir=/tmp/efficientnet/ --hparams="use_bfloat16=false" --use_tpu=False

  1. Eval on GPU:

// ssuming /tmp/efficientnet-d0/ contains your checkpoint.
python main.py --mode=eval --model_name=efficientdet-d0 --model_dir=/tmp/efficientdet-d0/ --validation_file_pattern=/coco_tfrecord/val* --val_json_file=/coco_tfrecord/instances_val2017.json --hparams="use_bfloat16=false" --use_tpu=False

  1. Inference a single image:

// pip install pytype pycocotools
python model_inspect.py --runmode=infer --model_name=efficientdet-d0 --ckpt_path=/tmp/efficientdet-d0/ --input_image=/tmp/img1.jpg --output_image_dir=/tmp/det1/

I will add a tutorial colab soon.

from automl.

kaikaizhu avatar kaikaizhu commented on May 17, 2024 10

If I just have Gpus, can I use the trained weights provided by this project to test my own pictures?

from automl.

airqj avatar airqj commented on May 17, 2024 9

@mingxingtan
the flag "--use_tpu=False" use cpu instead of tpu to train and it is very slow.
We need to change some code to train efficientdet on GPU?

from automl.

bhack avatar bhack commented on May 17, 2024 2

And Edge TPU (Coral). Also Will it we available on TF HUB?

from automl.

mingxingtan avatar mingxingtan commented on May 17, 2024 1

@liminghuiv it is a wrong comment and I have just fixed it. Estimator will automatically determine use GPU if you have; otherwise it uses CPU.

from automl.

hoangphucITJP avatar hoangphucITJP commented on May 17, 2024

@kaikaizhu , I guess you can according to https://cloud.google.com/tpu/docs/using-estimator-api:

Models written using TPUEstimator work across CPUs, GPUs, single TPU devices, and whole TPU pods, generally with no code changes.

and the TPUEstimator is used in this repo:
https://github.com/google/automl/blob/master/efficientdet/main.py#L239

from automl.

liminghuiv avatar liminghuiv commented on May 17, 2024

I am also interested in training with GPU. any tutorial? thanks a lot.

from automl.

Jilliansea avatar Jilliansea commented on May 17, 2024

@mingxingtan Hi, I want to detect a image lists in form of 'txt', and I change the code of build_input, but it also error in post process. Because the batch size of inference is 1, when send all images to the model, it also deal as 1 batch, then then anchor numbers will biger then index...
So, could you please publish an inference code to "change ckpt to pb " and inference by pb model for multi-images?

from automl.

liminghuiv avatar liminghuiv commented on May 17, 2024

Hi @mingxingtan , thanks for taking a look at it. I checked the efficientdet/main.py code:
line 53:
flags.DEFINE_bool('use_tpu', True, 'Use TPUs rather than CPUs')
it seems that it will use CPU instead of GPU, if we set use_tpu FALSE

from automl.

ruodingt avatar ruodingt commented on May 17, 2024

Hi @mingxingtan
Thanks so much for sharing your fantastic work.

I got similar problem here that the TPU estimator does train on my GPU
I am using tensorflow 2.0.0 (a docker image from official tf docker hub)

Although the system has a V100 GPU yet still it only trains on CPU.
Could you give me some tips?

Thank you.


I0529 03:07:38.554332 139669136197440 main.py:383] {'name': 'efficientdet-d0', 'act_type': 'swish', 'image_size': (512, 512), 'input_rand_hflip': True, 'train_scale_min': 0.1, 'train_scale_max': 2.0, 'autoaugment_policy': None, 'use_augmix': False, 'augmix_params': (3, -1, 1), 'num_classes': 20, 'skip_crowd_during_training': True, 'label_id_mapping': None, 'min_level': 3, 'max_level': 7, 'num_scales': 3, 'aspect_ratios': [(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)], 'anchor_scale': 4.0, 'is_training_bn': True, 'momentum': 0.9, 'optimizer': 'sgd', 'learning_rate': 0.08, 'lr_warmup_init': 0.008, 'lr_warmup_epoch': 1.0, 'first_lr_drop_epoch': 200.0, 'second_lr_drop_epoch': 250.0, 'poly_lr_power': 0.9, 'clip_gradients_norm': 10.0, 'num_epochs': 18000, 'data_format': 'channels_last', 'alpha': 0.25, 'gamma': 1.5, 'delta': 0.1, 'box_loss_weight': 50.0, 'iou_loss_type': None, 'iou_loss_weight': 1.0, 'weight_decay': 4e-05, 'strategy': '', 'precision': None, 'box_class_repeats': 3, 'fpn_cell_repeats': 3, 'fpn_num_filters': 64, 'separable_conv': True, 'apply_bn_for_resampling': True, 'conv_after_downsample': False, 'conv_bn_act_pattern': False, 'use_native_resize_op': True, 'pooling_type': None, 'fpn_name': None, 'fpn_weight_method': None, 'fpn_config': None, 'survival_prob': None, 'lr_decay_method': 'cosine', 'moving_average_decay': 0.9998, 'ckpt_var_scope': None, 'var_exclude_expr': '.*/class-predict/.*', 'backbone_name': 'efficientnet-b0', 'backbone_config': None, 'var_freeze_expr': None, 'resnet_depth': 50, 'model_name': 'efficientdet-d0', 'iterations_per_loop': 100, 'model_dir': '../output/exp-001-baseline-d0', 'num_shards': 1, 'num_examples_per_epoch': 2000, 'backbone_ckpt': '/home/appuser/project/pretrained/efficientnet-b0', 'ckpt': None, 'val_json_file': None, 'testdev_dir': None, 'mode': 'train_and_eval', 'DATA_CONF': {'CATEGORIES_IN_RANGE': ['calculus_tooth', 'tooth-decay', 'tooth-whitespot', 'gum-gingivitis', 'stain-tooth-external', 'stain-tooth-internal'], 'EVAL_SCOPE': ['calculus_tooth', 'tooth-decay', 'tooth-whitespot', 'gum-gingivitis', 'stain-tooth-external', 'stain-tooth-internal'], 'METRIC_SCOPE': ['calculus_tooth', 'tooth-decay', 'tooth-whitespot', 'gum-gingivitis', 'stain-tooth-external'], 'EVAL': ['coco_stack_out/user-data-2020-Apr-R3_B10M3-34.json'], 'IMAGE_BASEDIR': '../data/', 'SUB_MASK_CATEGORY': ['calculus'], 'TRAIN': ['coco_stack_out/web_decay_600-26-full.json', 'coco_stack_out/gingivitis_web_490-31-full.json', 'coco_stack_out/calculus_web_230-28-full.json', 'coco_stack_out/mturk_mar_2020r-30-full.json', 'coco_stack_out/legacy_decay-25-full.json', 'coco_stack_out/mturk50_mar16_ro-37.json', 'coco_stack_out/tooth_crawl_web_A-36.json', 'coco_stack_out/spotty_stain_web_A-35.json']}}
I0529 03:07:38.554489 139669136197440 main.py:274] Starting training cycle, epoch: 0 / 18000.
INFO:tensorflow:Using config: {'_model_dir': '../output/exp-001-baseline-d0', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f0729556cf8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=100, num_shards=1, num_cores_per_replica=8, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=[[1, 4, 2, 1], {'mean_num_positives': None, 'source_ids': None, 'groundtruth_data': None, 'image_scales': None, 'box_targets_3': [1, 4, 2, 1], 'cls_targets_3': [1, 4, 2, 1], 'box_targets_4': [1, 4, 2, 1], 'cls_targets_4': [1, 4, 2, 1], 'box_targets_5': [1, 4, 2, 1], 'cls_targets_5': [1, 4, 2, 1], 'box_targets_6': [1, 4, 2, 1], 'cls_targets_6': [1, 4, 2, 1], 'box_targets_7': [1, 4, 2, 1], 'cls_targets_7': [1, 4, 2, 1]}], eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
I0529 03:07:38.554966 139669136197440 estimator.py:212] Using config: {'_model_dir': '../output/exp-001-baseline-d0', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f0729556cf8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=100, num_shards=1, num_cores_per_replica=8, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=[[1, 4, 2, 1], {'mean_num_positives': None, 'source_ids': None, 'groundtruth_data': None, 'image_scales': None, 'box_targets_3': [1, 4, 2, 1], 'cls_targets_3': [1, 4, 2, 1], 'box_targets_4': [1, 4, 2, 1], 'cls_targets_4': [1, 4, 2, 1], 'box_targets_5': [1, 4, 2, 1], 'cls_targets_5': [1, 4, 2, 1], 'box_targets_6': [1, 4, 2, 1], 'cls_targets_6': [1, 4, 2, 1], 'box_targets_7': [1, 4, 2, 1], 'cls_targets_7': [1, 4, 2, 1]}], eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
I0529 03:07:38.555698 139669136197440 tpu_context.py:221] _TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
W0529 03:07:38.556049 139669136197440 tpu_context.py:223] eval_on_tpu ignored because use_tpu is False.
WARNING:tensorflow:From /home/appuser/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
W0529 03:07:38.561795 139669136197440 deprecation.py:506] From /home/appuser/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /home/appuser/.local/lib/python3.6/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0529 03:07:38.562173 139669136197440 deprecation.py:323] From /home/appuser/.local/lib/python3.6/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
2020-05-29 03:07:38.570108: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-05-29 03:07:38.573494: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-29 03:07:38.574404: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:00:1e.0
2020-05-29 03:07:38.574616: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-05-29 03:07:38.575953: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-05-29 03:07:38.577151: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-05-29 03:07:38.577451: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-05-29 03:07:38.579049: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-05-29 03:07:38.580268: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-05-29 03:07:38.584087: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-29 03:07:38.584177: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-29 03:07:38.585136: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-29 03:07:38.586012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
WARNING:tensorflow:From /home/appuser/project/efficientdet/dataloader.py:344: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W0529 03:07:38.623125 139669136197440 deprecation.py:323] From /home/appuser/project/efficientdet/dataloader.py:344: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
WARNING:tensorflow:Entity <function InputReader.__call__.<locals>._dataset_parser at 0x7f073ebbabf8> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: LIVE_VARS_IN
W0529 03:07:39.093873 139669136197440 ag_logging.py:146] Entity <function InputReader.__call__.<locals>._dataset_parser at 0x7f073ebbabf8> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: LIVE_VARS_IN
INFO:tensorflow:Calling model_fn.
I0529 03:07:39.843184 139669136197440 estimator.py:1147] Calling model_fn.
INFO:tensorflow:Running train on CPU

from automl.

linkrain-a avatar linkrain-a commented on May 17, 2024

successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

Find your answer through this link:
https://stackoverflow.com/questions/44232898/memoryerror-in-tensorflow-and-successful-numa-node-read-from-sysfs-had-negativ

from automl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.