Giter Club home page Giter Club logo

jetson-train's Introduction

Jetson Train

This repository contains step by step guide to build and train your own model for Jetson Nano or Xavier or any other model.

You need Ubuntu 18 or higher to follow this guide. You will also find the output model files in the repo for the model I trained for apples and banana.

Alt text

Alt text

Installation:

Make sure you have installed below python packages before you start setting up your machine for training ::

$ pip3 install opencv-python
$ pip3 install imutils
$ pip3 install matplotlib
$ pip3 install torchvision
$ pip3 install torch
$ pip3 install boto3
$ pip3 install pandas
$ pip3 install urllib3

Watch video below:

Step 1:

Clone the repository in your machine. Download and save your test video file in videos directory. Use prepare_dataset script to extract images from your testvideo file. You can adjust the save image counter in prepare_dataset script in order to save more images. Once run, this script will create three directories inside data directory.

$ JPEGImages: This directory will have all the images extracted from test video file.
$ ImageSets: This directory will contain few train and test file.
$ Annotations: Save all your annotation xml files in this directory.

You can use any tool to annotate your images. Make sure you are using Pascal VOC format. Save all you annotation xml files in Annotations directory. Once done, create labels.txt file inside your model directory. This file will contain all the labels name from your dataset. For ex,

$ object_name1
$ object_name2
$ object_name3

Step 2:

In order to start training, use below command:

$ python3 train_ssd.py --dataset-type=voc --data=data/{modelname}/ --model-dir=models/{modelname} --batch-size=2 --workers=5 --epochs=500

For ex, if your model name is model0110, then command will be:

$ python3 train_ssd.py --dataset-type=voc --data=data/model0110/ --model-dir=models/model0110 --batch-size=2 --workers=5 --epochs=500

This will start the training. You can adjust the number of epochs/workers as per your requirements.

Step 3:

Once your training completes or your loss is very low, you can use results.py script to analyze your result. Running the script, will generate a graph of the training and will also output the best checkpoint.

Alt text

Step 4:

Make sure you have jetson-inference project installed on your Jetson device. Once you are satisfied with the training results, you can copy the checkpoint file and the labels.txt from your machine to Jetson Nano or Xavier. Place them inside :

$ /home/username/jetson-inference/python/training/detection/ssd/models

Lets first convert checkpoint to onnx format by running below command from ssd directory:

$ python3 onnx_export.py --model-dir=models/model0110

This will generate onnx file. From here we can use below command to generate engine file:

$ detectnet --model=models/model0110/ssd-mobilenet.onnx --labels=models/model0110/labels.txt --input-blob=input_0 --output-cvg=scores --output-bbox=boxes /home/rocket/testvideo.mp4

where model0110 is the name of your model. Make sure you replace /home/rocket/testvideo.mp4 with path of your test video file or webcam/RTSP camera. This command can take upto 10-12mins to complete.

If you want to use the model file which I have trained for apples & banana, you can download it from mymodels directory.

jetson-train's People

Contributors

mailrocketsystems avatar

Stargazers

 avatar  avatar ashr avatar Teoh Jian Ye avatar FX. Bima Yudha Pratama avatar Pavan Kudake avatar  avatar Himam Priyono avatar Gabriel Cicotoste avatar  avatar Mateo Rodriguez avatar Denny avatar William Rogers avatar  avatar Felix Yustian Setiono avatar Hieu C B avatar Yuan Xiangning avatar  avatar NUBDev avatar TonyLiu avatar Misbah UL Hassan avatar Tito Osadebey avatar

Watchers

 avatar  avatar

jetson-train's Issues

i m getting error when try to run "python3 train_ssd.py --dataset-type=voc --data=data/weldr/ --model-dir=models/model0110 --batch-size=2 --workers=5 --epochs=500"

Traceback (most recent call last):
File "train_ssd.py", line 13, in
from torch.utils.tensorboard import SummaryWriter
File "/home/kml/rohin/lib/python3.8/site-packages/torch/utils/tensorboard/init.py", line 1, in
import tensorboard
File "/home/kml/rohin/lib/python3.8/site-packages/tensorboard/init.py", line 4, in
from .writer import FileWriter, SummaryWriter
File "/home/kml/rohin/lib/python3.8/site-packages/tensorboard/writer.py", line 28, in
from .summary import scalar, histogram, image, audio, text
File "/home/kml/rohin/lib/python3.8/site-packages/tensorboard/summary/init.py", line 22, in
from tensorboard.summary import v1 # noqa: F401
File "/home/kml/rohin/lib/python3.8/site-packages/tensorboard/summary/v1.py", line 21, in
from tensorboard.plugins.audio import summary as _audio_summary
File "/home/kml/rohin/lib/python3.8/site-packages/tensorboard/plugins/audio/summary.py", line 34, in
from tensorboard.plugins.audio import metadata
File "/home/kml/rohin/lib/python3.8/site-packages/tensorboard/plugins/audio/metadata.py", line 18, in
from tensorboard.compat.proto import summary_pb2
File "/home/kml/rohin/lib/python3.8/site-packages/tensorboard/compat/proto/summary_pb2.py", line 17, in
from tensorboard.compat.proto import histogram_pb2 as tensorboard_dot_compat_dot_proto_dot_histogram__pb2
File "/home/kml/rohin/lib/python3.8/site-packages/tensorboard/compat/proto/histogram_pb2.py", line 18, in
DESCRIPTOR = _descriptor.FileDescriptor(
File "/home/kml/rohin/lib/python3.8/site-packages/google/protobuf/descriptor.py", line 1024, in new
return _message.default_pool.AddSerializedFile(serialized_pb)
TypeError: Couldn't build proto file into descriptor pool!
Invalid proto descriptor for file "tensorboard/compat/proto/histogram.proto":
tensorboard.HistogramProto.min: "tensorboard.HistogramProto.min" is already defined in file "tensorboard/src/summary.proto".
tensorboard.HistogramProto.max: "tensorboard.HistogramProto.max" is already defined in file "tensorboard/src/summary.proto".
tensorboard.HistogramProto.num: "tensorboard.HistogramProto.num" is already defined in file "tensorboard/src/summary.proto".
tensorboard.HistogramProto.sum: "tensorboard.HistogramProto.sum" is already defined in file "tensorboard/src/summary.proto".
tensorboard.HistogramProto.sum_squares: "tensorboard.HistogramProto.sum_squares" is already defined in file "tensorboard/src/summary.proto".
tensorboard.HistogramProto.bucket_limit: "tensorboard.HistogramProto.bucket_limit" is already defined in file "tensorboard/src/summary.proto".
tensorboard.HistogramProto.bucket: "tensorboard.HistogramProto.bucket" is already defined in file "tensorboard/src/summary.proto".
tensorboard.HistogramProto: "tensorboard.HistogramProto" is already defined in file "tensorboard/src/summary.proto".

Unable to train the data set

LOG:
(Train) D:\Gujarat Fertilizers\Traing\jetson-train-main\jetson-train-main>python train_ssd.py --dataset-type=voc --data=data/model0110/ --model-dir=models/model0110 --batch-size=2 --workers=5 --epochs=500
2023-10-18 12:49:22 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=2, checkpoint_folder='models/model0110', dataset_type='voc', datasets=['data/model0110/'], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, log_level='info', lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb1-ssd', num_epochs=500, num_workers=5, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resolution=300, resume=None, scheduler='cosine', t_max=100, use_cuda=True, validation_epochs=1, validation_mean_ap=False, weight_decay=0.0005)
2023-10-18 12:49:22 - model resolution 300x300
2023-10-18 12:49:22 - SSDSpec(feature_map_size=19, shrinkage=16, box_sizes=SSDBoxSizes(min=60, max=105), aspect_ratios=[2, 3])
2023-10-18 12:49:22 - SSDSpec(feature_map_size=10, shrinkage=32, box_sizes=SSDBoxSizes(min=105, max=150), aspect_ratios=[2, 3])
2023-10-18 12:49:22 - SSDSpec(feature_map_size=5, shrinkage=64, box_sizes=SSDBoxSizes(min=150, max=195), aspect_ratios=[2, 3])
2023-10-18 12:49:22 - SSDSpec(feature_map_size=3, shrinkage=100, box_sizes=SSDBoxSizes(min=195, max=240), aspect_ratios=[2, 3])
2023-10-18 12:49:22 - SSDSpec(feature_map_size=2, shrinkage=150, box_sizes=SSDBoxSizes(min=240, max=285), aspect_ratios=[2, 3])
2023-10-18 12:49:22 - SSDSpec(feature_map_size=1, shrinkage=300, box_sizes=SSDBoxSizes(min=285, max=330), aspect_ratios=[2, 3])
2023-10-18 12:49:22 - Prepare training datasets.
2023-10-18 12:49:23 - VOC Labels read from file: ('BACKGROUND', 'banna')
2023-10-18 12:49:23 - Stored labels into file models/model0110\labels.txt.
2023-10-18 12:49:23 - Train dataset size: 111
2023-10-18 12:49:23 - Prepare Validation datasets.
2023-10-18 12:49:23 - VOC Labels read from file: ('BACKGROUND', 'banna')
2023-10-18 12:49:23 - Validation dataset size: 12
2023-10-18 12:49:23 - Build network.
2023-10-18 12:49:23 - Init from pretrained ssd models/mobilenet-v1-ssd-mp-0_675.pth
2023-10-18 12:49:23 - Took 0.07 seconds to load the model.
2023-10-18 12:49:23 - Learning rate: 0.01, Base net learning rate: 0.001, Extra Layers learning rate: 0.01.
2023-10-18 12:49:23 - Uses CosineAnnealingLR scheduler.
2023-10-18 12:49:23 - Start training from epoch 0.
Traceback (most recent call last):
File "train_ssd.py", line 396, in
train(train_loader, net, criterion, optimizer, device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
File "train_ssd.py", line 133, in train
for i, data in enumerate(loader):
File "C:\Users\kvar technologies\anaconda3\envs\Train\lib\site-packages\torch\utils\data\dataloader.py", line 359, in iter
return self._get_iterator()
File "C:\Users\kvar technologies\anaconda3\envs\Train\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\kvar technologies\anaconda3\envs\Train\lib\site-packages\torch\utils\data\dataloader.py", line 918, in init
w.start()
File "C:\Users\kvar technologies\anaconda3\envs\Train\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users\kvar technologies\anaconda3\envs\Train\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\kvar technologies\anaconda3\envs\Train\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\kvar technologies\anaconda3\envs\Train\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "C:\Users\kvar technologies\anaconda3\envs\Train\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'TrainAugmentation.init..'

(Train) D:\Gujarat Fertilizers\Traing\jetson-train-main\jetson-train-main>Traceback (most recent call last):
File "", line 1, in
File "C:\Users\kvar technologies\anaconda3\envs\Train\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\kvar technologies\anaconda3\envs\Train\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.