Report for Computer Vision Task: Detecting the largest and the smallest objects in Images / Videos using YOLO.
This report outlines my approaches to fine-tune the YOLO-V7 model for detecting the largest and smallest objects in images and videos.
• There is no clear distinction between smallest and largest bounding boxes. Sizes alone cannot determine this distinction. • Some objects are exceptionally small (as small as 0.8 pixels), making accurate labeling difficult. They were removed from the data.
• Fine-tuning, then post-processing: I considered two variants: – (1) Fine-tune to detect largest and smallest boxes direct from the image, and then apply filtering to guarantee a single bounding box prediction per class. – (2) Fine-tune the model to detect objects in the image without distinguishing between specific objects, and then select two bounding boxes representing the smallest and largest sizes. – Model (2) yield better results in most cases.
Install the environment
conda env create -f environment.yml
conda activate amagcvTask1
Install and setup wandb as instructed here
Download the dataset and model and unzip to the desired folder
bash download.sh
Model | Precision | Recall | mAP@50 | Link |
---|---|---|---|---|
Two-class, 3 anchors | 0.500 | 0.460 | 0.360 | Link |
Two-class, 5 anchors | 0.546 | 0.483 | 0.3910 | Link |
Single-class, 3 anchors | 0.612 | 0.474 | 0.435 | Link |
Single-class, 5 anchors | 0.665 | 0.486 | 0.457 | Link |
Two-class: Fine-tune the model to identify the largest and smallest object boxes within the image, then select the boxes that have the smallest and largest size. Single-class: Fine-tune the model to detect all objects in the image, then select two boxes that have smallest and largest size.
python utils/prepare_dataset.py --output_dir dataset/processed/ --min_size 8
The data used to trained and evaluated the models were uploaded here. Please download it and unzip to the dataset
folder.
Two-class model
python train.py --workers 8 --batch-size 32 --data data/single_class_coco.yaml --img 640 640 --cfg cfg/training/yolov7_5anchors.yaml --weights downloaded_files/yolov7.pt --name singleClass5anchor --hyp data/hyp.scratch.p5.yaml --device 0
Single-class model
python train.py --workers 8 --batch-size 32 --data data/single_class_coco.yaml --img 640 640 --cfg cfg/training/yolov7_5anchors.yaml --weights downloaded_files/yolov7.pt --name singleClass5anchor --hyp data/hyp.scratch.p5.yaml --device 0
python test.py --batch-size 32 --data data/single_class_coco.yaml --weights model.pt --device 0
Where model.pt
is the path to your model
python detect.py --weight model.pt --device 0 --img-size 640 --source inference/images/