ObjectDetectAll is a comprehensive toolkit for object detection across various media types, including images, GIFs, and videos. Utilizing state-of-the-art object detection models (YOLOS by default), this project allows users to detect objects and draw bounding boxes with labels across different media formats seamlessly.
- Support for multiple media types: Process any kind of images, GIFs, and videos with a single toolkit.
- Object Detection configuration: Configure labels, detection thresholds.
- Comprehensive FFmpeg settings (batch_size, audio or not, bitrate, duration, etc)
- Auto-download your detection model and convert it into ONNX and quantized ONNX for faster processing
- Handle local and remote files. When URLs are provided (CLI or from a text file) the medias are downloaded automatically.
This project requires:
- Python 3.6+
- FFmpeg for video processing. Ensure FFmpeg is installed and accessible in your system's PATH.
- Gifsicle for GIF optimization. If not installed, the script will output an unoptimized GIF instead (i.e the output size >> input size).
Clone the Repository
git clone https://github.com/AdamCodd/ObjectDetectAll.git
cd ObjectDetectAll
Ensure you have Python 3.6+ installed, then run:
pip install -r requirements.txt
pip install -r requirements-convert.txt
Basic usage examples for processing different media types:
Images (local)
python main.py --input path/to/image.jpg --output path/to/output/directory
Images (remote)
python main.py --input https://upload.wikimedia.org/wikipedia/commons/3/3f/JPEG_example_flower.jpg --output path/to/output/directory
GIFs (local or remote)
python main.py --input path/to/animation.gif --output path/to/output/directory
Videos (local or remote, without audio by default)
python main.py --input path/to/video.mp4 --output path/to/output/directory
Replace path/to/input and path/to/output/directory with your specific paths. Use the --help flag to see all available options:
python main.py --help
To specify object labels for detection (others labels will be ignored):
python main.py --input path/to/media --output path/to/output --labels person car
Adjust detection sensitivity using the --threshold option (default 0.9):
python main.py --input path/to/media --output path/to/output --threshold 0.5
NB: If the threshold is decreased (from 0.9), there will be an increase in false positives.
--input
: URL, path, text file, or folder. (Required)--output
: Output directory. (Required)--labels
: Specific object labels to draw (draws all detected objects if omitted).--filename
: Output filename prefix. (Default: 'out')--vcodec
: Video codec (defaults based on output format).--acodec
: Audio codec (defaults based on output format).--include_audio
: Include original audio (slows processing).--duration
: Video duration to process (seconds).--fps
: Video FPS (defaults to source FPS).--hwaccel
: Hardware acceleration method: cuda, dxva2, qsv, d3d11va, opencl, vulkan.--preset
: FFmpeg encoding preset. (Default: 'medium')--bitrate
: Output video bitrate ('auto' for FFmpeg default). (Default: 'auto')--batch-size
: Batch size for processing. (Default: 10)--threads
: FFmpeg thread count (0 for auto).--threshold
: Object detection sensitivity. (Default: 0.9)--model
: Hugging Face model repository. (Default: 'hustvl/yolos-tiny')--unquantized
: Use the unquantized ONNX model if present (more accurate but slower).
python script.py --input "./input_folder" --output "./output_folder" --labels "person" "car" --filename "processed" --vcodec "libx264" --preset "medium" --batch-size 5 --threshold 0.8
This example processes all supported media in ./input_folder, drawing boxes around detected "person" and "car" objects, using the libx264 codec for video processing, a medium preset for encoding quality, processing in batches of 5, and using a detection threshold of 0.8. The processed files will be saved in ./output_folder with filenames prefixed with "processed".
This project is licensed under the MIT License - see the LICENSE file for details.
If you'd like to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.
This project utilizes ONNX Runtime, the Transformers library from Hugging Face for object detection models and a slightly modified version of the convert.py script from Transformers.js.