Giter Club home page Giter Club logo

objectdetectall's Introduction

ObjectDetectAll

ObjectDetectAll is a comprehensive toolkit for object detection across various media types, including images, GIFs, and videos. Utilizing state-of-the-art object detection models (YOLOS by default), this project allows users to detect objects and draw bounding boxes with labels across different media formats seamlessly.

Cat detection

Features

  • Support for multiple media types: Process any kind of images, GIFs, and videos with a single toolkit.
  • Object Detection configuration: Configure labels, detection thresholds.
  • Comprehensive FFmpeg settings (batch_size, audio or not, bitrate, duration, etc)
  • Auto-download your detection model and convert it into ONNX and quantized ONNX for faster processing
  • Handle local and remote files. When URLs are provided (CLI or from a text file) the medias are downloaded automatically.

Requirements

This project requires:

  • Python 3.6+
  • FFmpeg for video processing. Ensure FFmpeg is installed and accessible in your system's PATH.
  • Gifsicle for GIF optimization. If not installed, the script will output an unoptimized GIF instead (i.e the output size >> input size).

Installation

Clone the Repository

git clone https://github.com/AdamCodd/ObjectDetectAll.git
cd ObjectDetectAll

Ensure you have Python 3.6+ installed, then run:

pip install -r requirements.txt
pip install -r requirements-convert.txt

Usage

Basic usage examples for processing different media types:

Images (local)

python main.py --input path/to/image.jpg --output path/to/output/directory

Images (remote)

python main.py --input https://upload.wikimedia.org/wikipedia/commons/3/3f/JPEG_example_flower.jpg --output path/to/output/directory

GIFs (local or remote)

python main.py --input path/to/animation.gif --output path/to/output/directory

Videos (local or remote, without audio by default)

python main.py --input path/to/video.mp4 --output path/to/output/directory

Replace path/to/input and path/to/output/directory with your specific paths. Use the --help flag to see all available options:

python main.py --help

Customizing Detection

To specify object labels for detection (others labels will be ignored):

python main.py --input path/to/media --output path/to/output --labels person car

Adjust detection sensitivity using the --threshold option (default 0.9):

python main.py --input path/to/media --output path/to/output --threshold 0.5

NB: If the threshold is decreased (from 0.9), there will be an increase in false positives.

Arguments

  • --input: URL, path, text file, or folder. (Required)
  • --output: Output directory. (Required)
  • --labels: Specific object labels to draw (draws all detected objects if omitted).
  • --filename: Output filename prefix. (Default: 'out')
  • --vcodec: Video codec (defaults based on output format).
  • --acodec: Audio codec (defaults based on output format).
  • --include_audio: Include original audio (slows processing).
  • --duration: Video duration to process (seconds).
  • --fps: Video FPS (defaults to source FPS).
  • --hwaccel: Hardware acceleration method: cuda, dxva2, qsv, d3d11va, opencl, vulkan.
  • --preset: FFmpeg encoding preset. (Default: 'medium')
  • --bitrate: Output video bitrate ('auto' for FFmpeg default). (Default: 'auto')
  • --batch-size: Batch size for processing. (Default: 10)
  • --threads: FFmpeg thread count (0 for auto).
  • --threshold: Object detection sensitivity. (Default: 0.9)
  • --model: Hugging Face model repository. (Default: 'hustvl/yolos-tiny')
  • --unquantized: Use the unquantized ONNX model if present (more accurate but slower).

Example

python script.py --input "./input_folder" --output "./output_folder" --labels "person" "car" --filename "processed" --vcodec "libx264" --preset "medium" --batch-size 5 --threshold 0.8

This example processes all supported media in ./input_folder, drawing boxes around detected "person" and "car" objects, using the libx264 codec for video processing, a medium preset for encoding quality, processing in batches of 5, and using a detection threshold of 0.8. The processed files will be saved in ./output_folder with filenames prefixed with "processed".

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

If you'd like to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.

Acknowledgments

This project utilizes ONNX Runtime, the Transformers library from Hugging Face for object detection models and a slightly modified version of the convert.py script from Transformers.js.

objectdetectall's People

Contributors

adamcodd avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.