Giter Club home page Giter Club logo

described's Introduction

Dscribed

Described is a simple DAG workflow model for creating more detailed image captions built on LAVIS/blip2. This is a useful tool I created to generate improved captions for my stable diffusion fine-tuning efforts.

Installation

NOTE: The blip2 models are VERY large, ensure you have at least 60gb of free disk space on your root drive. Huggingface will by default store the models in ~/.cache/huggingface.

source ./venv/bin/activate
pip install -r requirements.txt

Usage

usage: described [-h] [--workflow WORKFLOW] [--model_name MODEL_NAME] [--model_type MODEL_TYPE] --path PATH [--prefix PREFIX] [--suffix SUFFIX]

options:
  -h, --help            show this help message and exit
  --workflow WORKFLOW   The workflow file to use
  --model_name MODEL_NAME
                        One of: blip2_opt, blip2_t5, blip2
  --model_type MODEL_TYPE
                        A compatible model type. One of: blip2_opt(pretrain_opt2.7b, caption_coco_opt2.7b, pretrain_opt6.7b, caption_coco_opt6.7b), blip2_t5(pretrain_flant5xl,
                        caption_coco_flant5xl, pretrain_flant5xxl), blip2(pretrain, coco)
  --path PATH           Path to images to be captioned
  --prefix PREFIX       a string applied at the beginning of each caption
  --suffix SUFFIX       a string applied at the end of each caption

To use the standard workflow and model (blip2_t5/pretrain_flant5xl) simply provide the image path:

python described.py --path /path/to/my/images

Captions are saved in the same path and with the same name as the source image with a .txt extension.
If a caption already exists for an image, it will be skipped.

Workflows

The core idea behind described are workflows. Workflows are defined in json and describe a line of questioning that will eventually result in a caption. Unfortunately, we are still limited by the capabilities of available models, however you should expect captions that are generally superior to blip/blip2 single-question captions.

See The Standard Workflow for an example of usage.

Contributing

These are early days and I would be very happy to have you help expanding the capabilities of described! Most importantly, we need more comprehensive workflows and these can be built by anyone, regardless of technical skills, with a bit of patience. To contribute fork this repository and send me a pull request with your changes.

Caveats

  • Hand-crafted captions will remain far superior to those generated automatically, including this tool. Where described shines is generating captions for large fine-tuning data sets. Thousands of images or more.
  • This tool is ulimately limited by the capabilities of the model and what the model understands. Sometimes you must accept the model is not capable of discerning many kinds of details.
  • Expect several hours to generate 10k captions on a fast GPU, like a 4090.

described's People

Contributors

amorporkian avatar tjennings avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.