Giter Club home page Giter Club logo

x-ck-x / dataset-curation-tool Goto Github PK

View Code? Open in Web Editor NEW
28.0 3.0 7.0 14.01 MB

A tool for downloading from public image boards (which allow scraping) / preview your images & tags / edit your images & tags. Additional tabs for downloading other desired code repositories as well as S.O.T.A. diffusion and clips models for your purposes. Custom datasets can be added!

License: GNU General Public License v3.0

Python 98.41% Shell 1.06% Batchfile 0.53%
auto-tagger captioning-images captioning-videos data-curation dataset-manager downloader imageboard-grabber tagging

dataset-curation-tool's People

Contributors

aswillis avatar luvoid avatar x-ck-x avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dataset-curation-tool's Issues

Image Resize Augmentation (should not) be part of the copied file that is saved

If augmentation is selected, then image copies are provided to the user as possible augmented data with tags that could be used as valid data in the future.
Currently the resize augmentation that "must" happen prior the image being fed to the auto-tagging model, is interfering with the copy that is supposed to be saved with the "new" augmentation that is selected.

Invalid Tag Support

Valid tags are e621 tags that exist in the csv with a count over 1 on the site.

Currently there are two features to add invalid tags to the dataset as post-process steps to editing tags.
I.e. edit valid tags & then use the post-process functions on the custom data tab or download tab to insert invalid tags

The Planned Changes

  • Create a new supported Invalid Tag/s Category, CSV, Statistics, etc. : this allows the user to add invalid tags directly to the tag editor
  • Additional considerations to let the invalid tags be re-ordered however the user desires it to be (this may be it's own feature) : Custom gradio component to let the user restructure the final tags placement in the string easily (Custom Gradio Component for Tag Editor)

Weird Looping Behavior on Windows

Describe the bug

Tested on both linux & windows. Only getting this issue on windows. It loops when calling functions from instantiated objects from other local files with the respective classes.

Reproduction

Either downloading any data and/or trying to run inference with the captioning model creates this looping behavior with code in the webui.py script.

Tried upgrading it to gradio==3.32.0, but on windows10 the issue persists
the issue is OS related using gradio, but the loop behavior specifically happens when a function is called from a previously instantiated object in the webui.py.

  • so with the batch_downloader class, running the download button
  • as well as the the auto_tag class, running the interrogate button

Uploading Frozen State : Batch Upload Feature (`Custom Data Tab`)

I plan to change this to have the User simply use a checkbox that uses the same folder that is mandatory for the user to provide which is to the original data directory.
Use Case/s:

  • single image upload:
    • upload single image & provide path
  • batch upload:
    • provide path & check that it is (also) for the batch of images
  • batch upload (NON-Interact):
    • images/s uploaded from Tag Editor via a selection & use the tag editor path as default for this feature

Issues importing custom data into the program for tag editing etc.

Various Causes:

  • Auto-tag model not detected, preventing different functions of the Custom Data Tab` from functioning properly
  • User tries to import data with invalid tags
  • Another (just failing to copy files; no error) : i.e. no image/s nor tag files are copying over <-> I'll update this as I find out more information

I may have to split up the tab functionality to reduce a lot of the overhead; since there's a lot of moving parts regarding that Tab alone.
Possibly moving some feature/s currently on that Tab to another "new" Tab, and/or reformat the existing layout of that Tab.
#15

Auto-caption feature

An approximation function:

  • A feature using various heuristics to determine from each auto-tag/caption model available; what tags are best

Tag Editor Custom Gradio Component

A UI of color categorized tags all represented as buttons and from which can be dragged into different placements to the user's requirements.

This will initially show all the tags grouped w.r.t. the different categories and in the order set on the download tab. The user can then edit the tag/s and their ordering from there as well.

Edge Case Considerations:

  • If the user already edits the tags in this new section, their changes must persist as opposed to the default setting as to how the tag/s would otherwise be displayed initially

Support for New Auto-Tagging/Captioning Model/s Available

Several new model/s are available; The following must be done:

  • add them to the download tab
  • create a config for each model
  • create a generic handler for all the auto-tag/caption model/s
  • GPU support (optional)
  • protobuffer (.pb) to .onnx conversion to use the (onnx-runtime library)

Augmentation Checkboxes on `Custom Data Tab`

  • Include augmentation data checkbox :: Default is True
  • Augment data only checkbox :: Default is False

That’s to make it so user/s have the option of forcing augmentation over model production as well as the option to include the augmented data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.