fpgaminer / joytag Goto Github PK
View Code? Open in Web Editor NEWThe JoyTag Image Tagging Model
License: Apache License 2.0
The JoyTag Image Tagging Model
License: Apache License 2.0
First of all, nice job!
I noticed in the validation arena you're using my suggested thresholds for my models, and a "default" one for yours.
That's doing your work a disservice.
I think a fairer way to compare the models would be to try and find some fixed performance point, and see how the other metrics fare.
For my models, for example, I used to choose (by bisection) a threshold where micro averaged recall and precision matched: if both were higher than the last model then I had a better model.
You could do the same, or bisect towards a threshold that gives a desired precision and evaluate recall for example.
This also has the side effect of being more fair to augmentations like mixup, that skew predictions confidence towards lower values.
If I may go on a slight tangent about the discrepancy between my stated scores and the ones in the Arena: I used to use micro averaging, while you're calculating macro averages. Definitely keep using macro averaging for the metrics, I started using it too in my newer codebase over at https://github.com/SmilingWolf/JAX-CV (posting the repo in case you consider using it if you decide to apply to TRC).
I see you have commission
and skeb_commission
in your tag list. It feels odd to me how you can know if a drawing is a commission by just looking at it.
Please add python version + requirements.txt or the spec file for whatever virtual environment management tool you are using.
This will help ensure the code still runs half a year from now, when maybe you're not committing as often etc.
(I don't see it listed anywhere, feel free to correct me if I'm wrong.)
Also, btw, example from readme crashes at the moment with
RuntimeError: Input type (float) and bias type (c10::Half) should be the same
which can be fixed by adding
def predict(image: Image.Image):
image_tensor = prepare_image(image, model.image_size)
batch = {
'image': image_tensor.unsqueeze(0),
}
batch['image'] = batch['image'].to('cuda') # THIS here
Inspired by
joytag/validation-arena/validate.py
Lines 279 to 286 in 67f1261
name: joytag
channels:
- conda-forge
dependencies:
- _libgcc_mutex=0.1
- _openmp_mutex=4.5
- aiohttp=3.9.1
- aiosignal=1.3.1
- attrs=23.1.0
- aws-c-auth=0.7.8
- aws-c-cal=0.6.9
- aws-c-common=0.9.10
- aws-c-compression=0.2.17
- aws-c-event-stream=0.3.2
- aws-c-http=0.7.15
- aws-c-io=0.13.36
- aws-c-mqtt=0.10.0
- aws-c-s3=0.4.6
- aws-c-sdkutils=0.1.13
- aws-checksums=0.1.17
- aws-crt-cpp=0.25.0
- aws-sdk-cpp=1.11.210
- brotli-python=1.1.0
- bzip2=1.0.8
- c-ares=1.24.0
- ca-certificates=2023.11.17
- certifi=2023.11.17
- charset-normalizer=3.3.2
- colorama=0.4.6
- cuda-cudart=12.2.140
- cuda-cudart_linux-64=12.2.140
- cuda-nvrtc=12.2.140
- cuda-nvtx=12.2.140
- cuda-version=12.2
- cudnn=8.8.0.121
- datasets=2.14.4
- dill=0.3.7
- einops=0.7.0
- filelock=3.13.1
- freetype=2.12.1
- frozenlist=1.4.1
- fsspec=2023.12.2
- gflags=2.2.2
- glog=0.6.0
- gmp=6.3.0
- gmpy2=2.1.2
- huggingface_hub=0.20.0
- icu=73.2
- idna=3.6
- importlib-metadata=7.0.1
- jinja2=3.1.2
- keyutils=1.6.1
- krb5=1.21.2
- lcms2=2.16
- ld_impl_linux-64=2.40
- lerc=4.0.0
- libabseil=20230802.1
- libarrow=14.0.2
- libarrow-acero=14.0.2
- libarrow-dataset=14.0.2
- libarrow-flight=14.0.2
- libarrow-flight-sql=14.0.2
- libarrow-gandiva=14.0.2
- libarrow-substrait=14.0.2
- libblas=3.9.0
- libbrotlicommon=1.1.0
- libbrotlidec=1.1.0
- libbrotlienc=1.1.0
- libcblas=3.9.0
- libcrc32c=1.1.2
- libcublas=12.2.5.6
- libcufft=11.0.8.103
- libcurand=10.3.3.141
- libcurl=8.5.0
- libcusolver=11.5.2.141
- libcusparse=12.1.2.141
- libdeflate=1.19
- libedit=3.1.20191231
- libev=4.33
- libevent=2.1.12
- libexpat=2.5.0
- libffi=3.4.2
- libgcc-ng=13.2.0
- libgfortran-ng=13.2.0
- libgfortran5=13.2.0
- libgoogle-cloud=2.12.0
- libgrpc=1.59.3
- libhwloc=2.9.3
- libiconv=1.17
- libjpeg-turbo=3.0.0
- liblapack=3.9.0
- libllvm15=15.0.7
- libmagma=2.7.2
- libmagma_sparse=2.7.2
- libnghttp2=1.58.0
- libnl=3.9.0
- libnsl=2.0.1
- libnuma=2.0.16
- libnvjitlink=12.2.140
- libopenblas=0.3.25
- libparquet=14.0.2
- libpng=1.6.39
- libprotobuf=4.24.4
- libre2-11=2023.06.02
- libsqlite=3.44.2
- libssh2=1.11.0
- libstdcxx-ng=13.2.0
- libthrift=0.19.0
- libtiff=4.6.0
- libutf8proc=2.8.0
- libuuid=2.38.1
- libuv=1.46.0
- libwebp-base=1.3.2
- libxcb=1.15
- libxcrypt=4.4.36
- libxml2=2.11.6
- libzlib=1.2.13
- llvm-openmp=17.0.6
- lz4-c=1.9.4
- magma=2.7.2
- markupsafe=2.1.3
- mkl=2022.2.1
- mpc=1.3.1
- mpfr=4.2.1
- mpmath=1.3.0
- multidict=6.0.4
- multiprocess=0.70.15
- nccl=2.19.4.1
- ncurses=6.4
- networkx=3.2.1
- numpy=1.26.2
- openjpeg=2.5.0
- openssl=3.2.0
- orc=1.9.2
- packaging=23.2
- pandas=2.1.4
- pillow=10.1.0
- pip=23.3.2
- pthread-stubs=0.4
- pyarrow=14.0.2
- pysocks=1.7.1
- python=3.11.7
- python-dateutil=2.8.2
- python-tzdata=2023.3
- python-xxhash=3.4.1
- python_abi=3.11
- pytorch=2.1.0
- pytorch-gpu=2.1.0
- pytz=2023.3.post1
- pyyaml=6.0.1
- rdma-core=49.0
- re2=2023.06.02
- readline=8.2
- regex=2023.10.3
- requests=2.31.0
- s2n=1.4.1
- safetensors=0.3.3
- setuptools=68.2.2
- six=1.16.0
- sleef=3.5.1
- snappy=1.1.10
- sympy=1.12
- tbb=2021.11.0
- tk=8.6.13
- tokenizers=0.15.0
- torchvision=0.16.1
- tqdm=4.66.1
- transformers=4.36.2
- typing-extensions=4.9.0
- typing_extensions=4.9.0
- tzdata=2023d
- ucx=1.15.0
- urllib3=2.1.0
- wheel=0.42.0
- xorg-libxau=1.0.11
- xorg-libxdmcp=1.1.3
- xxhash=0.8.2
- xz=5.2.6
- yaml=0.2.5
- yarl=1.9.3
- zipp=3.17.0
- zstd=1.5.5
This is probably excessive, as the only thing that didn't work out of the box was python 3.12 (had to use 3.11), the rest could use latest available versions.
Cool project, btw.
Hello, thank you for sharing this model.
I did a quick and naive check between this and the Danbooru Interrogator in Automatic1111's webui and compared with the actual tags. The test took the 100 most recent posts from Danbooru with a success rate of 88/88. (12 images didn't have proper url to download).
These are my current observations:
I'm looking forward to see if this can be integrated into the webui or retrained for even more tags!
Note:
Settings: Threshold 0.5
I just noticed that the threshold on the doc was 0.4
. Here, I ran the same code but with the new threshold. The images pulled may not be the same. Success rate 88/88
(12 failed to download).
Observation:
Edit: Updated charts to reflect fixed Interrogator code. Cleaning tags was necessary.
thank you for this model! May I ask if you could possibly provide an ONNX format for the model? I want to try and use it in extensions for stable diffusion (e.g. comfyUI) and have been unable to figure out how to convert it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.