nopperl / clip-synthetic-captions Goto Github PK
View Code? Open in Web Editor NEWTiny-scale experiment showing that CLIP models trained using detailed captions generated by multimodal models (CogVLM and LLaVA 1.5) outperform models trained using the original alt-texts on a range of classification and retrieval tasks.
License: Apache License 2.0