Topic: llava Goto Github

Some thing interesting about llava

👇 Here are 80 public repositories matching this topic...

apocas / restai

llava,RestAI is an AIaaS (AI as a Service) open-source platform. Built on top of LlamaIndex, Ollama and HF Pipelines. Supports any public LLM supported by LlamaIndex and any local LLM suported by Ollama. Precise embeddings usage and tuning.

User: apocas

Home Page: https://apocas.github.io/restai/

embeddings langchain llm openai python fastapi rag llama openaiapi llamaindex

ashleykleynhans / llava-docker

llava,Docker image for LLaVA: Large Language and Vision Assistant

User: ashleykleynhans

ai chatbot chatgpt docker docker-image foundation-models gpt-4 instruction-tuning llama llama-2

ashleykleynhans / runpod-worker-llava

llava,LLaVA: Large Language and Vision Assistant | RunPod Serverless Worker

User: ashleykleynhans

chatbot docker-image gpt-4 llava multimodal runpod runpod-worker vision-language-model

ashleykleynhans / supir-docker

llava,Docker image for SUPIR (Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild)

User: ashleykleynhans

deep-learning diffusion-models docker docker-image llava pytorch pytorch-lightning restoration runpod sdxl

blaizzy / mlx-vlm

llava,MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.

User: blaizzy

llava llm mlx vision-transformer

blib-la / captain

llava,Your all-in-one platform to build and use AI apps effortlessly on your own computer.

Organization: blib-la

blip booru-tags captioning-images clip dataset-generation datasets generative-ai llava lora model-training

chenking2020 / findthechatgpter

llava,ChatGPT爆火，开启了通往AGI的关键一步，本项目旨在汇总那些ChatGPT的开源平替们，包括文本大模型、多模态大模型等，为大家提供一些便利

User: chenking2020

chatglm llama belle vicuna chatgpt alpaca guanaco lora llava minigpt4

corentin-ryr / multimedeval

llava,A Python tool to evaluate the performance of VLM on the medical domain.

User: corentin-ryr

benchmark evaluation medical-imaging vision-language-model llava

dlcv-buaa / tinyllavabench

llava,A Framework of Small-scale Large Multimodal Models

Organization: dlcv-buaa

Home Page: https://arxiv.org/abs/2402.14289

large-multimodal-models llama llava nlp tinyllama transformers vision-language

eliranwong / freegenius

llava,FreeGenius AI, an advanced AI assistant that can talk and take multi-step actions. Supports numerous open-source LLMs via Llama.cpp or Ollama or Groq Cloud API, with optional integration with AutoGen agents, OpenAI API, Google Gemini Pro and unlimited plugins.

User: eliranwong

Home Page: https://letmedoit.ai

ai autogen chatgpt gemini google ollama openai vision mistral stable-diffusion

fanghua-yu / supir

llava,SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild

User: fanghua-yu

Home Page: http://supir.xpixel.group/

deep-learning diffusion-models llava sdxl stable-diffusion super-resolution restoration pytorch pytorch-lightning

fmxexpress / ai-vision-chat

llava,Chat with large languages models about the contents of an image via this native desktop client for Windows, macOS, and Linux.

User: fmxexpress

Home Page: https://www.fmxexpress.com/

ai delphi delphi-sample desktop-app linux llava llm macos replicate-api vicuna

fuxiaoliu / lrv-instruction

llava,[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

User: fuxiaoliu

Home Page: https://fuxiaoliu.github.io/LRV/

evaluation gpt-4 hallucination object-detection vision vqa llama vicuna llava gpt

fuxiaoliu / mmc

llava,[NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning

User: fuxiaoliu

arxiv benchmark chart dataset gpt instruction-tuning llava minigpt4 mplug-owl multimodal

gbaptista / ollama-ai

llava,A Ruby gem for interacting with Ollama's API that allows you to run open source AI LLMs (Large Language Models) locally.

User: gbaptista

Home Page: https://rubygems.org/gems/ollama-ai

ai alpaca dolphin llama llama2 llm mistral mistral-ai mixtral nano-bots

gokayfem / awesome-vlm-architectures

llava,Famous Vision Language Models and Their Architectures

User: gokayfem

clip llava vlm image-encoder text-encoder multimodal blip cogvlm internlm kosmos

gokayfem / comfyui_vlm_nodes

llava,Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

User: gokayfem

nodes comfyui custom-nodes llava llm siglip phi15 img2text joytag image-captioning

graphpku / coi

llava,Chain of Images for Intuitively Reasoning

Organization: graphpku

Home Page: https://huggingface.co/spaces/fxmeng/Chain-of-Image

chatbot chatgpt dalle3 gpt4v llama llava multimodal visual-language-models chain-of-image chain-of-throught

haotian-liu / llava

llava,[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

User: haotian-liu

Home Page: https://llava.hliu.cc

gpt-4 chatbot chatgpt llama multimodal llava foundation-models instruction-tuning multi-modality visual-language-learning

herrera-luis / vision-core-ai

llava,Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.

User: herrera-luis

llamacpp llava whisper-ai bakllava

internlm / xtuner

llava,An efficient, flexible and full-featured toolkit for fine-tuning large models (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Organization: internlm

baichuan chatglm2 internlm large-language-models llama2 llm llm-training peft qwen chatbot

jameszhou-gl / gpt-4v-distribution-shift

llava,Code for "How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation"

User: jameszhou-gl

Home Page: https://arxiv.org/pdf/2312.07424.pdf

ai distribution-shift gpt-4v openai python clip llava generalization robustness

jhc13 / taggui

llava,Tag manager and captioner for image datasets

User: jhc13

image-captioning image-tagging pyside6 stable-diffusion tag-manager llava cogvlm cogagent moondream

mapluisch / llava-cli-with-multiple-images

llava,LLaVA inference with multiple images at once for cross-image analysis.

User: mapluisch

image-processing inference llama2 llama2-13b llava python image-concatenation lmm lmms pillow

mbzuai-oryx / video-chatgpt

llava,"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Organization: mbzuai-oryx

Home Page: https://mbzuai-oryx.github.io/Video-ChatGPT

chatbot clip gpt-4 llama llava mulit-modal vicuna vision-language vision-language-pretraining video-chatboat

meatfucker / metatron2

llava,A Multimodal Discord bot with machine learning functions, including LLM chat, Image generation, and Speech Generation capabilities

User: meatfucker

bark bot chatbot discord llm machine-learning stable-diffusion artificial-intelligence discord-bot discord-py

mgonzs13 / llama_ros

llava,llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2

User: mgonzs13

cpp gpt llama llm ros2 ggml gguf llamacpp llava vlm

modelscope / data-juicer

llava,A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据！

Organization: modelscope

data-analysis data-science dataset large-language-models llm nlp chinese data-visualization opendata gpt pytorch gpt-4 llama llms streamlit instruction-tuning pre-training multi-modal llava sora

modelscope / swift

llava,ms-swift: Use PEFT or Full-parameter to finetune 200+ LLMs or 15+ MLLMs

Organization: modelscope

Home Page: https://github.com/modelscope/swift/blob/main/docs/source/LLM/index.md

agent llm lora llama pre-training sft deploy finetune multimodal dpo

notune / captcha-solver

llava,basic google recaptcha solver using llava-v1.6-7b

User: notune

ai captcha captcha-solver llava ml opencv python python3

oliverc1623 / ceriad

llava,An extension of the Planner-Actor-Reporter framework applied to autonomous vehicles in Highway-Env and CARLA.

User: oliverc1623

autonomous-driving autonomous-vehicles carla-simulator highway-env llava llms vqa reinforcement-learning

open-compass / vlmevalkit

llava,Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 30+ HF models, 15+ benchmarks

Organization: open-compass

Home Page: https://rank.opencompass.org.cn/leaderboard-multimodal

gpt-4v large-language-models llava multi-modal openai vqa llm openai-api mplug-owl qwen

paddlepaddle / paddlemix

llava,Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Organization: paddlepaddle

aigc stable-diffusion blip2 clip minigpt4 image-to-text text-to-image ppdiffusers controlnet multimodal eva-clip sd-xl text-to-video dit llava qwen-vl sora stablevideodiffusion

robert-mcdermott / llm-image-classification

llava,Image Classification Testing with LLMs

User: robert-mcdermott

image-classification llm llava multimodal

roboflow / multimodal-maestro

llava,Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥

Organization: roboflow

Home Page: https://maestro.roboflow.com

lmm multimodality segment-anything instance-segmentation object-detection gpt-4 gpt-4-vision llava prompt-engineering visual-prompting

salt-nlp / llavar

llava,Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"

Organization: salt-nlp

Home Page: https://llavar.github.io/

chatbot chatgpt gpt-4 instruction-tuning llava multimodal ocr vision-and-language

scisharp / llamasharp

llava,A C#/.NET library to run LLM models (🦙LLaMA/LLaVA) on your local device efficiently.

Organization: scisharp

Home Page: https://scisharp.github.io/LLamaSharp

chatbot gpt llama llamacpp llm semantic-kernel llava multi-modal llama2 llama3

skalskip / awesome-foundation-and-multimodal-models

llava,👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

User: skalskip

blip clip foundational-models grounding-dino llava multimodal segment-anything computer-vision nlp open-vocabulary-detection

sshh12 / multi_token

llava,Embed arbitrary modalities (images, audio, documents, etc) into large language models.

User: sshh12

large-language-models llava large-multimodal-models multi-modality multimodal vision-language-model large-context llm

thomas-yanxin / karmavlm

llava,🧘🏻‍♂️KarmaVLM (相生)：A family of high efficiency and powerful visual language model.

User: thomas-yanxin

llama2 llava qwen2 vlm vision-language-model visual-language-learning multimodel

tianyi-lab / hallusionbench

llava,[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Organization: tianyi-lab

benchmark vlms gpt-4 gpt-4v llava benchmarks hallucination llm lmm large-language-models

tiwater / flowgen

llava,AutoGen Visualized - Visual Tools for Multi-Agent Development.

Organization: tiwater

Home Page: https://platform.flowgen.app/

autogen agent artificial-intelligence chatgpt llm openai rag gpt4v llava

tosiyuki / llava-jp

llava,LLaVA-JP is a Japanese VLM trained by LLaVA method

User: tosiyuki

llava llm python

trzy / llava-cpp-server

llava,LLaVA server (llama.cpp).

User: trzy

llama llama2 llava llm multimodal vision-transformer

ucsc-vlaa / sight-beyond-text

llava,This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"

Organization: ucsc-vlaa

llama2 llava llm mllm vicuna vision-language ai-alignment alignment vlm

uniaa-mllm / uniaa

llava,Unified Multi-modal IAA Baseline and Benchmark

User: uniaa-mllm

image-aesthetic-assessment benchmark dataset llava mllm

unum-cloud / uform

llava,Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

Organization: unum-cloud

Home Page: https://unum-cloud.github.io/uform/

huggingface-transformers language-vision multimodal pytorch semantic-search transformer cross-attention vector-search bert neural-network