Giter Club home page Giter Club logo

chat-with-your-data's Introduction

This repository contains a system designed for embedding, indexing, and applying semantic search for personal folders containing text and image data.
The system is capable of processing, analyzing, and visualizing the data, with additional features such as clustering, image captioning, and retrieval-augmented generation.

Components:

Multi-modal Semantic Search:

  • Embedding and indexing text data using the nli-mpnet-base-v2 model.
  • Embedding and indexing image data using the CLIP model.
  • Semantic search for both text and image data (searching images by both image and text queries).
  • Additional keyword text search feature for enhanced search results.

Clustering and Image Captioning:

  • Clustering image embeddings using the PyTorch KMeans implementation (with GPU support).
  • Image captioning utilizing the BLIP model.

Retrieval-Augmented Generation RAG:

  • Utilization of a local instance of the Ollama API to run open-source LLM models (running with docker-compose).
  • Answering questions based on search results.
  • Summarizing search results.
  • Generating topics for provided image captions.

Web User Interface Using Gradio:

  • Provides a user-friendly interface for interacting with the system.

Visualization (In experiments directory)

  • Visualizes data and results.
  • Facilitates exploration of topic relationships through semantic graphs.
  • Applies PCA dimensionality reduction for 2D and 3D visualizations of cluster embeddings.

Backend API Support:

  • Offers a RESTful API for data retrieval and processing.

Download the Example Testing Dataset:

A sample testing dataset can be downloaded from here.

Installation (Linux / MacOS):

(Recommended)

Configuration:

cp .env.example .env

Starting the System:

./start.sh

Access the web interface at http://127.0.0.1:7860/.

Running Tests:

python ./src/api.py
cd src/tests
pytest

How to Run Manually (Windows):

Please note that the system is primarily designed to run on Linux. Running on Windows may require additional adjustments and is not guaranteed to work seamlessly.

# Set environment variables
set OLLAMA_LLM_MODEL=your_model # default is mistral:7b
set DEFAULT_SEARCH_FOLDER_PATH=\path\to\your\dataset\folder # optional

# Create a virtual environment and install dependencies
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt

# Start Ollama API and pull the model
docker compose up -d
docker exec -it ollama-api ollama pull %OLLAMA_LLM_MODEL%

# Start the application
python .\src\app.py

Access the web interface at http://127.0.0.1:7860/.

chat-with-your-data's People

Contributors

harduex avatar

Watchers

Kiril Valentinov Spiridonov avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.