Giter Club home page Giter Club logo

vectordb-recipes's Introduction

VectorDB-recipes


Dive into building GenAI applications! This repository contains examples, applications, starter code, & tutorials to help you kickstart your GenAI projects.
  • These are built using LanceDB, a free, open-source, serverless vectorDB that requires no setup.
  • It integrates into python data ecosystem so you can simply start using these in your existing data pipelines in pandas, arrow, pydantic etc.
  • LanceDB has native Typescript SDK using which you can run vector search in serverless functions!

Join our community for support - DiscordTwitter

This repository is divided into 3 sections:

  • Examples - Get right into the code with minimal introduction, aimed at getting you from an idea to PoC within minutes!
  • Applications - Ready to use Python and web apps using applied LLMs, VectorDB and GenAI tools
  • Tutorials - A curated list of tutorials, blogs, Colabs and courses to get you started with GenAI in greater depth.

Examples

Applied examples that get right into the code with minimal introduction, aimed at getting you from an idea to PoC within minutes! Examples are available as:

  • Colab notebooks - that builds the application is stages allowing you to investigate results at every intermediate stage.
  • Python scripts - for cases where you'd like directly to use the file or snippets to integrate in your application
  • JS/TS scripts - Some examples are written using lancedb's native js library! These script/snippets can also be directly integrated in your web applications.

If you're looking for in-depth tutorial-like examples, checkout the tutorials section!

Example   Notebook & Scripts   Read The Blog!       
Youtube transcript search bot Open In Colab Python JS LLM intermediate
Langchain: Code Docs QA bot Open In Colab Python JS LLM intermediate
Databricks DBRX Website Bot Python Databricks LLM beginner
CLI-based SDK Manual Chatbot with Phidata Python local LLM beginner
TransformersJS Embedding example JS LLM advanced
Inbuilt Hybrid Search Open In Colab LLM beginner
Audio Search Open In Colab Python LLM beginner
Multi-lingual search Open In Colab Python LLM beginner
Hybrid search BM25 & lancedb Open In Colab LLM intermediate Ghost
Search Within Images Open In Colab local LLM intermediate Ghost
Accelerate Vector Search Applications Using OpenVINO Open In Colab local LLM advanced Ghost
Multimodal CLIP: DiffusionDB Open In Colab Python LLM beginner Ghost
Multimodal CLIP: Youtube videos Open In Colab Python LLM beginner Ghost
Multimodal Image + Text Search Open In Colab Python LLM intermediate Ghost
Movie Recommender Open In Colab Python beginner
Product Recommender Open In Colab Pythonintermediate
Arxiv paper recommender Open In Colab Python LLM beginner
Improve RAG with Re-ranking Open In Colab LLM beginner Ghost
Improve RAG with FLARE Open In Colab Python LLM intermediate Ghost
Improve RAG with HyDE Open In Colab LLM intermediate Ghost
Improve RAG with LOTR Open In Colab LLM intermediate Ghost
Advanced RAG: Parent Document Retriever Open In Colab LLM intermediate Ghost
Query Expansion and Reranker Open In Colab LLM advanced Ghost
RAG Fusion Open In Colab LLM advanced
Contextual-Compression-with-RAG Open In Colab local LLM intermediate Ghost
Instruct-Multitask Open In Colab Python LLM beginner Ghost
Evaluating Prompts with Prompttools Open In Colab LLM local LLM advanced
AI Agents: Reducing Hallucination Open In Colab Python JS LLM advanced Ghost
AI Trends Searcher with CrewAI Open In Colab LLM beginner Ghost
SuperAgent Autogen Open In Colab LLM intermediate Ghost
Sentiment Analysis : Analysing Hotel Reviews Open In Colab local LLM beginner Ghost
Facial Recognition Open In Colab beginner
Imagebind demo app hf spaces intermediate

Projects & Applications

These are ready to use applications built using LanceDB serverless vector database. You can explore these open source projects, use parts of them in your projects or build your applications on top of these.

Project Name Description Screenshot
YOLOExplorer Iterate on your YOLO / CV datasets using SQL, Vector semantic search, and more within seconds YOLOExplorer
Website Chatbot (Deployable Vercel Template) Create a chatbot from the sitemap of any website/docs of your choice. Built using vectorDB serverless native javascript package. Chatbot
Chat with multiple URL/website Conversational AI for Any Website with Mistral,Bge Embedding & LanceDB webui_aa
Talk with Youtube Video using GPT4 Vision API Talk with Youtube Video using GPT4 Vision API and Langchain demo
Talk with Podcast Talk with Youtube Podcast using Ollama and insanely-fast-whisper demo
Talk with Wikipedia Talk with Wikipedia Pages demo
Talk with Github Talk with Github Codespaces using Qwen1.5 demo
Document Chat with Langroid Talk with your Documents using Langroid demo
Hr chatbot Hr chatbot - ask your personal query using zero-shot React agent & tools image
Advanced Chatbot with Parler TTS This Chatbot app uses Lancedb Hybrid search, FTS & reranker method with Parlers TTS library. image
Multi-Modal Search Engine Create a Multi-modal search engine app, to search images using both images or text Search
Multimodal Myntra Fashion Search Engine This app uses OpenAI's CLIP to make a search engine that can understand and deal with both written words and pictures. image
Multilingual-RAG Multilingual RAG with cohere embedding & support 100+ languages image
Fastapi RAG template FastAPI based RAG template with Websocket support image
GTE MLX RAG mlx based RAG model using lancedb api support image

Tutorials

Looking to get started with LLMs, vectorDBs, and the world of Generative AI? These in-depth tutorials and courses cover these concepts with practical follow along colabs where possible.

Tutorial Interactive Environment Blog Link
Build RAG from Scratch Open In Colab LLM beginner
Local RAG from Scratch with Llama3 Python local LLM beginner
A Primer on Text Chunking and its Types Open In Colab beginner Ghost
Langchain LlamaIndex Chunking Open In Colab beginner Ghost
NER powered Semantic Search Open In Colab local LLM beginner Ghost
Product Quantization: Compress High Dimensional Vectors intermediate Ghost
Corrective RAG with Langgraph Open In Colab LLM intermediate Ghost
LLMs, RAG, & the missing storage layer for AI intermediate Ghost
Fine-Tuning LLM using PEFT & QLoRA Open In Colab local LLM advanced Ghost
Context-Aware Chatbot using Llama 2 & LanceDB Open In Colab local LLM advanced Ghost
Better RAG with FLARE Open In Colab local LLM LLM advanced Ghost

🌟 New! 🌟 Applied GenAI and VectorDB course on Udacity Learn about GenAI and vectorDBs using LanceDB in the recently launched Udacity Course

Contributing Examples

If you're working on some cool applications that you'd like to add to this repo, please open a PR!

vectordb-recipes's People

Contributors

akashad98 avatar akashmangoai avatar albertlockett avatar ayushexel avatar dependabot[bot] avatar deshwalmahesh avatar kadirnar avatar kaushal07wick avatar nishant-kumar-2002 avatar nivekt avatar prashantdixit0 avatar qianzhu avatar raghavdixit99 avatar tanaymeh avatar tevinwang avatar unkn-wn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vectordb-recipes's Issues

Add linting

Black & Isort. Pre commit actions if its not disruptive

Update Lambda examples to use S3 Express

This is currently blocked by apache/arrow-rs#5140

We need:

  1. New API documentation from AWS to know what needs to be updated
  2. arrow-rs has to be updated
  3. arrow-rs needs to be released
  4. datafusion needs to update
  5. lance can then update

to test things out, we can certainly just fork arrow-rs and datafusion to create a new custom build.

Invalid argument error: Dictionary replacement detected when writing IPC file format. Arrow IPC files only support a single dictionary for a given field across all batches.

Heyo me again 👯

i created a little helper file for my usecase with lanceDb but when i run my example i get an error which doesnt help

[Error: Invalid argument error: Dictionary replacement detected when writing IPC file format. Arrow IPC files only support a single dictionary for a given field across all batches.]

Here is my code:

lanceDb-retriver.ts

import { OpenAIEmbeddingFunction, connect, } from 'vectordb';
const dbPath = 'assets/db'
let embedFunction;

export interface IngestOptions {
    table: string;
    data: Array<Record<string, unknown>>;
}

export interface RetriveOptions {
    query: string;
    table: string;
    limit?: number;
    filter?: string;
    select?: Array<string>;
}

export interface DeleteOptions {
    table: string;
    filter: string;
}

export interface UpdateOptions {
    table: string;
    data: Record<string, unknown>[]
}

export async function useLocalEmbedding() {
    const { pipeline } = await import('@xenova/transformers');
    const pipe = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');

    const embed_fun: any = {};
    embed_fun.sourceColumn = 'text';
    embed_fun.embed = async function (batch) {
        let result = [];
        for (let text of batch) {
            const res = await pipe(text, { pooling: 'mean', normalize: true });
            result.push(Array.from(res['data']));
        }
        return result;
    }

    embedFunction = embed_fun;
}

export function useOpenAiEmbedding(apiKey: string, sourceColumn = 'pageContent') {
    embedFunction = new OpenAIEmbeddingFunction(sourceColumn, apiKey)
}

export async function update(options: UpdateOptions) {
    try {
        const db = await connect(dbPath)

        if ((await db.tableNames()).includes(options.table)) {
            const tbl = await db.openTable(options.table, embedFunction)
            await tbl.overwrite(options.data)
        } else {
            return new Error("Table does not exist")
        }
    } catch (e) {
        console.error(e);
        throw e;
    }
}

export async function remove(options: DeleteOptions) {
    try {
        const db = await connect(dbPath)

        if ((await db.tableNames()).includes(options.table)) {
            const tbl = await db.openTable(options.table, embedFunction)
            await tbl.delete(options.filter)
        } else {
            return new Error("Table does not exist")
        }
    } catch (e) {
        console.error(e);
        throw e;
    }
}

export async function ingest(options: IngestOptions) {
    try {
        const db = await connect(dbPath)
        if ((await db.tableNames()).includes(options.table)) {
            const tbl = await db.openTable(options.table, embedFunction)
            await tbl.overwrite(options.data)
        } else {
            await db.createTable(options.table, options.data, embedFunction)
        }
    }
    catch (e) {
        console.error(e);
        throw e;
    }
}

export async function retrive(options: RetriveOptions) {
    try {
        const db = await connect(dbPath)

        if ((await db.tableNames()).includes(options.table)) {
            const tbl = await db.openTable(options.table, embedFunction)
            const build = tbl.search(options.query);

            if (options.filter) {
                build.filter(options.filter)
            }

            if (options.select) {
                build.select(options.select)
            }

            if (options.limit) {
                build.limit(options.limit)
            }

            const results = await build.execute();
            return results;
        } else {
            return new Error("Table does not exist")
        }
    }
    catch (e) {
        console.error(e);
        throw e;
    }
}

and i call it on an other file

test.ts

import dotenv from 'dotenv';
dotenv.config();
const apiKey = process.env.OPENAI_API_KEY;

async function main() {
    console.time('ingest');
    const data = [
        {
            id: 1,
            metadata: {
                title: "Lorem Ipsum Document",
                author: "John Doe",
                date: "2023-09-20"
            },
            pageContent: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed ac ipsum nec justo consequat dignissim. Nulla facilisi. Integer gravida tincidunt turpis eget iaculis."
        },
        {
            id: 2,
            metadata: {
                title: "Technical Report on AI Ethics",
                author: "Jane Smith",
                date: "2023-09-21"
            },
            pageContent: "This document provides an overview of the ethical considerations surrounding artificial intelligence. It covers topics such as bias in machine learning, data privacy, and responsible AI development."
        },
    ];
    useOpenAiEmbedding(apiKey);
    await ingest({
        data,
        table: 'vectors'
    })

    const retriveData = await retrive({
        table: 'vectors',
        query: 'what is lorem?'
    });

    console.log(retriveData);
    console.timeEnd('ingest');
}

main();

Tag all examples

  • Add [beginner, intermediate, Advanced] tags for all examples
  • Try to maintain the ordering beginner followed by intermediate
  • group topics together for example like this:
Screenshot 2024-04-15 at 11 29 15 PM

[User Feedback]: Recipes need better branding, content depth & better introduction.

A power user provided the following critical feedback:

  • Who is this repo aimed at? Add better intro for each of the tables to set the right expectations.
    A user might come expecting tutorials with a lot of handholding. Instead 2/3 tables are mostly PoCs and Standalone applications. They expect users to know a bit more than "what are LLMs, vectorDBs, & RAGs respectively". We've recently added a 3rd table that is aimed more at "Lengthy tutorials" that are actually introductory but no one can deduce them from the table titles at a glance.
  • Why Should someone use this repo as opposed to others like openai cookbook and others?
    Missing loud backlinks to lancedb and doesn't tell much about the value props(serverless, no setup auth, native js etc.) so a user landing directly to this repo has no incentive to try it out.
    (This was actually done on purpose initially to get users to try the examples sooner, but maybe it's better to find a middle ground)

chat with any website app broken

I followed the readme, but keep getting 422 errors:

-> % curl 'http://localhost:7860/run/predict' \
  -H 'Accept: */*' \
  -H 'Accept-Language: en-US,en' \
  -H 'Cache-Control: no-cache' \
  -H 'Connection: keep-alive' \
  -H 'Content-Type: application/json' \
  -H 'Cookie: PGADMIN_LANGUAGE=en; _ga=GA1.1.919721918.1702334574; _ga_R1FN4KJKJH=GS1.1.1702334574.1.1.1702334813.0.0.0' \
  -H 'Origin: http://localhost:7860' \
  -H 'Pragma: no-cache' \
  -H 'Referer: http://localhost:7860/' \
  -H 'Sec-Fetch-Dest: empty' \
  -H 'Sec-Fetch-Mode: cors' \
  -H 'Sec-Fetch-Site: same-origin' \
  -H 'Sec-GPC: 1' \
  -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36' \
  -H 'dnt: 1' \
  -H 'sec-ch-ua: "Not_A Brand";v="8", "Chromium";v="120", "Brave";v="120"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  --data-raw '{"data":["https://lancedb.com/ & https://blog.lancedb.com/context-aware-chatbot-using-llama-2-lancedb-as-vector-database-4d771d95c755"],"event_data":null,"fn_index":0,"session_hash":"81i86315img"}' \
  --compressed
{"detail":[{"type":"missing","loc":["body","event_id"],"msg":"Field required","input":{"data":["https://lancedb.com/ & https://blog.lancedb.com/context-aware-chatbot-using-llama-2-lancedb-as-vector-database-4d771d95c755"],"event_data":null,"fn_index":0,"session_hash":"81i86315img"},"url":"https://errors.pydantic.dev/2.4/v/missing"}]}%     

openai.Configuration is not a constructor

Heyo your implementation of the OpenAIEmbeddingFunction seems to fail when i try to run it like this

`import { OpenAIEmbeddingFunction, connect } from 'vectordb';
import dotenv from 'dotenv';
dotenv.config();

const dbPath = 'assets/db/lancedb'
const apiKey = process.env.OPENAI_API_KEY
let embedFunction = new OpenAIEmbeddingFunction('info', apiKey)`

i get this error

        const configuration = new openai.Configuration({
                              ^
TypeError: openai.Configuration is not a constructor
    at new OpenAIEmbeddingFunction (...node_modules\vectordb\dist\embedding\openai.js:37:31)

Monthly Recipes audit

Jan -

Things to do:

  • Every example should work without errors
  • The requirements need to be present in the examples directory
  • If case there is a colab, make sure the first cell installs all requriments
  • The links to colabs and blogs should actually work

NOTE: When auditing, make sure each example gets tested in a separate env. Otherwise missing deps errors won't be captured for some cases. This can be automated via a script or smthn.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.