truefoundry / cognita Goto Github PK

RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry

Home Page: https://cognita.truefoundry.com

License: Apache License 2.0

Python 61.00% Dockerfile 0.67% HTML 0.70% JavaScript 0.09% TypeScript 34.00% SCSS 3.53% Shell 0.01%

ai deep-learning generative-ai llmops machine-learning mlops model-deployment python rag retrieval-augmented-generation

cognita's Introduction

Cognita

Why use Cognita?

Langchain/LlamaIndex provide easy to use abstractions that can be used for quick experimentation and prototyping on jupyter notebooks. But, when things move to production, there are constraints like the components should be modular, easily scalable and extendable. This is where Cognita comes in action. Cognita uses Langchain/Llamaindex under the hood and provides an organisation to your codebase, where each of the RAG component is modular, API driven and easily extendible. Cognita can be used easily in a local setup, at the same time, offers you a production ready environment along with no-code UI support. Cognita also supports incremental indexing by default.

You can try out Cognita at: https://cognita.truefoundry.com

🎉 What's new in Cognita

[June, 2024] Added one click local deployment of cognita. You can now run the entire cognita system using docker-compose. This makes it easier to test and develop locally.
[May, 2024] Added support for Embedding and Reranking using Infninty Server. You can now use hosted services for variatey embeddings and reranking services available on huggingface. This reduces the burden on the main cognita system and makes it more scalable.
[May, 2024] Cleaned up requirements for optional package installations for vector dbs, parsers, embedders, and rerankers.
[May, 2024] Conditional docker builds with arguments for optional package installations
[April, 2024] Support for multi-modal vision parser using GPT-4

Cognita
- Why use Cognita?
🎉 What's new in Cognita
Contents
- Introduction
  - Advantages of using Cognita are:
  - Features:
🚀 Quickstart: Running Cognita Locally
- 🐳 Using Docker compose (recommended)
- Cognita from source
⚒️ Project Architecture
💡 Writing your Query Controller (QnA):
- Steps to add your custom Query Controller:
🐳 Quickstart: Deployment with Truefoundry:
- Using the RAG UI:
💖 Open Source Contribution
🔮 Future developments
Star History

Introduction

Cognita is an open-source framework to organize your RAG codebase along with a frontend to play around with different RAG customizations. It provides a simple way to organize your codebase so that it becomes easy to test it locally while also being able to deploy it in a production ready environment. The key issues that arise while productionizing RAG system from a Jupyter Notebook are:

Chunking and Embedding Job: The chunking and embedding code usually needs to be abstracted out and deployed as a job. Sometimes the job will need to run on a schedule or be triggered via an event to keep the data updated.
Query Service: The code that generates the answer from the query needs to be wrapped up in a api server like FastAPI and should be deployed as a service. This service should be able to handle multiple queries at the same time and also autoscale with higher traffic.
LLM / Embedding Model Deployment: Often times, if we are using open-source models, we load the model in the Jupyter notebook. This will need to be hosted as a separate service in production and model will need to be called as an API.
Vector DB deployment: Most testing happens on vector DBs in memory or on disk. However, in production, the DBs need to be deployed in a more scalable and reliable way.

Cognita makes it really easy to customize and experiment everything about a RAG system and still be able to deploy it in a good way. It also ships with a UI that makes it easier to try out different RAG configurations and see the results in real time. You can use it locally or with/without using any Truefoundry components. However, using Truefoundry components makes it easier to test different models and deploy the system in a scalable way. Cognita allows you to host multiple RAG systems using one app.

Advantages of using Cognita are:

A central reusable repository of parsers, loaders, embedders and retrievers.
Ability for non-technical users to play with UI - Upload documents and perform QnA using modules built by the development team.
Fully API driven - which allows integration with other systems.

If you use Cognita with Truefoundry AI Gateway, you can get logging, metrics and feedback mechanism for your user queries.

Features:

Support for multiple document retrievers that use Similarity Search, Query Decompostion, Document Reranking, etc
Support for SOTA OpenSource embeddings and reranking from mixedbread-ai
Support for using LLMs using Ollama
Support for incremental indexing that ingests entire documents in batches (reduces compute burden), keeps track of already indexed documents and prevents re-indexing of those docs.

🚀 Quickstart: Running Cognita Locally

🐳 Using Docker compose (recommended)

Cognita and all of it's services can be run using docker-compose. This is the recommended way to run Cognita locally. You can run the following command to start the services:

docker-compose --env-file compose.env up --build

The compose file uses compose.env file for environment variables. You can modify it as per your needs.
The compose file will start the following services:
- ollama-server - Used to start local LLM server. compose.env has OLLAMA_MODEL as the environment variable to specify the model.
- infinity-server - Used to start local embeddings and rerankers server. compose.env has INFINITY_EMBEDDING_MODEL and INFINITY_RERANKING_MODEL as environment variables to specify the embedding and reranker from HuggingFace Hub.
- qdrant-server - Used to start local vector db server.
- cognita-backend - Used to start the FastAPI backend server for Cognita.
- cognita-frontend - Used to start the frontend for Cognita.
Once the services are up, you can access the infinity server at http://localhost:7997, qdrant server at http://localhost:6333, the backend at http://localhost:8000 and frontend at http://localhost:5001.
Backend uses local.metadata.yaml file for configuration. You can modify it as per your needs. The file is used to setup collection name, different data source path, and embedder configurations. Before starting of backend an indexer job is run to index the data sources mentioned in local.metadata.yaml file.

⚠️ Note: Currently UI supports only QnA and not the data source and collection creation. These have to be done via local.metadata.yaml only. Post that restart the docker-compose services. The work is in progress to bring that facility via UI as well and make the experience seamless.

Cognita from source

You can play around with the code locally using the python script or using the UI component that ships with the code.

🐍 Installing Python and Setting Up a Virtual Environment

Before you can use Cognita, you'll need to ensure that Python >=3.10.0 is installed on your system and that you can create a virtual environment for a safer and cleaner project setup.

Setting Up a Virtual Environment

It's recommended to use a virtual environment to avoid conflicts with other projects or system-wide Python packages.

Create a Virtual Environment:

Navigate to your project's directory in the terminal. Run the following command to create a virtual environment named venv (you can name it anything you like):

python3 -m venv ./venv

Activate the Virtual Environment:

On Windows, activate the virtual environment by running:

venv\Scripts\activate.bat

On macOS and Linux, activate it with:

source venv/bin/activate

Once your virtual environment is activated, you'll see its name in the terminal prompt. Now you're ready to install Cognita using the steps provided in the Quickstart sections.

Remember to deactivate the virtual environment when you're done working with Cognita by simply running deactivate in the terminal.

Following are the instructions for running Cognita locally without any additional Truefoundry dependencies

Install necessary packages:

In the project root execute the following command:

pip install -r backend/requirements.txt

Install Additional packages:

Install packages for additional parsers like PDFTableParser that uses deep doctection for table extraction from PDFs. This is optional and can be skipped if you don't need to extract tables from PDFs.
```
pip install -r backend/parsers.requirements.txt
```
Install packages for vector_db that uses singlestore for vector db. This is optional and can be skipped if you don't need to use singlestore.
```
pip install -r backend/vectordb.requirements.txt
```
Uncomment the respective vector db in backend/modules/vector_db/__init__.py to use it.

Infinity Service:

Rerankers and Embedders are to be used via hosted services like Infinity. Respective service files can be found under embedder and reranker directories.
To install Infinity service, follow the instructions here. You can also run the following command to start a Docker container having mixedbread embeddings and rerankers.

    docker run -it --gpus all \
    -v $PWD/infinity/data:/app/.cache \
    -p 7997:7997 \
    michaelf34/infinity:latest \
    v2 \
    --model-id mixedbread-ai/mxbai-embed-large-v1 \
    --model-id mixedbread-ai/mxbai-rerank-xsmall-v1 \
    --port 7997

Setting up .env file:

Create a .env file by copying copy from compose.env set up relavant fields. You will need to provide EMBEDDING_SVC_URL and RERANKER_SVC_URL in .env file respectively which will be `http://localhost:7997"

Executing the Code:

Now we index the data (sample-data/creditcards) by executing the following command from project root:
```
python -m local.ingest
```
You can also start a FastAPI server: uvicorn --host 0.0.0.0 --port 8000 backend.server.app:app --reload Then, Swagger doc will be available at: http://localhost:8000/ For local version you need not create data sources, collection or index them using API, as it is taken care by local.metadata.yaml and ingest.py file. You can directly try out retrievers endpoint.
To use frontend UI for quering you can go to : cd frontend and execute yarn dev to start the UI and play around. Refer more at frontend README. You can then query the documents using the UI hosted at http://localhost:5000/

These commands make use of local.metadata.yaml file where you setup qdrant collection name, different data source path, and embedder configurations. You can try out different retrievers and queries by importing them from from backend.modules.query_controllers.example.payload in run.py. To run the query execute the query script from project root: python -m local.run

⚒️ Project Architecture

Overall the architecture of Cognita is composed of several entities

Cognita Components:

Data Sources - These are the places that contain your documents to be indexed. Usually these are S3 buckets, databases, TrueFoundry Artifacts or even local disk
Metadata Store - This store contains metadata about the collection themselves. A collection refers to a set of documents from one or more data sources combined. For each collection, the collection metadata stores
- Name of the collection
- Name of the associated Vector DB collection
- Linked Data Sources
- Parsing Configuration for each data source
- Embedding Model and Configuration to be used
LLM Gateway - This is a central proxy that allows proxying requests to various Embedding and LLM models across many providers with a unified API format. This can be OpenAIChat, OllamaChat, or even TruefoundryChat that uses TF LLM Gateway.
Vector DB - This stores the embeddings and metadata for parsed files for the collection. It can be queried to get similar chunks or exact matches based on filters. We are currently supporting Qdrant and SingleStore as our choice of vector database.
Indexing Job - This is an asynchronous Job responsible for orchestrating the indexing flow. Indexing can be started manually or run regularly on a cron schedule. It will
- Scan the Data Sources to get list of documents
- Check the Vector DB state to filter out unchanged documents
- Downloads and parses files to create smaller chunks with associated metadata
- Embeds those chunks using the AI Gateway and puts them into Vector DB
  
  The source code for this is in the backend/indexer/
API Server - This component processes the user query to generate answers with references synchronously. Each application has full control over the retrieval and answer process. Broadly speaking, when a user sends a request
- The corresponsing Query Controller bootstraps retrievers or multi-step agents according to configuration.
- User's question is processed and embedded using the AI Gateway.
- One or more retrievers interact with the Vector DB to fetch relevant chunks and metadata.
- A final answer is formed by using a LLM via the AI Gateway.
- Metadata for relevant documents fetched during the process can be optionally enriched. E.g. adding presigned URLs.
  
  The code for this component is in backend/server/

Data Indexing:

A Cron on some schedule will trigger the Indexing Job
The data source associated with the collection are scanned for all data points (files)
The job compares the VectorDB state with data source state to figure out newly added files, updated files and deleted files. The new and updated files are downloaded
The newly added files and updated files are parsed and chunked into smaller pieces each with their own metadata
The chunks are embedded using embedding models like text-ada-002 from openai or mxbai-embed-large-v1 from mixedbread-ai
The embedded chunks are put into VectorDB with auto generated and provided metadata

❓ Question-Answering using API Server:

Users sends a request with their query
It is routed to one of the app's query controller
One or more retrievers are constructed on top of the Vector DB
Then a Question Answering chain / agent is constructed. It embeds the user query and fetches similar chunks.
A single shot Question Answering chain just generates an answer given similar chunks. An agent can do multi step reasoning and use many tools before arriving at an answer. In both cases, the API server uses LLM models (like GPT 3.5, GPT 4, etc)
Before returning the answer, the metadata for relevant chunks can be updated with things like presigned urls, surrounding slides, external data source links.
The answer and relevant document chunks are returned in response.

Note: In case of agents the intermediate steps can also be streamed. It is up to the specific app to decide.

💻 Code Structure:

Entire codebase lives in backend/

.
|-- Dockerfile
|-- README.md
|-- __init__.py
|-- backend/
|   |-- indexer/
|   |   |-- __init__.py
|   |   |-- indexer.py
|   |   |-- main.py
|   |   `-- types.py
|   |-- modules/
|   |   |-- __init__.py
|   |   |-- dataloaders/
|   |   |   |-- __init__.py
|   |   |   |-- loader.py
|   |   |   |-- localdirloader.py
|   |   |   `-- ...
|   |   |-- embedder/
|   |   |   |-- __init__.py
|   |   |   |-- embedder.py
|   |   |   -- mixbread_embedder.py
|   |   |   `-- embedding.requirements.txt
|   |   |-- metadata_store/
|   |   |   |-- base.py
|   |   |   |-- client.py
|   |   |   `-- truefoundry.py
|   |   |-- parsers/
|   |   |   |-- __init__.py
|   |   |   |-- parser.py
|   |   |   |-- pdfparser_fast.py
|   |   |   `-- ...
|   |   |-- query_controllers/
|   |   |   |-- default/
|   |   |   |   |-- controller.py
|   |   |   |   `-- types.py
|   |   |   |-- query_controller.py
|   |   |-- reranker/
|   |   |   |-- mxbai_reranker.py
|   |   |   |-- reranker.requirements.txt
|   |   |   `-- ...
|   |   `-- vector_db/
|   |       |-- __init__.py
|   |       |-- base.py
|   |       |-- qdrant.py
|   |       `-- ...
|   |-- requirements.txt
|   |-- server/
|   |   |-- __init__.py
|   |   |-- app.py
|   |   |-- decorators.py
|   |   |-- routers/
|   |   `-- services/
|   |-- settings.py
|   |-- types.py
|   `-- utils.py

Customizing the Code for your usecase

Cognita goes by the tagline -

Everything is available and Everything is customizable.

Cognita makes it really easy to switch between parsers, loaders, models and retrievers.

Customizing Dataloaders:

You can write your own data loader by inherting the BaseDataLoader class from backend/modules/dataloaders/loader.py
Finally, register the loader in backend/modules/dataloaders/__init__.py

Testing a dataloader on localdir, in root dir, copy the following code as test.py and execute it. We show how to test an existing LocalDirLoader here:

from backend.modules.dataloaders import LocalDirLoader
from backend.types import DataSource

data_source = DataSource(
type="local",
uri="sample-data/creditcards",
)

loader = LocalDirLoader()


loaded_data_pts = loader.load_full_data(
    data_source=data_source,
    dest_dir="test/creditcards",
)


for data_pt in loaded_data_pts:
    print(data_pt)

Customizing Embedder:

The codebase currently uses OpenAIEmbeddings you can registered as default.
You can register your custom embeddings in backend/modules/embedder/__init__.py
You can also add your own embedder an example of which is given under backend/modules/embedder/mixbread_embedder.py. It inherits langchain embedding class.

Customizing Parsers:

You can write your own parser by inherting the BaseParser class from backend/modules/parsers/parser.py
Finally, register the parser in backend/modules/parsers/__init__.py

Testing a Parser on a local file, in root dir, copy the following code as test.py and execute it. Here we show how we can test existing MarkdownParser:

import asyncio
from backend.modules.parsers import MarkdownParser

parser = MarkdownParser()
chunks =  asyncio.run(
    parser.get_chunks(
        filepath="sample-data/creditcards/diners-club-black.md",
    )
)
print(chunks)

Adding Custom VectorDB:

To add your own interface for a VectorDB you can inhertit BaseVectorDB from backend/modules/vector_db/base.py
Register the vectordb under backend/modules/vector_db/__init__.py

Rerankers:

Rerankers are used to sort relavant documents such that top k docs can be used as context effectively reducing the context and prompt in general.
Sample reranker is written under backend/modules/reranker/mxbai_reranker.py

💡 Writing your Query Controller (QnA):

Code responsible for implementing the Query interface of RAG application. The methods defined in these query controllers are added routes to your FastAPI server.

Steps to add your custom Query Controller:

Add your Query controller class in backend/modules/query_controllers/
Add query_controller decorator to your class and pass the name of your custom controller as argument

from backend.server.decorator import query_controller

@query_controller("/my-controller")
class MyCustomController():
    ...

Add methods to this controller as per your needs and use our http decorators like post, get, delete to make your methods an API

from backend.server.decorator import post

@query_controller("/my-controller")
class MyCustomController():
    ...

    @post("/answer")
    def answer(query: str):
        # Write code to express your logic for answer
        # This API will be exposed as POST /my-controller/answer
        ...

Import your custom controller class at backend/modules/query_controllers/__init__.py

...
from backend.modules.query_controllers.sample_controller.controller import MyCustomController

As an example, we have implemented sample controller in backend/modules/query_controllers/example. Please refer for better understanding

🐳 Quickstart: Deployment with Truefoundry:

To be able to Query on your own documents, follow the steps below:

Register at TrueFoundry, follow here
- Fill up the form and register as an organization (let's say <org_name>)
- On Submit, you will be redirected to your dashboard endpoint ie https://<org_name>.truefoundry.cloud
- Complete your email verification
- Login to the platform at your dashboard endpoint ie. https://<org_name>.truefoundry.cloud
Note: Keep your dashboard endpoint handy, we will refer it as "TFY_HOST" and it should have structure like "https://<org_name>.truefoundry.cloud"
Setup a cluster, use TrueFoundry managed for quick setup
- Give a unique name to your Cluster and click on Launch Cluster
- It will take few minutes to provision a cluster for you
- On Configure Host Domain section, click Register for the pre-filled IP
- Next, Add a Docker Registry to push your docker images to.
- Next, Deploy a Model, you can choose to Skip this step
Add a Storage Integration
Create a ML Repo
- Navigate to ML Repo tab
- Click on + New ML Repo button on top-right
- Give a unique name to your ML Repo (say 'docs-qa-llm')
- Select Storage Integration
- On Submit, your ML Repo will be created
  
  For more details: link
Create a Workspace
- Navigate to Workspace tab
- Click on + New Workspace button on top-right
- Select your Cluster
- Give a name to your Workspace (say 'docs-qa-llm')
- Enable ML Repo Access and Add ML Repo Access
- Select your ML Repo and role as Project Admin
- On Submit, a new Workspace will be created. You can copy the Workspace FQN by clicking on FQN.
For more details: link
Deploy RAG Application
- Navigate to Deployments tab
- Click on + New Deployment buttton on top-right
- Select Application Catalogue
- Select your workspace
- Select RAG Application
- Fill up the deployment template
  - Give your deployment a Name
  - Add ML Repo
  - You can either add an existing Qdrant DB or create a new one
  - By default, main branch is used for deployment (You will find this option in Show Advance fields). You can change the branch name and git repository if required.
    
    Make sure to re-select the main branch, as the SHA commit, does not get updated automatically.
  - Click on Submit your application will be deployed.

Using the RAG UI:

The following steps will showcase how to use the cognita UI to query documents:

Create Data Source
- Click on Data Sources tab
- Click + New Datasource
- Data source type can be either files from local directory, web url, github url or providing Truefoundry artifact FQN.
  - E.g: If Localdir is selected upload files from your machine and click Submit.
- Created Data sources list will be available in the Data Sources tab.
Create Collection
- Click on Collections tab
- Click + New Collection
- Enter Collection Name
- Select Embedding Model
- Add earlier created data source and the necessary configuration
- Click Process to create the collection and index the data.
As soon as you create the collection, data ingestion begins, you can view it's status by selecting your collection in collections tab. You can also add additional data sources later on and index them in the collection.
Response generation
- Select the collection
- Select the LLM and it's configuration
- Select the document retriever
- Write the prompt or use the default prompt
- Ask the query

💖 Open Source Contribution

Your contributions are always welcome! Feel free to contribute ideas, feedback, or create issues and bug reports if you find any! Before contributing, please read the Contribution Guide.

🔮 Future developments

Contributions are welcomed for the following upcoming developments:

Support for other vector databases like Chroma, Weaviate, etc
Support for Scalar + Binary Quantization embeddings.
Support for RAG Evalutaion of different retrievers.
Support for RAG Visualization.
Support for conversational chatbot with context
Support for RAG optimized LLMs like stable-lm-3b, dragon-yi-6b, etc
Support for GraphDB

Star History

cognita's People

Contributors

Stargazers

Watchers

Forkers

dineshkumarsarangapani mittal-ishaan s1lv3rj1nx greaser22 baldcodr maruthapandian deeptishukla22 wp1122 lokesh-singla dhruv-g334 harshkumarjain vikram-htc supreet02 neobrainz mindkhichdi shahinsharifi mbaroudi parthjinwala afshinebtia pallakartheekreddy dswh swoldanski clic-ethiopia yacineali74 kwrjoseph tobitege vital121 decentralised-ai martincooperbiz veryvanya stophobia autonomous-software-engineer mdwoicke jaesuphwang mahadev-er abdelkrimz calrizien thorstone137 fiditenemini scriptonics snafi99 infoaitek24 bitsnaps ai-in-pm anil2799 techthiyanes devopstoday11 jmanhype rcastillo9x nguyenducnhaty princetrunks mivanovitch neothology waichan8 ambitiousanil saridsa1 christokur usr-av orefaleoluwayinka kumarlova mstrar76 aceaks sumanth19k yemaedahrav mohankrishna225 magaton cryotheta socho009 healthmemmo jjhw rakesh-deshalli-ravi lela-bones abhinav-kimothi id-2 diode23 repos-ai-local keyman9848 swuecho miketigerblue thomascherickal antoinersw limujun linhkid buttergls boardtwinkle-baseat fisherno94 q-thegaro spinti-cornyslip skiermonnonfutou essents-g bloggeno14 buffar88 runninger-y romangoudtenderis janssma75 audienic-vibrantman miraclene-goldma maintemagreator gament-y beimingmaster

cognita's Issues

Data Sources are Empty

Hello,
I have seen this issue being raised in other threads without any resolution, so please don't close this until resolved. I have been trying to get this working for hours and despite of setting VITE_QA_FOUNDRY_URL=http://localhost:8000 and local.metadata.yaml , the data sources are still empty. I was trying to review this tool for my channel and its quite off-putting to waste hours on this. Could I request you to fix this basic functionality of RAG before even making this project public? Thanks

DataSource type not show

After install using docker composer by default env, the data source type doesnt show, so i cant add any data.

Frontend does not show data source types

Hi, I am attempting to configure Cognita. Followed all steps listed in the README; unfortunately, after running the front-end with 'yarn dev' no data source types are listed. I am unsure what is causing this issue and documentation seems to be missing.

Data source types (localdir, web, etc.) seem to be registered in backend/modules/dataloaders, but this apparently does not come through in the front end.

I added METADATA_STORE_CONFIG='{"provider":"local","config":{"path":"local.metadata.yaml"}}' to the frontend .env (in the .yaml, a data source is listed), but this had had no effect either.

Any ideas on how to fix this?

Metadata Aware retrieval

Does Cognita Support advanced RAG techniques like Vectorindex Auto -retrievers, where the LLM extracts metadata features from the query to make Vector Search more enriched with query filters

Port pydantic v1 models to pydantic v2

DNS resolution failed for qdrant-server:6334

The requirements.txt install pydantic2 but there is a fair deal of call deprecated like @parse_obj needing to be replaced with model_validate and so on.

After attempting to make the changes I am still having an error I can't resolve

python -m local.ingest
VECTOR_DB_CONFIG: provider='qdrant' local=None url='http://qdrant-server:6333' api_key=None config=None
METADATA_STORE_CONFIG: provider='local' config={'path': 'local.metadata.yaml'}
Settings: LOG_LEVEL='info' METADATA_STORE_CONFIG=MetadataStoreConfig(provider='local', config={'path': 'local.metadata.yaml'}) VECTOR_DB_CONFIG=VectorDBConfig(provider='qdrant', local=None, url='http://qdrant-server:6333', api_key=None, config=None) TFY_SERVICE_ROOT_PATH='/' TFY_API_KEY='' OPENAI_API_KEY=None TFY_HOST='' TFY_LLM_GATEWAY_URL='/api/llm' EMBEDDING_CACHE_CONFIG=None LOCAL=False OLLAMA_URL='http://ollama-server:11434' EMBEDDING_SVC_URL='http://infinity-server:7997' RERANKER_SVC_URL='http://infinity-server:7997' ollama_model='qwen2:1.5b' ollama_port=11434 infinity_embedding_model='mixedbread-ai/mxbai-embed-large-v1' infinity_reranking_model='mixedbread-ai/mxbai-rerank-xsmall-v1' infinity_batch_size=8 infinity_port=7997 cognita_backend_port=8000 cognita_backend_host='0.0.0.0' cognita_frontend_port=5001 vite_qa_foundry_url='http://cognita-backend:8000' vite_docs_qa_delete_collections='true' vite_docs_qa_standalone_path='/' vite_docs_qa_enable_redirect='false' vite_docs_qa_max_upload_size_mb=200
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/home/maxtensor/Documents/cognita/local/ingest.py", line 6, in
from backend.indexer.indexer import ingest_data as ingest_data_to_collection
File "/home/maxtensor/Documents/cognita/backend/indexer/indexer.py", line 14, in
from backend.modules.metadata_store.client import METADATA_STORE_CLIENT
File "/home/maxtensor/Documents/cognita/backend/modules/metadata_store/client.py", line 4, in
METADATA_STORE_CLIENT = get_metadata_store_client(config=settings.METADATA_STORE_CONFIG)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maxtensor/Documents/cognita/backend/modules/metadata_store/base.py", line 212, in get_metadata_store_client
return METADATA_STORE_REGISTRYconfig.provider
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maxtensor/Documents/cognita/backend/modules/metadata_store/local.py", line 56, in init
associated_data_sources[self.data_source.fqn] = AssociatedDataSources(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/maxtensor/Documents/cognita/venv/lib/python3.11/site-packages/pydantic/main.py", line 176, in init
self.pydantic_validator.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for AssociatedDataSources
data_source_fqn
Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
For further information visit https://errors.pydantic.dev/2.7/v/string_type

If somebody is working at updating to pydantic2 and knows what is going on? 
Or have just gone down a rabbit hole to deep to get out?

Outdated package errors with python 3.10 (macOS)

I am struggling with the installation process and I receive following error:

Command:
cognita % python -m local.ingest

Error:

Traceback (most recent call last):
  File "/Users/directory/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/directory/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/directory/Digital skill academy/cognita/local/ingest.py", line 3, in <module>
    from backend.settings import Settings
  File "/Users/directory/Digital skill academy/cognita/backend/settings.py", line 5, in <module>
    from pydantic import BaseSettings
  File "/Users/directory/miniconda3/lib/python3.10/site-packages/pydantic/__init__.py", line 380, in __getattr__
    return _getattr_migration(attr_name)
  File "/Users/directory/miniconda3/lib/python3.10/site-packages/pydantic/_migration.py", line 296, in wrapper
    raise PydanticImportError(
pydantic.errors.PydanticImportError: `BaseSettings` has been moved to the `pydantic-settings` package. See https://docs.pydantic.dev/2.7/migration/#basesettings-has-moved-to-pydantic-settings for more details.

For further information visit https://errors.pydantic.dev/2.7/u/import-error

System:
MacOS (M3)

Frontend Dockerfile ARG Error

Environment Variables Not Set Correctly in Dockerfile

Description

The current Frontend Dockerfile does not set the environment variables correctly using ARG and ENV instructions. This results in the build failing when these variables are needed during the build process.

OS : MacOS (M1)

I'll push it!

Provide a docker compose file

In order to make it easy to Cognita, it would be awesome to offer & document a quick setup with docker compose and prebuilt docker containers.

cannot add data source

Hi,I have installed cognita successful.When I access the ip:5001 and add new datasource , it show me follow :
How can I reslov it?
Many thanks.

Embedding and Data sources empty

Hello Cognita team!
I bumped with the project and I found it very interesting. I'm currently running the code on localmachine but I've notice that I can't do anything because I can't create new collections

Both the Embedding and Data sources are empty.
I'm not seeing anything relevant in the console

But I can see 400 in browser console

Any advice on what I could be missing? Thanks for the help!

Feature Request: Publish Cognita as a package to pypi

For higher adoption, you have to have stable releases of Cognita and publish it as a package to pypi

This is another aspect SchiPhi's R2R is taking a lead w.r.t meeting developer expectations.

And please publish a roadmap with prioritized items.

Switch from `pip` based dependency management to `poetry`

For small projects pip is great, but for larger projects like this whose (flattened) dependency list is long and will surely keep growing more, it is better to move to poetry.

Conda can be a contender but is a very heavy option.

As the main target audience profile for this project is an AI Engineers and poetry is more suitable and it is perfect for robust dependency management, as well as environment management.

And poetry also helps with publishing of your package - I really hope you do that soon.

I added a request on that line : #191

References of why Poetry over Pip

CrossEncoder of Sentence Transformers uses best device type available on machine leading to error

Hi Team!!

While I was trying to explore Cognita, I have found an error trying to run the retrieval augmented generation locally using ./local/run.py
The machine I am using is MacBook Air, Apple M1 chip, macOS 13.1 (22C65) which has mps

As the CrossEncoder from sentence_transformers library uses below function which returns 'mps' on my machine

def get_device_name() -> Literal["mps", "cuda", "npu", "hpu", "cpu"]:
    if torch.cuda.is_available():
        return "cuda"
    elif torch.backends.mps.is_available():
        return "mps"
    elif is_torch_npu_available():
        return "npu"
    elif importlib.util.find_spec("habana_frameworks") is not None:
        import habana_frameworks.torch.hpu as hthpu

        if hthpu.is_available():
            return "hpu"
    return "cpu"

I am encountering this error

ERROR:    2024-05-28 01:23:19,041 - controller:answer:336 - Operation 'sign_out_mps()' does not support input type 'int64' in MPS backend.
Traceback (most recent call last):
  File "/Users/malyala/Desktop/cognita/backend/modules/query_controllers/example/controller.py", line 312, in answer
    outputs = await rag_chain_with_source.ainvoke(request.query)
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 2109, in ainvoke
    input = await step.ainvoke(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 2739, in ainvoke
    results = await asyncio.gather(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain_core/retrievers.py", line 157, in ainvoke
    return await self.aget_relevant_documents(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain_core/retrievers.py", line 300, in aget_relevant_documents
    raise e
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain_core/retrievers.py", line 293, in aget_relevant_documents
    result = await self._aget_relevant_documents(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain/retrievers/multi_query.py", line 104, in _aget_relevant_documents
    documents = await self.aretrieve_documents(queries, run_manager)
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain/retrievers/multi_query.py", line 137, in aretrieve_documents
    document_lists = await asyncio.gather(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain_core/retrievers.py", line 300, in aget_relevant_documents
    raise e
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain_core/retrievers.py", line 293, in aget_relevant_documents
    result = await self._aget_relevant_documents(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain/retrievers/contextual_compression.py", line 74, in _aget_relevant_documents
    compressed_docs = await self.base_compressor.acompress_documents(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain/retrievers/document_compressors/base.py", line 31, in acompress_documents
    return await run_in_executor(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain_core/runnables/config.py", line 493, in run_in_executor
    return await asyncio.get_running_loop().run_in_executor(
  File "/opt/homebrew/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/malyala/Desktop/cognita/backend/modules/reranker/mxbai_reranker.py", line 27, in compress_documents
    reranked_docs = model.rank(query, docs, return_documents=True, top_k=self.top_k)
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py", line 419, in rank
    scores = self.predict(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py", line 332, in predict
    model_predictions = self.model(**features, return_dict=True)
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py", line 1296, in forward
    outputs = self.deberta(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py", line 1066, in forward
    encoder_outputs = self.encoder(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py", line 484, in forward
    relative_pos = self.get_rel_pos(hidden_states, query_states, relative_pos)
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py", line 460, in get_rel_pos
    relative_pos = build_relative_position(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py", line 583, in build_relative_position
    rel_pos_ids = make_log_bucket_position(rel_pos_ids, bucket_size, max_position)
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py", line 546, in make_log_bucket_position
    sign = torch.sign(relative_pos)
TypeError: Operation 'sign_out_mps()' does not support input type 'int64' in MPS backend.
Traceback (most recent call last):
  File "/Users/malyala/Desktop/cognita/backend/modules/query_controllers/example/controller.py", line 312, in answer
    outputs = await rag_chain_with_source.ainvoke(request.query)
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 2109, in ainvoke
    input = await step.ainvoke(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 2739, in ainvoke
    results = await asyncio.gather(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain_core/retrievers.py", line 157, in ainvoke
    return await self.aget_relevant_documents(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain_core/retrievers.py", line 300, in aget_relevant_documents
    raise e
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain_core/retrievers.py", line 293, in aget_relevant_documents
    result = await self._aget_relevant_documents(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain/retrievers/multi_query.py", line 104, in _aget_relevant_documents
    documents = await self.aretrieve_documents(queries, run_manager)
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain/retrievers/multi_query.py", line 137, in aretrieve_documents
    document_lists = await asyncio.gather(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain_core/retrievers.py", line 300, in aget_relevant_documents
    raise e
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain_core/retrievers.py", line 293, in aget_relevant_documents
    result = await self._aget_relevant_documents(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain/retrievers/contextual_compression.py", line 74, in _aget_relevant_documents
    compressed_docs = await self.base_compressor.acompress_documents(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain/retrievers/document_compressors/base.py", line 31, in acompress_documents
    return await run_in_executor(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/langchain_core/runnables/config.py", line 493, in run_in_executor
    return await asyncio.get_running_loop().run_in_executor(
  File "/opt/homebrew/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/malyala/Desktop/cognita/backend/modules/reranker/mxbai_reranker.py", line 27, in compress_documents
    reranked_docs = model.rank(query, docs, return_documents=True, top_k=self.top_k)
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py", line 419, in rank
    scores = self.predict(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py", line 332, in predict
    model_predictions = self.model(**features, return_dict=True)
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py", line 1296, in forward
    outputs = self.deberta(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py", line 1066, in forward
    encoder_outputs = self.encoder(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py", line 484, in forward
    relative_pos = self.get_rel_pos(hidden_states, query_states, relative_pos)
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py", line 460, in get_rel_pos
    relative_pos = build_relative_position(
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py", line 583, in build_relative_position
    rel_pos_ids = make_log_bucket_position(rel_pos_ids, bucket_size, max_position)
  File "/Users/malyala/Desktop/env/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py", line 546, in make_log_bucket_position
    sign = torch.sign(relative_pos)
TypeError: Operation 'sign_out_mps()' does not support input type 'int64' in MPS backend.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/malyala/Desktop/cognita/./local/run.py", line 41, in <module>
    answer = asyncio.run(controller.answer(ExampleQueryInput(**request)))
  File "/opt/homebrew/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/homebrew/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/Users/malyala/Desktop/cognita/backend/modules/query_controllers/example/controller.py", line 337, in answer
    raise HTTPException(status_code=500, detail=str(exp))
fastapi.exceptions.HTTPException

Looks like there are some operations(torch.sign) in deberta_v2 model which do not support on some data types (int64). Shouldn't there be a env variable to handle device type to use while running locally or deploying RAG on any system?

an error while creating a collection

FETCHING EXISTING VECTORS FAILED

infinity suggestions

Cool work! Might feature it on Social Media / Twitter.

I recently added multi-model deployments that default to the ENV variables. Aka you can set the default env variable
-> --port -> INFINITY_PORT or --batch-size -> INFINITY_BATCH_SIZE. Multiple args can be separated with ;, which works if the cli arg can be overloaded.

--model-id ${INFINITY_EMBEDDING_MODEL} --model-id ${INFINITY_RERANKING_MODEL}

INFINITY_MODEL_ID=mixedbread-ai/mxbai-embed-large-v1;mixedbread-ai/mxbai-rerank-xsmall-v1;

https://huggingface.co/spaces/michaelfeil/infinity/tree/305b1c2b583e9968aa153c45e8e50555af1d9575

Invalid URL '/embeddings': No scheme supplied. Perhaps you meant https:///embeddings?

Don't know whats going on, i have configured everything so it should work locally with ollama but this comes up while running
python3 -m local.ingest

return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
  exec(code, run_globals)
File "/home/hans/cognita/local/ingest.py", line 6, in <module>
  from backend.indexer.indexer import ingest_data as ingest_data_to_collection
File "/home/hans/cognita/backend/indexer/indexer.py", line 14, in <module>
  from backend.modules.metadata_store.client import METADATA_STORE_CLIENT
File "/home/hans/cognita/backend/modules/metadata_store/client.py", line 4, in <module>
  METADATA_STORE_CLIENT = get_metadata_store_client(config=settings.METADATA_STORE_CONFIG)
File "/home/hans/cognita/backend/modules/metadata_store/base.py", line 212, in get_metadata_store_client
  return METADATA_STORE_REGISTRY[config.provider](config=config.config)
File "/home/hans/cognita/backend/modules/metadata_store/local.py", line 69, in __init__
  VECTOR_STORE_CLIENT.create_collection(
File "/home/hans/cognita/backend/modules/vector_db/qdrant.py", line 46, in create_collection
  partial_embeddings = embeddings.embed_documents(["Initial document"])
File "/home/hans/cognita/backend/modules/embedder/embedding_svc.py", line 44, in embed_documents
  return self.call_embedding_service(texts, "documents")
File "/home/hans/cognita/backend/modules/embedder/embedding_svc.py", line 38, in call_embedding_service
  response = requests.post(self.url.rstrip("/") + "/embeddings", json=payload)
File "/home/hans/.local/lib/python3.10/site-packages/requests/api.py", line 115, in post
  return request("post", url, data=data, json=json, **kwargs)
File "/home/hans/.local/lib/python3.10/site-packages/requests/api.py", line 59, in request
  return session.request(method=method, url=url, **kwargs)
File "/home/hans/.local/lib/python3.10/site-packages/requests/sessions.py", line 575, in request
  prep = self.prepare_request(req)
File "/home/hans/.local/lib/python3.10/site-packages/requests/sessions.py", line 486, in prepare_request
  p.prepare(
File "/home/hans/.local/lib/python3.10/site-packages/requests/models.py", line 368, in prepare
  self.prepare_url(url, params)
File "/home/hans/.local/lib/python3.10/site-packages/requests/models.py", line 439, in prepare_url
  raise MissingSchema(
requests.exceptions.MissingSchema: Invalid URL '/embeddings': No scheme supplied. Perhaps you meant https:///embeddings?

My .env

METADATA_STORE_CONFIG='{"provider":"local","config":{"path":"local.metadata.yaml"}}'
VECTOR_DB_CONFIG='{"provider":"qdrant","local":"true"}'

DEBUG_MODE=true
LOG_LEVEL="DEBUG"
LOCAL=true

# If Ollama is installed in the system
OLLAMA_URL="http://localhost:11434"

local.metadata.yaml

collection_name: creditcard
data_source:
    type: localdir
    uri: sample-data/creditcards
parser_config:
    chunk_size: 512
    chunk_overlap: 40
    parser_map:
        ".pdf": PdfTableParser
embedder_config:
    provider: embedding-svc
    config:
        model: "mixedbread-ai/mxbai-embed-large-v1"

Why do have two env example/sample files in the root folder?

As of now, root folder has

An env.example file
An env.local.example file

It would be helpful to clean up the .env sample/example files