Giter Club home page Giter Club logo

pebblo's Introduction


GitHub MIT license Documentation

PyPI PyPI - Downloads PyPI - Python Version

Discord Twitter Follow

Pebblo enables developers to safely load data and promote their Gen AI app to deployment without worrying about the organization’s compliance and security requirements. The project identifies semantic topics and entities found in the loaded data and summarizes them on the UI or a PDF report.

Pebblo has these components.

  1. Pebblo Server - a REST api application with topic-classifier, entity-classifier and reporting features
  2. Pebblo SafeLoader - a thin wrapper to Gen-AI framework's data loaders
  3. Pebblo SafeRetriever - a retrieval QA chain that enforces identity and semantic rules on Vector database retrieval before LLM inference

Pebblo Server

Installation

Using pip

pip install pebblo --extra-index-url https://packages.daxa.ai/simple/

Download python package

Alternatively, download and install the latest Pebblo python .whl package from URL https://packages.daxa.ai/pebblo/0.1.13/pebblo-0.1.13-py3-none-any.whl

Example:

curl -LO "https://packages.daxa.ai/pebblo/0.1.13/pebblo-0.1.13-py3-none-any.whl" 
pip install pebblo-0.1.13-py3-none-any.whl

Run Pebblo Server

pebblo

Pebblo Server now listens to localhost:8000 to accept Gen-AI application data snippets for inspection and reporting.

Pebblo Optional Flags
  • --config <file>: specify a configuration file in yaml format.

See configuration guide for knobs to control Pebblo Server behavior like enabling snippet anonymization, selecting specific report renderer, etc.

Using Docker

docker run -p 8000:8000 docker.daxa.ai/daxaai/pebblo

Local UI can be accessed by pointing the browser to https://localhost:8000.

See installation guide for details on how to pass custom config.yaml and accessing PDF reports in the host machine.

Troubleshooting

Refer to troubleshooting guide.

Pebblo SafeLoader

Langchain

Pebblo SafeLoader is natively supported in Langchain framework. It is available in Langchain versions >=0.1.7

Enable Pebblo in Langchain Application

Add PebbloSafeLoader wrapper to the existing Langchain document loader(s) used in the RAG application. PebbloSafeLoader is interface compatible with Langchain BaseLoader. The application can continue to use load() and lazy_load() methods as it would on a Langchain document loader.

Here is the snippet of Lanchain RAG application using CSVLoader before enabling PebbloSafeLoader.

    from langchain_community.document_loaders import CSVLoader

    loader = CSVLoader(file_path)
    documents = loader.load()
    vectordb = Chroma.from_documents(documents, OpenAIEmbeddings())

The Pebblo SafeLoader can be enabled with few lines of code change to the above snippet.

    from langchain_community.document_loaders import CSVLoader
    from langchain_community.document_loaders.pebblo import PebbloSafeLoader

    loader = PebbloSafeLoader(
                CSVLoader(file_path),
                name="acme-corp-rag-1", # App name (Mandatory)
                owner="Joe Smith", # Owner (Optional)
                description="Support productivity RAG application", # Description (Optional)
    )
    documents = loader.load()
    vectordb = Chroma.from_documents(documents, OpenAIEmbeddings())

See here for samples with Pebblo SafeLoader enabled RAG applications and this document for more details.

Pebblo SafeRetriever

Langchain

PebbloRetrievalQA chain uses a SafeRetrieval to enforce that the snippets used for in-context are retrieved only from the documents authorized for the user and semantically allowed for the Gen-AI application.

Here is a sample code for the PebbloRetrievalQA with authorized_identities from the user accessing the RAG application, passed in auth_context.

from langchain_community.chains import PebbloRetrievalQA
from langchain_community.chains.pebblo_retrieval.models import AuthContext, ChainInput

safe_rag_chain = PebbloRetrievalQA.from_chain_type(
    llm=llm,
    app_name="pebblo-safe-retriever-demo",
    owner="Joe Smith",
    description="Safe RAG demo using Pebblo",
    chain_type="stuff",
    retriever=vectordb.as_retriever(),
    verbose=True,
)

def ask(question: str, auth_context: dict):
    auth_context_obj = AuthContext(**auth_context)
    chain_input_obj = ChainInput(query=question, auth_context=auth_context_obj)
    return safe_rag_chain.invoke(chain_input_obj.dict())

See here for samples with Pebblo SafeRetriever enabled RAG applications and this document for more details.

Contribution

Pebblo is a open-source community project. If you want to contribute see Contributor Guidelines for more details.

License

Pebblo is released under the MIT License

pebblo's People

Contributors

dineshkrr avatar dristysrivastava avatar eltociear avatar gr8nishan avatar kumarnitin19 avatar kunaljadhav5 avatar rahul-trip avatar raj725 avatar rohininn avatar rutujaac avatar rutujacopods avatar shreyas-damle avatar sid-cd-daxa avatar siddheshwar-more avatar srics avatar sridhar-daxa avatar yograjopcito avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pebblo's Issues

[Enhancement] LlamaIndex SafeRetriever support

Follow on for #296, which introduced SafeLoader for LlamaIndex.

Add support for SafeRetriever with,

  • identity enforcement / filtering on the doc snippets retrieved from vector db
  • semantic topic policy enforcement / filtering on the doc snippets retrieved from vector db
  • send app-discover and per-prompt stats to pebblo server and pebblo cloud

Pebblo Safe Retriever details in Local UI

We would like to have Safe Retriever Details on the local UI. Here two separate tabs would be shown for loader type and retrieval type applications.
This would include below details on Safe Retrieval App listing page:

  • Active Users (with mouse over showing actual "Top" users - cut off at N user-ids, N = 3)
  • Retrieve Documents (with mouse over showing "Top" documents retrieved - cut off at N user-ids, N = 3)
  • Retrievals (i.e. prompt count, cumulative)
  • VectorDB (with mouse over showing first N vector db names, N=3)
  • Owner(app owner)

OSError: cannot load library 'pango-1.0-0'

  • Pebblo server is failing on conda environment due to Pango(WeasyPrint) dependency issue
  • Once Pebblo installation is completed and user tries to run Pebblo
  • Below runtime error occurs -
    WeasyPrint could not import some external libraries. Please carefully follow the installation steps before reporting an issue: https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#installation https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#troubleshooting
  • Please refer below screenshot for complete error message:

image

OSError: [E050] can't find model

Description:

  • Once Pebblo installation is completed and user tries to run Pebblo
  • Below runtime error occurs - OSError: [E050] can't find model en_core_web_lg
  • Using Ubuntu VM with Ubuntu 22.04
  • Please refer below screenshot for complete error message:
Screenshot 2024-01-30 at 12 41 03 AM

[Enhancement] Support for Multiple Data Sources

Pebblo (as of 0.1.12 version) supports single data source. Having support multiple data sources within single RAG application would be a good feature.

Description:
When I have multiple data sources to be used in my app, I should be able to see all those data sources and their details in the pebblo report.

As part of this feature, following changes would need to be done in the report:

  1. Report Summary: Aggregate details about all data data sources.
  2. Top Files With Most Findings: Add new column to show to which data source the file belongs.
  3. Data Source: It would show snippets about all data sources.

[Local UI] Add delete app

Add delete app support for SafeLoader and SafeRetriever apps

Tasks

  • Add delete app API to pebbo server
  • Add Delete link in Local UI

Linter for UI code

As we have ruff linter for python code, we would like to have linter for UI code.

Dockerize Pebblo

Create a Dockerfile for pebblo to quickly run pebblo in a containerized environment.

Tasks:

  • Add Pebblo Dockerfile
  • CI: Add docker image push on release/tag

[Bug] Unable to reach Pebblo Server

Description

When execute RAG app, we are getting error "unable to reach pebblo server." but it is generating report as expected on Pebblo Server.

Error message
$ python3 fin_corp_rag_app.py
Loading RAG documents ...
Unable to reach pebblo server.
Loaded 93 documents ...

Hydrating Vector DB ...
Finished hydrating Vector DB ...

Expected behavior
It should call Pebblo APIs and pdf report should get generated without any error.

Additional context
Pebblo server was healthy when this error occured.

System:

  • OS: Mac
  • GPU/CPU:
  • Pebblo version (commit or version number): 0.1.11
  • Langchain version: 0.1.9
  • DocumentStore:
  • Reader:
  • Retriever:

Pebblo --help should not show empty progress bar

$ pebblo --help
  0%|                                                                                                      | 0/10 [00:00<?, ?it/s]usage: pebblo [-h] [--config CONFIG]

Pebblo CLI

options:
  -h, --help       show this help message and exit
  --config CONFIG  Config file path
  0%|                                                                                                       | 0/10 [00:00<?, ?it/s]

Add make command to format all/changed files

Description:
We need to streamline our code formatting process by implementing a make command that can format either all files in the repository or only the changed files. This will help maintain consistency in our codebase and make it easier for developers to adhere to our coding standards.


Tasks:

  • Configure the formatting commands (format and format-diff) to use the ruff formatting tool
  • Document the usage of these commands in the project's README or documentation.

Labels: enhancement, formatting

Capture and display topic classification confidence score

Capture and display RAG document snippet's classifier confidence score in Local UI and PDF report

Tasks

  • Capture confidence score in backend schema
  • Display confidence score in PDF report
  • Display confidence score in Local UI on SafeLoader and SafeRetriever pages

[Enchancement] Local UI

As of pebblo 0.1.9, we have only pebblo_report.pdf as output of the pebblo package.
It would be useful if we can have simple UI running locally on pebblo server which will show all apps discovered, giving details about each app(equivalent to pebblo_report.pdf).

Anonymize document snippets in the report

Feature: Document snippet anonymizer

Anonymize document snippets in Pebblo report. As Pebblo is considered for environments beyond dev, anonymization will help distribute the report to more app stakeholders.

[Enhancement] App Histoy

As of pebblo 0.1.7, we have good information about current state of the app loading. It would be useful to capture history of last 5 loads of the app.
This will give below information about each load:

  • Location of the report and report file name
  • How many findings were there in that report
  • How many files were there with findings
  • When the report was generated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.