Giter Club home page Giter Club logo

chat-with-your-doc's Introduction

chat-with-your-doc

chat-with-your-doc is a demonstration application that leverages the capabilities of Azure OpenAI GPT-4 and LangChain to enable users to chat with their documents. This repository hosts the codebase, instructions, and resources needed to set up and run the application.

Introduction

The primary goal of this project is to simplify the interaction with documents and extract valuable information with using natural language. This project is built using LangChain and Azure OpenAI GPT-4/ChatGPT to deliver a smooth and natural conversational experience to the user.

Features

  • Upload documents as external knowledge base for Azure OpenAI GPT-4/ChatGPT.
  • Support various format including PDF, DOCX, PPTX, TXT and etc.
  • Chat with the document content, ask questions, and get relevant answers based on the context.
  • User-friendly interface to ensure seamless interaction.

Todo

  • Show source documents for answers in the web gui
  • Support streaming of answers
  • Support swith of chain type and streaming LangChain output in the web gui

Architecture

Installation

To get started with Chat-with-your-doc, follow these steps:

  1. Clone the repository:
git clone https://github.com/linjungz/chat-with-your-doc.git
  1. Change into the chat-with-your-doc directory:
cd chat-with-your-doc
  1. Install the required Python packages:
pip install -r requirements.txt

Configuration

  1. Obtain your Azure OpenAI API key, Endpoint and Deployment Name from the Azure Portal.

  2. Set the environment variable in .env file:

OPENAI_API_BASE=https://your-endpoint.openai.azure.com
OPENAI_API_KEY=your-key-here
OPENAI_DEPLOYMENT_NAME=your-deployment-name-here

Usage: CLI

The CLI application is built to support both ingest and chat commands. Python library typer is used to build the command line interface.

Ingest

This command would take the documents as input, split the texts, generate the embeddings and store in a vector store FAISS. The vector store would be store locally for later used for chat.

$ python chat_cli.py ingest --help

 Usage: chat_cli.py ingest [OPTIONS] DOC_PATH INDEX_NAME

Arguments:
doc_path        TEXT  Path to the documents to be ingested, support glob pattern [required]
index_name      TEXT  Name of the index to be created [default: None] [required]

Options:
--help          Show this message and exit. 

Chat

This command would start a interactive chat, with documents as a external knowledge base in a vector store. You could choose which knowledge base to load for chat.

$ python chat_cli.py chat --help 

Usage: chat_cli.py chat [OPTIONS]

Options:
--index-name        TEXT  [default: index]
--help                    Show this message and exit.

Usage: Web

This will initialize the application and open up the user interface in your default web browser. You can now upload a document to create a knowledge base and start a conversation with it.

Gradio is used for quickly building the Web GUI and Hupper is used to ease the development.

For development purpuse, you may run python watcher.py to start the web gui. Or you may directly run python chat_web.py without monitoring the change of the source files.

Reference

Langchain is leveraged to quickly build a workflow interacting with Azure GPT-4. ConversationalRetrievalChain is used in this particular use case to support chat history. You may refer to this link for more detail.

For chaintype, by default stuff is used. For more detail, please refer to this link

Credits

License

chat-with-your-doc is released under the MIT License. See the LICENSE file for more details.

chat-with-your-doc's People

Contributors

linjungz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.