Giter Club home page Giter Club logo

anil-matcha / chatpdf Goto Github PK

View Code? Open in Web Editor NEW
1.3K 18.0 195.0 15 KB

Chat with any PDF. Easily upload the PDF documents you'd like to chat with. Instant answers. Ask questions, extract information, and summarize documents with AI. Sources included.

Home Page: https://www.thesamur.ai/?utm_source=github&utm_medium=link&utm_campaign=github_chatpdf

License: MIT License

Python 42.17% Jupyter Notebook 57.83%
chatgpt gpt gpt4 langchain openai pdf chatpdf chatbot chatwithpdf pdfgpt pdf-chat-bot pdftochatbot

chatpdf's Introduction

ChatPDF

Chat with any PDF.

Easily upload the PDF documents you'd like to chat with. Instant answers. Ask questions, extract information, and summarize documents with AI. Sources included.

Create app like ChatPDF or PDF.ai in less than 10 lines of code

PDF.to.Chatbot.-.An.alternative.to.PDF.AI.chatbot.mp4

Getting Started

Code is up now, โญ (Star) the repo to receive updates

Replit and streamlit version coming soon

Follow Anil Chandra Naidu Matcha on twitter for updates

Subscribe to https://www.youtube.com/@AnilChandraNaiduMatcha for more such video tutorials

How to run ? (Things might change based on OS)

  1. Create a virtual environment in python https://docs.python.org/3/library/venv.html

  2. Run "pip install -r requirements.txt"

  3. Set OPENAI_API_KEY environment variable with your openai key

  4. Run "python main.py"

  5. Change pdf file and query in code if you want to try with any other content

To run streamlit app, follow the steps run "streamlit run streamlitui.py"

Parts of the streamlit code is inspired from here

Demo link

https://heybot.thesamur.ai/

Also check

Chat with Website code

Chat with CSV code

Chat with Youtube code

ChatGPT in Discord code

chatpdf's People

Contributors

aliceoh avatar anil-matcha avatar taherfattahi avatar vadootvpeer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chatpdf's Issues

Add ChatAudio in README.md

Recently built ChatAudio which works with ChatPDF as a base but can handle Audio.

Please consider dropping the link in README so more people can reach out to it
ChatAudio
It is hosted as a fork to ChatPDF

use LLM from huggingface

Hi, I'm trying to use some LLM model from huggingface, for example "lmsys/vicuna-13b-v1.3". The model could be fetched through AutoModelForCausalLM.from_pretrained. However, what's the best way to wrap the model for integration with load_qa_chain?

How the response from ChatGPT is written in the UI?

I see you have an input field with a callback process_input:

st.text_input("Message", key="user_input", disabled=not is_openai_api_key_set(), on_change=process_input)

This callback process_input is:

def process_input():
    if st.session_state["user_input"] and len(st.session_state["user_input"].strip()) > 0:
        user_text = st.session_state["user_input"].strip()
        with st.session_state["thinking_spinner"], st.spinner(f"Thinking"):
            query_text = st.session_state["pdfquery"].ask(user_text)

        st.session_state["messages"].append((user_text, True))
        st.session_state["messages"].append((query_text, False))

Here you are only writing the response from the model into the st.session_state["messages"]. How the response is getting written the to the UI?

Will appreciate if you can please add an explanation. Thanks.

streamlit_chat error

when i run 'streamlit run streamlitui.py'
some error message show:

from streamlit_chat import message
ModuleNotFoundError: No module named 'streamlit_chat'

embed_with_retry in 4.0 seconds as it raised RateLimitError:

etrying langchain.embeddings.openai.embed_with_retry.._embed_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors..
Retrying langchain.embeddings.openai.embed_with_retry.._embed_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors..

need gpt4๏ผŸ

Getting issue to install chromadb & hnswlib

-----------------------Error Log of hnswlib---------------------------
ERROR: Failed building wheel for hnswlib
Failed to build hnswlib
ERROR: Could not build wheels for hnswlib, which is required to install pyproject.toml-based projects
WARNING: You are using pip version 21.3.1; however, version 23.1.2 is available.
You should consider upgrading via the 'D:\AI_Projects\PDF_GPT\ChatPDF-main\env\Scripts\python.exe -m pip install --upgrade pip' command.

Unsupported OpenAI-Version header

openai.error.InvalidRequestError: Unsupported OpenAI-Version header provided: 2022-12-01. (HINT: you can provide any of the following supported versions: 2020-10-01, 2020-11-07. Alternatively, you can simply omit this header to use the default version associated with your account.)

_embed_with_retry in 4.0 seconds

D:\anaconda\envs\ChatPDF-main2\python.exe D:/python_project/ChatPDF-main/main.py
Using embedded DuckDB without persistence: data will be transient
Retrying langchain.embeddings.openai.embed_with_retry.._embed_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors..
Retrying langchain.embeddings.openai.embed_with_retry.._embed_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors..
Retrying langchain.embeddings.openai.embed_with_retry.._embed_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors..
Retrying langchain.embeddings.openai.embed_with_retry.._embed_with_retry in 8.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on

TypeError: get_model() got an unexpected keyword argument 'ocr_languages'

Traceback (most recent call last):
File "D:\python_project\ChatPDF-main\main.py", line 16, in
pages = loader.load_and_split()
File "D:\anaconda\envs\ChatPDF-main2\lib\site-packages\langchain\document_loaders\base.py", line 36, in load_and_split
docs = self.load()
File "D:\anaconda\envs\ChatPDF-main2\lib\site-packages\langchain\document_loaders\unstructured.py", line 61, in load
elements = self._get_elements()
File "D:\anaconda\envs\ChatPDF-main2\lib\site-packages\langchain\document_loaders\pdf.py", line 29, in _get_elements
return partition_pdf(filename=self.file_path, **self.unstructured_kwargs)
File "D:\anaconda\envs\ChatPDF-main2\lib\site-packages\unstructured\partition\pdf.py", line 68, in partition_pdf
return partition_pdf_or_image(
File "D:\anaconda\envs\ChatPDF-main2\lib\site-packages\unstructured\partition\pdf.py", line 140, in partition_pdf_or_image
layout_elements = _partition_pdf_or_image_local(
File "D:\anaconda\envs\ChatPDF-main2\lib\site-packages\unstructured\partition\pdf.py", line 227, in _partition_pdf_or_image_local
layout = process_file_with_model(
File "D:\anaconda\envs\ChatPDF-main2\lib\site-packages\unstructured_inference\inference\layout.py", line 377, in process_file_with_model
model = get_model(model_name, **kwargs)
TypeError: get_model() got an unexpected keyword argument 'ocr_languages'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.