anil-matcha / chatpdf Goto Github PK

Chat with any PDF. Easily upload the PDF documents you'd like to chat with. Instant answers. Ask questions, extract information, and summarize documents with AI. Sources included.

Home Page: https://www.thesamur.ai/?utm_source=github&utm_medium=link&utm_campaign=github_chatpdf

License: MIT License

Python 42.17% Jupyter Notebook 57.83%

chatgpt gpt gpt4 langchain openai pdf chatpdf chatbot chatwithpdf pdfgpt pdf-chat-bot pdftochatbot

chatpdf's Introduction

ChatPDF

Chat with any PDF.

Easily upload the PDF documents you'd like to chat with. Instant answers. Ask questions, extract information, and summarize documents with AI. Sources included.

Create app like ChatPDF or PDF.ai in less than 10 lines of code

PDF.to.Chatbot.-.An.alternative.to.PDF.AI.chatbot.mp4

Getting Started

Code is up now, ⭐ (Star) the repo to receive updates

Replit and streamlit version coming soon

Follow Anil Chandra Naidu Matcha on twitter for updates

Subscribe to https://www.youtube.com/@AnilChandraNaiduMatcha for more such video tutorials

How to run ? (Things might change based on OS)

Create a virtual environment in python https://docs.python.org/3/library/venv.html
Run "pip install -r requirements.txt"
Set OPENAI_API_KEY environment variable with your openai key
Run "python main.py"
Change pdf file and query in code if you want to try with any other content

To run streamlit app, follow the steps run "streamlit run streamlitui.py"

Parts of the streamlit code is inspired from here

Demo link

https://heybot.thesamur.ai/

Also check

Chat with Website code

Chat with CSV code

Chat with Youtube code

ChatGPT in Discord code

chatpdf's People

Contributors

Stargazers

Watchers

Forkers

hirajanwin homer-hq ml-sketch spiderwithshoes ulrichdohou jeffara shammirbaig kervin5 commerceless entropicsky moezubair cbryg adrianwedd jamiegood mariuspatru dingla0 ykankaya mivanovitch mangollc praveentelu javaidnaik robitx asadal aurenk itsmebins vicktor rossman22590 olavl annias kacco kwaku jorik041 sorokinvld enkemmc mparje bubblecfd john-codes drgonzalomora osbarcelos79 dbtjr1103 netning sahinutar sriharshitha842 lotositsh rcadecaro jaimetr vaimalaviya1233 bernardojales redfishiaven ahsan78689 armindichter ai-awe ai-awe edgefree socialdiabetes roncrivera triken22 andymiller-og timzhan ddlmud git-tengsun mustafaersoyer zxtxjtu realwahyuputra cybon1 inaki neria05 bakiwebdev elysianysus danieloladele-forked skyoflove1406 pridabruno adnaniz xrunda lord-haji lp17863564 mz0in mateusexel coinhubx aliceoh tonyidong ovec8hkin techthiyanes yumarinfaye tonywhite11 igortodorovskiibm derailexander neverstoplearn aliushn joewellhe obinna ideal19dev20 lucasmartincalderon chengjunliulcj kristofe czhk555 tivojn taherfattahi matthewchen1008 sci-rus

chatpdf's Issues

Add ChatAudio in README.md

Recently built ChatAudio which works with ChatPDF as a base but can handle Audio.

Please consider dropping the link in README so more people can reach out to it
ChatAudio
It is hosted as a fork to ChatPDF

Hi, I'm trying to use some LLM model from huggingface, for example "lmsys/vicuna-13b-v1.3". The model could be fetched through AutoModelForCausalLM.from_pretrained. However, what's the best way to wrap the model for integration with load_qa_chain?

How the response from ChatGPT is written in the UI?

I see you have an input field with a callback process_input:

st.text_input("Message", key="user_input", disabled=not is_openai_api_key_set(), on_change=process_input)

This callback process_input is:

def process_input():
    if st.session_state["user_input"] and len(st.session_state["user_input"].strip()) > 0:
        user_text = st.session_state["user_input"].strip()
        with st.session_state["thinking_spinner"], st.spinner(f"Thinking"):
            query_text = st.session_state["pdfquery"].ask(user_text)

        st.session_state["messages"].append((user_text, True))
        st.session_state["messages"].append((query_text, False))

Here you are only writing the response from the model into the st.session_state["messages"]. How the response is getting written the to the UI?

Will appreciate if you can please add an explanation. Thanks.

streamlit_chat error

when i run 'streamlit run streamlitui.py'
some error message show:

from streamlit_chat import message
ModuleNotFoundError: No module named 'streamlit_chat'

embed_with_retry in 4.0 seconds as it raised RateLimitError:

etrying langchain.embeddings.openai.embed_with_retry.._embed_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors..
Retrying langchain.embeddings.openai.embed_with_retry.._embed_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors..

need gpt4？

Getting issue to install chromadb & hnswlib

-----------------------Error Log of hnswlib---------------------------
ERROR: Failed building wheel for hnswlib
Failed to build hnswlib
ERROR: Could not build wheels for hnswlib, which is required to install pyproject.toml-based projects
WARNING: You are using pip version 21.3.1; however, version 23.1.2 is available.
You should consider upgrading via the 'D:\AI_Projects\PDF_GPT\ChatPDF-main\env\Scripts\python.exe -m pip install --upgrade pip' command.

Spanish doc and Spanish answer with spanish Questions?

Credits

Hello,
The code here is very similar to the project by @viniciusarruda: https://github.com/viniciusarruda/chatpdf/tree/main
You might consider giving him credit for this :)

Unsupported OpenAI-Version header

openai.error.InvalidRequestError: Unsupported OpenAI-Version header provided: 2022-12-01. (HINT: you can provide any of the following supported versions: 2020-10-01, 2020-11-07. Alternatively, you can simply omit this header to use the default version associated with your account.)

error message

token exceed 4000 error.

Don't put a Youtube link that has no content

Don't put a Youtube link that has no content. I clicked on the link and there was nothing!

Change `st.divider()` to `st.markdown("---")` and this error goes away

It seems like you're having a problem with the st.divider() function in your Streamlit app, but there's no specific error message provided.

solution: Change st.divider() to st.markdown("---") and this error goes away

Good job this repo 👍 , thanks...

This model's maximum context length is 4097 tokens, however you requested 4236 tokens

Thanks for you contribution and work!
Sometimes I got error like this:
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 4236 tokens (3980 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

How can I solve this problem?

Your app is having trouble loading the streamlit_chat.streamlit_chat component.

I get the following error after uploading the PDF and asking a question.

_embed_with_retry in 4.0 seconds

D:\anaconda\envs\ChatPDF-main2\python.exe D:/python_project/ChatPDF-main/main.py
Using embedded DuckDB without persistence: data will be transient
Retrying langchain.embeddings.openai.embed_with_retry.._embed_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors..
Retrying langchain.embeddings.openai.embed_with_retry.._embed_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors..
Retrying langchain.embeddings.openai.embed_with_retry.._embed_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors..
Retrying langchain.embeddings.openai.embed_with_retry.._embed_with_retry in 8.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on

TypeError: get_model() got an unexpected keyword argument 'ocr_languages'

Traceback (most recent call last):
File "D:\python_project\ChatPDF-main\main.py", line 16, in
pages = loader.load_and_split()
File "D:\anaconda\envs\ChatPDF-main2\lib\site-packages\langchain\document_loaders\base.py", line 36, in load_and_split
docs = self.load()
File "D:\anaconda\envs\ChatPDF-main2\lib\site-packages\langchain\document_loaders\unstructured.py", line 61, in load
elements = self._get_elements()
File "D:\anaconda\envs\ChatPDF-main2\lib\site-packages\langchain\document_loaders\pdf.py", line 29, in _get_elements
return partition_pdf(filename=self.file_path, **self.unstructured_kwargs)
File "D:\anaconda\envs\ChatPDF-main2\lib\site-packages\unstructured\partition\pdf.py", line 68, in partition_pdf
return partition_pdf_or_image(
File "D:\anaconda\envs\ChatPDF-main2\lib\site-packages\unstructured\partition\pdf.py", line 140, in partition_pdf_or_image
layout_elements = _partition_pdf_or_image_local(
File "D:\anaconda\envs\ChatPDF-main2\lib\site-packages\unstructured\partition\pdf.py", line 227, in _partition_pdf_or_image_local
layout = process_file_with_model(
File "D:\anaconda\envs\ChatPDF-main2\lib\site-packages\unstructured_inference\inference\layout.py", line 377, in process_file_with_model
model = get_model(model_name, **kwargs)
TypeError: get_model() got an unexpected keyword argument 'ocr_languages'