AnswerPDF is a web app that allows users to upload PDF files, and then get answers to questions based on the contents of the PDFs. The app uses Gemini Pro, GenAI Embeddingds by Google and Facebook AI Similarity Search to accomplish this.
This project was made as a part of the GenAI Internship at IGDTUW held in association with Sansoftech in the Summer of 2024.
- Multiple PDFs support
- PDF Text Extraction
- Text Chunking
- Vector Store Creation using FAISS and Google GenAI Embeddings
- Question Prompt Generation using Gemini Pro
- Interactive Streamlit interface for user input
- The PDF files uploaded by users are processed using the
PyPDF2
library to extract text from each page of the document. - The extracted text is split into smaller chunks using a recursive character-based text splitter from
LangChain
. This helps in breaking down large texts into more manageable pieces for further processing. Google Generative AI Embeddings
are used to generate embeddings for each text chunk. These embeddings are then used to create aFAISS
vector store, which enables efficient similarity search.- A conversational chain is set up using
LangChain
and theGemini Pro
model. This chain takes the user's question and context as input and generates a response based on the content of the uploaded PDF files. - The application provides an interactive web interface using
StreamLit
, where users can upload PDF files, input questions, and receive responses summarizing the content of the files.
- Clone the repository
cd
into the repository
cd AnswerPDF
- Install the necessary dependencies
pip install -r requirements.txt
- Create a Gemini API key from Google AI Studio
- Create a
.env
file and add your Google API Key:
GOOGLE_API_KEY="<your key here>"
- Run the application
streamlit run app.py
- Open the displayed link
- Upload PDF files and click Submit
- Start asking questions about their content!