This repository contains a Python script for processing books and generating summaries, questions, and answers using large language models (LLMs).
The "qa-llamacpp-pipeline" notebook contains all the code required to execute the tasks. The pipeline is designed to allow you to easily extract meaningful insights from books PDF files. We implement three naive LLM chains followed by a regex to generate the dataset.
Sample Output (without regex):
To get started with the QAPipeline using LLMs, follow these steps:
- Clone this repository to kaggle or your local machine.
- Open the "qa-llamacpp-pipeline" notebook.
- Configure notebook environment and input pdf.
- Run the notebook cells to execute the tasks and generate summaries, questions, and answers.
- This project utilized the compute power generously provided by Kaggle and open large language models available on huggingface using llamacpp.