Giter Club home page Giter Club logo

llm-rag-invoice-haystack's Introduction

RAG Invoice Data Processing with Llama2, Haystack 2, & Docker

Requirements

To run this project, you will need a Hugging Face API token, which should be set in your local environment as follows:

You will need to have a Hugging Face API token in your local environment:

HF_API_TOKEN=''

Quickstart

RAG runs on Haystack, Weaviate, and HuggingFace

  1. Build the Docker images, containers, and services:

docker-compose up --build

  1. (Optional) Copy text PDF files to the data folder. (An example invoice is provided for demo/testing purposes.)

  2. Open a new CLI tab and perform the following to SSH into your running container:

  • Retrieve your CONTAINER_ID with docker ps.
  • Access the container using docker exec -it CONTAINER_ID bash.
  1. Run the script to convert PDF documents to vector embeddings and save them in Weaviate vector storage:

python ingest.py

  1. Run the following script to process inquiries about the data and fetch the answers:

python main.py "What is the invoice number value?"

Examples

python main.py "What is the invoice seller name, address and tax ID? use this format for the answer {\"seller_name\": {},\"address\": {},\"tax_id\": {}}"

Answer:
 {"seller_name": "Chapman, Kim and Green", "address": "64731 James Branch Smithmouth, NC 26872", "tax_id": "949-84-9105"}
==================================================
Time to retrieve answer: 3.551683001991478
python main.py "retrieve invoice IBAN in the format {\"invoice_iban\": {}}"

Answer:
{"invoice_iban": {"GB50ACIE59715038217063"}}
==================================================
Time to retrieve answer: 9.18394808798621
python main.py "retrieve two values: net price and gross worth for the second invoice item in this format: {\"net_price\": {},\"gross_worth\": {}}"

Answer:
{"net_price": {"1.00": 7.50},"gross_worth": {"1.00": 12.99}}
==================================================
Time to retrieve answer: 3.623518834996503

Credits

This project is based on Andrej Baranovskij's original work.

Changes

  • The code has been refactored to integrate with Haystack 2, and now utilizes Llama2 hosted on HuggingFace. This change enhances response times and simplifies the architecture by eliminating the need for a local LLM.
  • The application is fully containerized using Docker.
  • Response times have significantly improved, and all scenarios in the prompts-structured directory are now generating answers.

llm-rag-invoice-haystack's People

Contributors

jfagan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.