MedBot is an innovative healthcare chatbot project that leverages large Language Models along with Retrieval-Augmented Generation (RAG) from trusted databases created from PubMed datasets to facilitate seamless and intuitive communication in the realm of medical assistance. Designed to enhance the interaction between users and healthcare information, MedBot offers immediate responses to inquiries related to health, wellness, and medical queries. This intelligent chatbot employs RAG, natural language processing, and understanding to provide accurate and personalized responses, making it a reliable companion for individuals seeking information on symptoms, medications, and general healthcare advice. MedBot aims to bridge the gap between users and healthcare knowledge, offering a convenient and accessible platform for health-related conversations.
This project utilizes Large Language Models with Retrieval-Augmented Generation (RAG), trained on reliable medical datasets collected from PubMed. The bot demonstrates impressive performance metrics, including:
96.7% Content Precision 95% Context Recall 85% Faithfulness 73% Answer Relevancy 69.4% Answer Correctness
![Screen Shot 2024-05-29 at 12 30 15 PM](https://private-user-images.githubusercontent.com/150852458/334898788-8d1a6b24-f5b0-4cc2-9c94-02a6d9ca77a7.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjIyNTI5OTIsIm5iZiI6MTcyMjI1MjY5MiwicGF0aCI6Ii8xNTA4NTI0NTgvMzM0ODk4Nzg4LThkMWE2YjI0LWY1YjAtNGNjMi05Yzk0LTAyYTZkOWNhNzdhNy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyOVQxMTMxMzJaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0yYWFhZTU0M2IxOWYyNTViMjEyYjNkNDk2NDg0MjA3ZjI5M2M5NDFiZjFhMTI1Zjk3ZjZkM2IwMzE1ZGUxNjdiJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.WL-bOtMSiT0JgNiQcvIB8AAT3FAuzhNankL37u5xnVI)
![Screen Shot 2024-05-29 at 12 28 17 PM](https://private-user-images.githubusercontent.com/150852458/334898843-73340c9a-f732-40de-9570-c9c23c375d94.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjIyNTI5OTIsIm5iZiI6MTcyMjI1MjY5MiwicGF0aCI6Ii8xNTA4NTI0NTgvMzM0ODk4ODQzLTczMzQwYzlhLWY3MzItNDBkZS05NTcwLWM5YzIzYzM3NWQ5NC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyOVQxMTMxMzJaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT03NjcwNTg3YmNlOGY0ODcxNTNkMzUzNmExOWZjOTUzM2E2ZTljMTFkNjAxMTgwNTE4NmU1NjNlZGQxOWYyYTg0JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.C244-8jngbobnYeI3Y6bpVnw-pUhkcgqU_exlIyAVzQ)
Identifying and bringing in all necessary tools and resources required for the project, including programming languages, machine learning frameworks, data collection tools, and other dependencies. Collecting and preparing data relevant to the project, including data cleaning, preprocessing, and structuring the data in a format suitable for analysis or modeling.
Transforming the collected data into numerical vectors while preserving semantic meaning, typically using word embedding models or contextual embedding models to represent words or sentences as dense vectors.
Validating the effectiveness of the vector database by querying it with known inputs and verifying that the retrieved vectors match expectations.
Assessing the robustness of the vector database to handle paraphrased queries and verifying its ability to accurately retrieve relevant vectors even when the query is rephrased or expressed differently.
Building a pipeline that integrates retrieval and generation techniques using LANGCHAIN, based on the vector database.
Assessing the effectiveness of the RAG pipeline in generating relevant responses to queries using RAGAS (Retrieval Augmented Generation Assessment Suite) or a similar evaluation framework.
Calculating various metrics to assess the efficiency and effectiveness of the RAG pipeline, including faithfulness, context precision, context recall, answer similarity, answer relevancy, and answer correctness. Summarizing and analyzing the evaluation results, highlighting the performance of the RAG pipeline based on the calculated metrics.