Giter Club home page Giter Club logo

chatchat_aimeng's Introduction

Chatchat

Problem Statement

In the era of information overload, where accessing specific and accurate answers to complex questions is both challenging and time-consuming, there is a growing need for advanced solutions. To address this, I propose the development of a cloud-based Question and Answer (Q&A) system, which is a multifunctional integrated LLM application that offer different solutions for different tasks.

Previous Efforts

Ollama, a useful tool for getting up and running with large language models locally. It is easy for integrating diffrent LLM locally, but is does not offer the solution combination of different tasks. Developing LLM application also always encounter difficult problems. e.g.

  • Cost
    • OpenAI API costs a lot and has limitations
    • Self deployed local LLM need powerful GPU
    • Hard to get started with -Privacy -Hallucination and outdated knowledge -Task composability

Therefore, I tried to build a application that unified different workflow best practice and adapt to different LLMs according to the user requirements. In this app, I offered 3 features:

  1. Local knowledge base + GPT-3.5 -- Solving the hallucination and data inaccurate.
  2. Local LLM baseline(LLama2-7b-chat) -- Solving the privacy problem
  3. Fintuned flan-t5 model -- Fintuned on specific QA dataset(Nvidia QA). Solving the cost problem, because it is light.

Prepare

Environment Requirements

  • Python 3.10+
  • Required dependencies (see requirements.txt)

Installation Steps

After you fork and git clone the project, You should do the following steps:

  1. Prepare for the virtual environment python -m venv venv
  2. Activate virtual environment.
    Windows:venv\Scripts\activate, MacOS or Linux:source venv/bin/activate
  3. Install required packages pip install -r requirements.txt

Data Sources

The dataset I used for fintuning model is nvidia-qa It is nvidia documentation question and answer pairs Q&A dataset for LLM finetuning about the NVIDIA about SDKs and blogs. alt text

Also, I have some local mannuals in pdf format for RAG purpose.

Data processing pipeline

The project is structured with notebooks are mainly for EDA and all the code used in notebook can be found in the corresponding python script which is organized in function format.

Overall, the main data processing is split train and test dataset -> data cleaning(convert to lowercase, remove Remove HTML tags, punctuation and extra white spaces) -> tokenization

Model Fine Tuning and Evaluation

alt text

alt text The model was evaluated and result compared with metrics of eval_loss, eval_runtime, eval_samples_per_second and eval_steps_per_second. The result shows that it is pretty much better than original model.

Metric Before Fine Tune After Fine Tune
eval_loss 39.28157043457031 0.16730812191963196
eval_runtime 50.671 49.9755
eval_samples_per_second 28.063 28.454
eval_steps_per_second 7.026 7.123

Traditional Non-DL Approach

alt text Here, I compared with n-gram. The main idea of generating text using N-Grams is to assume that the last word (x^{n} ) of the n-gram can be inferred from the other words that appear in the same n-gram (x^{n-1}, x^{n-2}, … x¹), which is called context. So the main simplification of the model is that we do not need to keep track of the whole sentence in order to predict the next word, we just need to look back for n-1 tokens. alt text

However, compared with polular LLM, this kind of solution is much worse. The Non-DL approach usually cannot match the performance and flexibility of deep learning methods when dealing with complex language modeling tasks.

Interesting findings

There are almost no LLM App that can offer the different workflow and model selection(esprcially for opern-source LLM) as the user want. I aasumed that it is because differnet model has their own interface and loading logic. That is why,in this project, I tried my best to implement an unified adaption for different LLMs for user to choose.

There is a problem that when the user is invoking local LLM inference, if they switch to another interface process but do not terminate the current local llm task, the gpu memory will continuously increasing. It can not be solved unless you terminate the process. Nevertheless, when you are in a Jupyter Notebook environmrnt, torch.empty_cache() function can also work well. alt text

Web

The device requirement:

  • Nvidia A10 GPU/ T4 GPU
  • Other requirements from .env file Start method:
cd src
chainlit run web/main.py -w

alt text

Due to the high cost of the local LLM(because it should run on a powerful GPU), in my online web demo I cancelled this feature. If you want to try, please let me know and I will keep it open for a shorter time. Thank you!

Deployment

Deployed on AWS by ECS and ECR. The endpoint is: http://44.211.116.41:8080/ alt text

chatchat_aimeng's People

Contributors

0xhzx avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.