This repository contains the code for the Langchain and LlamaIndex chatbots, which utilize natural language to SQL processing with Langchain and LLamaIndex using OpenAI GPT-3.5 Turbo LLM engine. The data source for these chatbots is IPEDS (Integrated Postsecondary Education Data System).
IPEDS is a comprehensive data source maintained by the National Center for Education Statistics (NCES), which collects data from all primary providers of postsecondary education in the United States. It covers various aspects of postsecondary education, including enrollment, graduation rates, financial aid, institutional characteristics, and more.
Follow these instructions to get the project up and running on your local machine.
Make sure you have Python 3.11 installed on your system.
Clone the repository using the following command:
git clone https://github.com/hemadataworksai/ipedsllm.git
# Create a virtual environment
python -m venv venv
# Activate the virtual environment
venv\Scripts\activate
# Create a virtual environment
python3.11 -m venv venv
# Activate the virtual environment
source venv/bin/activate
Install the project dependencies using pip:
pip install -r requirements.txt
Create a .env
file in the root directory of the project and save the following details:
OPENAI_API_KEY=YOUR_OPENAI_API_KEY
DB_URL=YOUR_DATABASE_URL
LANGCHAIN_TRACING_V2=YOUR_LANGCHAIN_TRACING_V2_SETTING
LANGCHAIN_ENDPOINT=YOUR_LANGCHAIN_ENDPOINT
LANGCHAIN_API_KEY=YOUR_LANGCHAIN_API_KEY
Replace YOUR_OPENAI_API_KEY
, YOUR_DATABASE_URL
, YOUR_LANGCHAIN_TRACING_V2_SETTING
, YOUR_LANGCHAIN_ENDPOINT
, and YOUR_LANGCHAIN_API_KEY
with appropriate values.
To run the Langchain chatbot with Streamlit, use the following command:
streamlit run src/langchain_chatbot/components/main.py
To run the LlamaIndex chatbot with Streamlit, use the following command:
streamlit run src/llamaIndex_chat/components/main.py