This project aims to leverage advanced natural language processing (NLP) techniques and machine learning models to enhance the experience of searching for solutions to data science questions on StackOverflow. The project integrates several cutting-edge technologies, including GPT-3.5, Langchain, and Pinecone, to provide users with more accurate and efficient query responses.
- GPT-3.5 Integration: Utilizes OpenAI's GPT-3.5 model to understand and generate natural language responses to data science queries.
- Langchain: Provides language understanding and query processing capabilities to parse and interpret user queries effectively.
- Pinecone Database: Uses Pinecone's vector database to efficiently store and retrieve embeddings for query matching and similarity search.
- Clone the repository:
git clone https://github.com/TanmayAT/StackOverflow_DS_LLM-.git
- Install dependencies:
cd stackoverflow-ds-llm
pip install -r requirements.txt
- Configure API keys:
- Obtain API keys for GPT-3.5, Langchain, and Pinecone.
- Add these keys to the respective configuration files (
gpt_config.json
,langchain_config.json
,pinecone_config.json
).
- Run the application:
python main.py
- Access the application through the provided interface or API endpoints.
gpt_config.json
: Contains API key and configuration settings for GPT-3.5.langchain_config.json
: Holds configuration details for Langchain.pinecone_config.json
: Stores Pinecone API key and configuration parameters.
Contributions are welcome! If you'd like to contribute to this project, please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature/yourfeature
). - Make your changes.
- Commit your changes (
git commit -am 'Add new feature'
). - Push to the branch (
git push origin feature/yourfeature
). - Create a new Pull Request.