Giter Club home page Giter Club logo

ilhansevval / web_chat_explorer Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 9.03 MB

This project hosts seamlessly integrate web scraping and interactive chat functionalities. Extract information effortlessly from any website. Engage in dynamic conversations with uploaded documents, powered by text similarity search. Streamline your exploration with this versatile Python tool.

License: Apache License 2.0

Python 100.00%
chatbot openai streamlit-webapp webscraping

web_chat_explorer's Introduction

WebChat Explorer

WebChat Explorer is a versatile tool that combines web scraping with interactive chat capabilities, providing users with a seamless experience to explore and interact with information from any website.

Overview

ย 

This project has two main functionalities:

1. Web Scraper:

  • Users can scrape information from any website by providing the desired domain.
  • The HTML of the webpage is retrieved using the requests library.
  • bs4 (Beautiful Soup) parses the HTML code, extracting text and links.
  • The parsed information is presented to the user, with options to download the extracted text or links.

2. Interactive Chat with Text Similarity Search:

  • Users can upload a file and engage in an interactive chat with the system.
  • Text similarity search is performed using scikit-learn to find relevant text chunks based on user queries.
  • The chatbot leverages OpenAI's GPT-3.5 (Chat GPT) language model to generate responses to user queries.
  • The chat history is maintained in the st.session_state dictionary, persisting across Streamlit sessions.

Dependencies

This project relies on the following libraries:

  • streamlit: for building the user interface.
  • openai: for generating responses to user questions.
  • tiktoken: for tokenizing text
  • scikit-learn: for finding the relevant text chunks based on a user's question.
  • numpy: for creating arrays
  • pandas: for creating dataframes
  • bs4: for parsing HTML code.
  • requests: for retreiving the HTML of a webpage.

Usage

Follow these steps to set up and run the project:

  1. Create a virtual environment:
python3 -m venv my_env
source my_env/bin/activate # Mac OS or Linux
.\my_env\Scripts\activate # Windows
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the Streamlit server:
streamlit run main.py
streamlit run app.py

Access the web scraper by navigating to http://localhost:8501 and the conversational chatbot by visiting http://localhost:8502.

How it Works

Web Scraper:

  1. User enters a URL in the input field.
  2. requests retrieves relevant HTML based on the user's URL.
  3. bs4 parses the HTML code into text and links.
  4. The chatbot displays parsed information, providing options to download text or links.

Interactive Chat:

  1. User enters a question in the input field.
  2. Text similarity search using scikit-learn retrieves relevant text chunks.
  3. User's question is added to retrieved text chunks to create an augmented query.
  4. GPT-3.5 generates a response to the augmented query.
  5. The chatbot displays the response, along with the chat history.

The chat history is stored in the st.session_state dictionary for persistence.

web_chat_explorer's People

Contributors

ilhansevval avatar

Stargazers

Pradeep Pawar avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.