Giter Club home page Giter Club logo

skywalkerdarren / chatweb Goto Github PK

View Code? Open in Web Editor NEW
869.0 20.0 136.0 101 KB

ChatWeb can crawl web pages, read PDF, DOCX, TXT, and extract the main content, then answer your questions based on the content, or summarize the key points.

License: MIT License

Python 89.19% Jupyter Notebook 9.74% Dockerfile 0.98% Shell 0.10%
chatgpt embedding gpt-35-turbo news-extractor newspaper openai pgvector postgresql vector-database faiss ai gpt crawler docx pdf

chatweb's Introduction

ChatWeb

Open In Colab

English Doc 中文文档

ChatWeb can crawl any webpage or extract text from PDF, DOCX, TXT files, and generate an embedded summary. It can also answer your questions based on the content of the text. It is implemented using the chatAPI and embeddingAPI based on gpt3.5, as well as a vector database.

Basic Principle

The basic principle is similar to existing projects such as chatPDF and automated customer service AI.

Crawl web pages Extract text content Use GPT3.5's embedding API to generate vectors for each paragraph Calculate the similarity score between each paragraph's vector and the entire text's vector to generate a summary Store the vector-text mapping in a vector database Generate keywords from user input Generate a vector from the keywords Use the vector database to perform a nearest neighbor search and return a list of the most similar texts Use GPT3.5's chat API to design a prompt that answers the user's question based on the most similar texts in the list. The idea is to extract relevant content from a large amount of text and then answer questions based on that content, which can achieve a similar effect to breaking through token limits.

An improvement was made to generate vectors based on keywords rather than the user's question, which increases the accuracy of searching for relevant texts.

Getting Started

Manual installation:

  • Install Python3
  • Download this repository by running git clone https://github.com/SkywalkerDarren/chatWeb.git
  • Navigate to the directory by running cd chatWeb
  • Copy config.example.json to config.json
  • Edit config.json and set open_ai_key to your OpenAI API key
  • Install dependencies by running pip3 install -r requirements.txt
  • Start the application by running python3 main.py

Docker:

if you prefer, you can also run this project using docker:

  • build the container using docker-compose build (only needed once when you are not planning to contibute to this repo)
  • copy config.example.json to config.json and set all the needed stuff. The example config is already fine for running with docker, no need to change anything there, if you don't have the OPEN_AI_KEY in your env variables you can set it here too, or later if you run this app.
  • run the container: `docker-compose up"
  • open the application in browser: http://localhost:7860

Set language

  • Edit config.json, set language to English or other language

Mode Selection

  • Edit config.json and set mode to console, api, or webui to choose the startup mode.
  • In console mode, type /help to view commands.
  • In api mode, an API service can be provided to the outside world. api_port and api_host can be set in config.json.
  • In webui mode, a web user interface service can be provided. webui_port can be set in config.json, defaulting to http://127.0.0.1:7860.

Stream Mode

  • Edit config.json and set use_stream to true.

Setting the Temperature

  • Edit config.json and set temperature to a value between 0 and 1.
  • The smaller the value, the more conservative and stable the response will be. The larger the value, the more daring the response may be, possibly resulting in "hallucinations."

OpenAI Proxy Settings

  • Edit config.json and add open_ai_proxy for your proxy address, for example:
"open_ai_proxy": {
  "http": "socks5://127.0.0.1:1081",
  "https": "socks5://127.0.0.1:1081"
}

Install PostgreSQL (Optional)

  • Edit config.json and set use_postgres to true.
  • Install PostgreSQL.
    • The default SQL address is postgresql://localhost:5432/mydb, or you can set it in config.json.
  • Install the pgvector plugin.

Compile and install the extension (support Postgres 11+).

git clone --branch v0.4.0 https://github.com/pgvector/pgvector.git
cd pgvector
make
make install # may need sudo

Then load it in the database you want to use it in

CREATE EXTENSION vector;
  • Install dependency with pip: pip3 install psycopg2

Example

Please enter the link to the article or the file path of the PDF/TXT/DOCX document: https://gutenberg.ca/ebooks/hemingwaye-oldmanandthesea/hemingwaye-oldmanandthesea-00-e.html
Please wait for 10 seconds until the webpage finishes loading.
The article has been retrieved, and the number of text fragments is: 663
...
=====================================
Query fragments used tokens: 7219, cost: $0.0028876
Query fragments used tokens: 7250, cost: $0.0029000000000000002
Query fragments used tokens: 7188, cost: $0.0028752
Query fragments used tokens: 7177, cost: $0.0028708
Query fragments used tokens: 2378, cost: $0.0009512000000000001
Embeddings have been created with 663 embeddings, using 31212 tokens, costing $0.0124848
The embeddings have been saved.
=====================================
Please enter your query (/help to view commands):

TODO

  • Support for pdf/txt/docx files
  • Support for in-memory storage without a database (faiss)
  • Support for Stream
  • Support for API
  • Support for proxies
  • Add Colab support
  • Add language support
  • Support for temperature
  • Support for webui
  • Other features that have not been thought of yet

Star History

chatweb's People

Contributors

folook avatar skywalkerdarren avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chatweb's Issues

I installed it with docker, but it ran wrong.

This is the wrong details:

INFO:     :35837 - "POST /crawler_url HTTP/1.1" 500 Internal Server Error
chatweb-chatweb-1  | ERROR:    Exception in ASGI application
chatweb-chatweb-1  | Traceback (most recent call last):
chatweb-chatweb-1  |   File "/usr/local/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py", line 435, in run_asgi
chatweb-chatweb-1  |     result = await app(  # type: ignore[func-returns-value]
chatweb-chatweb-1  |   File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
chatweb-chatweb-1  |     return await self.app(scope, receive, send)
chatweb-chatweb-1  |   File "/usr/local/lib/python3.9/site-packages/fastapi/applications.py", line 276, in __call__
chatweb-chatweb-1  |     await super().__call__(scope, receive, send)
chatweb-chatweb-1  |   File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 122, in __call__
chatweb-chatweb-1  |     await self.middleware_stack(scope, receive, send)
chatweb-chatweb-1  |   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__
chatweb-chatweb-1  |     raise exc
chatweb-chatweb-1  |   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
chatweb-chatweb-1  |     await self.app(scope, receive, _send)
chatweb-chatweb-1  |   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
chatweb-chatweb-1  |     raise exc
chatweb-chatweb-1  |   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
chatweb-chatweb-1  |     await self.app(scope, receive, sender)
chatweb-chatweb-1  |   File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
chatweb-chatweb-1  |     raise e
chatweb-chatweb-1  |   File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
chatweb-chatweb-1  |     await self.app(scope, receive, send)
chatweb-chatweb-1  |   File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 718, in __call__
chatweb-chatweb-1  |     await route.handle(scope, receive, send)
chatweb-chatweb-1  |   File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle
chatweb-chatweb-1  |     await self.app(scope, receive, send)
chatweb-chatweb-1  |   File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 66, in app
chatweb-chatweb-1  |     response = await func(request)
chatweb-chatweb-1  |   File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 237, in app
chatweb-chatweb-1  |     raw_response = await run_endpoint_function(
chatweb-chatweb-1  |   File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
chatweb-chatweb-1  |     return await dependant.call(**values)
chatweb-chatweb-1  |   File "/app/api.py", line 34, in crawler_url
chatweb-chatweb-1  |     contents, lang = web_crawler_newspaper(req.url)
chatweb-chatweb-1  |   File "/app/contents.py", line 14, in web_crawler_newspaper
chatweb-chatweb-1  |     raw_html, lang = _get_raw_html(url)
chatweb-chatweb-1  |   File "/app/contents.py", line 35, in _get_raw_html
chatweb-chatweb-1  |     with webdriver.Chrome(options=chrome_options) as driver:
chatweb-chatweb-1  |   File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/chrome/webdriver.py", line 84, in __init__
chatweb-chatweb-1  |     super().__init__(
chatweb-chatweb-1  |   File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/chromium/webdriver.py", line 101, in __init__
chatweb-chatweb-1  |     self.service.start()
chatweb-chatweb-1  |   File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/common/service.py", line 100, in start
chatweb-chatweb-1  |     self.assert_process_still_running()
chatweb-chatweb-1  |   File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/common/service.py", line 113, in assert_process_still_running
chatweb-chatweb-1  |     raise WebDriverException(f"Service {self._path} unexpectedly exited. Status code was: {return_code}")
chatweb-chatweb-1  | selenium.common.exceptions.WebDriverException: Message: Service /root/.cache/selenium/chromedriver/linux64/114.0.5735.90/chromedriver unexpectedly exited. Status code was: 127

AttributeError: 'generator' object has no attribute 'choices' in ai.py

Not familar with your codes, report error as below:

Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\py39\lib\site-packages\gradio\routes.py", line 414, in run_predict
output = await app.get_blocks().process_api(
File "C:\ProgramData\Anaconda3\envs\py39\lib\site-packages\gradio\blocks.py", line 1323, in process_api
result = await self.call_function(
File "C:\ProgramData\Anaconda3\envs\py39\lib\site-packages\gradio\blocks.py", line 1051, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\ProgramData\Anaconda3\envs\py39\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\ProgramData\Anaconda3\envs\py39\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\ProgramData\Anaconda3\envs\py39\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "D:\coding\notNowUsing\nlp2000\News_nltk\Example_test\OpenaiFinancialReports\chatWeb\webui.py", line 130, in respond
kw = self.ai.get_keywords(message)
File "D:\coding\notNowUsing\nlp2000\News_nltk\Example_test\OpenaiFinancialReports\chatWeb\ai.py", line 86, in get_keywords
result = self._chat_stream([
File "D:\coding\notNowUsing\nlp2000\News_nltk\Example_test\OpenaiFinancialReports\chatWeb\ai.py", line 47, in _chat_stream
print(response.choices[0].message.content.strip())
AttributeError: 'generator' object has no attribute 'choices'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.