Giter Club home page Giter Club logo

webglm's Introduction

WebGLM: Towards An Efficient Web-enhanced Question Answering System with Human Preferences

📃 Paper (KDD'23) • 🌐 中文 README • 🤗 HF Repo [WebGLM-10B] [WebGLM-2B] • 📚 Dataset [WebGLM-QA]

This is the official implementation of WebGLM. If you find our open-sourced efforts useful, please 🌟 the repo to encourage our following developement!

[Please click to watch the demo!]

Click to Watch Demo!

Read this in 中文.

Update

[2023/06/25] Release ChatGLM2-6B, an updated version of ChatGLM-6B which introduces several new features:

  1. Stronger Performance: we have fully upgraded the ChatGLM2-6B. It uses the hybrid objective function of GLM, and has undergone pre-training with 1.4T bilingual tokens and human preference alignment training. The evaluation results show that, compared to the first-generation model, ChatGLM2-6B has achieved substantial improvements in performance on datasets like MMLU (+23%), CEval (+33%), GSM8K (+571%), BBH (+60%), showing strong competitiveness among models of the same size.
  2. Longer Context: Based on FlashAttention technique, we have extended the context length of the base model from 2K in ChatGLM-6B to 32K, and trained with a context length of 8K during the dialogue alignment, allowing for more rounds of dialogue. However, the current version of ChatGLM2-6B has limited understanding of single-round ultra-long documents, which we will focus on optimizing in future iterations.
  3. More Efficient Inference: Based on Multi-Query Attention technique, ChatGLM2-6B has more efficient inference speed and lower GPU memory usage: under the official implementation, the inference speed has increased by 42% compared to the first generation; under INT4 quantization, the dialogue length supported by 6G GPU memory has increased from 1K to 8K.

More details please refer to ChatGLM2-6B

Overview

paper

WebGLM aspires to provide an efficient and cost-effective web-enhanced question-answering system using the 10-billion-parameter General Language Model (GLM). It aims to improve real-world application deployment by integrating web search and retrieval capabilities into the pre-trained language model.

Features

  • LLM-augmented Retriever: Enhances the retrieval of relevant web content to better aid in answering questions accurately.
  • Bootstrapped Generator: Generates human-like responses to questions, leveraging the power of the GLM to provide refined answers.
  • Human Preference-aware Scorer: Estimates the quality of generated responses by prioritizing human preferences, ensuring the system produces useful and engaging content.

News

  • [2023-06-24] We support searching via Bing now!
  • [2023-06-14] We release our code and the paper of WebGLM!

Preparation

Prepare Code and Environments

Clone this repo, and install python requirements.

pip install -r requirements.txt

Install Nodejs.

apt install nodejs # If you use Ubuntu

Install playwright dependencies.

playwright install

If browsing environments are not installed in your host, you need to install them. Do not worry, playwright will give you instructions when you first execute it if so.

Prepare SerpAPI Key

In search process, we use SerpAPI to get search results. You need to get a SerpAPI key from here.

Then, set the environment variable SERPAPI_KEY to your key.

export SERPAPI_KEY="YOUR KEY"

Alternatively, you can use Bing search with local browser environment (playwright). You can add --searcher bing to start command lines to use Bing search. (See Run as Command Line Interface and Run as Web Service)

Prepare Retriever Checkpoint

Download the checkpoint on Tsinghua Cloud by running the command line below.

You can manually specify the path to save the checkpoint by --save SAVE_PATH.

python download.py retriever-pretrained-checkpoint

Try WebGLM

Before you run the code, make sure that the space of your device is enough.

Export Environment Variables

Export the environment variable WEBGLM_RETRIEVER_CKPT to the path of the retriever checkpoint. If you have downloaded the retriever checkpoint in the default path, you can simply run the command line below.

export WEBGLM_RETRIEVER_CKPT=./download/retriever-pretrained-checkpoint

Run as Command Line Interface

You can try WebGLM-2B model by:

python cli_demo.py -w THUDM/WebGLM-2B

Or directly for WebGLM-10B model:

python cli_demo.py

If you want to use Bing search instead of SerpAPI, you can add --searcher bing to the command line, for example:

python cli_demo.py -w THUDM/WebGLM-2B --searcher bing

Run as Web Service

Run web_demo.py with the same arguments as cli_demo.py to start a web service. For example, you can try WebGLM-2B model with Bing search by:

python web_demo.py -w THUDM/WebGLM-2B --searcher bing

Train WebGLM

Train Generator

Prepare Data (WebGLM-QA)

Download the training data (WebGLM-QA) on Tsinghua Cloud by running the command line below.

python download.py generator-training-data

It will automatically download all the data and preprocess them into the seq2seq form that can be used immediately in ./download.

Training

Please refer to GLM repo for seq2seq training.

Train Retriever

Prepare Data

Download the training data on Tsinghua Cloud by running the command line below.

python download.py retriever-training-data

Training

Run the following command line to train the retriever. If you have downloaded the retriever training data in the default path, you can simply run the command line below.

python train_retriever.py --train_data_dir ./download/retriever-training-data

Evaluation

You can reproduce our results on TriviaQA, WebQuestions and NQ Open. Take TriviaQA for example, you can simply run the command line below:

bash scripts/triviaqa.sh

and start running the experiment.

Real Application Cases

Here you can see some examples of WebGLM real application scenarios.

When will the COVID-19 disappear?

How to balance career and hobbies?

FL Studio and Cubase, which is better?

Is attention better than CNN?

How to survive in the first-tier cities without a high-salary work?

What do you think of version 3.5 of Genshin Impact?

transformers are originated in NLP, but why they can be applied in CV?

Who proposed Music Transformer? How does it work?

What is the backbone of Toolformer?

License

This repository is licensed under the Apache-2.0 License. The use of model weights is subject to the Model_License. All open-sourced data is for resarch purpose only.

Citation

If you use this code for your research, please cite our paper.

@misc{liu2023webglm,
      title={WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences},
      author={Xiao Liu and Hanyu Lai and Hao Yu and Yifan Xu and Aohan Zeng and Zhengxiao Du and Peng Zhang and Yuxiao Dong and Jie Tang},
      year={2023},
      eprint={2306.07906},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

This repo is simplified for easier deployment.

webglm's People

Contributors

hanyullai avatar longin-yu avatar xiao9905 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

webglm's Issues

Unable to Download Retriever-Pretrained-Checkpoint or Training Data

Hi, I am attempting to replicate the remarkable achievement, but it appears that everything on Tsinghua Cloud has expired. I am unable to download the pre-trained checkpoint by simply clicking the URL in README.md or using the download.py script, nor can I access the training data. Can anyone help me? I would greatly appreciate it.

按照步骤安装后,提示search类有问题?

具体错误如下,请帮诊断下:
File "cli_demo.py", line 1, in
from model import load_model, citation_correction
File "/data/home/justinsu/WebGLM/model/init.py", line 1, in
from .modeling_webglm import WebGLM, load_model
File "/data/home/justinsu/WebGLM/model/modeling_webglm.py", line 1, in
from .retriever import ReferenceRetiever
File "/data/home/justinsu/WebGLM/model/retriever/init.py", line 1, in
from .searching import create_searcher
File "/data/home/justinsu/WebGLM/model/retriever/searching/init.py", line 1, in
from .serpapi import Searcher as SerpAPISearcher
File "/data/home/justinsu/WebGLM/model/retriever/searching/serpapi.py", line 3, in
from .searcher import *
File "/data/home/justinsu/WebGLM/model/retriever/searching/searcher.py", line 19, in
class SearcherInterface:
File "/data/home/justinsu/WebGLM/model/retriever/searching/searcher.py", line 20, in SearcherInterface
def search(self, query) -> list[SearchResult]:
TypeError: 'type' object is not subscriptable

centos 运行失败

报错日志如下:

<launching> /home/opsuser/ . cache/ ms-playwright/firefox-1350/firefox/firefox -no-remote -headless -profile
/tmp/playwright_firefoxdev_profile-Rex3ck -juggler-pipe -silent
(launchea> p10=28440
[pid=284401 [err] xPCoMGlueLoad error for file /home/opsuser/ .cache/ms-playwright/firefox-1350/firefox/1ibxul.so:
[pid-28440][err] /lib64/libc.so.6: version GLIBC_2.18' not found (required by /home/opsuser/.cache/ms-playwright/firefox-1350/firefox/libxul.so)
[pid-28440][err] Couldn't load XPCOM.
[pid-28440] <process did exit: exitCode=255, signal=null› [pid=28440] starting temporary directories cleanup

playwright不支持centos microsoft/playwright#9199

RuntimeError: CUDA driver error: device-side assert triggered

os: ubuntu 18:04
python: 3.9
cuda: 11.8
启动命令:python web_demo.py -w THUDM/WebGLM-2B --searcher bing
question: 大连在**什么位置
error:
WebGLM Initializing... WebGLM Loaded Running on local URL: http://0.0.0.0:8032 [System] Searching ... [System] Count of available urls: 15 [System] Fetching ... [System] Count of available fetch results: 2147719 [System] Extracting ... [System] Count of paragraphs: 136 [System] Filtering ... Input length of input_ids is 1068, butmax_lengthis set to 1024. This can lead to unexpected behavior. You should consider increasingmax_new_tokens. ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [39,0,0], thread: [64,0,0] Assertion srcIndex < srcSelectDimSizefailed. ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [39,0,0], thread: [65,0,0] AssertionsrcIndex < srcSelectDimSizefailed. ... Traceback (most recent call last): File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/gradio/routes.py", line 427, in run_predict output = await app.get_blocks().process_api( File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/gradio/blocks.py", line 1323, in process_api result = await self.call_function( File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/gradio/blocks.py", line 1067, in call_function prediction = await utils.async_iteration(iterator) File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/gradio/utils.py", line 336, in async_iteration return await iterator.__anext__() File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/gradio/utils.py", line 329, in __anext__ return await anyio.to_thread.run_sync( File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/anyio/to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 807, in run result = context.run(func, *args) File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/gradio/utils.py", line 312, in run_sync_iterator_async return next(iterator) File "/data/ssd_workspace/lh/WebGLM/web_demo.py", line 52, in query for resp in webglm.stream_query(query): File "/data/ssd_workspace/lh/WebGLM/model/modeling_webglm.py", line 49, in stream_query outputs = self.model.generate(**inputs, max_length=1024, eos_token_id = self.tokenizer.eop_token_id, pad_token_id=self.tokenizer.eop_token_id) File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/transformers/generation/utils.py", line 1515, in generate return self.greedy_search( File "/home/aiadmin/anaconda3/envs/webglm_3.9/lib/python3.9/site-packages/transformers/generation/utils.py", line 2385, in greedy_search next_tokens.tile(eos_token_id_tensor.shape[0], 1).ne(eos_token_id_tensor.unsqueeze(1)).prod(dim=0) RuntimeError: CUDA driver error: device-side assert triggered ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [87,0,0], thread: [64,0,0] AssertionsrcIndex < srcSelectDimSizefailed. ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [87,0,0], thread: [65,0,0] AssertionsrcIndex < srcSelectDimSizefailed. ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [87,0,0], thread: [66,0,0] AssertionsrcIndex < srcSelectDimSize` failed.
...

`

纯cpu运行有问题: "LayerNormKernelImpl" not implemented for 'Half'

WebGLM Initializing...
WebGLM Loaded
[Enter to Exit] >>> hello
[System] Searching ...
[System] Count of available urls:  10
[System] Fetching ...
[System] Count of available fetch results:  3252593
[System] Extracting ...
[System] Count of paragraphs:  212
[System] Filtering ...
Reference [1](https://dictionary.cambridge.org/dictionary/english/hello): Hello is also used to attract someone’s attention:
Reference [2](https://dictionary.cambridge.org/dictionary/english/hello): Hello is also said at the beginning of a telephone conversation.
Reference [3](https://www.merriam-webster.com/dictionary/hello): They welcomed us with a warm hello.  we said our hellos and got right down to business
Reference [4](https://www.bing.com/dict/search?q=Hello%EF%BC%81&mkt=zh-cn): Hello- nice to meet you. Take a lott- I'll be down in a minute.
Reference [5](https://dictionary.cambridge.org/dictionary/english/hello): (Definition of hello from the Cambridge Academic Content Dictionary © Cambridge University Press)
Traceback (most recent call last):
  File "/home/me/WebGLM/cli_demo.py", line 21, in <module>
    for results in webglm.stream_query(question):
  File "/home/me/WebGLM/model/modeling_webglm.py", line 49, in stream_query
    outputs = self.model.generate(**inputs, max_length=1024, eos_token_id = self.tokenizer.eop_token_id, pad_token_id=self.tokenizer.eop_token_id)
  File "/home/me/p/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/me/p/lib/python3.10/site-packages/transformers/generation/utils.py", line 1522, in generate
    return self.greedy_search(
  File "/home/me/p/lib/python3.10/site-packages/transformers/generation/utils.py", line 2339, in greedy_search
    outputs = self(
  File "/home/me/p/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1522, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/me/p/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1531, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/me/hug/modules/transformers_modules/THUDM/WebGLM-2B/cffa6bde032c129824aca963836ba7a03c422990/modeling_glm.py", line 902, in forward
    model_output = self.glm(input_ids, position_ids, attention_mask, mems=mems, **kwargs)
  File "/home/me/p/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1522, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/me/p/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1531, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/me/hug/modules/transformers_modules/THUDM/WebGLM-2B/cffa6bde032c129824aca963836ba7a03c422990/modeling_glm.py", line 783, in forward
    transformer_output = self.transformer(embeddings, position_ids, attention_mask, mems)
  File "/home/me/p/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1522, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/me/p/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1531, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/me/hug/modules/transformers_modules/THUDM/WebGLM-2B/cffa6bde032c129824aca963836ba7a03c422990/modeling_glm.py", line 595, in forward
    hidden_states = layer(*args, mem=mem_i)
  File "/home/me/p/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1522, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/me/p/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1531, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/me/hug/modules/transformers_modules/THUDM/WebGLM-2B/cffa6bde032c129824aca963836ba7a03c422990/modeling_glm.py", line 417, in forward
    layernorm_output = self.input_layernorm(hidden_states)
  File "/home/me/p/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1522, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/me/p/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1531, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/me/p/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 190, in forward
    return F.layer_norm(
  File "/home/me/p/lib/python3.10/site-packages/torch/nn/functional.py", line 2548, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

运行不成功

似乎搜索正常,总结报error

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmulAlgoGetHeuristic( ltHandle, computeDesc.descriptor(), Adesc.descriptor(), Bdesc.descriptor(), Cdesc.descriptor(), Cdesc.descriptor(), preference.descriptor(), 1, &heuristicResult, &returnedResult)

最好给个直接能用的docker镜像,配环境太麻烦了

TypeError: 'type' object is not subscriptable

when i run python web_demo.py I get the following error:

‘’‘
File "/workspace/wangs/gpt_task/WebGLM/web_demo.py", line 2, in
from model import citation_correction, load_model
File "/workspace/wangs/gpt_task/WebGLM/model/init.py", line 1, in
from .modeling_webglm import WebGLM, load_model
File "/workspace/wangs/gpt_task/WebGLM/model/modeling_webglm.py", line 1, in
from .retriever import ReferenceRetiever
File "/workspace/wangs/gpt_task/WebGLM/model/retriever/init.py", line 2, in
from .searching import Searcher
File "/workspace/wangs/gpt_task/WebGLM/model/retriever/searching/init.py", line 45, in
def dump_results(results: list[SearchResult]):
TypeError: 'type' object is not subscriptable
’‘’

关于人类偏好模型的训练

您好,看到论文里写的最后的对比训练用的是,一个线性层做的一个打分排序模型?请问这一步是不是没有用的强化学习

关于train_retriever.py中的loss

麻烦问一下train_retriever.py文件中第44行求loss的函数中,cross_entropy的训练target为什么是是torch.arange(0, len(l_pos)呀?
image

关于max_new_tokens的运行错误,请问如何修改

关于max_new_tokens的运行错误,请问如何修改?
错误描述如下:
Input length of input_ids is 1303, but max_length is set to 1024. This can lead to unexpected behavior. You should consider increasing max_new_tokens.

想请教关于citation的问题

看demo视频发现回答结果是带有citation的,所以想请教一下这个citation是怎么实现的,希望大佬能指点一下

get_child_watcher raise NotImplementedError

python web_demo.py -d "mps"

python3.10/asyncio/events.py", line 788, in get_child_watcher
return get_event_loop_policy().get_child_watcher()
File "/Users/xjq284/miniconda3/lib/python3.10/asyncio/events.py", line 616, in get_child_watcher
raise NotImplementedError

web_demo无法运行

cli_demo能够正常运行,执行web_demo时,会在./model/retriever/filtering/contriver.py AutoModel.from_pretrained(query_encoder_path) core掉。

去掉 import gradio as gr,上述问题消失。所以,是gradio和transformer不兼容的问题么,确认过了,gradio和transformer的版本均与requirements一致

LLM-augmented Retriever中的 label 问题

您好,请问在原论文中提到在使用 LLM 增强 contriever 的时候的训练 label 是 Rouge-1 的准确率吗?但是我看源码loss 的正负样本 label 是0,1呢?开放的源码不全吗?

 “Therefore, the labels we use for training are the Rouge-1 precision scores of a query-reference pair.”

关于dataset

请问webGLM 论文中提到的关于 “quoted long-formed QA dataset” 这个数据集应在在哪下载呢??

webglm10B 是否支持双卡24G的2张?

webglm2B测试还不错,想用webglm10B,结果报显存不足;单卡要多少G才够?当前有2张GPU 显存是24G,不知道怎么搞可以把10B搞起来。
另外,WEBGLM10B的INT8、INT4,是否有下载?

向量检索模型训练数据生成的具体方法可以说一下吗

包括gpt3的promt 以及label的计算方法, 我用rouge-1算了一下训练集中的query和ref,感觉对不上啊。
下面例子中的0.8和0.2是怎么算的, 我咋算不对呢?

example:
{'question': 'Why does wine taste better the older it gets?',
'positive_reference': '"No, wine does not always taste better with age. This is because tannins, which give wine its astringent taste, break down over time. However, some wines may taste better after being exposed to oxygen.',
'positive_label': 0.8,
'negative_reference': 'Does wine taste better with age? The answer is: maybe! We’ll take a look at the why, how, and what of aging wine so you can be a more discerning drinker.',
'negative_label': 0.2}

python源找不到chardet

pip install -r requirements.txt
DEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at Homebrew/homebrew-core#76621
WARNING: Ignoring invalid distribution -ip (/usr/local/lib/python3.6/site-packages)
WARNING: Ignoring invalid distribution -ip (/usr/local/lib/python3.6/site-packages)
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting beautifulsoup4==4.11.2
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c6/ee/16d6f808f5668317d7c23f942091fbc694bcded6aa39678e5167f61b2ba0/beautifulsoup4-4.11.2-py3-none-any.whl (129 kB)
|████████████████████████████████| 129 kB 4.4 MB/s
ERROR: Could not find a version that satisfies the requirement chardet==5.1.0 (from versions: 1.0, 1.0.1, 1.1, 2.1.1, 2.2.1, 2.3.0, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.0.4, 4.0.0, 5.0.0)
ERROR: No matching distribution found for chardet==5.1.0
WARNING: Ignoring invalid distribution -ip (/usr/local/lib/python3.6/site-packages)
WARNING: Ignoring invalid distribution -ip (/usr/local/lib/python3.6/site-packages)

input length of input ids的长度大于1024

配置好本地环境,使用WebGLM-2B模型。提问:Is HER2 gene a good target for treating cancer?

出现如下报错:

Input length of input_ids is 1056, but max_length is set to 1024. This can lead to unexpected behavior. You should consider increasing max_new_tokens.
Traceback (most recent call last):
File "cli_demo.py", line 21, in
for results in webglm.stream_query(question):
File "/media/WebGLM/model/modeling_webglm.py", line 49, in stream_query
outputs = self.model.generate(**inputs, max_length=1024, eos_token_id = self.tokenizer.eop_token_id, pad_token_id=self.tokenizer.eop_token_id)
File "/usr/local/miniconda3/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/usr/local/miniconda3/lib/python3.8/site-packages/transformers/generation/utils.py", line 1515, in generate
return self.greedy_search(
File "/usr/local/miniconda3/lib/python3.8/site-packages/transformers/generation/utils.py", line 2332, in greedy_search
outputs = self(
File "/usr/local/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_glm.py", line 902, in forward
model_output = self.glm(input_ids, position_ids, attention_mask, mems=mems, **kwargs)
File "/usr/local/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_glm.py", line 783, in forward
transformer_output = self.transformer(embeddings, position_ids, attention_mask, mems)
File "/usr/local/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_glm.py", line 595, in forward
hidden_states = layer(*args, mem=mem_i)
File "/usr/local/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_glm.py", line 422, in forward
layernorm_input = hidden_states + attention_output
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

AttributeError: 'ChatGLMTokenizer' object has no attribute 'build_inputs_for_generation'_

When I run python web_demo.py the following problem occurs:

_Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/gradio/routes.py", line 437, in run_predict
output = await app.get_blocks().process_api(
File "/opt/conda/lib/python3.8/site-packages/gradio/blocks.py", line 1352, in process_api
result = await self.call_function(
File "/opt/conda/lib/python3.8/site-packages/gradio/blocks.py", line 1093, in call_function
prediction = await utils.async_iteration(iterator)
File "/opt/conda/lib/python3.8/site-packages/gradio/utils.py", line 341, in async_iteration
return await iterator.anext()
File "/opt/conda/lib/python3.8/site-packages/gradio/utils.py", line 334, in anext
return await anyio.to_thread.run_sync(
File "/opt/conda/lib/python3.8/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/opt/conda/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/opt/conda/lib/python3.8/site-packages/anyio/_backends/asyncio.py", line 807, in run
result = context.run(func, *args)
File "/opt/conda/lib/python3.8/site-packages/gradio/utils.py", line 317, in run_sync_iterator_async
return next(iterator)
File "web_demo.py", line 49, in query
for resp in webglm.stream_query(query):
File "/workspace/wangs/gpt_task/WebGLM/model/modeling_webglm.py", line 46, in stream_query
inputs = self.tokenizer.build_inputs_for_generation(inputs, max_gen_length=1024)
AttributeError: 'ChatGLMTokenizer' object has no attribute 'build_inputs_for_generation'

RuntimeError: CUDA error: no kernel image is available for execution on the device

跑了web_demo.py
报错了

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 437, in run_predict
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1352, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1093, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 341, in async_iteration
    return await iterator.__anext__()
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 334, in __anext__
    return await anyio.to_thread.run_sync(
  File "/home/kula/.local/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/kula/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/home/kula/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 317, in run_sync_iterator_async
    return next(iterator)
  File "/srv/chatglm/WebGLM/web_demo.py", line 49, in query
    for resp in webglm.stream_query(query):
  File "/srv/chatglm/WebGLM/model/modeling_webglm.py", line 35, in stream_query
    refs = self.ref_retriever.query(question)
  File "/srv/chatglm/WebGLM/model/retriever/__init__.py", line 50, in query
    return self.filter.produce_references(question, data_list, 5)
  File "/srv/chatglm/WebGLM/model/retriever/filtering/contriver.py", line 82, in produce_references
    topk = self.scorer.select_topk(query, texts, topk)
  File "/srv/chatglm/WebGLM/model/retriever/filtering/contriver.py", line 69, in select_topk
    scores.append(self.score_documents_on_query(query, documents[self.max_batch_size*i:self.max_batch_size*(i+1)]).to('cpu'))
  File "/srv/chatglm/WebGLM/model/retriever/filtering/contriver.py", line 58, in score_documents_on_query
    query_embedding = self.get_query_embeddings([query])[0]
  File "/srv/chatglm/WebGLM/model/retriever/filtering/contriver.py", line 29, in get_query_embeddings
    outputs = self.query_encoder(**inputs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/bert/modeling_bert.py", line 993, in forward
    extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(attention_mask, input_shape)
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 893, in get_extended_attention_mask
    extended_attention_mask = extended_attention_mask.to(dtype=dtype)  # fp16 compatibility
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Run with -d mps get error

WebGLM/model/retriever/filtering/contriver.py:36: UserWarning: MPS: no support for int64 reduction ops, casting it to int32 (Triggered internally at /Users/runner/miniforge3/conda-bld/pytorch-recipe_1680607563975/work/aten/src/ATen/native/mps/operations/ReduceOps.mm:144.)
dim=1) / mask.sum(dim=1)[..., None]
loc("varianceEps"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/ff32e6fb-db00-11ed-a068-428477786501/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x825x1xf16>' and 'tensor<1xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
[1] 4709 abort python web_demo.py -w model/10B --searcher bing -d mps
/Users/colin/miniforge3/envs/webglm/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

SERPAPI_KEY is not set

[Error] SERPAPI_KEY is not set, please set it to use serpapi
但 export SERPAPI_KEY="xxxxx" 已经设置过了

NotImplementedError when run with web_demo.py

When I use run the web_demo.py it raise the error bellow, but when I run with cli_demo.py it works well. Is there something wrong with my configuration。

Query:why musk set the limit of twitter request times?

Output with cli_demo.py:

[System] Searching ...
[System] Count of available urls: 8
[System] Fetching ...
[System] Count of available fetch results: 2315866
[System] Extracting ...
[System] Count of paragraphs: 136
[System] Filtering ...
Reference 1: Twitter owner Elon Musk has limited the number of tweets that users can view each day — restrictions he described as an attempt to prevent unauthorized scraping of potentially valuable data from the social media platform.
Reference 2: SAN FRANCISCO (AP) — Twitter owner Elon Musk has limited the number of tweets that users can view each day — restrictions he described as an attempt to prevent unauthorized scraping of potentially valuable data from the social media platform.
Reference 3: The Tesla and SpaceX CEO, who is executive chairman and CTO of Twitter, said the limits are temporary, but verified accounts will be able to read 8,000 posts per day, unverified accounts will be able to read 800 posts per day and new unverified accounts will be limited to reading 400 posts per day.
Reference 4: FILE - Elon Musk, who owns Twitter, Tesla and SpaceX, speaks at the Vivatech fair, June 16, 2023, in Paris. Thousands of people logged complaints about problems accessing Twitter on Saturday, July 1, after Musk limited most users to viewing 600 tweets a day — restrictions he described as an attempt to prevent unauthorized scraping of potentially valuable data from the site. (AP Photo/Michel Euler, File)
Reference 5: FILE - Elon Musk, who owns Twitter, Tesla and SpaceX, speaks at the Vivatech fair, June 16, 2023, in Paris. Thousands of people logged complaints about problems accessing Twitter on Saturday, July 1, after Musk limited most users to viewing 600 tweets a day — restrictions he described as an attempt to prevent unauthorized scraping of potentially valuable data from the site. (AP Photo/Michel Euler, File)

Elon Musk set the limit of Twitter request times in order to prevent unauthorized scraping of potentially valuable data from the social media platform[1][2].The limits are temporary, with verified accounts being able to read 8,000 posts per day, unverified accounts being able to read 800 posts per day, and new unverified accounts being limited to reading 400 posts per day.[3]

=============================================

Output with web_demo.py

[System] Searching ...
[System] Count of available urls: 8
[System] Fetching ...
Task exception was never retrieved
future: <Task finished name='Task-111' coro=<Connection.run() done, defined at /opt/miniconda39/lib/python3.9/site-packages/playwright/_impl/_connection.py:264> exception=NotImplementedError()>
Traceback (most recent call last):
File "/opt/miniconda39/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 271, in run
await self._transport.connect()
File "/opt/miniconda39/lib/python3.9/site-packages/playwright/_impl/_transport.py", line 127, in connect
raise exc
File "/opt/miniconda39/lib/python3.9/site-packages/playwright/_impl/_transport.py", line 116, in connect
self._proc = await asyncio.create_subprocess_exec(
File "/opt/miniconda39/lib/python3.9/asyncio/subprocess.py", line 236, in create_subprocess_exec
transport, protocol = await loop.subprocess_exec(
File "/opt/miniconda39/lib/python3.9/asyncio/base_events.py", line 1676, in subprocess_exec
transport = await self._make_subprocess_transport(
File "/opt/miniconda39/lib/python3.9/asyncio/unix_events.py", line 188, in _make_subprocess_transport
with events.get_child_watcher() as watcher:
File "/opt/miniconda39/lib/python3.9/asyncio/events.py", line 766, in get_child_watcher
return get_event_loop_policy().get_child_watcher()
File "/opt/miniconda39/lib/python3.9/asyncio/events.py", line 602, in get_child_watcher
raise NotImplementedError
NotImplementedError
Traceback (most recent call last):
File "/opt/miniconda39/lib/python3.9/site-packages/gradio/routes.py", line 422, in run_predict
output = await app.get_blocks().process_api(
File "/opt/miniconda39/lib/python3.9/site-packages/gradio/blocks.py", line 1323, in process_api
result = await self.call_function(
File "/opt/miniconda39/lib/python3.9/site-packages/gradio/blocks.py", line 1067, in call_function
prediction = await utils.async_iteration(iterator)
File "/opt/miniconda39/lib/python3.9/site-packages/gradio/utils.py", line 336, in async_iteration
return await iterator.anext()
File "/opt/miniconda39/lib/python3.9/site-packages/gradio/utils.py", line 329, in anext
return await anyio.to_thread.run_sync(
File "/opt/miniconda39/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/opt/miniconda39/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/opt/miniconda39/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/opt/miniconda39/lib/python3.9/site-packages/gradio/utils.py", line 312, in run_sync_iterator_async
return next(iterator)
File "/train/LLMs/webGLM/WebGLM/web_demo.py", line 49, in query
for resp in webglm.stream_query(query):
File "/train/LLMs/webGLM/WebGLM/model/modeling_webglm.py", line 43, in stream_query
refs = self.ref_retriever.query(question)
File "/train/LLMs/webGLM/WebGLM/model/retriever/init.py", line 27, in query
fetch_results = self.fetcher.fetch(urls)
File "/train/LLMs/webGLM/WebGLM/model/retriever/fetching/init.py", line 27, in fetch
self.loop.run_until_complete(get_raw_pages(urls, close_browser=True))
File "/opt/miniconda39/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/train/LLMs/webGLM/WebGLM/model/retriever/fetching/playwright_based_crawl_new.py", line 50, in get_raw_pages
context = await get_conetent()
File "/train/LLMs/webGLM/WebGLM/model/retriever/fetching/playwright_based_crawl_new.py", line 27, in get_conetent
playwright = await async_playwright().start()
File "/opt/miniconda39/lib/python3.9/site-packages/playwright/async_api/_context_manager.py", line 52, in start
return await self.aenter()
File "/opt/miniconda39/lib/python3.9/site-packages/playwright/async_api/_context_manager.py", line 47, in aenter
playwright = AsyncPlaywright(next(iter(done)).result())
File "/opt/miniconda39/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 271, in run
await self._transport.connect()
File "/opt/miniconda39/lib/python3.9/site-packages/playwright/_impl/_transport.py", line 127, in connect
raise exc
File "/opt/miniconda39/lib/python3.9/site-packages/playwright/_impl/_transport.py", line 116, in connect
self._proc = await asyncio.create_subprocess_exec(
File "/opt/miniconda39/lib/python3.9/asyncio/subprocess.py", line 236, in create_subprocess_exec
transport, protocol = await loop.subprocess_exec(
File "/opt/miniconda39/lib/python3.9/asyncio/base_events.py", line 1676, in subprocess_exec
transport = await self._make_subprocess_transport(
File "/opt/miniconda39/lib/python3.9/asyncio/unix_events.py", line 188, in _make_subprocess_transport
with events.get_child_watcher() as watcher:
File "/opt/miniconda39/lib/python3.9/asyncio/events.py", line 766, in get_child_watcher
return get_event_loop_policy().get_child_watcher()
File "/opt/miniconda39/lib/python3.9/asyncio/events.py", line 602, in get_child_watcher
raise NotImplementedError
NotImplementedError

"Invalid API key"怎么引起的?

(wenglm) D:\LLM\WebGLM>python web_demo.py
WebGLM Initializing...
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [01:41<00:00, 7.23s/it]
WebGLM Loaded
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
[System] Searching ...
Traceback (most recent call last):
File "D:\ProgramData\anaconda3\envs\wenglm\lib\site-packages\gradio\routes.py", line 427, in run_predict
output = await app.get_blocks().process_api(
File "D:\ProgramData\anaconda3\envs\wenglm\lib\site-packages\gradio\blocks.py", line 1323, in process_api
result = await self.call_function(
File "D:\ProgramData\anaconda3\envs\wenglm\lib\site-packages\gradio\blocks.py", line 1067, in call_function
prediction = await utils.async_iteration(iterator)
File "D:\ProgramData\anaconda3\envs\wenglm\lib\site-packages\gradio\utils.py", line 336, in async_iteration
return await iterator.anext()
File "D:\ProgramData\anaconda3\envs\wenglm\lib\site-packages\gradio\utils.py", line 329, in anext
return await anyio.to_thread.run_sync(
File "D:\ProgramData\anaconda3\envs\wenglm\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "D:\ProgramData\anaconda3\envs\wenglm\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "D:\ProgramData\anaconda3\envs\wenglm\lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
File "D:\ProgramData\anaconda3\envs\wenglm\lib\site-packages\gradio\utils.py", line 312, in run_sync_iterator_async
return next(iterator)
File "D:\LLM\WebGLM\web_demo.py", line 49, in query
for resp in webglm.stream_query(query):
File "D:\LLM\WebGLM\model\modeling_webglm.py", line 35, in stream_query
refs = self.ref_retriever.query(question)
File "D:\LLM\WebGLM\model\retriever_init_.py", line 18, in query
search_results = self.searcher.search(question)
File "D:\LLM\WebGLM\model\retriever\searching_init_.py", line 62, in search
return serp_api(query)
File "D:\LLM\WebGLM\model\retriever\searching_init_.py", line 19, in serp_api
raise Exception("Serpapi returned %d\n%s"%(resp.status_code, resp.text))
Exception: Serpapi returned 401
{
"error": "Invalid API key. Your API key should be here: https://serpapi.com/manage-api-key"
}
————————————————————————————
环境:win11
已经运行:set SERPAPI_KEY=“xxxx5217658144a17782a2bxxxxxxxxx2acd5b8002370729e0”

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.