Giter Club home page Giter Club logo

thu-keg / omnievent Goto Github PK

View Code? Open in Web Editor NEW
314.0 314.0 29.0 17.72 MB

A comprehensive, unified and modular event extraction toolkit.

Home Page: https://omnievent.readthedocs.io/

License: MIT License

Python 96.72% Shell 2.99% Makefile 0.04% Batchfile 0.05% CSS 0.14% JavaScript 0.03% HTML 0.04%
big-models bmtrain deep-learning event-detection event-extraction huggingface-transformers information-extration natural-language-generation natural-language-processing pytorch

omnievent's People

Contributors

bakser avatar dependabot[bot] avatar devross avatar h-peng17 avatar yaof20 avatar zimuwangnlp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

omnievent's Issues

您好我发现windows环境无法安装

我在您给的条件的基础上,安装了cuda,但是仍无法安装完成,您的安装环境需要的lscpu命令是我无法使用的,请问有无解决方案

There is a fatal bug ,please fix it .

我用pip install OmniEvent安装的库,跑代码的时候发现一个bug,请修复它。bug位于OmniEvent/input_engineering/seq2seq_processor.py文件第52行,请及时处理。
image

ERE: Event Relation Extraction

Hello,

Thank you for this amazing toolkit. I have a question as I tried to run the Code 2 example on page 4 in the paper (https://arxiv.org/pdf/2309.14258.pdf)

event extraction & relation extraction

all_results = infer(text=text, task="EE & ERE")

That part of the code throws the following exception:

Traceback (most recent call last):
File "/storage/home/grads/ehussein/OmniEvent/test.py", line 20, in
all_results = infer(text=text, task="ERE")
File "/storage/home/grads/ehussein/OmniEvent/OmniEvent/infer.py", line 107, in infer
assert task in ['ED', 'EAE', 'EE']
AssertionError

The toolkit does not support the ERE part yet. Do you think I need to do something to infer the event relation extraction? or will this part be released soon?

Thank you

No CUDA GPUs are available

您好,我在git clone这个repo,并用pip install -e .安装库以后,运行代码出现No CUDA GPUs are available的问题,但实际上我是在服务器上运行代码的,在命令行运行nvidia-smi也是正常的
问题出现在OmniEvent/examples/ED/token_classification.py的第88行model.cuda()

成功安装后首次运行报错

成功安装后运行报错
`from OmniEvent.infer import infer

Even Extraction (EE) Task

text = "2022年北京市举办了冬奥会"
results = infer(text=text, task="EE")
print(results[0]["events"])`
发生如下报错:
Downloading: 0%| | 0.00/1.77G [00:00<?, ?B/s]1901858561
Downloading
Downloading: 100%|████████████████████████████████████████████████████████████████| 1.77G/1.77G [01:14<00:00, 25.4MB/s]
Archive: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed.zip
creating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_5.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/config.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_3.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/pytorch_model.bin
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/spiece.model
extracting: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/latest
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_7.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_0.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/special_tokens_map.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/trainer_state.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/tokenizer.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_4.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/args.yaml
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/zero_to_fp32.py
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_1.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/tokenizer_config.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_2.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_6.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/added_tokens.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/training_args.bin
load from local file: C:\Users\lenovo/.cache/OmniEvent_Model\s2s-mt5-ed tokenizer
download from web, cache will be save to: C:\Users\lenovo/.cache/OmniEvent_Model/s2s-mt5-eae.zip
Downloading: 0%| | 0.00/3.88G [00:00<?, ?B/s]4167695152
Downloading
Downloading: 100%|████████████████████████████████████████████████████████████████| 3.88G/3.88G [03:04<00:00, 22.6MB/s]
Archive: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-eae.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-eae.zip,
and cannot find C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-eae.zip.zip, period.
Traceback (most recent call last):
File "D:\python3.10安装\lib\site-packages\transformers\configuration_utils.py", line 623, in _get_config_dict
resolved_config_file = cached_path(
File "D:\python3.10安装\lib\site-packages\transformers\utils\hub.py", line 284, in cached_path
output_path = get_from_cache(
File "D:\python3.10安装\lib\site-packages\transformers\utils\hub.py", line 562, in get_from_cache
raise ValueError(
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "D:\python3.10安装\lib\site-packages\OmniEvent\infer.py", line 135, in infer
eae_model, eae_tokenizer = get_pretrained("s2s-mt5-eae", device)
File "D:\python3.10安装\lib\site-packages\OmniEvent\infer.py", line 67, in get_pretrained
model = get_model(model_args, model_name_or_path)
File "D:\python3.10安装\lib\site-packages\OmniEvent\infer.py", line 57, in get_model
model = get_model_cls(model_args).from_pretrained(path)
File "D:\python3.10安装\lib\site-packages\transformers\modeling_utils.py", line 1840, in from_pretrained
config, model_kwargs = cls.config_class.from_pretrained(
File "D:\python3.10安装\lib\site-packages\transformers\configuration_utils.py", line 534, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "D:\python3.10安装\lib\site-packages\transformers\configuration_utils.py", line 561, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "D:\python3.10安装\lib\site-packages\transformers\configuration_utils.py", line 656, in _get_config_dict
raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this model, couldn't find it in the cached files and it looks like C:\Users\lenovo/.cache/OmniEvent_Model/s2s-mt5-eae is not the path to a directory containing a config.json file.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
屏幕截图 2024-02-26 224947

找不到train

您好我按照您的说明文档进行模型训练的时候,当我进行到如图片所示的这一步时,我发现您的代码里没有train.sh该文件,请问这样该如何进行模型训练呢
1709622557287

用例代码data_class 未定义

您好 我再运行 readme时遇到这段
logits, labels, metrics, test_dataset = predict(trainer=trainer, tokenizer=tokenizer, data_class=data_class,data_args=data_args,data_file=data_args.test_file,training_args=training_args)
NameError: name 'data_class' is not defined
这个变量没有定义,请问如何获取

Constrained Decoding

Is there example code on how to integrate Constrained Decoding for the Seq2Seq example model?

Event Ontology

Hi,
I have been using this library for a project. I am using it for event detection, but I have not found the exact event ontology used to train the model.

Does this ontology comprise event types from both ACE and MAVEN? Or is there any custom event ontology for the Event Detection model? Where can I access the ontology file?

Thank you.

Information on models fine-tuning used in OmniEvent.infer

Hello,

Thank you for this great package!

I would like to know on which datasets and how the two models that are used when running OmniEvent.infer were fine-tuned. That is, the 2 models which links are accessibles in the utils module.

In particular, I did notice that there is an option "schema" in OmniEvent.infer. I took it as suggesting that the models where fine-tuned all on the schemas available. Yet, when digging a bit further I noticed that none of these schemas have been passed as special_tokens to the tokenizer. Thus I'm wondering how the model would know that we are refering to a specific task, that is the fine-tuning on a specific dataset, when prepending each text with f"<txt_schema>". To be sure, when given "<maven>The king married the queen" how does the model understand that I want it to focus on what it learned when being fine-tuned on the maven dataset?

I ran a test only with the EDProcessor class using the schema "maven" and indeed it treated it as any other token.

Thank you

运行环境

你好,请问该代码需要在windows下运行还是linux呢

Question / Potential Bug re: Seq2Seq Example

For the evaluation code provided in https://github.com/THU-KEG/OmniEvent/blob/main/examples/EAE/seq2seq.py

logits, labels, metrics, test_dataset = predict(trainer=trainer, tokenizer=tokenizer, data_class=data_class,
                                                        data_args=data_args, data_file=data_args.test_file,
                                                        training_args=training_args)
  preds = get_pred_s2s(logits, tokenizer, pred_types=training_args.data_for_evaluation["pred_types"])

  logging.info("\n")
  logging.info("{}-EAE Evaluate Mode : {}-{}".format("-" * 25, data_args.eae_eval_mode, "-" * 25))
  logging.info("{}-Use Golden Trigger: {}-{}".format("-" * 25, data_args.golden_trigger, "-" * 25))

  if data_args.test_exists_labels:
      logging.info("{} test performance before converting: {}".format(data_args.dataset_name, metrics))
      get_ace2005_argument_extraction_s2s(preds, labels, data_args.test_file, data_args, None)

It seems that the labels being passed to get_ace2005_argument_extraction_s2s are still token ids, but the function is expecting it to have been parsed and prepared similar to how preds is formatted. Is there missing code here?

Thanks!

Note I am adapting this code for RAMs and using t5-base config.

base_processor.py的136行是否有bug呢?

发现一个小bug,/OmniEvent/OmniEvent/input_engineering/base_processor.py这个文件里的136行写的是:input_template: Optional[str][str] = None,但是程序报错了,说Optional[str][str]的语法是错的。因此我改成了Optional[str]=None,就可以运行了。
不知道我这样改对不对呢?

运行EAE任务时出错

运行时出现以下错误
image

原因是OmniEvent/infer.py 文件的第134行 do_event_argument_extraction()函数少加了个参数‘device'

可以在seq2seq.py文件下找到对应函数
image
在添加'device'参数后,可以正常运行

readme里面的运行步骤 写到一个py文件中

from OmniEvent.arguments import DataArguments, ModelArguments, TrainingArguments, ArgumentParser
from OmniEvent.input_engineering.seq2seq_processor import EDSeq2SeqProcessor, type_start, type_end
from OmniEvent.backbone.backbone import get_backbone
from OmniEvent.model.model import get_model
from OmniEvent.evaluation.metric import compute_seq_F1
from OmniEvent.trainer_seq2seq import Seq2SeqTrainer
from OmniEvent.evaluation.utils import predict, get_pred_s2s
from OmniEvent.evaluation.convert_format import get_trigger_detection_s2s
from transformers import T5ForConditionalGeneration, T5TokenizerFast
from ipdb import set_trace

def main():

# Step 2: Set up the customized configurations
parser = ArgumentParser((ModelArguments, DataArguments, TrainingArguments))
model_args, data_args, training_args = parser.parse_yaml_file(yaml_file="config/all-datasets/ed/s2s/duee.yaml")
training_args.output_dir = 'output/duee/ED/seq2seq/t5-base/'
data_args.markers = ["<event>", "</event>", type_start, type_end]
print('==================================step2 数据集配置文件yaml结束==================================')

# Step 3: Initialize the model and tokenizer
model_args.model_name_or_path = '/pretrained_model/t5'
model = T5ForConditionalGeneration.from_pretrained(model_args.model_name_or_path)
backbone = model
tokenizer = T5TokenizerFast.from_pretrained(model_args.model_name_or_path, never_split=data_args.markers)
config = model.config

model = get_model(model_args, backbone)
print("======================step3 模型初始化结束====================================")

# Step 4: Initialize the dataset and evaluation metric
data_args.train_file = '/data/processed/DuEE1.0/train.unified.jsonl'
data_args.test_file = "/data/processed/DuEE1.0/test.unified.jsonl"
data_args.validation_file = "/data/processed/DuEE1.0/valid.unified.jsonl"
train_dataset = EDSeq2SeqProcessor(data_args, tokenizer, data_args.train_file)
eval_dataset = EDSeq2SeqProcessor(data_args, tokenizer, data_args.validation_file)
metric_fn = compute_seq_F1

# Step 5: Define Trainer and train
trainer = Seq2SeqTrainer(
    args=training_args,
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=metric_fn,
    data_collator=train_dataset.collate_fn,
    tokenizer=tokenizer,
)
resume_from_checkpoint = 'OmniEvent-main/output/duee/ED/seq2seq/t5-base/checkpoint-7440'
if resume_from_checkpoint :
    trainer.train(resume_from_checkpoint)
else:
    trainer.train()
print('*****************************************训练结束********************************************')

# Step 6: Unified Evaluation
logits, labels, metrics, test_dataset = predict(trainer=trainer, tokenizer=tokenizer, data_class=EDSeq2SeqProcessor,
                                                data_args=data_args, data_file=data_args.test_file,
                                                training_args=training_args)
set_trace()
# paradigm-dependent metrics
print("{} test performance before converting: {}".format(test_dataset.dataset_name, metrics["test_micro_f1"]))

preds = get_pred_s2s(logits, tokenizer)
# convert to the unified prediction and evaluate
pred_labels = get_trigger_detection_s2s(preds, labels, data_args.test_file, data_args, None)
print("{} test performance after converting: {}".format(test_dataset.dataset_name, pred_labels["test_micro_f1"]))

if name == "main":
main()

您好,我在尝试将您readme里面的例子,用duee数据集,写成了py的格式。但是遇到了一些问题,例如metrics["test_micro_f1"]里为metrics["micro_f1"]、并且这里为0。请问您那边是否有这个的py文件,是否方便提供一下

problems with installation

I wander which version of transformers should be used.
I have problems like ModuleNotFoundError: No module named 'BartForConditionalGeneration'

ace2005-zh-novalue

Hello,

What's the difference between the ace2005-zh-novalue.py with ace2005-zh.py in the data processing scripts?

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.