thu-keg / omnievent Goto Github PK
View Code? Open in Web Editor NEWA comprehensive, unified and modular event extraction toolkit.
Home Page: https://omnievent.readthedocs.io/
License: MIT License
A comprehensive, unified and modular event extraction toolkit.
Home Page: https://omnievent.readthedocs.io/
License: MIT License
我在您给的条件的基础上,安装了cuda,但是仍无法安装完成,您的安装环境需要的lscpu命令是我无法使用的,请问有无解决方案
在您的OmniEvent 的 Demo中有一个完整的关于Event Extraction & Event Relation Extraction的样例,似乎是基于maven-ere训练出的?
请问我们是否可以获得这个模型?并用其进行infer任务?
Hello,
Thank you for this amazing toolkit. I have a question as I tried to run the Code 2 example on page 4 in the paper (https://arxiv.org/pdf/2309.14258.pdf)
all_results = infer(text=text, task="EE & ERE")
That part of the code throws the following exception:
Traceback (most recent call last):
File "/storage/home/grads/ehussein/OmniEvent/test.py", line 20, in
all_results = infer(text=text, task="ERE")
File "/storage/home/grads/ehussein/OmniEvent/OmniEvent/infer.py", line 107, in infer
assert task in ['ED', 'EAE', 'EE']
AssertionError
The toolkit does not support the ERE part yet. Do you think I need to do something to infer the event relation extraction? or will this part be released soon?
Thank you
您好,我在git clone这个repo,并用pip install -e .安装库以后,运行代码出现No CUDA GPUs are available的问题,但实际上我是在服务器上运行代码的,在命令行运行nvidia-smi也是正常的
问题出现在OmniEvent/examples/ED/token_classification.py的第88行model.cuda()
成功安装后运行报错
`from OmniEvent.infer import infer
Even Extraction (EE) Task
text = "2022年北京市举办了冬奥会"
results = infer(text=text, task="EE")
print(results[0]["events"])`
发生如下报错:
Downloading: 0%| | 0.00/1.77G [00:00<?, ?B/s]1901858561
Downloading
Downloading: 100%|████████████████████████████████████████████████████████████████| 1.77G/1.77G [01:14<00:00, 25.4MB/s]
Archive: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed.zip
creating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_5.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/config.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_3.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/pytorch_model.bin
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/spiece.model
extracting: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/latest
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_7.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_0.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/special_tokens_map.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/trainer_state.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/tokenizer.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_4.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/args.yaml
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/zero_to_fp32.py
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_1.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/tokenizer_config.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_2.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/rng_state_6.pth
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/added_tokens.json
inflating: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-ed/training_args.bin
load from local file: C:\Users\lenovo/.cache/OmniEvent_Model\s2s-mt5-ed tokenizer
download from web, cache will be save to: C:\Users\lenovo/.cache/OmniEvent_Model/s2s-mt5-eae.zip
Downloading: 0%| | 0.00/3.88G [00:00<?, ?B/s]4167695152
Downloading
Downloading: 100%|████████████████████████████████████████████████████████████████| 3.88G/3.88G [03:04<00:00, 22.6MB/s]
Archive: C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-eae.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-eae.zip,
and cannot find C:/Users/lenovo/.cache/OmniEvent_Model/s2s-mt5-eae.zip.zip, period.
Traceback (most recent call last):
File "D:\python3.10安装\lib\site-packages\transformers\configuration_utils.py", line 623, in _get_config_dict
resolved_config_file = cached_path(
File "D:\python3.10安装\lib\site-packages\transformers\utils\hub.py", line 284, in cached_path
output_path = get_from_cache(
File "D:\python3.10安装\lib\site-packages\transformers\utils\hub.py", line 562, in get_from_cache
raise ValueError(
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "", line 1, in
File "D:\python3.10安装\lib\site-packages\OmniEvent\infer.py", line 135, in infer
eae_model, eae_tokenizer = get_pretrained("s2s-mt5-eae", device)
File "D:\python3.10安装\lib\site-packages\OmniEvent\infer.py", line 67, in get_pretrained
model = get_model(model_args, model_name_or_path)
File "D:\python3.10安装\lib\site-packages\OmniEvent\infer.py", line 57, in get_model
model = get_model_cls(model_args).from_pretrained(path)
File "D:\python3.10安装\lib\site-packages\transformers\modeling_utils.py", line 1840, in from_pretrained
config, model_kwargs = cls.config_class.from_pretrained(
File "D:\python3.10安装\lib\site-packages\transformers\configuration_utils.py", line 534, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "D:\python3.10安装\lib\site-packages\transformers\configuration_utils.py", line 561, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "D:\python3.10安装\lib\site-packages\transformers\configuration_utils.py", line 656, in _get_config_dict
raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this model, couldn't find it in the cached files and it looks like C:\Users\lenovo/.cache/OmniEvent_Model/s2s-mt5-eae is not the path to a directory containing a config.json file.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
没办法在windows里安装deepspeed库,需要如何解决
您好 我再运行 readme时遇到这段
logits, labels, metrics, test_dataset = predict(trainer=trainer, tokenizer=tokenizer, data_class=data_class,data_args=data_args,data_file=data_args.test_file,training_args=training_args)
NameError: name 'data_class' is not defined
这个变量没有定义,请问如何获取
Is there example code on how to integrate Constrained Decoding for the Seq2Seq example model?
Hi,
I have been using this library for a project. I am using it for event detection, but I have not found the exact event ontology used to train the model.
Does this ontology comprise event types from both ACE and MAVEN? Or is there any custom event ontology for the Event Detection model? Where can I access the ontology file?
Thank you.
Hello,
Thank you for this great package!
I would like to know on which datasets and how the two models that are used when running OmniEvent.infer
were fine-tuned. That is, the 2 models which links are accessibles in the utils
module.
In particular, I did notice that there is an option "schema" in OmniEvent.infer
. I took it as suggesting that the models where fine-tuned all on the schemas available. Yet, when digging a bit further I noticed that none of these schemas have been passed as special_tokens to the tokenizer. Thus I'm wondering how the model would know that we are refering to a specific task, that is the fine-tuning on a specific dataset, when prepending each text with f"<txt_schema>"
. To be sure, when given "<maven>The king married the queen" how does the model understand that I want it to focus on what it learned when being fine-tuned on the maven dataset?
I ran a test only with the EDProcessor class using the schema "maven" and indeed it treated it as any other token.
Thank you
你好,请问该代码需要在windows下运行还是linux呢
For the evaluation code provided in https://github.com/THU-KEG/OmniEvent/blob/main/examples/EAE/seq2seq.py
logits, labels, metrics, test_dataset = predict(trainer=trainer, tokenizer=tokenizer, data_class=data_class,
data_args=data_args, data_file=data_args.test_file,
training_args=training_args)
preds = get_pred_s2s(logits, tokenizer, pred_types=training_args.data_for_evaluation["pred_types"])
logging.info("\n")
logging.info("{}-EAE Evaluate Mode : {}-{}".format("-" * 25, data_args.eae_eval_mode, "-" * 25))
logging.info("{}-Use Golden Trigger: {}-{}".format("-" * 25, data_args.golden_trigger, "-" * 25))
if data_args.test_exists_labels:
logging.info("{} test performance before converting: {}".format(data_args.dataset_name, metrics))
get_ace2005_argument_extraction_s2s(preds, labels, data_args.test_file, data_args, None)
It seems that the labels being passed to get_ace2005_argument_extraction_s2s are still token ids, but the function is expecting it to have been parsed and prepared similar to how preds is formatted. Is there missing code here?
Thanks!
Note I am adapting this code for RAMs and using t5-base config.
发现一个小bug,/OmniEvent/OmniEvent/input_engineering/base_processor.py这个文件里的136行写的是:input_template: Optional[str][str] = None,但是程序报错了,说Optional[str][str]的语法是错的。因此我改成了Optional[str]=None,就可以运行了。
不知道我这样改对不对呢?
from OmniEvent.arguments import DataArguments, ModelArguments, TrainingArguments, ArgumentParser
from OmniEvent.input_engineering.seq2seq_processor import EDSeq2SeqProcessor, type_start, type_end
from OmniEvent.backbone.backbone import get_backbone
from OmniEvent.model.model import get_model
from OmniEvent.evaluation.metric import compute_seq_F1
from OmniEvent.trainer_seq2seq import Seq2SeqTrainer
from OmniEvent.evaluation.utils import predict, get_pred_s2s
from OmniEvent.evaluation.convert_format import get_trigger_detection_s2s
from transformers import T5ForConditionalGeneration, T5TokenizerFast
from ipdb import set_trace
def main():
# Step 2: Set up the customized configurations
parser = ArgumentParser((ModelArguments, DataArguments, TrainingArguments))
model_args, data_args, training_args = parser.parse_yaml_file(yaml_file="config/all-datasets/ed/s2s/duee.yaml")
training_args.output_dir = 'output/duee/ED/seq2seq/t5-base/'
data_args.markers = ["<event>", "</event>", type_start, type_end]
print('==================================step2 数据集配置文件yaml结束==================================')
# Step 3: Initialize the model and tokenizer
model_args.model_name_or_path = '/pretrained_model/t5'
model = T5ForConditionalGeneration.from_pretrained(model_args.model_name_or_path)
backbone = model
tokenizer = T5TokenizerFast.from_pretrained(model_args.model_name_or_path, never_split=data_args.markers)
config = model.config
model = get_model(model_args, backbone)
print("======================step3 模型初始化结束====================================")
# Step 4: Initialize the dataset and evaluation metric
data_args.train_file = '/data/processed/DuEE1.0/train.unified.jsonl'
data_args.test_file = "/data/processed/DuEE1.0/test.unified.jsonl"
data_args.validation_file = "/data/processed/DuEE1.0/valid.unified.jsonl"
train_dataset = EDSeq2SeqProcessor(data_args, tokenizer, data_args.train_file)
eval_dataset = EDSeq2SeqProcessor(data_args, tokenizer, data_args.validation_file)
metric_fn = compute_seq_F1
# Step 5: Define Trainer and train
trainer = Seq2SeqTrainer(
args=training_args,
model=model,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
compute_metrics=metric_fn,
data_collator=train_dataset.collate_fn,
tokenizer=tokenizer,
)
resume_from_checkpoint = 'OmniEvent-main/output/duee/ED/seq2seq/t5-base/checkpoint-7440'
if resume_from_checkpoint :
trainer.train(resume_from_checkpoint)
else:
trainer.train()
print('*****************************************训练结束********************************************')
# Step 6: Unified Evaluation
logits, labels, metrics, test_dataset = predict(trainer=trainer, tokenizer=tokenizer, data_class=EDSeq2SeqProcessor,
data_args=data_args, data_file=data_args.test_file,
training_args=training_args)
set_trace()
# paradigm-dependent metrics
print("{} test performance before converting: {}".format(test_dataset.dataset_name, metrics["test_micro_f1"]))
preds = get_pred_s2s(logits, tokenizer)
# convert to the unified prediction and evaluate
pred_labels = get_trigger_detection_s2s(preds, labels, data_args.test_file, data_args, None)
print("{} test performance after converting: {}".format(test_dataset.dataset_name, pred_labels["test_micro_f1"]))
if name == "main":
main()
您好,我在尝试将您readme里面的例子,用duee数据集,写成了py的格式。但是遇到了一些问题,例如metrics["test_micro_f1"]里为metrics["micro_f1"]、并且这里为0。请问您那边是否有这个的py文件,是否方便提供一下
I wander which version of transformers should be used.
I have problems like ModuleNotFoundError: No module named 'BartForConditionalGeneration'
Hello,
What's the difference between the ace2005-zh-novalue.py with ace2005-zh.py in the data processing scripts?
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.