yyding1 / gner Goto Github PK

Rethinking Negative Instances for Generative Named Entity Recognition

License: Apache License 2.0

Python 80.53% Shell 11.56% Jupyter Notebook 7.91%

flan-t5 huggingface large-language-models llama named-entity-recognition negatives state-of-the-art text-generation transformer

gner's People

Contributors

Stargazers

Watchers

gner's Issues

Program is stuck in load_dataset when testing

Hi,

I get a problem when testing t5-base model. After runing bash scripts/eval_t5_task_adaptation.sh, the program is stuck here in the output below.

Do you have any idea about this sitation? Thanks Very much!

$ bash scripts/eval_t5_task_adaptation.sh
++ shuf -i25000-30000 -n1
+ port=27486
+ BEAM_SIZE=1
+ MODEL_NAME_OR_PATH=./../../LLM_checkpoint/GNER-T5-base
+ DATA_DIR=data
+ DATA_CONFIG_DIR=configs/dataset_configs/task_adaptation_configs
+ INSTRUCTION_FILE=configs/instruction_configs/instruction.json
+ OUTPUT_DIR=output/flan-t5-base-task-adaptation-beam1
+ RUN_NAME=flan-t5-base-experiment
+ deepspeed --include=localhost:6,7 --master_port 27486 src/run.py --bf16 True --tf32 True --generation_num_beams 1 --do_predict --predict_with_generate --model_name_or_path ./../../LLM_checkpoint/GNER-T5-base --data_dir data --preprocessing_num_workers 4 --data_config_dir configs/dataset_configs/task_adaptation_configs --instruction_file configs/instruction_configs/instruction.json --output_dir output/flan-t5-base-task-adaptation-beam1 --per_device_eval_batch_size 4 --run_name flan-t5-base-experiment --max_source_length 640 --max_target_length 640 --generation_max_length 640 --overwrite_output_dir --overwrite_cache --seed 1234
[2024-03-18 10:27:51,415] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-03-18 10:28:06,363] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-03-18 10:28:06,409] [INFO] [runner.py:568:main] cmd = /data2/derongxu/anaconda3/envs/py/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbNiwgN119 --master_addr=127.0.0.1 --master_port=27486 --enable_each_rank_log=None src/run.py --bf16 True --tf32 True --generation_num_beams 1 --do_predict --predict_with_generate --model_name_or_path ./../../LLM_checkpoint/GNER-T5-base --data_dir data --preprocessing_num_workers 4 --data_config_dir configs/dataset_configs/task_adaptation_configs --instruction_file configs/instruction_configs/instruction.json --output_dir output/flan-t5-base-task-adaptation-beam1 --per_device_eval_batch_size 4 --run_name flan-t5-base-experiment --max_source_length 640 --max_target_length 640 --generation_max_length 640 --overwrite_output_dir --overwrite_cache --seed 1234
[2024-03-18 10:28:10,530] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-03-18 10:28:16,303] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [6, 7]}
[2024-03-18 10:28:16,303] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=2, node_rank=0
[2024-03-18 10:28:16,303] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2024-03-18 10:28:16,303] [INFO] [launch.py:163:main] dist_world_size=2
[2024-03-18 10:28:16,303] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=6,7
[2024-03-18 10:28:16,318] [INFO] [launch.py:253:main] process 417252 spawned with command: ['/data2/derongxu/anaconda3/envs/py/bin/python', '-u', 'src/run.py', '--local_rank=0', '--bf16', 'True', '--tf32', 'True', '--generation_num_beams', '1', '--do_predict', '--predict_with_generate', '--model_name_or_path', './../../LLM_checkpoint/GNER-T5-base', '--data_dir', 'data', '--preprocessing_num_workers', '4', '--data_config_dir', 'configs/dataset_configs/task_adaptation_configs', '--instruction_file', 'configs/instruction_configs/instruction.json', '--output_dir', 'output/flan-t5-base-task-adaptation-beam1', '--per_device_eval_batch_size', '4', '--run_name', 'flan-t5-base-experiment', '--max_source_length', '640', '--max_target_length', '640', '--generation_max_length', '640', '--overwrite_output_dir', '--overwrite_cache', '--seed', '1234']
[2024-03-18 10:28:16,327] [INFO] [launch.py:253:main] process 417253 spawned with command: ['/data2/derongxu/anaconda3/envs/py/bin/python', '-u', 'src/run.py', '--local_rank=1', '--bf16', 'True', '--tf32', 'True', '--generation_num_beams', '1', '--do_predict', '--predict_with_generate', '--model_name_or_path', './../../LLM_checkpoint/GNER-T5-base', '--data_dir', 'data', '--preprocessing_num_workers', '4', '--data_config_dir', 'configs/dataset_configs/task_adaptation_configs', '--instruction_file', 'configs/instruction_configs/instruction.json', '--output_dir', 'output/flan-t5-base-task-adaptation-beam1', '--per_device_eval_batch_size', '4', '--run_name', 'flan-t5-base-experiment', '--max_source_length', '640', '--max_target_length', '640', '--generation_max_length', '640', '--overwrite_output_dir', '--overwrite_cache', '--seed', '1234']
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
03/18/2024 10:28:35 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
03/18/2024 10:28:35 - INFO - __main__ - Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=True,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=True,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_config=None,
generation_max_length=640,
generation_num_beams=1,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=output/flan-t5-base-task-adaptation-beam1/runs/Mar18_10-28-35_sota,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=500,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=3.0,
optim=adamw_hf,
optim_args=None,
output_dir=output/flan-t5-base-task-adaptation-beam1,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=4,
per_device_train_batch_size=8,
predict_with_generate=True,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=[],
resume_from_checkpoint=None,
run_name=flan-t5-base-experiment,
save_on_each_node=False,
save_safetensors=False,
save_steps=500,
save_strategy=steps,
save_total_limit=None,
seed=1234,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=True,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
03/18/2024 10:28:35 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: True

About evaluation

Hi,
Congrats for the great work.
I have a question about evaluation for OOD benchmark. Are you including "else" label for the evaluation?

I have recently notice that UniversalNER remove them, but GLiNER does not

Thank you

Questions about the paper

Hey guys, great work! Thank you for publishing the paper. Very impressed with your results, especially for 250M and 780M models - they look super cool!

I've got several questions:

Am I right, that in your opinion, you got better results than GoLLIE because:

Your prompt is better than GoLLIE and you found out that is's important to generate an entire output (aka full contextual lengths, instead of what they are doing), and you made this entire algorithm actually viable thanks to LCS, etc
You resolve the problem of Back Tokenization (which also adds some additional performance)
Did I forget something???

When reading the paper, I got the impression that the Hierarchical Matching algorithm can be replaced with teacher forcing - you just make the model generate the correct word (if it's time to generate the next word of the sentence) or you force the model to make entity prediction aka generate "(" then some entity and then ")". Why did you do the "Hierarchical Matching Algorithm", am I missing something?

Thank you very much for your response

P.S. I am also working on NER-pretraining using artificial data. But I am primarily interested in pre-training BERT-like encoders with great token-level embeddings so mostly Feature Extraction task. Recently we released our models and got the SOTA few-shot results for NER. You can find them and the paper here: https://huggingface.co/collections/numind/paper-65e1f6e14639e2a465af823b

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.