Giter Club home page Giter Club logo

gner's People

Contributors

yyding1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

gner's Issues

Program is stuck in load_dataset when testing

Hi,

I get a problem when testing t5-base model. After runing bash scripts/eval_t5_task_adaptation.sh, the program is stuck here in the output below.

Do you have any idea about this sitation? Thanks Very much!

$ bash scripts/eval_t5_task_adaptation.sh
++ shuf -i25000-30000 -n1
+ port=27486
+ BEAM_SIZE=1
+ MODEL_NAME_OR_PATH=./../../LLM_checkpoint/GNER-T5-base
+ DATA_DIR=data
+ DATA_CONFIG_DIR=configs/dataset_configs/task_adaptation_configs
+ INSTRUCTION_FILE=configs/instruction_configs/instruction.json
+ OUTPUT_DIR=output/flan-t5-base-task-adaptation-beam1
+ RUN_NAME=flan-t5-base-experiment
+ deepspeed --include=localhost:6,7 --master_port 27486 src/run.py --bf16 True --tf32 True --generation_num_beams 1 --do_predict --predict_with_generate --model_name_or_path ./../../LLM_checkpoint/GNER-T5-base --data_dir data --preprocessing_num_workers 4 --data_config_dir configs/dataset_configs/task_adaptation_configs --instruction_file configs/instruction_configs/instruction.json --output_dir output/flan-t5-base-task-adaptation-beam1 --per_device_eval_batch_size 4 --run_name flan-t5-base-experiment --max_source_length 640 --max_target_length 640 --generation_max_length 640 --overwrite_output_dir --overwrite_cache --seed 1234
[2024-03-18 10:27:51,415] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-03-18 10:28:06,363] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-03-18 10:28:06,409] [INFO] [runner.py:568:main] cmd = /data2/derongxu/anaconda3/envs/py/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbNiwgN119 --master_addr=127.0.0.1 --master_port=27486 --enable_each_rank_log=None src/run.py --bf16 True --tf32 True --generation_num_beams 1 --do_predict --predict_with_generate --model_name_or_path ./../../LLM_checkpoint/GNER-T5-base --data_dir data --preprocessing_num_workers 4 --data_config_dir configs/dataset_configs/task_adaptation_configs --instruction_file configs/instruction_configs/instruction.json --output_dir output/flan-t5-base-task-adaptation-beam1 --per_device_eval_batch_size 4 --run_name flan-t5-base-experiment --max_source_length 640 --max_target_length 640 --generation_max_length 640 --overwrite_output_dir --overwrite_cache --seed 1234
[2024-03-18 10:28:10,530] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-03-18 10:28:16,303] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [6, 7]}
[2024-03-18 10:28:16,303] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=2, node_rank=0
[2024-03-18 10:28:16,303] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2024-03-18 10:28:16,303] [INFO] [launch.py:163:main] dist_world_size=2
[2024-03-18 10:28:16,303] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=6,7
[2024-03-18 10:28:16,318] [INFO] [launch.py:253:main] process 417252 spawned with command: ['/data2/derongxu/anaconda3/envs/py/bin/python', '-u', 'src/run.py', '--local_rank=0', '--bf16', 'True', '--tf32', 'True', '--generation_num_beams', '1', '--do_predict', '--predict_with_generate', '--model_name_or_path', './../../LLM_checkpoint/GNER-T5-base', '--data_dir', 'data', '--preprocessing_num_workers', '4', '--data_config_dir', 'configs/dataset_configs/task_adaptation_configs', '--instruction_file', 'configs/instruction_configs/instruction.json', '--output_dir', 'output/flan-t5-base-task-adaptation-beam1', '--per_device_eval_batch_size', '4', '--run_name', 'flan-t5-base-experiment', '--max_source_length', '640', '--max_target_length', '640', '--generation_max_length', '640', '--overwrite_output_dir', '--overwrite_cache', '--seed', '1234']
[2024-03-18 10:28:16,327] [INFO] [launch.py:253:main] process 417253 spawned with command: ['/data2/derongxu/anaconda3/envs/py/bin/python', '-u', 'src/run.py', '--local_rank=1', '--bf16', 'True', '--tf32', 'True', '--generation_num_beams', '1', '--do_predict', '--predict_with_generate', '--model_name_or_path', './../../LLM_checkpoint/GNER-T5-base', '--data_dir', 'data', '--preprocessing_num_workers', '4', '--data_config_dir', 'configs/dataset_configs/task_adaptation_configs', '--instruction_file', 'configs/instruction_configs/instruction.json', '--output_dir', 'output/flan-t5-base-task-adaptation-beam1', '--per_device_eval_batch_size', '4', '--run_name', 'flan-t5-base-experiment', '--max_source_length', '640', '--max_target_length', '640', '--generation_max_length', '640', '--overwrite_output_dir', '--overwrite_cache', '--seed', '1234']
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
03/18/2024 10:28:35 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
03/18/2024 10:28:35 - INFO - __main__ - Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=True,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=True,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_config=None,
generation_max_length=640,
generation_num_beams=1,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=output/flan-t5-base-task-adaptation-beam1/runs/Mar18_10-28-35_sota,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=500,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=3.0,
optim=adamw_hf,
optim_args=None,
output_dir=output/flan-t5-base-task-adaptation-beam1,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=4,
per_device_train_batch_size=8,
predict_with_generate=True,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=[],
resume_from_checkpoint=None,
run_name=flan-t5-base-experiment,
save_on_each_node=False,
save_safetensors=False,
save_steps=500,
save_strategy=steps,
save_total_limit=None,
seed=1234,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=True,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
03/18/2024 10:28:35 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: True

About evaluation

Hi,
Congrats for the great work.
I have a question about evaluation for OOD benchmark. Are you including "else" label for the evaluation?

I have recently notice that UniversalNER remove them, but GLiNER does not

Thank you

Questions about the paper

Hey guys, great work! Thank you for publishing the paper. Very impressed with your results, especially for 250M and 780M models - they look super cool!

I've got several questions:

  1. Am I right, that in your opinion, you got better results than GoLLIE because:
  • Your prompt is better than GoLLIE and you found out that is's important to generate an entire output (aka full contextual lengths, instead of what they are doing), and you made this entire algorithm actually viable thanks to LCS, etc
  • You resolve the problem of Back Tokenization (which also adds some additional performance)
  • Did I forget something???
  1. When reading the paper, I got the impression that the Hierarchical Matching algorithm can be replaced with teacher forcing - you just make the model generate the correct word (if it's time to generate the next word of the sentence) or you force the model to make entity prediction aka generate "(" then some entity and then ")". Why did you do the "Hierarchical Matching Algorithm", am I missing something?

Thank you very much for your response

P.S. I am also working on NER-pretraining using artificial data. But I am primarily interested in pre-training BERT-like encoders with great token-level embeddings so mostly Feature Extraction task. Recently we released our models and got the SOTA few-shot results for NER. You can find them and the paper here: https://huggingface.co/collections/numind/paper-65e1f6e14639e2a465af823b

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.