Giter Club home page Giter Club logo

Comments (30)

t1101675 avatar t1101675 commented on September 6, 2024

Readme 中的例子是28亿参数的 EVA2.0,模型文件肯定会放出,但是具体时间可能得等智源研究院完成模型的评估,大概在2月份或者三月份

from eva.

xiaoqiao avatar xiaoqiao commented on September 6, 2024

from eva.

jiangliqin avatar jiangliqin commented on September 6, 2024

@t1101675 请问能在三月开放EVA2.0吗?

from eva.

xwwwwww avatar xwwwwww commented on September 6, 2024

您好,模型开源还在审批中,我们争取在三月下旬完成开源~

from eva.

t1101675 avatar t1101675 commented on September 6, 2024

我们的 EVA2.0 模型已经完成开放,相关链接已经更新至 README。另外我们也放出了 EVA2.0 模型的技术报告,通过实验探索了使用大规模预训练构建对话系统的一些重要问题,欢迎关注~

from eva.

jiangliqin avatar jiangliqin commented on September 6, 2024

非常感谢!

from eva.

jiangliqin avatar jiangliqin commented on September 6, 2024

体验下EVA2.0,测试了几个话题,感觉质量很差啊,请帮忙看看是我配置的问题吗 谢谢 @t1101675
image

from eva.

t1101675 avatar t1101675 commented on September 6, 2024

您看一下模型的配置文件有没有改成 eva2.0 的?

from eva.

jiangliqin avatar jiangliqin commented on September 6, 2024

CONFIG_PATH有改成 eva2.0_model_config.json

from eva.

t1101675 avatar t1101675 commented on September 6, 2024

应该是什么地方配置出问题了,我们先本地 check 一下

from eva.

jiangliqin avatar jiangliqin commented on September 6, 2024

好的,看上去是配置问题

from eva.

jiangliqin avatar jiangliqin commented on September 6, 2024

我是修改模型并行度为1部署的,可以复现看看~

from eva.

xwwwwww avatar xwwwwww commented on September 6, 2024

请问您运行的脚本是?

from eva.

jiangliqin avatar jiangliqin commented on September 6, 2024

change_mp修改模型并行度 python3 src/change_mp.py checkpoints/eva2.0_4 checkpoints/eva2.0 1
交互推理eva_inference_interactive_beam

from eva.

xwwwwww avatar xwwwwww commented on September 6, 2024

您好,我重新下载了我们上传到智源的模型,运行eva_inference_interactive_beam.sh,没有复现您的问题,可以正常运行
请问您提到的checkpoints/eva2.0_4这个文件,是下载的原始文件吗, 还是做了一些修改呢?

from eva.

jiangliqin avatar jiangliqin commented on September 6, 2024

checkpoints/eva2.0_4就是下载的原始文件,只是我修改了名称
image

from eva.

xwwwwww avatar xwwwwww commented on September 6, 2024

可以把运行后的全部log发一下嘛

from eva.

jiangliqin avatar jiangliqin commented on September 6, 2024

`python -m torch.distributed.launch --master_port 1256 --nproc_per_node 1 /mnt/src/eva_interactive.py --model-config /mnt/src/configs/model/eva2.0_model_config.json --model-parallel-size 1 --load /mnt/checkpoints/eva2.0 --no_load_strict --distributed-backend nccl --weight-decay 1e-2 --clip-grad 1.0 --tokenizer-path /mnt/bpe_dialog_new --temperature 0.9 --top_k 0 --top_p 0.9 --num-beams 4 --length-penalty 1.6 --repetition-penalty 1.6 --rule-path /mnt/rules --fp16 --deepspeed --deepspeed_config /mnt/src/configs/deepspeed/eva_ds_config.json
Loading Model ...
using world size: 1 and model-parallel size: 1

using dynamic loss scaling
[2022-03-21 05:40:58,880] [INFO] [distributed.py:39:init_distributed] Initializing torch distributed with backend: nccl
initializing model parallel with size 1
initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3140 and data parallel seed: 422
building Enc-Dec model ...
number of parameters on model parallel rank 0: 2841044992
DeepSpeed is enabled.
[2022-03-21 05:41:46,998] [INFO] [logging.py:60:log_dist] [Rank 0] DeepSpeed info: version=0.3.9+59e4dbb, git-hash=59e4dbb, git-branch=master
[2022-03-21 05:41:47,031] [INFO] [config.py:705:print] DeepSpeedEngine configuration:
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] activation_checkpointing_config <deepspeed.runtime.activation_checkpointing.config.DeepSpeedActivationCheckpointingConfig object at 0x7fc44d1fc0d0>
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] allreduce_always_fp32 ........ False
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] amp_enabled .................. False
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] amp_params ................... False
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] disable_allgather ............ False
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] dump_state ................... False
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 2000, 'delayed_shift': 4, 'min_scale': 256}
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] elasticity_enabled ........... False
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] fp16_enabled ................. True
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] global_rank .................. 0
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] gradient_accumulation_steps .. 1
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] gradient_clipping ............ 1.0
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] gradient_predivide_factor .... 1.0
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] initial_dynamic_scale ........ 65536
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] loss_scale ................... 0
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] memory_breakdown ............. False
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] optimizer_legacy_fusion ...... False
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] optimizer_name ............... None
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] optimizer_params ............. None
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] pld_enabled .................. False
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] pld_params ................... False
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] prescale_gradients ........... False
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] scheduler_name ............... None
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] scheduler_params ............. None
[2022-03-21 05:41:47,032] [INFO] [config.py:709:print] sparse_attention ............. None
[2022-03-21 05:41:47,033] [INFO] [config.py:709:print] sparse_gradients_enabled ..... False
[2022-03-21 05:41:47,033] [INFO] [config.py:709:print] steps_per_print .............. 10
[2022-03-21 05:41:47,033] [INFO] [config.py:709:print] tensorboard_enabled .......... False
[2022-03-21 05:41:47,033] [INFO] [config.py:709:print] tensorboard_job_name ......... DeepSpeedJobName
[2022-03-21 05:41:47,033] [INFO] [config.py:709:print] tensorboard_output_path ......
[2022-03-21 05:41:47,033] [INFO] [config.py:709:print] train_batch_size ............. 32
[2022-03-21 05:41:47,033] [INFO] [config.py:709:print] train_micro_batch_size_per_gpu 32
[2022-03-21 05:41:47,033] [INFO] [config.py:709:print] wall_clock_breakdown ......... True
[2022-03-21 05:41:47,033] [INFO] [config.py:709:print] world_size ................... 1
[2022-03-21 05:41:47,033] [INFO] [config.py:709:print] zero_allow_untested_optimizer True
[2022-03-21 05:41:47,033] [INFO] [config.py:709:print] zero_config .................. {
"allgather_bucket_size": 500000000,
"allgather_partitions": true,
"contiguous_gradients": false,
"cpu_offload": false,
"elastic_checkpoint": true,
"load_from_fp32_weights": true,
"overlap_comm": false,
"reduce_bucket_size": 500000000,
"reduce_scatter": true,
"stage": 1
}
[2022-03-21 05:41:47,033] [INFO] [config.py:709:print] zero_enabled ................. True
[2022-03-21 05:41:47,033] [INFO] [config.py:709:print] zero_optimization_stage ...... 1
[2022-03-21 05:41:47,033] [INFO] [config.py:711:print] json = {
"activation_checkpointing":{
"contiguous_memory_optimization":false,
"partition_activations":false
},
"fp16":{
"enabled":true,
"hysteresis":4,
"initial_scale_power":16,
"loss_scale":0,
"loss_scale_window":2000,
"min_loss_scale":256
},
"gradient_accumulation_steps":1,
"gradient_clipping":1.0,
"steps_per_print":10,
"train_micro_batch_size_per_gpu":32,
"wall_clock_breakdown":true,
"zero_allow_untested_optimizer":true,
"zero_optimization":{
"stage":1
}
}
[2022-03-21 05:41:47,036] [INFO] [engine.py:1286:_load_checkpoint] rank: 0 loading checkpoint: /mnt/checkpoints/eva2.0/1/mp_rank_00_model_states.pt
[2022-03-21 05:41:53,622] [WARNING] [engine.py:1384:_get_all_zero_checkpoints] Client provided zero checkpoint load paths: ['/mnt/checkpoints/eva2.0/1/zero_pp_rank_0_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_1_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_2_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_3_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_4_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_5_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_6_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_7_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_8_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_9_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_10_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_11_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_12_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_13_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_14_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_15_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_16_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_17_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_18_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_19_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_20_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_21_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_22_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_23_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_24_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_25_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_26_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_27_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_28_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_29_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_30_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_31_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_32_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_33_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_34_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_35_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_36_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_37_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_38_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_39_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_40_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_41_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_42_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_43_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_44_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_45_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_46_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_47_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_48_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_49_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_50_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_51_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_52_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_53_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_54_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_55_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_56_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_57_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_58_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_59_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_60_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_61_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_62_mp_rank_00optim_states.pt', '/mnt/checkpoints/eva2.0/1/zero_pp_rank_63_mp_rank_00optim_states.pt'] does not exist
successfully loaded /mnt/checkpoints/eva2.0/1/mp_rank_00_model_states.pt
Model Loaded!`

from eva.

xwwwwww avatar xwwwwww commented on September 6, 2024

log看起来没问题
原始checkpoint的并行度即为1,您试一下直接load未调用过change_mp.py的原文件

另外确认一下,您是使用的docker吗?

from eva.

jiangliqin avatar jiangliqin commented on September 6, 2024

eva2.0_model_config.json
eva2.0_base_model_config.json
eva2.0_large_model_config.json
有影响吗?

from eva.

jiangliqin avatar jiangliqin commented on September 6, 2024

好的,我使用原始的模型试试,我是用的docker

from eva.

xwwwwww avatar xwwwwww commented on September 6, 2024

eva2.0_model_config.json eva2.0_base_model_config.json eva2.0_large_model_config.json 有影响吗?

  • eva2.0_model_config.json: 对应的是xLarge (2.8B)版本的EVA2.0
  • base和large分别对应base(300M)和large(700M)版本的EVA2.0,这两个小模型后续我们也会放出。

from eva.

jiangliqin avatar jiangliqin commented on September 6, 2024

是修改模型并行度的问题,可以正常对话了。

from eva.

jiangliqin avatar jiangliqin commented on September 6, 2024

记得好像以前默认并行度是4,就根据惯性修改了并行度。谢谢您的耐心指导~

from eva.

xwwwwww avatar xwwwwww commented on September 6, 2024

是修改模型并行度的问题,可以正常对话了。

change_mp.py这个脚本我们fix过,您可以pull一下最新版本试一下。根据我的尝试应该不会导致问题了

from eva.

jiangliqin avatar jiangliqin commented on September 6, 2024

是修改模型并行度的问题,可以正常对话了。

change_mp.py这个脚本我们fix过,您可以pull一下最新版本试一下。根据我的尝试应该不会导致问题了

好的,谢谢

from eva.

jiangliqin avatar jiangliqin commented on September 6, 2024

请问paper中提到的模型存在几个问题:Consistency/Knowledge /Safety/Empathy,后续有优化的计划吗?

from eva.

t1101675 avatar t1101675 commented on September 6, 2024

这个我们正在尝试进行优化,但是因为属于比较前沿的研究问题,带有比较大的不确定性,所以优化后的版本什么时候放出还未确定。

from eva.

jiangliqin avatar jiangliqin commented on September 6, 2024

请问,beam search、topp sampling超参组合,如果想生成效果比较稳定,降低多样性的话,该怎么调整参数呢?通过调整不同参数,发现效果在稳定和多样性之间trade-off

from eva.

t1101675 avatar t1101675 commented on September 6, 2024

可以吧 temperature 参数调小

from eva.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.