google-research / pegasus Goto Github PK

License: Apache License 2.0

Python 86.58% C++ 13.19% HTML 0.23%

pegasus's Issues

Plan to release large model

Hi authors
First of all, thank you for your awesome work!

I really enjoyed your paper with newly proposed GSP objective.
And I found your official implementation repository exists.

Just one question, do you guys plan to release large-sized model also?
I only found base model pre-trained on C4 datasets and am wondering if you have a plan to release large model.

Thanks,

Hey I would like to know pegasus AI-platform compatibility.
I tried to run it with reddiit_tifu dataset, there aren't any error logs as such but still job isn't being executed and stops after
Saving checkpoints for 0 into gs://movie-genre/text_summarization/ckpt/pegasus_ckpt/reddit_tifu/model.ckpt.
what could be the possible reason for this to happen and is there to fix this ?

Bash file which is used to submit AI-platform job has been pasted below.

configuration file is
trainingInput:
masterType: complex_model_l_gpu
scaleTier: CUSTOM

Google cloud arguments

REGION="us-east1"
BUCKET_NAME="movie-genre"
BUCKET_OBJECT="text_summarization"
DATASET="reddit_tifu_long_transformer"
CKPT="reddit_tifu"
BUCKET_PATH="movie-genre/text_summarization/ckpt"

Module mode arguments

Jobs running arguments

TRAIN_METHOD="train"
TRAINER_PACKAGE_PATH="$(pwd)/pegasus/"
now=$(date +"%Y%m%d_%H%M%S")
MAIN_TRAINER_MODULE="pegasus.train"
PACKAGE_STAGING_PATH="gs://$BUCKET_NAME/"

TRAIN_INIT_CHECKPOINT="gs://$BUCKET_PATH/pegasus_ckpt/model.ckpt-1500000"
VOCAB_FILENAME="gs://$BUCKET_PATH/pegasus_ckpt/c4.unigram.newline.10pct.96000.model"

JOB_NAME="PEGASUS_l""$CKPT""$now"
MODEL_DIR="gs://$BUCKET_PATH/pegasus_ckpt/$CKPT"
MODEL_DIRECTORY="gs://$BUCKET_NAME/BUCKET_OBJECT/pegasus/logs""$JOB_NAME""$now/"
CONFIG="config.yaml"
gcloud ai-platform jobs submit training $JOB_NAME
--staging-bucket $PACKAGE_STAGING_PATH
--runtime-version 1.15
--python-version 3.7
--package-path $TRAINER_PACKAGE_PATH
--module-name $MAIN_TRAINER_MODULE
--region $REGION
--config $CONFIG
--
--params=$DATASET
--param_overrides=vocab_filename=$VOCAB_FILENAME
--train_init_checkpoint=$TRAIN_INIT_CHECKPOINT
--model_dir=$MODEL_DIR

If I use the same bash file for job submission for xsum and wikihow dataset it asked me download dataset at AssertionError: Manual directory /root/tensorflow_datasets/downloads/manual does not exist or is empty. Create it and download/extract dataset artifacts in there. Additional instructions:

For big_patent: "The dataset you're trying to generate is using Apache Beam. Beam"
ValueError: The dataset you're trying to generate is using Apache Beam. Beamdatasets are usually very large and should be generated separately.
Let's suppose if I integrate dataflow pipeline, how am I supposed to use it while training ?

In local it was possible to download dataset in the required folder but where as in AI-platform, is there a way to access the dataset from bucket, whereever there is a requirement of manual download of dataset to root directory of tfds.

Error when trying to Download vocab, pretrained and fine-tuned checkpoints.

I get the following error:
TypeError: a bytes-like object is required, not 'str'
when I run
gsutil cp -r gs://pegasus_ckpt/ ckpt/

My python version is 3.6.

Difficulties to reproduce summarization results (CNNDM)

Hi,
Thank you very much for sharing the checkpoints and the code.
I am trying to reproduce the results of summarization datasets (CNNDM/XSum).
For the CNNDM dataset, I used following command and got the result lower than the reported score.
(I chooses the hyperparameter set from the paper (appendix-C))

python pegasus/bin/evaluate.py --params=cnn_dailymail_transformer \
> --param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model,batch_size=1,beam_size=8,beam_alpha=0.8 \
> --model_dir=ckpt/pegasus_ckpt/cnn_dailymail/model.ckpt-210000 --evaluate_test

and got

rouge1-R,0.458159,0.467907,0.477614
rouge1-P,0.432794,0.443048,0.451834
rouge1-F,0.431578,0.439820,0.447542
rouge2-R,0.218378,0.228153,0.238455
rouge2-P,0.207278,0.217396,0.227431
rouge2-F,0.205811,0.214875,0.224204
rougeL-R,0.322418,0.331526,0.341166
rougeL-P,0.304323,0.313319,0.322083
rougeL-F,0.302923,0.311144,0.319798

If I use

param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model,batch_size=1,beam_size=5,beam_alpha=0.6

I got slightly better score of rougeL-F,0.304459,0.313382,0.321829.

I guess I am missing some points. Also, I think I need to modify hyperparameters.
Would you let me know the exact hyperparameters for evaluating CNNDM (and XSum) dataset?

ValueError: Could not find trained model in model_dir

Hi,

I have the following folder structure in Colab:

I ran the following command for evaluation:

!python3 pegasus/bin/evaluate.py --params=cnn_dailymail_transformer \
--param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model \
--model_dir=ckpt/pegasus_ckpt/cnn_dailymail

However, the trained model in the model-dir is not being picked up.

The error logs are:

WARNING:tensorflow:From pegasus/bin/evaluate.py:152: The name tf.enable_eager_execution is deprecated. Please use tf.compat.v1.enable_eager_execution instead.

WARNING:tensorflow:From pegasus/bin/evaluate.py:153: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.

WARNING:tensorflow:From pegasus/bin/evaluate.py:85: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
W0629 22:40:38.187686 139837068035968 deprecation.py:323] From pegasus/bin/evaluate.py:85: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
WARNING:tensorflow:From /content/pegasus/pegasus/ops/public_parsing_ops.py:93: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

W0629 22:40:38.193506 139837068035968 module_wrapper.py:139] From /content/pegasus/pegasus/ops/public_parsing_ops.py:93: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

WARNING:tensorflow:From /content/pegasus/pegasus/params/estimator_utils.py:49: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

W0629 22:40:38.410753 139837068035968 module_wrapper.py:139] From /content/pegasus/pegasus/params/estimator_utils.py:49: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:Estimator's model_fn (<function _estimator_model_fn.<locals>.model_fn at 0x7f2ddcefd268>) includes params argument, but params are not passed to Estimator.
W0629 22:40:38.411430 139837068035968 estimator.py:1994] Estimator's model_fn (<function _estimator_model_fn.<locals>.model_fn at 0x7f2ddcefd268>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': 'ckpt/pegasus_ckpt/cnn_dailymail', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f2ddcf57be0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=2, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
I0629 22:40:38.416057 139837068035968 estimator.py:212] Using config: {'_model_dir': 'ckpt/pegasus_ckpt/cnn_dailymail', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f2ddcf57be0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=2, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
I0629 22:40:38.416823 139837068035968 tpu_context.py:220] _TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
W0629 22:40:38.417448 139837068035968 tpu_context.py:222] eval_on_tpu ignored because use_tpu is False.
Traceback (most recent call last):
  File "pegasus/bin/evaluate.py", line 153, in <module>
    tf.app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "pegasus/bin/evaluate.py", line 126, in main
    global_step = estimator.get_variable_value("global_step")
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 267, in get_variable_value
    _check_checkpoint_available(self.model_dir)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1929, in _check_checkpoint_available
    'Could not find trained model in model_dir: {}.'.format(model_dir))
ValueError: Could not find trained model in model_dir: ckpt/pegasus_ckpt/cnn_dailymail.

My pip-freeze | grep tensor is as follows:

mesh-tensorflow==0.1.16
tensor2tensor==1.15.0
tensorboard==1.15.0
tensorboard-plugin-wit==1.6.0.post3
tensorboardcolab==0.0.22
tensorflow==1.15.3
tensorflow-addons==0.8.3
tensorflow-datasets==2.1.0
tensorflow-estimator==1.15.1
tensorflow-gan==2.0.0
tensorflow-gcs-config==2.2.0
tensorflow-gpu==1.15.2
tensorflow-hub==0.8.0
tensorflow-metadata==0.22.2
tensorflow-privacy==0.2.2
tensorflow-probability==0.7.0
tensorflow-text==1.15.0rc0

TypeError: load() got an unexpected keyword argument 'shuffle_files'

Hi,

I am trying to get the model to work on a GCP Jupyter notebook. I chose the setting as closely as possible as in the README. That is,

Environment: TensorFlow: 1.15
Machine type: 8vCPUs, 52GB RAM (n1-highmem-8)
GPU: NVIDIA Tesla V100 x 1
Boot disk size: 500GB

I am running on python 3.7.6

Then I followed the rest of the README file as accurately as possible. Thus in the Jupyter notebook terminal:

git clone https://github.com/google-research/pegasus' (works fine)
cd pegasus
export PYTHONPATH=.
pip3 install -r requirements.txt

pip install -e pegasus' (This step is not in the README but in #16 )

Skipped the install gsutil step as it is installed by default

mkdir ckpt
gsutil cp -r gs://pegasus_ckpt/ ckpt/

Up until here everything seems to work perfectly.

The I run the finetuning command:

python3 pegasus/bin/train.py --params=aeslc_transformer \
--param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model \
--train_init_checkpoint=ckpt/pegasus_ckpt/model.ckpt-1500000 \
--model_dir=ckpt/pegasus_ckpt/aeslc

And I receive the following error:

TypeError: load() got an unexpected keyword argument 'shuffle_files'

I tried running it through the terminal and through a jupyter notebook but both gave me the same error. Do you have any idea what might be causing this?

Need help for summarization task for the XSum dataset (Out of range error)

Hi,
Congratulations on great work! I appreciate you all for making resources publicly available.

I am currently working on reproducing summarization results using provided checkpoints.
It was very succesfull for the other datasets.
However, I tried the XSum dataset but it ends up with Out of range error.

Currently, for the XSum dataset, TensorFlow (TFDS) requires to manually download and put the preprocessed dataset into a specific location, such as ~/tensorflow_datasets/downloads/manual/xsum-extracts-from-downloads.tar.gz.

Since there are a few issues on downloading XSum dataset from official repository
I had to preprocess the dataset to match the format.
Here is an example of preprocessed data. (11100448.data)

[XSUM]URL[XSUM]
http://web.archive.org/web/20160404221034/http://www.bbc.co.uk/news/entertainment-arts-11100448

[XSUM]INTRODUCTION[XSUM]
Comedian Frankie Boyle has been given his own series on Channel 4 as part of its comedy-heavy autumn 2010 schedule.

[XSUM]RESTBODY[XSUM]
Frankie Boyle's Tramadol Nights is a six-part series described as "no-holds-barred stand up with pre-filmed sketches".
Peep Show also returns for a seventh series, making it the longest running comedy in Channel 4 history.
...

However, I faced Out of range error.
According to the log line, the model was able to find 11334 examples.
(I0526 14:32:49.777998 140570555500352 datasets.py:215] Number of examples for config xsum test is 11334,)

Do you have any idea about solving this error?
Alternatively, I would appreciate it if I could get prediction results (=summarized text) for PEGASUS LARGE (C4) model!!

Thank you very much!
Wonjin

Here is the full-length log (I removed some of unnecessary warning log lines) :

CUDA_VISIBLE_DEVICES=1 python pegasus/bin/evaluate.py --params=xsum_transformer --param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model,batch_size=1,beam_size=5,beam_alpha=0.6 --model_dir=ckpt/pegasus_ckpt/xsum/model.ckpt-30000  --evaluate_test

WARNING:tensorflow:Estimator's model_fn (<function _estimator_model_fn.<locals>.model_fn at 0x7fd86d117c80>) includes params argument, but params are not passed to Estimator.
W0526 14:32:49.484526 140570555500352 estimator.py:1994] Estimator's model_fn (<function _estimator_model_fn.<locals>.model_fn at 0x7fd86d117c80>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': 'ckpt/pegasus_ckpt/xsum', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd86d116a20>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=2, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
I0526 14:32:49.485391 140570555500352 estimator.py:212] Using config: {'_model_dir': 'ckpt/pegasus_ckpt/xsum', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd86d116a20>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=2, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
I0526 14:32:49.485875 140570555500352 tpu_context.py:220] _TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
W0526 14:32:49.486188 140570555500352 tpu_context.py:222] eval_on_tpu ignored because use_tpu is False.
I0526 14:32:49.493249 140570555500352 dataset_info.py:358] Load dataset info from /home/wonjin/tensorflow_datasets/xsum/1.1.0
I0526 14:32:49.494374 140570555500352 dataset_builder.py:287] Reusing dataset xsum (/home/wonjin/tensorflow_datasets/xsum/1.1.0)
I0526 14:32:49.494525 140570555500352 dataset_builder.py:499] Constructing tf.data.Dataset for split test, from /home/wonjin/tensorflow_datasets/xsum/1.1.0
I0526 14:32:49.777998 140570555500352 datasets.py:215] Number of examples for config xsum test is 11334
2020-05-26 14:32:50.674068: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-05-26 14:32:50.707175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:20:00.0
2020-05-26 14:32:50.707436: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-05-26 14:32:50.708788: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-05-26 14:32:50.709977: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-05-26 14:32:50.710302: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-05-26 14:32:50.711864: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-05-26 14:32:50.713033: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-05-26 14:32:50.716500: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-26 14:32:50.722362: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
INFO:tensorflow:Calling model_fn.
I0526 14:32:51.179434 140570555500352 estimator.py:1148] Calling model_fn.
INFO:tensorflow:Running infer on CPU
I0526 14:32:51.180392 140570555500352 tpu_estimator.py:3124] Running infer on CPU

INFO:tensorflow:Done calling model_fn.
I0526 14:33:01.511505 140570555500352 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Graph was finalized.
I0526 14:33:02.609902 140570555500352 monitored_session.py:240] Graph was finalized.
2020-05-26 14:33:02.611567: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-05-26 14:33:02.648825: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz
2020-05-26 14:33:02.652197: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x617d2c0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-05-26 14:33:02.652242: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-05-26 14:33:02.994441: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4ec4b30 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-05-26 14:33:02.994511: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): TITAN RTX, Compute Capability 7.5
2020-05-26 14:33:02.996206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:20:00.0
2020-05-26 14:33:02.996437: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-05-26 14:33:02.996472: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-05-26 14:33:02.996508: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-05-26 14:33:02.996548: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-05-26 14:33:02.996575: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-05-26 14:33:02.996619: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-05-26 14:33:02.996658: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-26 14:33:02.999220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-05-26 14:33:02.999273: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-05-26 14:33:03.001303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-26 14:33:03.001323: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2020-05-26 14:33:03.001333: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2020-05-26 14:33:03.003746: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8384 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:20:00.0, compute capability: 7.5)
INFO:tensorflow:Restoring parameters from ckpt/pegasus_ckpt/xsum/model.ckpt-30000
I0526 14:33:03.009030 140570555500352 saver.py:1284] Restoring parameters from ckpt/pegasus_ckpt/xsum/model.ckpt-30000
2020-05-26 14:33:06.812386: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Out of range: Read less bytes than requested
ERROR:tensorflow:Error recorded from prediction_loop: 2 root error(s) found.
  (0) Out of range: Read less bytes than requested
         [[node save/RestoreV2 (defined at /home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
         [[save/RestoreV2/_301]]
  (1) Out of range: Read less bytes than requested
         [[node save/RestoreV2 (defined at /home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'save/RestoreV2':
  File "pegasus/bin/evaluate.py", line 153, in <module>
    tf.app.run(main)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "pegasus/bin/evaluate.py", line 144, in main
    FLAGS.enable_logging)
  File "/hdd3/wonjin/pegasus/pegasus/eval/text_eval.py", line 153, in text_eval
    for i, features in enumerate(features_iter):
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3072, in predict
    yield_single_examples=yield_single_examples):
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 638, in predict
    hooks=all_hooks) as mon_sess:
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 1014, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 725, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 1207, in __init__
    _WrappedSession.__init__(self, self._create_session())
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 1212, in _create_session
    return self._sess_creator.create_session()
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 878, in create_session
    self.tf_sess = self._session_creator.create_session()
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 638, in create_session
    self._scaffold.finalize()
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 229, in finalize
    self._saver = training_saver._get_saver_or_default()  # pylint: disable=protected-access
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 599, in _get_saver_or_default
    saver = Saver(sharded=True, allow_empty=True)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 878, in _build
    build_restore=build_restore)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 502, in _build_internal
    restore_sequentially, reshape)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 381, in _AddShardedRestoreOps
    name="restore_shard"))
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

Loss and Perplexity during train & eval

Will you add support for computing validation set perplexity? I dont see perplexity in the list of metrics. Also during training, its a bit hard to gage where we are in terms of train/eval loss/ppl so we can decide when to stop training. Just wondering if there is a plan to add this to the code?

Finetuning Loss not decreasing on Custom Summarization Task [Help wanted]

Hi,
First of all Great paper!
Lately, I have been doing abstractive summarization tasks separately for an agent and a customer, given a conversation transcript between both. We have total labeled datapoints of around 700-1000. (Conversation transcripts)

Currently, I am fine-tuning the released C4 + HugeNews checkpoint to perform abstractive summarization for speaker-1 (Agent)
Following is the input/output format to the encoder/decoder

# only the sentences correspond to speaker-1 (agent). Each sentence separated by a full stop
 Input: This is agent sentence-1.  This is agent sentence-2. This is agent sentence-3.

# corresponding ground truth
 output: This is the agent summary

I started the fine-tuning for 20 epochs with a learning rate of 2e-4
The loss is not decreasing after this point and not converging or stuck to local minima.
Any thoughts on how should I approach this problem. Any plans to release PEGASUS_base
Also, as of now, we have fine-tuned the T5 model continuing the summarization task with the "summarization" prefix. it's able to converge much faster in just 5 epochs.

Pegasus Base

Hello there,
Thank you for your quick and always helpful responses.

Do you plan on publishing the PEGASUS_BASE model?

Visualisation of the attention weight

I have a question about the visualisation of the attention weight.
I’d like to visualise the weight in the self-attention layer.
So I extract the weight in inference with fine-tuned model for arxiv.
Could you show how to do that?

I realise that the alignment_BxHxIxM, which is in line 78 in attention.py, is weight.
However, I cannot extract it using fine-tuned model.

I would appreciate it if you could response me.

Unable to run Readme example

Hi! I run your work on google colab. At this step:
!python3 pegasus/bin/train.py --params=aeslc_transformer \ --param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model \ --train_init_checkpoint=ckpt/pegasus_ckpt/model.ckpt-1500000 \ --model_dir=ckpt/pegasus_ckpt/aeslc
The error comes up.
Traceback (most recent call last): File "pegasus/bin/train.py", line 17, in <module> from pegasus.data import infeed File "/usr/local/lib/python3.6/dist-packages/pegasus/__init__.py", line 1, in <module> from pegasus.parser import * File "/usr/local/lib/python3.6/dist-packages/pegasus/parser.py", line 10, in <module> from pegasus.rules import _build_rule, ParseError, Lazy File "/usr/local/lib/python3.6/dist-packages/pegasus/rules.py", line 62 print 'pegasus: {}\x1b[2;38;5;241menter {} -> {}\x1b[m'.format(depth, repr(char()), _name) ^ SyntaxError: invalid syntax

Is there a cased version?

Hi,
Do you have a cased version of the base (pretrained) model and vocab?

Add JSON as new finetuning dataset

Hi,

The documentation refers to a tutorial on how a new TFDS dataset can be created.

I am trying to create and use my own finetuning dataset. Currently I have a JSON with text strings and corresponding summary strings which I would like to use for finetuning. Honestly, the tutorial was of very little help to me as I found it very complicated, I was also not able to find any other very informative sources concerning the TFDS dataset creation so I would really appreciate if you could provide some instructions. I understand it might be a lot to ask but I would really appreciate your help!

testing trained model

Hi there,

I have trained pegasus on my own datasets. It received excellent results for the evaluation dataset.

But right now, I don't know how to test on my test dataset. For example, my test dataset only have documents as inputs without labeled summaries.

Thanks

Will BERT+transformer-decoder better than tensor2tensor for text-generation?

Thank you very much.

dynamic GSR confusing

Hi, thx for great work about enhancing BERT like model's discourse and generation ability. I am reading the code of GSP sentence selecting & masking then I notice that the GSR for mixed & dynamic training is between 15% and 45%.

- the model dynamically choose 15%-45% important sentences to generate

There are two points, which confused me a lot.

related code block

is the related code block as followed?

 } else if (strategy_ == "dynamic_rouge") {
      indices = GetSentenceIndicesByRouge(
          sentences_vec, text, rouge_,
          masked_sentence_ratio_ *
              absl::Uniform(gen, dynamic_mask_min_ratio_, 1.0),
          rouge_noise_ratio_);
...

GSR setting

if it is, I noticed that the GSR setting here is sampled from Uniform(dynamic_mask_min_ratio_, 1.0), which is not matched with README's description.

tfds with limited examples

I am pretty interested in Fine-tuning with limited supervised examples experiments. However, I am not familiar with TensorFlow datasets. For example, if I ran an experiment on the AESLC dataset.

Tensorflow dataset load data with:

dataset = all_datasets.get_dataset(input_pattern, training)
dataset = dataset.map(parser, num_parallel_calls=parallelism)
dataset = dataset.unbatch()
dataset = dataset.shuffle(10000)
dataset = dataset.repeat()
dataset = dataset.padded_batch(
        params["batch_size"],
        padded_shapes=shapes,
        drop_remainder=drop_remainder)
dataset = dataset.prefetch(512)

This will load whole datasets for training. What is I just want to use only 50% of training data? What should I do?

installation of gsutil

Hi,
thank you for this very interesting contribution!

There might be a small mistake in the documentation of the installation process.
sudo apt install gsutil installed the "GrandStream" manager on Ubuntu 20.04, but not google's gsutil.
However pip install gsutil worked for me. So maybe you just want to add it to the requirements.txt and remove the apt dependency from the documentation.

Whether to release the Chinese version is in the future plan？

Hi,I am impressed with your model. So I want to consult, Whether to release the Chinese version is in the future plan?

evaluate.py not using GPU

I ran the setup instructions on a preixisting GCP machine with cuda 10.1 and one modification:

mv ckpt/pegasus_ckpt ckpt2

(Instructions don't work as written because they don't acknowledge the pegasus_ckpt subdirectory, or that you need to point --model_dir to a specific checkpoint file, which is the only way I got evaluate.py to run.

Then, I ran

python3 pegasus/bin/evaluate.py --params=aeslc_transformer \
--param_overrides=vocab_filename=ckpt2/c4.unigram.newline.10pct.96000.model,batch_size=1,beam_size=5,beam_alpha=0.6 --model_dir ckpt2/aeslc/model.ckpt-32000 | tee -a pegasus_output.txt

and it is running on 8 CPU cores, nvidia-smi similarly shows 0 GPU utilization.

How can I fix that?

Env:

mesh-tensorflow==0.1.13
tensor2tensor==1.15.0
tensorboard==1.15.0
tensorboardX==2.0
tensorflow==1.15.3
tensorflow-datasets==3.0.0
tensorflow-estimator==1.15.1
tensorflow-gan==2.0.0
tensorflow-gpu==1.15.0
tensorflow-hub==0.8.0
tensorflow-metadata==0.21.2
tensorflow-probability==0.7.0
tensorflow-text==1.15.0rc0

Could you please provide a training data example?

I am using the python code to do a text-generation experiments, as it is really coded well.

I find I cannot just let targets be the origin sentence.

I think I miss the BOS_ID and EOS_ID.

Could you please give a targets example include PAD, BOS_ID and EOS_ID.

Thank you very much.

ERROR "pip install -e pegasus"

Hello, when I run the command "pip install -e pegasus", I got an error:

"ERROR: File "setup.py" not found. Directory cannot be installed in editable mode: ./pegasus-master/pegasus", I want to know how to fix it. Thanks a lot

TensorFlow 2.0 Support

Hi! I was curious whether there will be an upgrade of this repository with TensorFlow 2.0 support because when I tried to install TensorFlow 1.15, I got this error:

ERROR: Could not find a version that satisfies the requirement tensorflow==1.15 (from versions: 2.2.0rc1, 2.2.0rc2, 2.2.0rc3, 2.2.0rc4, 2.2.0)
ERROR: No matching distribution found for tensorflow==1.15

and I also tried

import tensorflow.compat.v1 as tf

but in some files for example in continuous_eval.py there is

from tensorflow.cotrib import training as contrib_training

where I am not able to change.
Kindly upgrade the code susceptible to TensorFlow 2.0.

Not able to run readme example

Followed the steps mentioned in the ReadMe. Installations done.
Trying to run following example =>

python pegasus/bin/evaluate.py --params=aeslc_transformer --param_overrides=vocab_filename=./ckpt/pegasus_ckpt/aeslc/model.ckpt-32000.data-00000-of-00001,batch_size=1,beam_size=5,beam_alpha=0.6 --model_dir=./ckpt/pegasus_ckpt/aeslc

Error/Logs

WARNING:tensorflow:From pegasus/bin/evaluate.py:152: The name tf.enable_eager_execution is deprecated. Please use tf.compat.v1.enable_eager_execution instead.

WARNING:tensorflow:From pegasus/bin/evaluate.py:153: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.

WARNING:tensorflow:From pegasus/bin/evaluate.py:85: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
W0623 13:05:34.706781 140089317664512 deprecation.py:323] From pegasus/bin/evaluate.py:85: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
WARNING:tensorflow:From /home/ubuntu/pegasus/pegasus/ops/public_parsing_ops.py:93: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

W0623 13:05:34.710312 140089317664512 module_wrapper.py:139] From /home/ubuntu/pegasus/pegasus/ops/public_parsing_ops.py:93: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

python: /sentencepiece/src/../third_party/protobuf-lite/google/protobuf/stubs/stringpiece.h:230: google::protobuf::StringPiece::StringPiece(const char*, google::protobuf::stringpiece_ssize_type): Assertion `len >= 0' failed.
Fatal Python error: Aborted

Current thread 0x00007f6916042700 (most recent call first):
  File "/root/anaconda3/lib/python3.7/site-packages/sentencepiece.py", line 75 in LoadFromSerializedProto
  File "/home/ubuntu/pegasus/pegasus/ops/public_parsing_ops.py", line 94 in __init__
  File "/home/ubuntu/pegasus/pegasus/ops/public_parsing_ops.py", line 75 in create_text_encoder
  File "/home/ubuntu/pegasus/pegasus/params/public_params.py", line 95 in transformer_params
  File "/home/ubuntu/pegasus/pegasus/params/public_params.py", line 162 in aeslc_transformer
  File "pegasus/bin/evaluate.py", line 110 in main
  File "/root/anaconda3/lib/python3.7/site-packages/absl/app.py", line 250 in _run_main
  File "/root/anaconda3/lib/python3.7/site-packages/absl/app.py", line 299 in run
  File "/root/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40 in run
  File "pegasus/bin/evaluate.py", line 153 in <module>
Aborted (core dumped)

System - Ubuntu 16.04
Python - anaconda3, python 3.7.3
GPU - 16gb Telsa T4
Tensorflow pkgs

mesh-tensorflow==0.1.13
tensorflow==1.15.3
tensorflow-datasets==3.1.0
tensorflow-estimator==1.15.1
tensorflow-gan==2.0.0
tensorflow-gpu==1.15.2
tensorflow-hub==0.8.0
tensorflow-metadata==0.22.2
tensorflow-probability==0.7.0
tensorflow-text==1.15.0rc0

Where is the attention_mask or causal_mask?

According to https://sshleifer.github.io/blog_v2/jupyter/2020/03/12/bart.html
It seems there is no mask in https://github.com/google-research/pegasus/blob/master/pegasus/models/transformer.py
Thank you very much.

Error when running evaluate.py: ImportError: cannot import name 'ContextManager' (Python 3.5)

I created an instance on google cloud according to your recommendation. When trying to evaluate on the finetuned dataset I get the following error:

ImportError: cannot import name 'ContextManager'
Python 3.5 is installed. Which version is suggested?

Missing comma in setup.py

Hi,

I suspect that the requirements file in the setup.py is missing a comma. I'm not sure whether it is indeed the case as I read that others had gotten the model to work, but just in case I wanted to point it out. Hope it helps!

install_requires=[
        'absl-py',
        'mock',
        'numpy',
        'rouge-score',
        'sacrebleu',
        'sentencepiece',
        'tensorflow==1.15'         <-- I think there should be a comma here
        'tensorflow-text==1.15.0rc0',
        'tfds-nightly',
        'tensor2tensor==1.15.0',
    ],

EDIT: Markdown

Memory need for finetune on aeslc dataset

Hi there,

I recently read your paper and want to have a try for your code.

After running:

python3 pegasus/bin/train.py --params=aeslc_transformer \
--param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model \
--train_init_checkpoint=ckpt/pegasus_ckpt/model.ckpt-1500000 \
--model_dir=ckpt/pegasus_ckpt/aeslc

I fount the error "out of memory". Therefore, I want to ask the memory requirement for finetuneing your code.

Extractive Prediction Instead of Abstractive Prediction

Hi!
I have tried to run the pre-trained model to test it on my dataset which consists of paragraphs as inputs and one line sentence as targets. The problem was when I saw the prediction it was extracted from the input instead of generating one as expected.

kandarp@kandarp:~/Downloads/pegasus$ python3 pegasus/bin/evaluate.py --params=new_params --param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model,batch_size=1,beam_size=5,beam_alpha=0.6 --model_dir=ckpt/pegasus_ckpt

The new_params is the new .tfrecords dataset for testing.

In the output, I am getting the following output:

I0612 16:45:32.333093 140653653737856 text_eval.py:126] INPUTS: [0]:
Live in the country and last three years longer than my city friends? Good news indeed, more backing for a lifestyle choice made half a lifetime ago when it seemed a good idea to exchange an Edinburgh terrace for a farm cottage. I knew it was a good idea because I had been there before. Born and reared on a farm I had been seduced for a few years by the idea of being a big shot who lived and worked in a city rather than only going for the day to wave at the buses. True, I was familiar with some of the minor disadvantages of country living such as an iffy private water supply sometimes infiltrated by a range of flora and fauna (including, on one memorable occasion, a dead lamb), the absence of central heating in farmhouses and cottages, and a single track farm road easily blocked by snow, broken-down machinery or escaped livestock. But there were many advantages as I told Liz back in the mid-Seventies. Town born and bred, eight months pregnant and exchanging a warm, substantial Corstorphine terrace for a windswept farm cottage on a much lower income, persuading her that country had it over town might have been difficult.
I0612 16:45:32.334013 140653653737856 text_eval.py:126] TARGETS: Although there are many advantages of country living, it is still difficult to persuade a town- born and bred person to live in the country due to disadvantages and inconvenience of country living life.
I0612 16:45:32.335105 140653653737856 text_eval.py:126] PREDICTIONS: Good news indeed, more backing for a lifestyle choice made half a lifetime ago when it seemed a good idea to exchange an Edinburgh terrace for a farm cottage.

The prediction is the 2nd line of the Input.

Is there a mistake by me or is it the problem of the model?

Fine-tuned model

About fine-tuned model, can we consider the model in the derectory of pegasus_ckpt/arxiv as fine-tuned model with arxiv dataset?
There is no description about that, this is just checking.

Summarization code example?

Hi, I'm very curious about this model. I'd love to know how to generate summaries from it. A code snippet in a python script would be very helpful.

I'd like to input text and get an output summary out.

Also is it possible to specify the length of the summary?

hi，will there be a docker soon？

How does Pegasus perform on other seq2seq downstream tasks?

The GSG pretraining task makes sense in the context of a summarization downstream task - could this model be finetuned on other seq2seq tasks like Question Generation?

Error when running finetuning

Hi! I follow your readme instruction and run the code on google colab, but errors show up when running the finetuning part:
!python3 pegasus/bin/train.py --params=aeslc_transformer \ --param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model \ --train_init_checkpoint=ckpt/pegasus_ckpt/model.ckpt-1500000 \ --model_dir=ckpt/pegasus_ckpt/aeslc

This error report shows up again and again in the output field:
``2020-07-02` 18:53:50.204708: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats:
Limit: 11330115994
InUse: 10814302464
MaxInUse: 10881414400
NumAllocs: 4881
MaxAllocSize: 565956352

2020-07-02 18:53:50.204943: W tensorflow/core/common_runtime/bfc_allocator.cc:424] *************************************************************************************************___
2020-07-02 18:53:50.208784: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at softmax_op_gpu.cu.cc:157 : Resource exhausted: OOM when allocating tensor with shape[8,16,1024,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
ERROR:tensorflow:Error recorded from training_loop: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[8,16,1024,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node encoder/layer_5/self_attention/Softmax (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[softmax_cross_entropy_loss/value/_9835]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[8,16,1024,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node encoder/layer_5/self_attention/Softmax (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

Original stack trace for 'encoder/layer_5/self_attention/Softmax':
File "pegasus/bin/train.py", line 94, in
tf.app.run(main)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "pegasus/bin/train.py", line 89, in main
max_steps=train_steps)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3030, in train
saving_listeners=saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1191, in _train_model_default
features, labels, ModeKeys.TRAIN, self.config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2857, in _call_model_fn
config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3126, in _model_fn
features, labels, is_export_mode=is_export_mode)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1663, in call_without_tpu
return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1994, in _call_model_fn
estimator_spec = self._model_fn(features=features, **kwargs)
File "/content/pegasus/pegasus/params/estimator_utils.py", line 105, in model_fn
loss, outputs = model_params.model()(features, training)
File "/content/pegasus/pegasus/models/transformer.py", line 91, in call
context = self._encode(features, training)
File "/content/pegasus/pegasus/models/transformer.py", line 72, in _encode
None, None)
File "/content/pegasus/pegasus/layers/transformer_block.py", line 100, in stack
decode_i=decode_i)
File "/content/pegasus/pegasus/layers/transformer_block.py", line 65, in call
y_BxIxD, bias_BxIxI, training, cache=cache, decode_i=decode_i)
File "/content/pegasus/pegasus/layers/attention.py", line 94, in call
x, x, bias, training, cache=cache, decode_i=decode_i)
File "/content/pegasus/pegasus/layers/attention.py", line 78, in call
alignment_BxHxIxM = tf.nn.softmax(logits_BxHxIxM)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py", line 2958, in softmax
return _softmax(logits, gen_nn_ops.softmax, axis, name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py", line 2891, in _softmax
return compute_op(logits, name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_nn_ops.py", line 11376, in softmax
"Softmax", logits=logits, name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in init
self._traceback = tf_stack.extract_stack()
``

Pre-training on German dataset

Hi,

Thank you for the code! Are you planning on releasing a pre-trained version for German? If not, would we need to pre-train the model using a german dataset?

Thank you

Updating Security Measures

Dear Authors ,
Hope you're doing well 😃 !

I found out this project and immediately realised its potential. I want to update the project with some secure syntax. Can I get the names of the python files which are standalone and not being imported to other files so that I can create a dedicated PR ? Thanks in advance. 👍

PS :: I will try my best to make this project a success.

_registered_params is empty dict

hello,
I was creating a prototype with a pre-trained model. But showing error when recreating the model using the steps given.

ValueError: Name 'aeslc_transformer' is not registered. Registered names are .

But when I am reading Source code and debugging it in
https://github.com/google-research/pegasus/blob/master/pegasus/params/registry.py

_registered_params = {}

This _registered_params is an empty dictionary.
So _registered_params has to be filled with any config that I am missing right now.

How to train using multiple GPUs?

Hi,
I have multiple GPUs available; how can I train using all GPUs? Do you support model & data parallelism across multiple GPUs?

Fine-tuned Model

Hello, Thanks a lot for your great works. I want to know the principle of fine-tuned model. For example, if i want to fine-tune the pretrained model on supervised cnn/dailymail dataset, is the model seq2seq? And i only load the pretrained word embedding based on pegasus, and set it as the input of seq2seq?

Error when finetuning

Hi! I follow your readme instruction and run the code on google colab, but errors show up when running the finetuning part:
!python3 pegasus/bin/train.py --params=reddit_tifu_long_transformer \ --param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model,batch_size=1 \ --train_init_checkpoint=ckpt/pegasus_ckpt/model.ckpt-1500000 \ --model_dir=ckpt/pegasus_ckpt/reddit_tifu

And the error log here:
Traceback (most recent call last):
File "pegasus/bin/train.py", line 18, in
from pegasus.params import all_params # pylint: disable=unused-import
File "/content/pegasus/pegasus/params/all_params.py", line 18, in
from pegasus.params import pegasus_params
File "/content/pegasus/pegasus/params/pegasus_params.py", line 21, in
from pegasus.eval import text_eval
File "/content/pegasus/pegasus/eval/text_eval.py", line 22, in
from pegasus.eval.bleu import bleu_scorer
File "/content/pegasus/pegasus/eval/bleu/bleu_scorer.py", line 18, in
import sacrebleu
File "/usr/local/lib/python3.6/dist-packages/sacrebleu/init.py", line 25, in
from .tokenizers import TOKENIZERS, DEFAULT_TOKENIZER
ModuleNotFoundError: No module named 'sacrebleu.tokenizers'

The first time I run the code, this error doesn't show up and I finished finetuning part successfully. But this time I try to run the code again, I have this error. Why?

checkpoint

hi I am not able to get Model CHeckpoints.
Please help.

Error" ModuleNotFoundError: No module named 'pegasus'

Hi,

I would love to run this model in a Google Colab notebook. I tried following the README instructions but I ran into the following error:

Traceback (most recent call last):
  File "pegasus/bin/train.py", line 17, in <module>
    from pegasus.data import infeed
ModuleNotFoundError: No module named 'pegasus'

My notebook cells are as follows:

!git clone https://github.com/google-research/pegasus

%cd pegasus/

!export PYTHONPATH=.
!pip install -r requirements.txt

!sudo apt-get install gcc python-dev python-setuptools libffi-dev
!sudo apt-get install python-pip
!sudo pip install gsutil

!mkdir ckpt
!gsutil cp -r gs://pegasus_ckpt/ ckpt/

!python3 pegasus/bin/train.py --params=aeslc_transformer \
--param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model \
--train_init_checkpoint=ckpt/pegasus_ckpt/model.ckpt-1500000 \
--model_dir=ckpt/pegasus_ckpt/aeslc

The last cell returns the error shown at the top of this message. I would appreciate any help you could give me.

EDIT: Markdown

Tensorflow custom ops: pretrain_parsing_ops

Hi there,
First, thank you for your wonderful work and sharing it to the outside world - it means a lot to many people!
I was wondering - If we want to change something in the pre-training, should we follow the official Tensorflow's guide for adding an operation: i.e. modifying pretrain_parsing_ops.cc and then building it again with bazel? According to the guide, we should also use tf.load_op_library to include the custom operation in python but I don't see that in the code. Maybe something else is happening e.g. creating a python wrapper?
Thanks in advance for the help!

Is the cpp code related to the design of deep model?

https://github.com/google-research/pegasus/tree/master/pegasus/ops

Based on my view, it seems the cpp code is about the data preprocess.

So if I want to learn the model, I do not need to care about the cpp code.

Am I right?

Pretrain pegasus

Hi there,

After reading your paper, I am a little confused about pretrain procedure of Pegasus.

As shown in Figure 1 in the paper, given a document pegasus selected target text from the document and let the rest of text be input text.

pegasus model masked several tokens in input and let encoder to predict masked tokens as MLM loss. However, the paper stated that the pegasus_large model deleted the MLM loss part. Does that mean the final model takes masked inputs to the encoder and decoder leveraged shifted target text and encoder outputs to generate target text only using the next token prediction loss? If so, what is the difference between the pegasus pretrained model and BART model (BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension). The BART model also takes noised text input to their encoder and uses the next token prediction loss for decoder outputs.
What is the strategy for selecting target sentences as shown in Fig.1? Is that the Gap Sentence Generation procedure?

Thanks

training loss

Hi,

Did you just add the two loss together e.g. gsg loss and mlm loss,

Loss = L1+L2,

Did you use some tricks to balance the two loss?
@JingqingZ Thank you

Test set output summaries for CNN/DM

Hi Jingqing,

I am looking to run some analysis on PEGASUS' output summaries for CNN/DM. Can I get hold of these in any way? (I know I could just run the model to produce them myself but thought I would ask if they are already available before doing this!)

Great work on all this!

All the best,
Alex

successful prediction using evaluate.py but terminal hangs for 30 seconds before exiting to command prompt

Hi,

I was successful in using evaluate.py to generate expected output to "predictions-x.txt"
However I noticed that the script doesn't return to command line for another 30 seconds. The last log I see is

error_handling.py:101] prediction_loop marked as finished

And then it's an awkward 30 seconds to wait. Is this normal behavior? Is there anything I can do to eliminate the lag? Is this just the way TF works?

Thanks

Fine-tuned Model

I'm sorry to close issue #35 accidentally.
So I continue my problem. When I fine-tune the model with encoder-decoder transformer framework on cnn/dailymail dataset, whether the model structure and initial model parameter is the same as pretrained model. The model parameter of transformer encoder and decoder is updated during fine-tuning.

I want to sure whether the fine-tuned model are as follows:

google-research / pegasus Goto Github PK

pegasus's Issues

Google cloud arguments

Module mode arguments

Jobs running arguments

related code block

GSR setting

Recommend Projects

Recommend Topics

Recommend Org