Giter Club home page Giter Club logo

hmnet's Introduction


This is the official code for the Microsoft's paper of HMNet model at EMNLP 2020. It is implemented under PyTorch framework. The related paper to cite is:

author = {Zhu, Chenguang and Xu, Ruochen and Zeng, Michael and Huang, Xuedong},
title = {A Hierarchical Network for Abstractive Meeting Summarization with Cross-Domain Pretraining},
year = {2020},
month = {November},
url = {},
journal = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing},

Finetune HMNet

It is recommended to run our model inside a docker:

Build docker image

cd Docker
sudo docker build . -t hmnet

Run container from image

sudo nvidia-docker run -it hmnet /bin/bash

Get the pretrained HMNet ready at ExampleInitModel/HMNet-pretrained. Please see document.

Finetune on AMI dataset

CUDA_VISIBLE_DEVICES="0,1,2,3" mpirun -np 4 --allow-run-as-root python train ExampleConf/conf_hmnet_AMI

The training log/model/settings could be found at ExampleConf/conf_hmnet_AMI_conf~/run_1

Data paths

  • ExampleRawData/meeting_summarization/AMI_proprec: The preprocessed AMI dataset. The *.json files point to the path to each split. Each folder (train, dev or test) contains the compressed chunks of data in the format for infinibatch.

  • ExampleRawData/meeting_summarization/ICSI_proprec: Same as above for ICSI dataset.

  • ExampleInitModel/transfo-xl-wt103: Here we only used the vocabulary from Transformer-XL, provided by Huggingface.


Step 1: specify the model path

In ExampleConf/conf_eval_hmnet_AMI, for the line


Replace ### to the real checkpoint path. Use the relative path w.r.t the location of this configuration file.

Step 2: run the evaluate pipeline

CUDA_VISIBLE_DEVICES="0,1,2,3" mpirun -np 4 --allow-run-as-root python evaluate ExampleConf/conf_eval_hmnet_AMI

The decoding results could be found at ExampleConf/conf_eval_hmnet_AMI_conf~/run_1


This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.


Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include Microsoft, Azure, DotNet, AspNet, Xamarin, and our GitHub organizations.

If you believe you have found a security vulnerability in any Microsoft-owned repository that meets Microsoft's Microsoft's definition of a security vulnerability, please report it to us as described below.

Reporting Security Issues

Please do not report security vulnerabilities through public GitHub issues.

Instead, please report them to the Microsoft Security Response Center (MSRC) at

If you prefer to submit without logging in, send email to [email protected]. If possible, encrypt your message with our PGP key; please download it from the the Microsoft Security Response Center PGP Key page.

You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at

Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:

  • Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
  • Full paths of source file(s) related to the manifestation of the issue
  • The location of the affected source code (tag/branch/commit or direct URL)
  • Any special configuration required to reproduce the issue
  • Step-by-step instructions to reproduce the issue
  • Proof-of-concept or exploit code (if possible)
  • Impact of the issue, including how an attacker might exploit the issue

This information will help us triage your report more quickly.

If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our Microsoft Bug Bounty Program page for more details about our active programs.

Preferred Languages

We prefer all communications to be in English.


Microsoft follows the principle of Coordinated Vulnerability Disclosure.

hmnet's People


microsoft-github-policy-service[bot] avatar xrc10 avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hmnet's Issues

How to solve cuda out of memory error?

I have encountered errors like this
" RuntimeError: CUDA out of memory. Tried to allocate 2.40 GiB (GPU 3; 15.78 GiB total capacity; 12.06 GiB already allocated; 2.39 GiB free; 212.27 MiB cached) "
when trying to fine tune the model on both the data sets.The same error occurs if I try to evaluate the model with the fine tuned weights downloaded from the link given in the repo. Can you specify the hardware specifications to reproduce this project?

Docker building, Tensor Size issues, may be related to package versions.

Hi, I've tried to build your docker container using the provided Dockerfile and it fails upon python -m spacy download en. It couldn't link to To Fix I changed the dockerfile to link to the stub for compile time with:

RUN export LD_LIBRARY_PATH=/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH && \
        ln -s /usr/local/cuda/lib64/stubs/ /usr/local/cuda/lib64/stubs/ && \
        python -m spacy download en

The build then works. The next issue comes in Models/Networks/ where
spacy.load('en', parser = False) fails because the parser keyword has been removed. I fixed by changing to
nlp = spacy.load('en_core_web_sm', exclude=['parser']). That also fixed the warning that en shortcut is deprecated.

The last thing I had to change to get things working was that the Language object from Spacy no longer has tagger and entity fields. I had to access the pipeline to add them as below.

tagger = [x[1] for x in nlp.pipeline if x[0] == 'tagger']
assert len(tagger) == 1
tagger = tagger[0]

entity = [x[1] for x in nlp.pipeline if x[0] == 'ner']
assert len(entity) == 1
entity = entity[0]

POS = {w: i for i, w in enumerate([''] + list(tagger.labels))}
ENT = {w: i for i, w in enumerate([''] + list(entity.move_names))}

Finally, after the code was able to execute the code ran into a tensor size issue with the linked finetuned ami model which can be seen below:

Error(s) in loading state_dict for MeetingNet_Transformer:
	size mismatch for encoder.pos_embed.weight: copying a param with shape torch.Size([51, 16]) from checkpoint, the shape in current model is torch.Size([50, 16])

I think this may be due to a spacy model change since the code was compiled against a different version.

Could you provide a requirements.txt with versions or tell me if I'm wrong and the tensor size error is unrelated to the spacy tags?


cublas runtime error

I'm following the readme to try and Finetune HMNet on the AMI dataset. My only modification to the instructions is that I have only 1 visible device (my full command thus becomes CUDA_VISIBLE_DEVICES="0" mpirun -np 1 --allow-run-as-root python train ExampleConf/conf_hmnet_AMI).

The process exits with an error.

Here's the full output.

{'MODEL': 'MeetingNet_Transformer', 'TASK': 'HMNet', 'CRITERION': 'MLECriterion', 'SEED': 1033, 'RESUME': True, 'MAX_NUM_EPOCHS': 20, 'SAVE_PER_UPDATE_NUM': 400, 'UPDATES_PER_EPOCH': 2000, 'OPTIMIZER': 'RAdam', 'NO_AUTO_LR_SCALING': True, 'START_LEARNING_RATE': 0.001, 'LR_SCHEDULER': 'LnrWrmpInvSqRtDcyScheduler', 'WARMUP_STEPS': 16000, 'WARMUP_INIT_LR': 0.0001, 'WARMUP_END_LR': 0.001, 'GRADIENT_ACCUMULATE_STEP': 20, 'GRAD_CLIPPING': 2, 'USE_REL_DATA_PATH': True, 'TRAIN_FILE': '../ExampleRawData/meeting_summarization/AMI_proprec/train_ami.json', 'DEV_FILE': '../ExampleRawData/meeting_summarization/AMI_proprec/valid_ami.json', 'TEST_FILE': '../ExampleRawData/meeting_summarization/AMI_proprec/test_ami.json', 'ROLE_DICT_FILE': '../ExampleRawData/meeting_summarization/role_dict_ext.json', 'MINI_BATCH': 1, 'MAX_PADDING_RATIO': 1, 'BATCH_READ_AHEAD': 10, 'DOC_SHUFFLE_BUF_SIZE': 10, 'SAMPLE_SHUFFLE_BUFFER_SIZE': 10, 'BATCH_SHUFFLE_BUFFER_SIZE': 10, 'MAX_TRANSCRIPT_WORD': 8300, 'MAX_SENT_LEN': 30, 'MAX_SENT_NUM': 300, 'DROPOUT': 0.1, 'VOCAB_DIM': 512, 'ROLE_SIZE': 32, 'ROLE_DIM': 16, 'POS_DIM': 16, 'ENT_DIM': 16, 'USE_ROLE': True, 'USE_POSENT': True, 'USE_BOS_TOKEN': True, 'USE_EOS_TOKEN': True, 'TRANSFORMER_EMBED_DROPOUT': 0.1, 'TRANSFORMER_RESIDUAL_DROPOUT': 0.1, 'TRANSFORMER_ATTENTION_DROPOUT': 0.1, 'TRANSFORMER_LAYER': 6, 'TRANSFORMER_HEAD': 8, 'TRANSFORMER_POS_DISCOUNT': 80, 'PRE_TOKENIZER': 'TransfoXLTokenizer', 'PRE_TOKENIZER_PATH': '../ExampleInitModel/transfo-xl-wt103', 'PYLEARN_MODEL': '../ExampleInitModel/HMNet-pretrained', 'EXTRA_IDS': 1000, 'BEAM_WIDTH': 6, 'MAX_GEN_LENGTH': 512, 'MIN_GEN_LENGTH': 320, 'EVAL_TOKENIZED': True, 'EVAL_LOWERCASE': True, 'NO_REPEAT_NGRAM_SIZE': 3, 'cuda': True, 'confFile': 'ExampleConf/conf_hmnet_AMI', 'datadir': 'ExampleConf', 'basename': 'conf_hmnet_AMI', 'command': 'train', 'conf_file': 'ExampleConf/conf_hmnet_AMI', 'cluster': 'local', 'dist_init_path': './tmp', 'fp16': False, 'fp16_opt_level': 'O1', 'no_cuda': False}
Using Cuda

Saving logs, model, checkpoint, and evaluation in ExampleConf/conf_hmnet_AMI_conf~/run_2
 1.2.0  is high
Number of GPUs is  1 
Effective batch size is increased from  1  to  1 
Gradient accumulation steps =  20 
Effective batch size =  20 
[9d66c296629d:03515] pml_ucx.c:285  Error: UCP worker does not support MPI_THREAD_MULTIPLE
Select command: train
train on rank 0
Initializing model...
Loading Tokenizer from ExampleConf/../ExampleInitModel/transfo-xl-wt103...
Using pad_token, but it is not set yet.
Using bos_token, but it is not set yet.
Use POS and ENT

Total trainable parameters: 204488240
Loaded data on rank 0.
Using custom optimizer: RAdam
Optimizer parameters: {'lr': 0.001}
Using custom lr scheduler: LnrWrmpInvSqRtDcyScheduler
Lr scheduler parameters: {'warmup_steps': 16000, 'warmup_init_lr': 0.0001, 'warmup_end_lr': 0.001}
Cannot find checkpoint path from conf_hmnet_AMI_resume_checkpoint.json.
Make sure ExampleConf/conf_hmnet_AMI_resume_checkpoint.json exists.
Continue without loading checkpoint
Epoch 0
Traceback (most recent call last):
  File "", line 71, in <module>
  File "/root/HMNet/Models/Trainers/", line 273, in train
  File "/root/HMNet/Models/Trainers/", line 358, in update
    loss =
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/HMNet/Models/Trainers/", line 38, in forward
    output = self.model(batch)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/HMNet/Models/Networks/", line 100, in forward
    outputs = self._forward(**batch)
  File "/root/HMNet/Models/Networks/", line 125, in _forward
    token_encoder_outputs, sent_encoder_outputs = self.encoder(encoder_input_ids, encoder_input_roles, encoder_input_pos, encoder_input_ent)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/HMNet/Models/Networks/", line 1130, in forward
    embedded = self.embedder(vocab_x.view(batch_size, -1))
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/HMNet/Models/Networks/", line 387, in forward
    x_pos = self.pos_emb(torch.arange(x_len).type(torch.cuda.FloatTensor)) # len x n_state
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/HMNet/Models/Networks/", line 86, in forward
    sinusoid_inp = torch.ger(pos_seq, self.inv_freq)
RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[42501,1],0]
  Exit code:    1

tokenizer.convert_ids_to_tokens not generating special tokens with predefined position offset

self.tokenizer = self.tokenizer_class.from_pretrained(self.pretrained_tokenizer_path)
special_tokens_tuple_list = [("eos_token", 128), ("unk_token", 129), ("pad_token", 130), ("bos_token", 131)]
for special_token_name, special_token_id_offset in special_tokens_tuple_list:
if getattr(self.tokenizer, special_token_name) == None:
setattr(self.tokenizer, special_token_name, self.tokenizer.convert_ids_to_tokens(len(self.tokenizer)-special_token_id_offset))
self.config[special_token_name] = self.tokenizer.convert_ids_to_tokens(len(self.tokenizer)-special_token_id_offset)
self.config[special_token_name+'_id'] = len(self.tokenizer)-special_token_id_offset

In this snippet of code, it set up a default special_token_name with offset. Then later, the special token (pad_token, bos_token are not exist in pretrained_tokenizer) need to be added into tokenizer. I tried to load pretrained tokenizer from transof-xl-wt103 under ExampleInitModel and generate tokens from ids base on the predefined offset.


The returned tokens turn out to be specific words, not '<pad>' or '<bos>' tokens.

When the token_name is "pad_token" or "bos_token" with offset of "130", "131":
'The return: Islahul 267605,McShan 267604'

May I ask how did you setup the offset value of these special tokens? Is it normal that the 'transof-xl-wt103' doesn't need pad_token and bos_token or these special tokens actually should be set up somewhere else?

Modules Versions are not specified

Hello. I am trying to run the experiments. Unfortunately, since the pip modules' versions are not specified in the Docker file, I am getting hierarchical errors. Could you please specify the versions in the Docker file? For example, your code is not compatible with latest Spacy version (3.0.50). I guess you should have used version 2.3.5 in you code.

The order of token_attn and sent_attn in decoder is different between the code and the paper, in

In the paper, src-tgt attention on sentences is after the src-tgt attention on tokens. However, in the code, the order is opposite.
At line 1000 in,

def forward(self, y, token_enc_key, token_enc_value, sent_enc_key, sent_enc_value):
query, key, value = self.decoder_splitter(y)
# batch x len x n_state

    # self-attention
    a = self.attn(query, key, value, None, one_dir_visible=True)
    # batch x len x n_state

    n = self.ln_1(y + a) # residual

    if 'NO_HIERARCHY' in self.opt:
        q = y
        r = n
        # src-tgt attention on sentences
        q = self.sent_attn(n, sent_enc_key, sent_enc_value, None)
        r = self.ln_3(n + q) # residual
        # batch x len x n_state

    # src-tgt attention on tokens
    o = self.token_attn(r, token_enc_key, token_enc_value, None)
    p = self.ln_2(r + o) # residual
    # batch x len x n_state

    m = self.mlp(p)
    h = self.ln_4(p + m)

I would like to confirm Is this intended code or not?

Cuda out of memory

Hello. I am trying to reproduce the paper results. I am currently running the code on 2 Tesla V100 GPUs each containing 16GB of memory, but still I am getting out-of-memory error. I also tried to decrease MAX_TRANSCRIPT_WORD to 1000, but it did not help. Could you please let me what hardware and GPU it requires to run?

How to build a new data set with the same format

Hi, I have successfully run through your code, and the effect is quite good. Is it convenient for you to open source pre training data? Or can you tell me how to get a POS_ ID and ENT_ ID? Thanks.

Problems while building docker

When I ran sudo docker build . -t hmnet,errors occurred in Step 10/35 : RUN apt-get update && apt-get install -y --allow-change-held-packages --no-install-recommends software-properties-common openssh-client openssh-server pdsh curl sudo net-tools vim iputils-ping wget perl libxml-parser-perl libcudnn7=${CUDNN_VERSION} libnccl2=${NCCL_VERSION} libnccl-dev=${NCCL_VERSION} --allow-downgrades:

E: Unable to locate package libcudnn7
E: Version '2.4.7-1+cuda10.0' for 'libnccl2' was not found
E: Version '2.4.7-1+cuda10.0' for 'libnccl-dev' was not found

It seems that in the docker apt can't find the package. Is it my fault somewhere? or the docker may exist some bugs?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.