Giter Club home page Giter Club logo

neuralcodesum's Introduction

A Transformer-based Approach for Source Code Summarization

Official implementation of our ACL 2020 paper on Source Code Summarization. [arxiv]

Installing C2NL

You may consider installing the C2NL package. C2NL requires Linux and Python 3.6 or higher. It also requires installing PyTorch version 1.3. Its other dependencies are listed in requirements.txt. CUDA is strongly recommended for speed, but not necessary.

Run the following commands to clone the repository and install C2NL:

git clone https://github.com/wasiahmad/NeuralCodeSum.git
cd NeuralCodeSum; pip install -r requirements.txt; python setup.py develop

Training/Testing Models

We provide a RNN-based sequence-to-sequence (Seq2Seq) model implementation along with our Transformer model. To perform training and evaluation, first go the scripts directory associated with the target dataset.

$ cd  scripts/DATASET_NAME

Where, choices for DATASET_NAME are ["java", "python"].

To train/evaluate a model, run:

$ bash script_name.sh GPU_ID MODEL_NAME

For example, to train/evaluate the transformer model, run:

$ bash transformer.sh 0,1 code2jdoc

Generated log files

While training and evaluating the models, a list of files are generated inside a tmp directory. The files are as follows.

  • MODEL_NAME.mdl
    • Model file containing the parameters of the best model.
  • MODEL_NAME.mdl.checkpoint
    • A model checkpoint, in case if we need to restart the training.
  • MODEL_NAME.txt
    • Log file for training.
  • MODEL_NAME.json
    • The predictions and gold references are dumped during validation.
  • MODEL_NAME_test.txt
    • Log file for evaluation (greedy).
  • MODEL_NAME_test.json
    • The predictions and gold references are dumped during evaluation (greedy).
  • MODEL_NAME_beam.txt
    • Log file for evaluation (beam).
  • MODEL_NAME_beam.json
    • The predictions and gold references are dumped during evaluation (beam).

[Structure of the JSON files] Each line in a JSON file is a JSON object. An example is provided below.

{
    "id": 0,
    "code": "private int current Depth ( ) { try { Integer one Based = ( ( Integer ) DEPTH FIELD . get ( this ) ) ; return one Based - NUM ; } catch ( Illegal Access Exception e ) { throw new Assertion Error ( e ) ; } }",
    "predictions": [
        "returns a 0 - based depth within the object graph of the current object being serialized ."
    ],
    "references": [
        "returns a 0 - based depth within the object graph of the current object being serialized ."
    ],
    "bleu": 1,
    "rouge_l": 1
}

Generating Summaries for Source Codes

We may want to generate summaries for source codes using a trained model. And this can be done by running generate.sh script. The input source code file must be under java or python directory. We need to manually set the value of the DATASET variable in the bash script.

A sample Java and Python code file is provided at [data/java/sample.code] and [data/python/sample.code].

$ cd scripts
$ bash generate.sh 0 code2jdoc sample.code

The above command will generate tmp/code2jdoc_beam.json file that will contain the predicted summaries.

Running experiments on CPU/GPU/Multi-GPU

  • If GPU_ID is set to -1, CPU will be used.
  • If GPU_ID is set to one specific number, only one GPU will be used.
  • If GPU_ID is set to multiple numbers (e.g., 0,1,2), then parallel computing will be used.

Acknowledgement

We borrowed and modified code from DrQA, OpenNMT. We would like to expresse our gratitdue for the authors of these repositeries.

Citation

@inproceedings{ahmad2020summarization,
 author = {Ahmad, Wasi Uddin and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
 booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL)},
 title = {A Transformer-based Approach for Source Code Summarization},
 year = {2020}
}

neuralcodesum's People

Contributors

wasiahmad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

neuralcodesum's Issues

problem while trying to test the code

Hi,

I'm an intern trying to find the best solution to comment code automatically.
For this i've been trying to test all existing solutions and yours came to me.

I think I've installed all the requirements. I work on a WSL with python 3.8 and I got the following error while trying to execute the code

~/NeuralCodeSum/scripts/python$ bash transformer.sh 0 code2jdoc                                

============TRAINING============
Traceback (most recent call last):
File "../../main/train.py", line 675, in <module>
set_defaults(args)
File "../../main/train.py", line 153, in set_defaults
raise IOError('No such file: %s' % train_src)
OSError: No such file: ../../data/python/train/code.original_subtoken
============TESTING============
Traceback (most recent call last):                                                                                        
File "../../main/train.py", line 675, in <module>                                                                         
set_defaults(args)                                                                                                   
File "../../main/train.py", line 186, in set_defaults                                                                     
raise IOError('No such file: %s' % dev_src)                                                                         
OSError: No such file: ../../data/python/test/code.original_subtoken
============Beam Search TESTING============  
Traceback (most recent call last):
File "../../main/test.py", line 448, in <module>
set_defaults(args)
File "../../main/test.py", line 151, in set_defaults
raise IOError('No such file: %s' % dev_src)
OSError: No such file: ../../data/python/test/code.original_subtoken

I'm surely just dumb not to have understood the reason of this error but could u please enlighten me ?

How to do transfer learning on the pretrained models ?

I obtained the pretrained models from one of the threads where @vrmasrv had posted his pretrained models.
I am totally new to NLP, so forgive me if I am wrong, is there any possibility to do transfer learning on this model ?
If so, how can I do it ?

Model is not learning problem

Hi,

I'm trying to train the Transformer model on a custom dataset, however, the model fails to learn and just output the same prediction for all the inputs. More than that, due to the early stopping mechanism, it always stops after the 20 epoch and not continue forward.

I tried to change the learning rate, the optimizer, and then stopping, however, there is no improvement.

Thanks.

Do you have a trained model?

Hello.

Does your repository come with a trained model? I am being prompted to insert model path and I was looking to see if you have provided your trained model.

Thanks.

Question about dataset

Hello.
It's nice to look at wonderful paper and codes.
The code also works well:)

I have a question about the dataset. (Python)
Data source: https://github.com/EdinburghNLP/code-docstring-corpus
Even I know the dataset was from above github, could I know how could you make a code for dataset? (How to parse?)

If it's available, I want the model is works for general code data. Not train or test dataset.
But for that, I need to the way to parse the data like excluding _ underscore and others.
If you know, I hope to get detail information.

Thank you:-)

@saikat107
Hello. I checked your reply in another issue #19
If you know further information, could you share it in detail to create a dataset structure?
Thank you:)

how to train a model without copy attention?

Hello~I'm a student studying in code summarization.I have a question.
Is that set this param to False to unable copy attention in training the model?
image
Are there any more params to set?

problem during reproducing

Thank you for your good model and research.

I trained rnn model and TRANSFORMER model.

However, i got a error message when testing both model epoch 0.

The error message is
""""
Traceback (most recent call last):
File "../../main/train.py", line 704, in
main(args)
File "../../main/train.py", line 627, in main
validate_official(args, dev_loader, model, stats, mode='test')
File "../../main/train.py", line 345, in validate_official
mode=mode)
File "../../main/train.py", line 442, in eval_accuracies
meteor, _ = meteor_calculator.compute_score(references, hypotheses)
File "/content/gdrive/My Drive/Transformer/NeuralCodeSum-master/c2nl/eval/meteor/meteor.py", line 76, in compute_score
stat = self._stat(res[i][0], gts[i])
File "/content/gdrive/My Drive/Transformer/NeuralCodeSum-master/c2nl/eval/meteor/meteor.py", line 106, in _stat
self.meteor_p.stdin.flush()
BrokenPipeError: [Errno 32] Broken pipe
""""
Please suggest the cause or solution of the problem.

Take care of your health and thank you for reading !

"""
My Environment
google colab(Ubuntu 18.04.5 LTS)
python 3.6.9
"""

No such file: ../../data/java/test/code.original_subtoken while training a model

When I try to train a model by bash transformer.sh 0,1 code2jdoc, I get the error:

============TRAINING============
Traceback (most recent call last):
  File "../../main/train.py", line 675, in <module>
    set_defaults(args)
  File "../../main/train.py", line 153, in set_defaults
    raise IOError('No such file: %s' % train_src)
OSError: No such file: ../../data/java/train/code.original_subtoken
============TESTING============
Traceback (most recent call last):
  File "../../main/train.py", line 675, in <module>
    set_defaults(args)
  File "../../main/train.py", line 186, in set_defaults
    raise IOError('No such file: %s' % dev_src)
OSError: No such file: ../../data/java/test/code.original_subtoken
============Beam Search TESTING============
Traceback (most recent call last):
  File "../../main/test.py", line 448, in <module>
    set_defaults(args)
  File "../../main/test.py", line 151, in set_defaults
    raise IOError('No such file: %s' % dev_src)
OSError: No such file: ../../data/java/test/code.original_subtoken

So, I did the following steps (which are written in README):

git clone https://github.com/wasiahmad/NeuralCodeSum.git
cd NeuralCodeSum; pip install -r requirements.txt; python setup.py develop
cd  scripts/DATASET_NAME
bash transformer.sh 0,1 code2jdoc

Could you please tell me why do I get the error? Seems I need a dataset to train it, but there is no dataset in that folder.
I also downloaded java_with_sbt.zip and java.zip (which were provided by you in google drive), but there was no such file (java/test/code.original_subtoken)

code.original_subtoken file not found

when I am training or evaluating, getting an error

OSError: No such file: ../../data/python/test/code.original_subtoken

Please suggest where could I find those folder data/python/test and data/python/train.

Question about the training time

Hello, I am running the code to reproduce the results in the paper.
I train the Java baseline model on 4 Tesla P40 GPU cards, and all hyperparameters remain unchanged.
The model takes 2.5 days to train for 200 epochs (~20min per epoch).
Is this training time in expectation? Can you provide some information about the training time of baseline models?

Thanks!

ImportError: No module named 'c2nl.eval.ltorank'

Hello, I encountered the following problems while reproducing your work.

sec@WIN-NPQGFCOGD:/mnt/e/NeuralCodeSum/scripts/java$ bash rnn.sh -1 code2doc_rnn
============TRAINING============
Traceback (most recent call last):
File "../../main/train.py", line 27, in <module>
from c2nl.eval.bleu import corpus_bleu
File "/mnt/e/NeuralCodeSum/c2nl/eval/__init__.py", line 3, in <module>
from .ltorank import *
ImportError: No module named 'c2nl.eval.ltorank'
============TESTING============
Traceback (most recent call last):
File "../../main/train.py", line 27, in <module>
from c2nl.eval.bleu import corpus_bleu
File "/mnt/e/NeuralCodeSum/c2nl/eval/__init__.py", line 3, in <module>
from .ltorank import *
ImportError: No module named 'c2nl.eval.ltorank'
============Beam Search TESTING============
Traceback (most recent call last):
File "../../main/test.py", line 22, in <module>
from main.train import compute_eval_score
File "/mnt/e/NeuralCodeSum/main/train.py", line 27, in <module>
from c2nl.eval.bleu import corpus_bleu
File "/mnt/e/NeuralCodeSum/c2nl/eval/__init__.py", line 3, in <module>
from .ltorank import *
ImportError: No module named 'c2nl.eval.ltorank'

I check the file 'NeuralCodeSum/c2nl/eval/init.py' and find the following codes:
from .ltorank import *
from .squad_eval import *

However, there seems to be no 'ltorank.py' and 'squad_eval.py'. Are these two codes redundant ?

Thank you.

Question about parameters used in the process of calculating meteor score.

"""""
Hi, the problem looks related to METEOR computation which is perhaps related to the related package installation. I would suggest to turn everything off related to METEOR calculation and see if the program runs fine in your environment. You may compute METEOR locally later on, on the generated summaries!

One alternative is to use the METEOR from the NLTK library (https://www.nltk.org/_modules/nltk/translate/meteor_score.html).

Originally posted by @wasiahmad in
#14 (comment)

""""""

Thank for your answer.

It worked. However, I wonder what the parameters (alpha, beta, gamma) should be set in the process of calculating the meteor score locally by NLTK library which you recommend.

Thank you for reading!

GPU mode?

  • I ran it on 2 GPU, 4 GPU, and 6 GPU on one node, but I don't know why the more GPU the slower it runs

2GPU has 2.9it/s
1GPU has 3.4it/s

Summary produced in Spanish instead of English

Hi, I have a 20,000 line JAVA code, with all variable/method naming conventions used, in English. Yet, some of the method summary produced are coming out to be in Spanish, which I don't know why? For example:

validateAssetSerialNumber :::: persists a contentelement el contenido , serialize ldap
isAllAssetSerialNumberValid :::: metodo que contenido de serialize
setStausCompleteForAutoRcuRef :::: metodo donates work of information_schema el violation

Can someone please give some idea why this might be happening?

Problem about the symbol in python data.

Hi~ Your work is really great!
I have know from your paper that you preprocessed the original dataset with SnakeCase and CamelCase and i also get the .py file you used to implement this two approach. But i also noticed that, compared to the code.original, the codes in code.original_subtoken don't have symbol like ':', ',' , '=', '(', ')', but still hava some symbol like '{', '[' and so on. Could u pls share the scripts or the strategy you used to process those symbol and indent or blank?

Questions about javadoc generation

Hello, could you please share your scripts about pre-processing the java comment? I tried to do that myself but it seems that the sentence looks different from yours.

an error with the index_select in beam.py

Hi,

when I run your model, I run into the following error. I would really appreciate it if you could let me know how I can fix this error.

Traceback (most recent call last):
File "../../main/test.py", line 474, in
main(args)
File "../../main/test.py", line 436, in main
validate_official(args, dev_loader, model)
File "../../main/test.py", line 280, in validate_official
ret = translator.translate_batch(batch_inputs)
File "/scratch/ex-fhendija-1/ramin/Baseline/NeuralCodeSum/c2nl/translator/translator.py", line 241, in translate_batch
b.advance(out[:, j],
File "/scratch/ex-fhendija-1/ramin/Baseline/NeuralCodeSum/c2nl/translator/beam.py", line 134, in advance
self.attn.append(attn_out.index_select(0, prev_k))
RuntimeError: index_select(): Expected dtype int64 for index

Extract the attention weights

Hi,
first of all thanks to @wasiahmad for sharing the code and for the support provided.
I am trying to extract the attention weights from your model (just to observe them and store them on a file), at the moment I am able to get them in the function from_batch of the TranslationBuilder object in this way.
https://github.com/wasiahmad/NeuralCodeSum/blob/master/c2nl/translator/translation.py#L49

    def from_batch(self, translation_batch, src_raw, targets, src_vocabs):
        batch_size = len(translation_batch["predictions"])
        preds = translation_batch["predictions"]
        pred_score = translation_batch["scores"]
        attn = translation_batch["attention"]

        translations = []
        attentions = [] # CHANGE
        for b in range(batch_size):
            src_vocab = src_vocabs[b] if src_vocabs else None
            pred_sents = [self._build_target_tokens(
                src_vocab, src_raw[b],
                preds[b][n], attn[b][n])
                for n in range(self.n_best)]
            translation = Translation(targets[b], pred_sents,
                                      attn[b], pred_score[b])
            translations.append(translation)
            attentions.append(attn[b]) # CHANGE

        return translations, attentions # CHANGE

Then I propagate them back to the caller until I can manipulate them in the test.py file and write them on a file.
If there is a better way, I would thank you for sharing it with me.

QUESTION A:
Reading the paper and from the arguments it seems that there is the copy attention involved, I wanted to know if these weights that I extract are:

  1. copy attention weights(used to copy tokens directly to the output)
  2. attention weights used during prediction
  3. a mix of the two. In this case I would be curious to know how exactly, in particular with reference to the code, because I read the referenced work "Get To The Point: Summarization with Pointer-Generator Networks" (http://arxiv.org/abs/1704.04368 ) and they say: "We recycle the attention distribution to serve as the copy distribution", is this the case for you too?.

QUESTION B:
In case the extracted one is the copy attention, is it possible to somehow extract an attention that represent the self-attention of the transformer architecture?

Thanks in advance, I wish you a happy and productive day,

Matteo

RuntimeError: "index_select_out_cuda_impl" not implemented for 'Float While comment generation testing the code

============Generating (Beam)============
05/02/2021 10:06:46 AM: [ COMMAND: ../main/test.py --only_generate True --data_workers 5 --dataset_name java --data_dir ../data/ --model_dir ../tmp --model_name code2jdoc --dev_src sample.code --uncase True --max_examples -1 --max_src_len 150 --max_tgt_len 50 --test_batch_size 64 --beam_size 4 --n_best 1 --block_ngram_repeat 3 --stepwise_penalty False --coverage_penalty none --length_penalty none --beta 0 --gamma 0 --replace_unk ]
05/02/2021 10:06:46 AM: [ ---------------------------------------------------------------------------------------------------- ]
05/02/2021 10:06:46 AM: [ Load and process data files ]
100% 20/20 [00:00<00:00, 76538.39it/s]
100% 20/20 [00:00<00:00, 12735.10it/s]
05/02/2021 10:06:46 AM: [ Num dev examples = 20 ]
05/02/2021 10:06:46 AM: [ ---------------------------------------------------------------------------------------------------- ]
05/02/2021 10:06:46 AM: [ Loading model ../tmp/code2jdoc.mdl ]
05/02/2021 10:06:51 AM: [ ---------------------------------------------------------------------------------------------------- ]
05/02/2021 10:06:51 AM: [ Make data loaders ]
05/02/2021 10:06:51 AM: [ ---------------------------------------------------------------------------------------------------- ]
05/02/2021 10:06:51 AM: [ CONFIG:
{
"attn_type": "general",
"beam_size": 4,
"beta": 0.0,
"bidirection": true,
"block_ngram_repeat": 3,
"char_emsize": 16,
"code_tag_type": "subtoken",
"conditional_decoding": false,
"copy_attn": false,
"coverage_attn": false,
"coverage_penalty": "none",
"cuda": true,
"d_ff": 2048,
"d_k": 64,
"d_v": 64,
"data_dir": "../data/",
"data_workers": 5,
"dataset_name": [
"java"
],
"dev_src": [
"sample.code"
],
"dev_src_files": [
"../data/java/sample.code"
],
"dev_src_tag": null,
"dev_src_tag_files": [
null
],
"dev_tgt": null,
"dev_tgt_files": [
null
],
"dropout": 0.2,
"dropout_emb": 0.2,
"dropout_rnn": 0.2,
"early_stop": 5,
"emsize": 300,
"filter_size": 5,
"fix_embeddings": true,
"force_copy": false,
"gamma": 0.0,
"grad_clipping": 5.0,
"ignore_when_blocking": [],
"layer_wise_attn": false,
"learning_rate": 0.001,
"length_penalty": "none",
"log_file": "../tmp/code2jdoc_beam.txt",
"lr_decay": 0.99,
"max_characters_per_token": 30,
"max_examples": -1,
"max_relative_pos": 0,
"max_src_len": 150,
"max_tgt_len": 50,
"model_dir": "../tmp",
"model_file": "../tmp/code2jdoc.mdl",
"model_name": "code2jdoc",
"model_type": "rnn",
"momentum": 0,
"n_best": 1,
"n_characters": 260,
"nfilters": 100,
"nhid": 200,
"nlayers": 2,
"num_head": 8,
"only_generate": true,
"optimizer": "adam",
"parallel": false,
"pred_file": "../tmp/code2jdoc_beam.json",
"random_seed": 1013,
"reload_decoder_state": null,
"replace_unk": true,
"reuse_copy_attn": false,
"review_attn": false,
"rnn_type": "LSTM",
"share_decoder_embeddings": false,
"sort_by_len": true,
"split_decoder": false,
"src_pos_emb": true,
"stepwise_penalty": false,
"test_batch_size": 64,
"tgt_pos_emb": true,
"trans_drop": 0.2,
"uncase": true,
"use_all_enc_layers": false,
"use_code_type": false,
"use_neg_dist": true,
"use_src_char": false,
"use_src_word": true,
"use_tgt_char": false,
"use_tgt_word": true,
"verbose": false,
"warmup_epochs": 0,
"warmup_steps": 10000,
"weight_decay": 0
} ]
0% 0/1 [00:00<?, ?it/s]tensor([0.7018, 0.7952, 0.4606, 0.9982], device='cuda:0')
Traceback (most recent call last):
File "../main/test.py", line 474, in
main(args)
File "../main/test.py", line 436, in main
validate_official(args, dev_loader, model)
File "../main/test.py", line 280, in validate_official
ret = translator.translate_batch(batch_inputs)
File "/content/drive/My Drive/notebooks/class_doc_gen/class_comment_gen/method_doc_gen/transformer_based_work/NeuralCodeSum/c2nl/translator/translator.py", line 242, in translate_batch
beam_attn.data[:, j, :memory_lengths[j]])
File "/content/drive/My Drive/notebooks/class_doc_gen/class_comment_gen/method_doc_gen/transformer_based_work/NeuralCodeSum/c2nl/translator/beam.py", line 135, in advance
ttt = attn_out.index_select(0, prev_k)
RuntimeError: "index_select_out_cuda_impl" not implemented for 'Float'
0% 0/1 [00:00<?, ?it/s]

Model is not learning and is not generating meaningful output

Hi Wasi,

The paper is very insightful and I want to say thank you for the effort and the finding, first and foremost!

I have tested the code in two env: python 3.9 + Pytorch 1.11 and python 3.7 + Pytorch 1.5.0 and I didn't make any changes other than using the method mentioned in this thread to resolve error (RuntimeError: "index_select_out_cuda_impl" not implemented for 'Float While comment generation testing the code) during testing.

the suggested changes that I made:

(beam.py line:131 prevK = bestScoresId / numWords). You can modify this line to prevK = bestScoresId // numWords

Model not learning during training process:

I have tried to reproduce the transformer with copy attention result from the paper with the provided dataset, however, both training processes did not give meaningful result:

For python 3.9 + Pytorch 1.11
snippet of the training process looks like:

01/13/2023 06:47:30 PM: [ dev valid official: Epoch = 70 | bleu = 2.30 | rouge_l = 3.62 | Precision = 2.52 | Recall = 7.44 | F1 = 3.33 | examples = 8714 | valid time = 164.42 (s) ]
01/13/2023 06:52:14 PM: [ train: Epoch 71 | perplexity = 404.95 | ml_loss = 105.54 | Time for epoch = 283.49 (s) ]
01/13/2023 06:54:59 PM: [ dev valid official: Epoch = 71 | bleu = 2.30 | rouge_l = 3.62 | Precision = 2.52 | Recall = 7.44 | F1 = 3.33 | examples = 8714 | valid time = 164.19 (s) ]

The above is the last epoch, and the scores from the metric are not quite far from the scores from epoch 1 below, while ml_loss do not have significant drop:

01/13/2023 10:32:20 AM: [ train: Epoch 1 | perplexity = 12872.61 | ml_loss = 164.32 | Time for epoch = 258.82 (s) ]
01/13/2023 10:35:09 AM: [ dev valid official: Epoch = 1 | bleu = 2.05 | rouge_l = 0.86 | Precision = 0.58 | Recall = 1.92 | F1 = 0.79 | examples = 8714 | valid time = 167.51 (s) ]
01/13/2023 10:35:09 AM: [ Best valid: bleu = 2.05 (epoch 1, 2179 updates) ]

And for python 3.7 + Pytorch 1.5.0
the ml loss and perplexity are both nan and net_loss is containing nan for both ml_loss and perplexity for all of the epochs.

While the trained logs from the previous thread, which has a close result to the paper's, look something like this:
first epoch:

06/12/2020 03:56:03 PM: [ train: Epoch 1 | perplexity = 22026.47 | ml_loss = 452.92 | Time for epoch = 406.76 (s) ]
06/12/2020 03:58:45 PM: [ dev valid official: Epoch = 1 | bleu = 3.31 | rouge_l = 5.53 | Precision = 4.84 | Recall = 11.51 | F1 = 5.52 | examples = 8714 | valid time = 158.37 (s) ]

last epoch:

06/20/2020 08:02:14 AM: [ train: Epoch 122 | perplexity = 2.17 | ml_loss = 15.94 | Time for epoch = 1169.03 (s) ]
06/20/2020 08:08:00 AM: [ dev valid official: Epoch = 122 | bleu = 40.50 | rouge_l = 51.66 | Precision = 58.31 | Recall = 54.11 | F1 = 53.57 | examples = 8714 | valid time = 342.41 (s) ]

Saved model does not output meaningful result

Last but not least, I have also tried to use trained model provided by from the previous thread
and generate.sh to generate prediction of the sample.code
However, the result looks something like this:

"0": [
"( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ("
],
"1": [
", , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,"
],

My guess is there might be bug in the decoder base that causing this issue? Because both the trained transformer and trained rnn suffer the same issue when I run the generate.sh file.

Any hint on what might be the problem and where to look to resolve this issue?

Many thanks in advance!

Getting the embedding for the sample code

Hi,

First of all, thank you so much for not only sharing this but promptly responding to queries. It has helped me a lot.

I just have a quick question. Could you possibly point me where I could retrieve the actual embeddings that generate the final Natural Language output? I am combing through the codebase as I speak, but if you know that, it would help save a ton of time.

Thank you!

Error on the file generation

Hi, I am a master student venturing in the code summarization field,
I wanted to try the code using this command: bash generate.sh 0 code2jdoc sample.code
but an OSError araised:
Traceback (most recent call last):
File "../main/test.py", line 474, in
main(args)
File "../main/test.py", line 400, in main
raise IOError('No such file: %s' % args.model_file)
OSError: No such file: ../tmp/code2jdoc.mdl

Some of the problems

Dear Professor ,thank you for reading! When I run the code following the steps, I meet some problems.Can you help me ?
the problems is:
OSError:No such file" /home/zxq/code/NeuralCodeSum/data/java/train/code.origin_subtoken
and when I run the bash get_data.sh,
I got: FileNotFoundError: no such file or directory : 'train/code.original_subtoken

now I am solving this problem by downloading the dataset Manually

Split decoder

Hi, is the split decoder part implemented? I tried ur code with argument args.split_decoder True and got this error:

Epoch = 1 [perplexity = x.xx, ml_loss = x.xx]: 0% 0/939 [00:00<?, ?it/s]Traceback (most recent call last):
File "../../main/train.py", line 708, in
main(args)
File "../../main/train.py", line 653, in main
train(args, train_loader, model, stats)
File "../../main/train.py", line 283, in train
net_loss = model.update(ex)
File ".../model.py", line 173, in update
example_weights=ex_weights)
File ".../module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File ".../transformer.py", line 435, in forward
**kwargs)
File ".../transformer.py", line 363, in _run_forward_ml
summ_emb)
File ".../module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "...transformer.py", line 295, in forward
return self.decode(tgt_pad_mask, tgt_emb, memory_bank, state)
File "...transformer.py", line 273, in decode
f_t = self.fusion_sigmoid(torch.cat([copier_out, dec_out], dim=-1))
TypeError: expected Tensor as element 0 in argument 0, but got list
Epoch = 1 [perplexity = x.xx, ml_loss = x.xx]: 0% 0/939 [00:01<?, ?it/s]

should we use torch.stack() ??? thanks in advance

Questions about the python and java datasets.

Hi @wasiahmad ,

The input data of the model in A Transformer-based Approach for Source Code Summarization is a series of tokens, but the input data of my model is abstract syntax tree (AST), I need to find the original source code (executable source code snippet) corresponding to a series of tokens, and then parse it to AST.

I have downloaded the data from their original work, but I found that the size of the dataset used in your paper is different from the size of their original dataset. For example, in the train set of the python dataset, the original size exceeds 100,000, while yours is about 50,000.

I want to compare with your model, so I selected the experiment dataset provided by your paper.
Since the series of tokens can not be parsed into AST, I need to find the corresponding original source code from their original work.
image

Unfortunately, I can not find the original source code for all the series of tokens.
If you could provide me with the corresponding original code files (the size of your experiment datasets are inconsistent with the original datasets), I believe I can convert them to AST and compare the experiment results with yours.

Thank you.

Test this code

hello friend i want to test this code and run it
can you please send me steps to run it
this will help me in my graduation project
best regards
thanks in advance

Potential Bug iterator Pytorch

Hi @wasiahmad

I noticed a potential bug while processing small testing dataset:

  • testing dataset = 78 functions
  • batch size = 64

At this location:
https://github.com/wasiahmad/NeuralCodeSum/blob/master/main/test.py#L275

def validate_official(args, data_loader, model):
    """Run one full official validation. Uses exact spans and same
    exact match/F1 score computation as in the SQuAD script.
    Extra arguments:
        offsets: The character start/end indices for the tokens in each context.
        texts: Map of qid --> raw text of examples context (matches offsets).
        answers: Map of qid --> list of accepted answers.
    """

    eval_time = Timer()
    translator = build_translator(model, args)
    builder = TranslationBuilder(model.tgt_dict,
                                 n_best=args.n_best,
                                 replace_unk=args.replace_unk)

    # Run through examples
    examples = 0
    trans_dict, sources = dict(), dict()
    with torch.no_grad():
        pbar = tqdm(data_loader)
        for batch_no, ex in enumerate(pbar):
            batch_size = ex['batch_size']  # POTENTIAL BUG
            ids = list(range(batch_no * batch_size,
                             (batch_no * batch_size) + batch_size))
            batch_inputs = prepare_batch(ex, model)

Here we compute the ids based on the batch_size, and to compute the correct ids we assume that the batch_size is constant, but unfortunately I noticed with pdb that the batch size that comes form the data_loader (pytorch code) has a batch_size that is constant until the last batch. So for example if the dataset has 78 records, the first batch_size = 64 but the second and last is 14, and this mess up the writing of the results. It saves this last 14 prediction to ids that have already been assigned to some other record previously.

Maybe it is a particularity of my pytorch version:
pytorch=1.5.1=py3.6_cuda10.1.243_cudnn7.6.3_0

Anyway there is a quick fix:

def validate_official(args, data_loader, model, batch_size):

where batch_size is taken from the arg.batch_size and it stays constant throughout the entire validation procedure.

Luckily this bug is relevant only when the tested dataset is small and can invalidate at maximum 64 records, but on small testsets it is a relevant problem.

@wasiahmad let me know if it is clear how to reproduce the problem and I am curious to know if you can confirm the presence of the bug or there was something missing in my reasoning.

Thanks in advance,

Matteo

Upload Pre-Trained model weights

Hi, would it be possible to upload the models which you trained for the paper you published? I would like to test the code summary generation without having to train a model from scratch, which is quite time and resource intensive.

Project dependencies may have API risk issues

Hi, In NeuralCodeSum, inappropriate dependency versioning constraints can cause risks.

Below are the dependencies and version constraints that the project is using

numpy
tqdm
nltk
prettytable
torch>=1.3.0

The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict.
The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

After further analysis, in this project,
The version constraint of dependency tqdm can be changed to >=4.36.0,<=4.64.0.
The version constraint of dependency prettytable can be changed to >=0.6,<=1.0.1.

The above modification suggestions can reduce the dependency conflicts as much as possible,
and introduce the latest version as much as possible without calling Error in the projects.

The invocation of the current project includes all the following methods.

The calling methods from the tqdm
tqdm.tqdm
tqdm.tqdm.set_description
The calling methods from the prettytable
prettytable.PrettyTable
prettytable.PrettyTable.add_row
The calling methods from the all methods
os.path.isfile
i.code_words.size
result.F.relu.view
pad_indices.extend
lengths.tolist
self.named_parameters
self.scores.unsqueeze.expand_as.view
target.view.size
i.self.layer
c2nl.utils.misc.relative_matmul
size.self.tt.FloatTensor.zero_
c2nl.inputters.timer.AverageMeter.update
self.Decoder.super.__init__
torch.nn.ReLU
self.proj
nltk.translate.bleu_score.corpus_bleu
torch.nn.Sigmoid
word_dict.word_to_char_ids
c2nl.utils.misc.tens2sen
torch.matmul
torch.log.squeeze
c2nl.modules.char_embedding.CharEmbedding
torch.min
m.view
b.get_current_origin
target.view.scores.gather.view
wq.expand.expand
nltk.stem.PorterStemmer.stem
s.decode
torch.utils.data.sampler.RandomSampler
self.decode
inp.masked_fill.t
c2nl.modules.util_class.Elementwise
self.offset.align.view.scores.gather.view
BleuScorer
rnn_output.contiguous
self.RNNEncoder.super.__init__
i.code_mask.size
line.rstrip
torch.cat.gt
dev_exs.extend
pos_enc.cuda.expand
os.environ.copy
self.tanh
tokenizer.Tokens
wq.expand.view
self.words
self.type_embeddings
target.lower.split
nltk.translate.bleu_score.SmoothingFunction
cov.clone
self.attn.parameters
self.meteor_p.stdin.write
PositionalEncoding
self.layer_weights
c2nl.inputters.dataset.CommentDataset
scores.append
self.register_buffer
size.self.tt.LongTensor.fill_
b.prediction.index_select
self.transformer_c.init_state
w.word_dict.word_to_char_ids.tolist
sent_states.data.index_select
self._build_target_tokens
self.drop
summ_len.cuda.cuda
c2nl.inputters.constants.UNK.align.eq.float
torch.nn.Tanh
self.score.size
kwargs.get
num.decode.split
char_emb.conv.transpose
self.next_ys.append
code_chars.size
torch.nn.Embedding
logging.StreamHandler
torch.nn.Softmax
batch.x.view.transpose
multiplier.cuda.size
uh.expand.expand
self.dropout.size
res.keys
self._regexp.finditer
setuptools.setup
source.size
self.softmax
init_from_scratch
rvar
self.meteor_p.stdout.readline
fill.append
code_mask_rep.cuda.cuda
IOError
self.network.load_state_dict
alignment.contiguous
torch.FloatTensor
warnings.warn
torch.nn.functional.softmax.size
tgt_pad_mask.unsqueeze.unsqueeze
self.relative_positions_embeddings_k
c2nl.inputters.vocabulary.Vocabulary.add
self._convert_word_to_char_ids
torch.cat
bx.b.out.index_fill_
collections.OrderedDict
old_args.keys
self.RNNDecoderBase.super.__init__
lengths.max
self.MultiHeadedAttention.super.__init__
logging.getLogger
i.summ_words.size.i.summ_word_rep.copy_
out.log.squeeze
multiplier.unsqueeze.expand
tqdm.tqdm
length.range_vec.unsqueeze.expand.transpose.transpose
id.hypotheses.split
i.attn.max
i.predictions.lower
os.path.dirname
self.TransformerEncoder.super.__init__
torch.nn.Linear
code_word_rep.cuda.cuda
getattr
tgt.tgt_chars.torch.Tensor.to.unsqueeze.tolist
beam_scores.view.topk
RuntimeError
torch.abs
emb_dims.extend
torch.tensor
print
torch.Tensor
cov.clone.fill_.cov.torch.min.log.sum
length.torch.arange.unsqueeze
self.optimizer.state_dict
self.slice
torch.cuda.is_available
self.score
args.vars.items
self.src_pos_embeddings
torch.exp.size
x_tz_matmul.reshape.permute
c2nl.modules.copy_generator.CopyGeneratorCriterion
c2nl.inputters.constants.UNK.target.ne.float
torch.bmm
self.__tens2sent
self.rnns.parameters
cov.dim
self.transformer_c.count_parameters
copy.copy
decoder.init_decoder
self.calc_score
scores.self.softmax.to
fn
net_loss.mean.backward
self.tanh.size
p.numel
copy_score.data.masked_fill_
generator.forward
self.network.eval
unshape
source_maps.append
self._single_reflen
hyps.append
acc_dec_outs.append
self.next_ys.size
lengths.lengths.device.max_len.torch.arange.type_as.repeat
self._run_forward_pass
rvar.size
b.advance
unicodedata.normalize
c2nl.eval.bleu.bleu_scorer.BleuScorer
c2nl.config.override_model_args
TransformerEncoderLayer
src.size
beam.scores.add_
prec.append
stem
torch.gt
scores.masked_fill.float
torch.exp.div
self.dropout_2
self.entities
self.embedder
tokenize_with_camel_case
self.dropout
out.log.log
torch.ones
torch.log
args.dev_tgt_files.append
split.unique_javadoc_tokens.update
alignment.cuda.cuda
v.cuda
prettytable.PrettyTable.add_row
self.global_scorer.update_global_state
numpy.mean
c2nl.config.add_model_args
perm.x.permute.contiguous.view
self.tanh.view
read_data
encoder_final.unsqueeze.expand
target.view.scores.gather.view.mul
next
memory_bank.size
x.mean
target.view.view
self.crefs.extend
torch.ones_like
w.self.model.tgt_dict.word_to_char_ids.tolist
t.size
rec.append
torch.stack.squeeze
tgt_len.i.tgt_tensor.copy_
numpy.random.seed
white_space_fix
self._bridge
scores.contiguous.size
blank_arr.append
self.VecEmbedding.super.__init__
eval
self.linear_in.view
new_args.keys
self.model.network.eval
inp_chars.torch.Tensor.to
self.decoder.init_decoder_state
logging.FileHandler.setFormatter
char_emb.transpose.transpose
self.sigmoid.expand_as
c2nl.config.get_model_args
c2nl.utils.copy_utils.replace_unknown
torch.nn.utils.rnn.pad_packed_sequence
e.size
self.decoder
c2nl.inputters.constants.PAD.target.ne.float
none.unsqueeze.unsqueeze
c2nl.encoders.transformer.TransformerEncoder
self.dropout_1
i.batch.size
dec_log_probs.append
enc_dec_attn.mean.mean
code_char_rep.cuda.cuda
output.self.layer_weights.squeeze
self.retest
self.Transformer.super.__init__
validate_official
h_t_.view.size
math.exp
self.relu
c2nl.objects.Summary.append_token
model.network.count_encoder_parameters
torch.max.view
self.score_ratio
tokens.append
math.sqrt.size
argparse.ArgumentParser.parse_args
copy_generator.forward
torch.nn.functional.sigmoid
f.read.strip
references.keys
mask.unsqueeze.unsqueeze
h_t_.view.view
reference.split
c2nl.inputters.timer.Timer
self.transformer_d.init_state
c2nl.encoders.rnn_encoder.RNNEncoder
collections.defaultdict.items
c2nl.translator.penalties.PenaltyBuilder
self.length_penalty
_skip
torch.sort
NotImplementedError
self.reinforce.sample
inp.inp_chars.torch.Tensor.to.unsqueeze
code_len.cuda.cuda
self.intermediate
dict.word_to_char_ids
prediction.split
attr.lower.split
json.dumps
b.prediction.index_add_
tgt.squeeze.clone
tgt_size.data.len.torch.zeros.long
self._init_cache
self.src_vocab.add_tokens
ground_truth.normalize_answer.split
src.strip
multiplier.cuda.cuda
abs
cook_refs
states.view
dict.byte
perm.x.permute.contiguous
self._tokens.append
c2nl.decoders.transformer.TransformerDecoder
argparse.ArgumentParser.add_argument_group
beam.global_state.beam.attn.torch.min.sum
bx.b.out.index_select
setuptools.find_packages
self.feed_forward
ml_loss.sum.sum
self.LayerNorm.super.__init__
inputs.size
t.max
join
b.get_hyp
self.prev_ks.append
torch.stack.transpose
torch.nn.functional.softmax.squeeze
w.params.word_to_char_ids.tolist
c2nl.inputters.dataset.CommentDataset.lengths
emb.size
parameters.numel
self.embedder.size
self._validate
beam.scores.clone.fill_
int
_.detach
TAG_TYPE_MAP.get
dec_out.squeeze
s.split
_get_ngrams
self._bridge.append
summ_word_rep.cuda.cuda
words.torch.Tensor.type_as
fill_b.cuda.cuda
a.repeat
tgt_seq.contiguous.ne
nltk.translate.bleu
self.Seq2seq.super.__init__
sys.stderr.write
args.train_tgt_files.append
self.transformer
i.matches.span
tmp.mul.log
max_index.item
c2nl.inputters.utils.load_data
subprocess.call
target.view.ne
tgt.tgt_chars.torch.Tensor.to.unsqueeze
layer
args.train_src_tag_files.append
torch.arange.unsqueeze
self.GlobalAttention.super.__init__
self.src_highway_net
ValueError
self._pen_is_none
align.view.ne
inp.masked_fill.gt
self.CopyGenerator.super.__init__
args.dev_src_files.append
self.network
args.dev_src_tag_files.append
self.network.register_buffer
c2nl.inputters.vocabulary.UnicodeCharsVocabulary
self.fusion_sigmoid
hypothesis_str.replace.replace.replace
i.self.rnns
attn.append
self.CodeTokenizer.super.__init__
torch.utils.data.DataLoader
self.PositionalEncoding.super.__init__
c2nl.inputters.vector.vectorize
torch.nn.utils.clip_grad_norm_
zip
load_words.update
loss_per_token.item.item
self._stat
time.time
scores.view.gather
list
self.ctest.append
summ_chars.size
sequence.unsqueeze
decoder.init_decoder.beam_update
bottle_hidden
self.scores.unsqueeze.expand_as
self.ctest.extend
torch.LongTensor
encoder_final.unsqueeze.unsqueeze
torch.nn.Conv1d
self.cov_penalty
torch.clamp
load_words.add
x.std
subprocess.Popen
kwargs.byte
compute_eval_score
self.make_embedding._modules.values
new_layer_wise_coverage.append
lengths.device.max_len.torch.arange.type_as
bx.b.out.index_add_
ex.size
x.view
torch.nn.ModuleList
isinstance
emb.dim
self.optimizer.state.values
self._coverage_penalty
c2nl.inputters.timer.AverageMeter
args.dataset_weights.keys
torch.utils.data.sampler.SequentialSampler
map
tgt.gt.float
self.optimizer.load_state_dict
sequence.translate
collections.Counter.update
numpy.argsort
torch.nn.CrossEntropyLoss
blank_b.cuda.cuda
beam.global_state.keys
layer.chunk
c2nl.utils.misc.aeq
summary.vectorize
words.unsqueeze.cuda
self.rnn.parameters
self.linear
memory_bank.batch_size.torch.Tensor.type_as.long.fill_.repeat
state.items
examples.append
tgt_seq.contiguous
align.view.view
copy_info.cpu.numpy
scores.contiguous.contiguous
source_tag.split
src_map.size
self.copier
set.add
state.update_state
source.dim
self.opts.get
i.matches.group
self.Embedder.super.__init__
c2nl.inputters.dataset.SortedBatchSampler
self.reset
torch.stack
refmaxcounts.get
parser.register
tgt.tgt_chars.torch.Tensor.to.unsqueeze.to
self.rnns.append
b.prediction.index_fill_
self.dropout.squeeze
self.network.cuda
tgt.size
self.word_to_char_ids
code_mask_rep.repeat.byte
self.make_embedding
linear
tgt_chars.tolist.torch.Tensor.unsqueeze
f.read
regex.compile
self.compatible
c2nl.inputters.timer.Timer.time
self.tt.LongTensor
line.rstrip.split
self._tokens.insert
torch.exp
x.permute.reshape
summ_len.float.ml_loss.div.mean
ex_weights.cuda.cuda
inputs.split
ml_loss.sum.div
len
self.relative_positions_embeddings_v
attr.lower
split.unique_function_tokens.update
self.normalize
eval_accuracies
self.network.parameters
align.view.eq
c2nl.modules.embeddings.Embeddings
self.tgt_pos_embeddings
self.TransformerEncoderLayer.super.__init__
self._single_reflen.append
src.strip.split
c2nl.decoders.rnn_decoder.RNNDecoder
c2nl.inputters.constants.UNK.target.eq.float
self.meteor_p.wait
summ_len.float
dim.head_count.batch_size.x.view.transpose
self.fscore
torch.ones_like.unsqueeze
enumerate
coverage.dim
dec
beam.prev_ks.beam.global_state.index_select.add
c2nl.objects.Summary
tqdm.tqdm.set_description
self.rnn
bx.blank.torch.Tensor.to
self.src_word_embeddings
i.summ_chars.size
future_mask.triu_.view.triu_
self.Highway.super.__init__
c2nl.utils.misc.generate_relative_positions_matrix
collections.Counter
numpy.zeros
decoder.init_decoder.repeat_beam_size_times
code_len.size
memory_bank.batch_size.torch.Tensor.type_as.long.fill_
attn_out.index_select
kwargs.byte.unsqueeze
code_type_rep.cuda.cuda
torch.optim.Adam
pos_enc.cuda.cuda
self.Embeddings.super.__init__
logging.warning
self.tt.FloatTensor
float
token.split
init_from_scratch.init_optimizer
input_dim.layer.bias.data.fill_
parser.add_argument_group
compute_bleu
self.word_lut.weight.data.copy_
dict
self.output
argparse.ArgumentParser.register
self.optimizer.step
self.attention
layer_scores.unsqueeze.output.transpose.torch.matmul.squeeze
q.size
self.network.cpu
c2nl.utils.copy_utils.align
maxcounts.get
str
c2nl.modules.multi_head_attn.MultiHeadedAttention
_insert
c2nl.models.seq2seq.Seq2seq
out.shape.lengths.sequence_mask.unsqueeze
c2nl.utils.misc.generate_relative_positions_matrix.to
new_test.self.retest.compute_score
encoder_final.unsqueeze.size
d.size
unbottle
enc
key.hypotheses.split
Decoder
i.code_chars.size
Code2NaturalLanguage
c2nl.inputters.vocabulary.Vocabulary.normalize
prediction.normalize_answer.split
torch.save
max_len.torch.arange.unsqueeze
range_vec.unsqueeze.expand
attn_copy.data.masked_fill_
enc_dec_attn.mean.dim
cov.clone.fill_
self.context_attn
e.view
self.transformer.count_parameters
conv
sys.path.append
init_from_scratch.parallelize
self.src_char_embeddings
self.copy_generator
self._check_args
self.linear_query
copy.copy.pop
logging.getLogger.setLevel
vars
self.criterion
beam.global_state.index_select
self.tgt_highway_net
inp.masked_fill.masked_fill
sent_states.data.copy_
vocab_sizes.extend
inputs.view
torch.is_tensor
torch.nn.DataParallel
scores.contiguous.view
numpy.random.shuffle
reqs.strip.split
c2nl.modules.copy_generator.CopyGenerator
self.pe
snake_case_tokenized.extend
hyp.append
cov.size
main.model.Code2NaturalLanguage.load_checkpoint
module
gts.keys
decoder.decode
str.maketrans
torch.cos
self.src_vocab.remove
code_len.repeat
self.linear_context
c2nl.eval.bleu.bleu_scorer.BleuScorer.compute_score
h_s.contiguous.view
c2nl.modules.global_attention.GlobalAttention
main
train
source_map.cuda.cuda
scores.self.softmax.to.squeeze
self.key
multiplier.cuda.unsqueeze
dict.size
self.tok2ind.keys
torch.nn.functional.softmax
r.split
x.transpose
wt.item
self.UnicodeCharsVocabulary.super.__init__
mask.float.squeeze
c2nl.modules.util_class.LayerNorm
init_from_scratch.cuda
alignments.append
exp_score.div.sum
self.linear_out
self.v
torch.zeros
self.tgt_word_embeddings
args.train_src_files.append
self.init_decoder
set
sorted
self.sigmoid
self.network.embedder.tgt_word_embeddings.fix_word_lut
hasattr
self.embedder.squeeze
feat.squeeze
text.lower
b.sort_finished
self.get_hyp
torch.cat.append
tokenize_with_snake_case
count.batch.x.view.transpose.repeat.transpose.contiguous.view
scores.self.softmax.to.chunk
tgt_seq.contiguous.view
i.code_type.size
torch.nn.Sequential
state.state.sequence_mask.unsqueeze
i.code_chars.size.i.code_char_rep.copy_
self.TransformerDecoder.super.__init__
eval_score
load_words
model.network.layer_wise_parameters
torch.matmul.reshape
sum
summ_char_rep.cuda.cuda
Encoder
concat_c.self.linear_out.view
min
self._run_forward_ml
inp.t.contiguous.view
self._initialize_bridge
network.state_dict
self.decoder.decode
open.write
num.decode.split.decode
os.path.abspath
all
x.transpose.contiguous
parser.add_argument_group.add_argument
int.copy_info.cpu.numpy.astype.tolist
self.all_scores.append
self.copy_attn
translations.append
torch.nn.functional.relu
train_exs.extend
self.encoder
scores.view.size
target.view.eq
c2nl.eval.bleu.corpus_bleu
torch.nn.functional.softmax.unsqueeze
refs.append
my_lcs
count_file_lines
h_s.size
any
self.copy_attn.parameters
collections.Counter.most_common
m.group
dim.wquh.view.self.v.view
int.copy_info.cpu.numpy.astype.tolist.cpu
torch.cat.squeeze
self._length_penalty
c2nl.inputters.utils.build_word_and_char_dict
tgt_chars.torch.Tensor.to
out.log.size
torch.mul
i.code_type.size.i.code_type_rep.copy_
re.finditer
c2nl.objects.Summary.prepend_token
self.ratio
logging.getLogger.addHandler
self.layer.parameters
target.lower
self.linear_in
text.split
self.network.train
lower
c2nl.modules.position_ffn.PositionwiseFeedForward
collections.Counter.values
memory_bank.batch_size.torch.Tensor.type_as.long
attention_scores.append
torch.arange
hypotheses.keys
add_train_args
threading.Lock
tgt_seq.cuda.cuda
logging.getLogger.warning
logging.getLogger.info
torch.stack.append
fill_arr.append
self.parameters
net_loss.mean.item
ml_loss.sum.mul
torch.nn.Dropout
init_from_scratch.update
copy_info.cpu.numpy.astype
self.add
PositionalEncoding.unsqueeze
filter_fn
code_mask_rep.repeat.repeat
num.format.rstrip
max
code.vectorize
Translation
count.batch.x.view.transpose.repeat.transpose.contiguous
torch.cuda.device_count
states.size
load_words.append
torch.manual_seed
camel_case_tokenized.extend
b.get_current_state
h_s.contiguous
bx.fill.torch.Tensor.to
num.format.rstrip.rstrip
data.append
cov.clone.fill_.cov.torch.min.log
align.view.size
self.tok2ind.get
f
Code2NaturalLanguage.init_optimizer
exp_score.div.div
beam.scores.clone
attn.squeeze
self.finished.sort
self.TransformerDecoderLayer.super.__init__
ml_loss.sum.view
self.Encoder.super.__init__
numpy.random.random
isinstance.item
ml_loss.sum.mean
self.transformer_c
self.compute_score
_fix_enc_hidden
w.item
self.global_scorer.update_score
i.code_words.size.i.code_word_rep.copy_
source.split
tgt.strip.split
self.scores.unsqueeze
hypothesis_str.replace.replace
idx.start.self.slice.untokenize
multiplier.unsqueeze.unsqueeze.expand
self.network.embedder.src_word_embeddings.fix_word_lut
self.network.mean
self.embedding
logging.FileHandler
self.bridge.parameters
self.encoder.count_parameters
inp.masked_fill.tolist
self.crefs.append
range.append
inp.data.eq
self.word_vec_size.vocabulary.len.torch.FloatTensor.zero_
normalize_answer
hidden.size
c2nl.inputters.constants.UNK.align.ne.float
dict.values
beam.scores.sub_
embedder
hasattr.copy_attn
multiplier.unsqueeze.unsqueeze
c2nl.eval.meteor.Meteor.compute_score
self.form_src_vocab
tmp.mul.mul
self.softmax.size
tgt.tgt_chars.torch.Tensor.to.unsqueeze.repeat
self.linear_copy
logging.Formatter
sent.size
main.model.Code2NaturalLanguage.load
model.network.count_parameters
self.transformer_d.count_parameters
cov.clone.fill_.cov.torch.max.sum
scores.view.view
torch.optim.SGD
torch.load
torch.cat.size
e.data.repeat
batch_size.tgt_words.expand.unsqueeze
self.meteor_p.stdout.readline.strip
inp.t.contiguous
groups.append
self.query
self.generator
self.fusion_gate.squeeze
math.log
prettytable.PrettyTable
self.global_scorer.score
argparse.ArgumentParser
self._activation
future_mask.triu_.view
remove_punc
src_vocabs.append
beam.attn.sum
c2nl.models.transformer.Transformer
shape.transpose
atexit.unregister
k.bleu_list.append
memory_bank.size.memory_bank.size.emb.size.memory_bank.size.torch.zeros.type_as
self.PositionwiseFeedForward.super.__init__
max_len.torch.arange.unsqueeze.float
self.optimizer.zero_grad
tgt_words.data.eq
count.batch.x.view.transpose.repeat.transpose
x.transpose.contiguous.view
argparse.Namespace
attn.size
length.range_vec.unsqueeze.expand.transpose
math.sqrt
c2nl.translator.beam.Beam
line.strip
self.decoder.load_state_dict
uh.expand.view
range
scores.masked_fill.masked_fill
Embedder
c2nl.eval.meteor.Meteor
self._from_beam
init_from_scratch.checkpoint
atexit.register
os.path.join
params.byte.unsqueeze
self.meteor_p.stdout.readline.dec.strip
tgt_words.tgt_chars.to.unsqueeze
c2nl.inputters.vocabulary.Vocabulary
self.fusion_gate
self.model.tgt_dict.word_to_char_ids
word_probs.size
self.TEXT.t.lower
self.layer_norm_2
c2nl.decoders.state.RNNDecoderState
precook
candidate.split
model.network.count_decoder_parameters
main.model.Code2NaturalLanguage
words.torch.Tensor.type_as.unsqueeze
v.lower
rnn_output.size
perm.x.permute.contiguous.permute
self.meteor_p.kill
c2nl.modules.highway.Highway
uuid.uuid4
self.decoder.count_parameters
self.DecoderBase.super.__init__
self.layer_norm
s.encode
lengths.numel
tgt.strip
encoder
blank.append
words.unsqueeze.expand
torch.sin
batch_size.torch.Tensor.type_as
self.__generate_sequence
batch_size.lengths.lengths.device.max_len.torch.arange.type_as.repeat.lt
representations.append
c2nl.eval.rouge.Rouge
sentence.split
self.dropout.split
shape
self.attn.append
human_format
numpy.array
random.choice
self.ind2tok.items
self.ind2tok.get
self.Elementwise.super.__init__
self.CharEmbedding.super.__init__
torch.nn.Parameter
sequence.translate.strip
source.c.torch.cat.view
subprocess.check_output
word_rep.size.torch.arange.type
parser.add_argument
lengths.unsqueeze
tgt_pad_mask.unsqueeze.size
process_examples
super
collections.defaultdict
init_from_scratch.predict
open.close
self.finished.append
self.transformer.init_state
c2nl.objects.Code
code_mask_rep.byte.unsqueeze
nltk.stem.PorterStemmer
words.torch.Tensor.type_as.append
self.cook_append
self.meteor_p.stdin.flush
TransformerDecoderLayer
round
open
var
z.transpose
word.encode
logging.StreamHandler.setFormatter
perm.x.permute.contiguous.size
self.transformer_d
self.value
self._from_beam.append
std_attentions.append
time.strftime
torch.no_grad
self.decoder.init_decoder
self.copier.count_parameters
self.copier.init_decoder_state
h_s.transpose
init_from_scratch.save
format
type
batch.x.view.transpose.repeat
align.data.masked_fill_
cook_test
c2nl.eval.rouge.Rouge.compute_score
iter
rnn_type.nn.getattr
torch.cuda.manual_seed
self.tgt_char_embeddings
shape.size
set_defaults
i.summ_words.size
torch.nn.utils.rnn.pack_padded_sequence
c2nl.utils.copy_utils.make_src_map
self.make_embedding.add_module
self.close
c2nl.utils.misc.sequence_mask
torch.max
i.code_mask.size.i.code_mask_rep.copy_
torch.load.size
c2nl.utils.misc.count_file_lines
torch.tril
align_unk.tmp.mul.mul
i.summ_chars.size.i.summ_char_rep.copy_
psutil.virtual_memory
c2nl.utils.copy_utils.collapse_copy_scores
self.attn
self.shutdown
self.embedder.gt
tuple
self.data.t.self.TEXT_WS.t.join.strip
lengths.size

@developer
Could please help me check this issue?
May I pull a request to fix it?
Thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.