adf1178 / pt4code Goto Github PK

View Code? Open in Web Editor NEW

46.0 46.0 16.0 5.2 MB

License: MIT License

Python 99.31% Shell 0.69%

pt4code's People

Stargazers

Watchers

Forkers

yoonjinwoo hzlujunyi tensixchuan upcwanghaibo cs304-2022 ccsnow127 chengzhe-feng cs304-2022 cjl99 anandanne daoyuan14 banana-boat avesus oddhood jeongwhanchoi bchetule

pt4code's Issues

missing data

Hi,

Thanks for the good work. I am reproducing your work but find these files are missing :

in codebert.py:
train_dataset = read_answers('/data/czwang/prompt/dataset/train.jsonl')
valid_dataset = read_answers('/data/czwang/prompt/dataset/valid.jsonl')
test_dataset = read_answers('/data/czwang/prompt/dataset/test.jsonl')

Can you please provide the dataset? I download the dataset with command

cd dataset
pip install gdown
gdown https://drive.google.com/uc?id=1x6hoF7G-tSYxg8AFybggypLZgMGDNHfF
cd ..

but it is not relevent to /data/czwang/prompt/dataset/train.jsonl or th other two.

Thanks!

Question about the prompts?

Hello!

Recently, I came across your work and I found it inspiring.
I was trying to reproduce some of your experiments, however I did not mange to find any of the prompt you have used to conduct the experiment.

For example given the following snippet of code extracted from the script: defect/prompt/prompt_t5_2.py

promptTemplate = PrefixTuningTemplate(model=plm, tokenizer=tokenizer,
                text='{"placeholder":"text_a"}  {"mask"} ',
										  using_decoder_past_key_values=False, num_token=200)

which part has to be changed in order to integrate one of your hard-template (e.g, the code [x] is [z]) ?

Thanks in advance for any help you can provide.

CUDA error: device-side assert triggered

I am following the #3 issue and am able to get the data, but now if I run the script defect/codebert.py
I got a Cuda runtime error. Has anyone seen this before? I checked my Cuda installation and run other models and it works.
I suspect it is the sequence length but it is already set to max_seq_length = 512.

Any help, please? greatly appreciate!

Questions for multi-GPU prompt tuning

I try to reproduce the code-summarization task with the instruction：
python prompt_t5.py --visible_gpu "0,1,2,3" --lang java --train_batch_size 32 --eval_batch_size 16 --max_source_length 256 --max_target_length 128 --log_name=./log/java.log --do_train --do_eval
I only change your instruction and all code files remain unchanged.
But unfortunately, I faced a problem that console threw this error :

Traceback (most recent call last):
File "prompt_t5.py", line 538, in
main(my_args)
File "prompt_t5.py", line 286, in main
loss = model(batch)
File "/home/anaconda3/envs/lmass/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/anaconda3/envs/lmass/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/anaconda3/envs/lmass/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/anaconda3/envs/lmass/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 89, in parallel_apply
output.reraise()
File "/home/anaconda3/envs/lmass/lib/python3.8/site-packages/torch/_utils.py", line 543, in reraise
raise exception
TypeError: Caught TypeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/anaconda3/envs/lmass/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
output = module(*input, **kwargs)
File "/home/anaconda3/envs/lmass/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/anaconda3/envs/lmass/lib/python3.8/site-packages/openprompt/pipeline_base.py", line 449, in forward
return self._forward(*args, **kwargs)
File "/home/anaconda3/envs/lmass/lib/python3.8/site-packages/openprompt/pipeline_base.py", line 465, in _forward
outputs = self.prompt_model(batch)
File "/home/anaconda3/envs/lmass/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/anaconda3/envs/lmass/lib/python3.8/site-packages/openprompt/pipeline_base.py", line 210, in forward
batch = self.template.process_batch(batch)
File "/home/anaconda3/envs/lmass/lib/python3.8/site-packages/openprompt/prompts/soft_template.py", line 94, in process_batch
inputs_embeds = self.raw_embedding(batch['input_ids'])
File "/home/anaconda3/envs/lmass/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/anaconda3/envs/lmass/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 160, in forward
return F.embedding(
File "/home/anaconda3/envs/lmass/lib/python3.8/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
TypeError: embedding(): argument 'indices' (position 2) must be Tensor, not tuple

Do you have any idea what may trigger this error?

The results on the code summerization task

Hello, Thanks for the good work. But I have some questions about the results in the paper:

Why are the results of Table 4 and Table 5 in the article quite different while both on the code summerization task?
Which template is the result of the promp in Table 4 realized?
The results of CodeT5 Fine-tuning in Table 4 seem to be low. For example, the overall result of your CodeT5-base Fine-tuning is 19.23, but the result declared by CodeT5-base and commonly quoted by others is 19.55, and the result I reproduced is also19.4+

Questions for the details of summarization experiment

Hi, thank you for your work, @adf1178 I have some questions about the experiments of summarization.

How do you set the epochs for training, which is 20 for summarization? as we know, overfitting may occur, as for different languages, some may overfit, and others may underfit, it seems inappropriate to set them all the same.
For performance in low-resource scenarios, as mentioned above, overfitting occurs sometimes, how do you set the epochs for train data size 100, 200, 300, 500, and 1000 in summarization for each language? is it all the same? 20 for each?
For different Lengths of Prefix Soft Prompts, how do you set the epochs? so it's 20 epochs for all, or is it different for different lengths?
May I ask do you researchers fix the batch size and the seed for all experiments? it seems that the seed is always 42 and the batch size for training and validation is 64. As the batch size is also important for SoftTemplate (in summarization), the result changes when batch size changes when the seed is fixed (in my demo).

Cannot download files for defect detection task

Hi, thank you for your outstanding work and released code.

According to the readme.md file, the dataset for defect detection can be downloaded from https://drive.google.com/uc?id=1x6hoF7G-tSYxg8AFybggypLZgMGDNHfF, and then run the defect/prompt/codebert.py for prompt tuning.

I follow the above steps, but find some problems:
Only a file named function.json is downloaded from the provided URL (https://drive.google.com/uc?id=1x6hoF7G-tSYxg8AFybggypLZgMGDNHfF). When running the codebert.py to "read_answers('/data/czwang/prompt/dataset/train.jsonl')", it cannot execute successfully because no file named (train.jsonl, valid.jsonl, test.jsonl) was downloaded.

How to download these files?

Questions about the hard prompt and soft prompt in your code

Hi,

I'm sorry,but limited by my code level, I still don't know how to apply prompts like those in your paper to your code.
In your code, for example,in the task of code summarization, there is a hard prompt Generate comments for [𝐿𝐴𝑁𝐺] [𝑋] [𝑍]. I don't know how to apply it in code. The same goes for soft prompts.

From what I understand, I'm guessing it has to do with this line of code.

   promptTemplate = SoftTemplate(model=plm, tokenizer=tokenizer,
                                  text='Code: {"placeholder":"text_a"} Summarization: {"mask"} ',
                                  initialize_from_vocab=True,
                                  num_tokens=50)

For the parameter text, the value Code: {"placeholder":"text_a"} Summarization: {"mask"} seems to be different from the value in your paper

Some issues on your code

Thank you for presenting a good paper and sharing the code.

I have read your paper and analyzed the code, and I have a few questions.

In the paper, it is explained that PLM is frozen and prefix tuning is done, as the title of the paper is "No more Finetuning". However, in line 195 of "PT4Code/summarization/prompt_t5.py", if freeze_plm is set to False, is it fine-tuning that also trains the parameters of the PLM? In the experiment, it seems to be compared with Fine-tuning and Fine-tuning with Prompt template added.
In Table 2 of the paper, the Beam size is mentioned as 10. However, in line 481 of "PT4Code/summarization/prompt_t5.py", beam_size is not passed as an argument value to "_, output_sentence = model.generate(batch)", so it seems that beam_size is not used. If beam_size is added, there will be a significant difference from the results in the paper's table.
I have implemented code translation, but the training parameters are not mentioned in the paper or the README.md of github, so the performance in the paper cannot be implemented. Can you share the training parameters for all tasks?

Thank you.

questions about reproducing summarization results in table 4

Thanks for your well-written paper and code!
However, I have problem when running prompt_t5.py to reproduce your results in table 4 of summarization task. I have replaced SoftTemplate in this file with PrefixTuningTemplate, but the results is far from your table 4. Could share more precise experiment settings or Template details? Thank you!

Cannot download dataset for Code Summarization

I met a problem when I exec this command, which LANG = Ruby/Go/PHP/Java/Python/JavaScript

wget https://s3.amazonaws.com/code-search-net/CodeSearchNet/v2/{LANG}.zip

The error code is 403,
The error message is

<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>KY0D35ECZ6N9Y6F1</RequestId>
<HostId>1dVA91UlipsQHZWhzjF9ImHkRmg/0z2sZzlLxWLKepD3SbwWTWcarpwAYJoP/1oPPfrFnNB8f/c=</HostId>
</Error>

Should I register an account with Amazonaws or do something else?
Thanks!

Cannot download dataset for defect detection

Hi, thank you for your outstanding work and released code.

How to download these files?

It seems the released code tune both model parameters and prompts.

Thanks for your paper and code!

I did have a question about the setting for the function call, specifically the parameter freeze_plm in PromptForGeneration(plm=plm, template=promptTemplate, freeze_plm=False, tokenizer=tokenizer, plm_eval_mode=False). Since it is set to False, it appears that the backward propagation will update both the prompt parameters and the model parameters. If this is the case, it seems that the code includes both fine-tuning and prompt-tuning.

Questions about verbalizer setting in defect detection

In file defect/prompt/codebert.py, you defined promptVerbalizer as
promptVerbalizer = ManualVerbalizer( classes = classes, label_words = { "negative": ["clean", "good"], "positive": ["defective", "bad"], # "negative": ["indefective","good"], # "positive": ["defective", "bad"], }, tokenizer = tokenizer, )
I'd like to know if 'negative' and 'positive' were misplaced.

Zero shot for Prompt.

How is the vanilla soft prompt of zero-shot data obtained in the experiment? Is the hard prompt used directly?

adf1178 / pt4code Goto Github PK

pt4code's People

Stargazers

Watchers

Forkers

pt4code's Issues

Recommend Projects

Recommend Topics

Recommend Org