neulab / external-knowledge-codegen Goto Github PK

Code and data for ACL20 paper "Incorporating External Knowledge through Pre-training for Natural Language to Code Generation"

License: Apache License 2.0

Python 87.64% Shell 8.28% JavaScript 1.07% CSS 0.11% HTML 2.90%

external-knowledge-codegen's People

Stargazers

Watchers

Forkers

chaoyue729 ionutzzu12 zeta1999 gabeorlanski kleag xc15071347094 codewithdongjy aef-nattanon amtsing

external-knowledge-codegen's Issues

Add wget directions?

This section could be made easier to follow if you added command line instructions to do it with wget:

https://github.com/neulab/external-knowledge-codegen#mined-stackoverflow-pairs

ModuleNotFoundError: No module named 'asdl'

Hello,
Please I am trying to preprocess the data to obtain the .bin file that will enable me train the data. Unfortunately, the following error keep appearing.
(base) lab@master:~/external-knowledge-codegen$ python datasets/conala/dataset.py --pretrain data/conala/conala-mined.jsonl --topk 100000 --include_api apidocs/processed/distsmpl/snippet_15k/goldmine_snippet_count100k_topk1_temp2.jsonl
Traceback (most recent call last):
File "datasets/conala/dataset.py", line 9, in
from asdl.hypothesis import *
ModuleNotFoundError: No module named 'asdl'

I tried to install asdl library and the error changed to "ModuleNotFoundError: No module named 'asdl.hypothesis'; 'asdl' is not a package"

I will appreciate any help from you to enable me run the model successfully.
Thank you.

Resampling python-docs.jsonl but finally get nothing

I am trying to Resampling python-docs.jsonl from conala-mined and python-docs.jsonl,but finally get nothing.

Below is some experimental details that occur
（1）python retrieve.py --method topk --inp ../data/conala/conala-mined.jsonl:python-docs.jsonl --topk 6 --field snippet --temp 6 --out temp6.json

load 593891 from ../data/conala/conala-mined.jsonl
load 12958 from python-docs.jsonl
100% 606849/606849 [04:43<00:00, 2137.19it/s]
most commonly retrieved ids []
（2）python retrieve.py --method dist --inp ../data/conala/conala-mined.jsonl --topk 6 --field snippet --temp 6 --out temp6.json

load 593891 from ../data/conala/conala-mined.jsonl
return _wrapreduction(a, np.maximum, 'max', axis, None, out,
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity

I think the reason might be laid in elasticsearch that return nothing ,but I'm not sure about that.
I install elasticsearch by "pip install elasticsearch".
Looking forward to your answer!

How to make goldmine_snippet_count100k_topk1_temp2.jsonl?

Hello,

I want to run python datasets/conala/dataset.py --pretrain path/to/conala-mined.jsonl --topk 100000 --include_api apidocs/processed/distsmpl/snippet_15k/goldmine_snippet_count100k_topk1_temp2.jsonl

~~However, goldmine_snippet_count100k_topk1_temp2.jsonl is missing (but there is some txt files in snippet_15k directory).~~
How to make this jsonl file?

I think, retrieve.py with --method dist makes the jsonl file, but get_distribution() does not make jsonl files.

Thank you.

Reproducing results

I am trying to reproduce the numbers stated in the paper for appropriate comparisons to a paper I am writing. But when I run the following command I get a corpus BLEU score of 30.69.

. scripts/conala/test.sh ../external-knowledge-codegen/best_pretrained_models/finetune.mined.retapi.distsmpl.dr0.3.lr0.001.lr_de0.5.lr_da15.beam15.seed0.mined_100000.intent_count100k_topk1_temp5.bin 2>&1
load model from [../external-knowledge-codegen/best_pretrained_models/finetune.mined.retapi.distsmpl.dr0.3.lr0.001.lr_de0.5.lr_da15.beam15.seed0.mined_100000.intent_count100k_topk1_temp5.bin]
Decoding: 100%|██████| 500/500 [02:39<00:00,  3.13it/s]
{'corpus_bleu': 0.30694588794625494, 'oracle_corpus_bleu': 0.4181369862278688, 'avg_sent_bleu': 0.2376696401071103, 'oracle_avg_sent_bleu': 0.3983062032090926, 'exact_match': 0.028, 'oracle_exact_match': 0.084}

I am guessing the reranker is not used in the generation of the results.

To solve this I accessed the testing function directly to generate hyps and evaluate them with the same BLEU functions.

model_file='external_repos/external-knowledge-codegen/best_pretrained_models/finetune.mined.retapi.distsmpl.dr0.3.lr0.001.lr_de0.5.lr_da15.beam15.seed0.mined_100000.intent_count100k_topk1_temp5.bin'
reranker_file = 'external_repos/external-knowledge-codegen/best_pretrained_models/reranker.conala.vocab.src_freq3.code_freq3.mined_100000.intent_count100k_topk1_temp5.bin'
self.parser = StandaloneParser('default_parser',
                              model_file,
                              'conala_example_processor',
                              beam_size=15,
                              cuda=True,
                              reranker_path=reranker_file)

This gives me a similar corpus BLEU score of 30.078 and an average sentence BLEU score with NLTK with smooth_fn3 of 25.295.

What are the necessary commands in sequence to get the score from the paper?

neulab / external-knowledge-codegen Goto Github PK

external-knowledge-codegen's People

Stargazers

Watchers

Forkers

external-knowledge-codegen's Issues

Add wget directions?

ModuleNotFoundError: No module named 'asdl'

Resampling python-docs.jsonl but finally get nothing

How to make goldmine_snippet_count100k_topk1_temp2.jsonl?

Reproducing results

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent