jun-jie-huang / coclr Goto Github PK

View Code? Open in Web Editor NEW

42.0 42.0 8.0 4.62 MB

License: MIT License

Python 100.00%

coclr's People

Contributors

Stargazers

Watchers

Forkers

s1530129650 harlanlu ybagdi yegmor fly-dragon211 ishuangxin ndaheim budsus

coclr's Issues

CoSQA | Related Queries | Need Guidance

Dear CoSQA authors,

I am trying to improve on the results in your paper for the code search objective: [2105.13239] CoSQA: 20,000+ Web Queries for Code Search and Question Answering (arxiv.org)

(I have already reproduced the results from the above paper.)

I have a few queries in this regard:

I tried rePretraining with the CSN dataset but was not getting the desired performance. This may be due to the hyperparameters values required. So, what hyperparams you used in this regard would help me in replicating the performance from scratch and be a good starting point.
Also, for improving the performance of the existing model, are there any other model variations or any suggestion from you that would help me towards the right track?
Also, any recommendations for study materials and research papers to look at?

Any insights or pointers that you have in mind would be greatly appreciated.

Thanks & Regards,
Yash Bagdi

The pre-trained model on the task of code search

In the Model Checkpoint section, there is the link of the checkpoint with best code question answering results. Can you share your pre-trained model on the task of code search?

The data in the CoSQA dataset

Hi, about the the CoSQA dataset and how to use it, I have a few questions:
1.The Table4 in the paper shows:

There are 20604 queries and 6276 codes. Why is the number of code and query inconsistent? Is it because one code can answer multiple queries?

2.The paper describes We ﬁx a code database with 6,267 different codes in CoSQA. How to understand it? Do you just want to express that all 6267 codes are different?

3.The CoCLR on Code Search section describes Step 1: download the checkpoint trained on CodeSearchNet. Does the Checkpoint belong to Codebert or CoCLR?

4.The Model Checkpoint section describes You can also use the data in CodeXGLUE code search (WebQueryTest) to train the models by your self. What does the model refer to? The data in the CodeXGLUE/Text-Code/NL-code-search-WebQuery/data/ :

There is only test_webquery.json. It is used for test dataset. How to use it to train model?

How to reproduce the results in Table 5 of the paper?

The table5 in the paper shows:

With following the steps in CoCLR on Code Search section, I have applied CoCLR on the task of code search and got the result:

1.How can I get the results for rows 2 and 3 in Table 5? Is it to follow the steps in Vanilla Model? The step2 of Vanilla Model descripts To train a search model without CoCLR. What does the model without CoCLR refer to? Is it CodeBert?

2.The paper descripts Hence, we simply choose RoBERTa-base and CodeBERT as the baseline methods. So the result of row 2 in the table 5 can be get by https://github.com/microsoft/CodeBERT/tree/master/CodeBERT/codesearch, right? Codebert is then evaluated on the data cosqa-retrieval-test-500.json with follow command, right?

3.The row 3 in table5 shows the data is CSN + CoSQA. Can CodeSearchNet Python corpus and CoSQA be used for training at the same time? They have different data formats.
The CodeSearchNet data format is:

CoSQA data format is:

Can CoSQA be directly used to train codebert?

No such file or directory: './model/search_codebert_switch/checkpoint-best-mrr/pytorch_model.bin'

Hi, I want to try the CoCLR on Code Search. When I run Step 2: training and evaluating, there is an error:

What should I do? Do I need to create my own folder path ./model/search_codebert_switch? Is the model pytorch_model.bin generated by itself, or does it need to be stored in the folder in advance?

training results and training details

Your work is excellent，But I have some questions about the training results and training details.

There are two models in models.py, ModelBinary and BodelContract. Does that mean that the former is a vanilla model? However, GitHub does not give a description of ModelBinary.
Do you have annotations in the code when you train CodeSearchNet alone? If the query is annotations, shouldn't the code be trained with annotations?

jun-jie-huang / coclr Goto Github PK

coclr's People

Contributors

Stargazers

Watchers

Forkers

coclr's Issues

CoSQA | Related Queries | Need Guidance

The pre-trained model on the task of code search

The data in the CoSQA dataset

How to reproduce the results in Table 5 of the paper?

No such file or directory: './model/search_codebert_switch/checkpoint-best-mrr/pytorch_model.bin'

training results and training details

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent