Giter Club home page Giter Club logo

coclr's People

Contributors

jun-jie-huang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

coclr's Issues

CoSQA | Related Queries | Need Guidance

Dear CoSQA authors,

I am trying to improve on the results in your paper for the code search objective: [2105.13239] CoSQA: 20,000+ Web Queries for Code Search and Question Answering (arxiv.org)

(I have already reproduced the results from the above paper.)

I have a few queries in this regard:

  • I tried rePretraining with the CSN dataset but was not getting the desired performance. This may be due to the hyperparameters values required. So, what hyperparams you used in this regard would help me in replicating the performance from scratch and be a good starting point.
  • Also, for improving the performance of the existing model, are there any other model variations or any suggestion from you that would help me towards the right track?
  • Also, any recommendations for study materials and research papers to look at?

Any insights or pointers that you have in mind would be greatly appreciated.

Thanks & Regards,
Yash Bagdi

The data in the CoSQA dataset

Hi, about the the CoSQA dataset and how to use it, I have a few questions:
1.The Table4 in the paper shows:
捕获

There are 20604 queries and 6276 codes. Why is the number of code and query inconsistent? Is it because one code can answer multiple queries?

2.The paper describes We fix a code database with 6,267 different codes in CoSQA. How to understand it? Do you just want to express that all 6267 codes are different?

3.The CoCLR on Code Search section describes Step 1: download the checkpoint trained on CodeSearchNet. Does the Checkpoint belong to Codebert or CoCLR?

4.The Model Checkpoint section describes You can also use the data in CodeXGLUE code search (WebQueryTest) to train the models by your self. What does the model refer to? The data in the CodeXGLUE/Text-Code/NL-code-search-WebQuery/data/ :
捕获

There is only test_webquery.json. It is used for test dataset. How to use it to train model?

How to reproduce the results in Table 5 of the paper?

The table5 in the paper shows:
image

With following the steps in CoCLR on Code Search section, I have applied CoCLR on the task of code search and got the result:
截图

1.How can I get the results for rows 2 and 3 in Table 5? Is it to follow the steps in Vanilla Model? The step2 of Vanilla Model descripts To train a search model without CoCLR. What does the model without CoCLR refer to? Is it CodeBert?

2.The paper descripts Hence, we simply choose RoBERTa-base and CodeBERT as the baseline methods. So the result of row 2 in the table 5 can be get by https://github.com/microsoft/CodeBERT/tree/master/CodeBERT/codesearch, right? Codebert is then evaluated on the data cosqa-retrieval-test-500.json with follow command, right?
捕获

3.The row 3 in table5 shows the data is CSN + CoSQA. Can CodeSearchNet Python corpus and CoSQA be used for training at the same time? They have different data formats.
The CodeSearchNet data format is:
捕获

CoSQA data format is:
捕获
Can CoSQA be directly used to train codebert?

training results and training details

Your work is excellent,But I have some questions about the training results and training details.

  1. There are two models in models.py, ModelBinary and BodelContract. Does that mean that the former is a vanilla model? However, GitHub does not give a description of ModelBinary.
  2. Do you have annotations in the code when you train CodeSearchNet alone? If the query is annotations, shouldn't the code be trained with annotations?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.