Giter Club home page Giter Club logo

chainofeverything's Introduction

Does Adapting Chain-of-Thought Help ?

  1. Installation:
$ pip install -r requirements.txt
  1. Evaluation for Codellama and GPT3.5 models:

    HumanEval

    2.1.: Generate the test results by the model you'd like between codellama 7b and 34b Instruct variants:

    $ python eval_codellama_humaneval.py --model_name codellama/CodeLlama-7b-Instruct-hf --length 100

    OR if you plan to evaluate GPT3.5

    $ python eval_gpt_humaneval.py 

    2.2 To evaluate for a number greater/lesser than 100, you would need to change the length on line 57 in human-eval/human_eval/evaluation.py to match the length set.

    $ evaluate_functional_correctness results/codellama/humaneval_CodeLlama-7b-Instruct-hf_100.jsonl

    for GPT3.5

    $ evaluate_functional_correctness results/openai/gpt3.5-turbo_100.jsonl

    2.3 Change prompt: Different prompts are provided in the eval_codellama_human_eval.py (more details below).

    MBPP

    3.1 Generate:

    $ python eval_codellama_mbpp.py --model_name codellama/CodeLlama-7b-Instruct-hf --length 100

    OR if you plan to evaluate GPT3.5

    $ python eval_gpt_mbpp.py 

    2.2 Evaluate: # you may need to change the output directory of the model as per your choice of model in eval_mbpp.py line 254

    $ python eval_mbpp.py
  2. File info:

    1. eval_gpt_mbpp.py: We include several functions such as wrap_code_template , wrap_code_template_baseline, wrap_with_steps, one_shot_pseudocode,one_shot_steps,zero_shot_pseudocode to construct prompts according to different settings (one shot, with and without steps and psuedocode). We also include some other variations we had tried with GPT.
    2. eval_gpt_humaneval.py: We include similar functions like above, for testing GPT on human eval as well.
    3. For eval_codellama_humaneval.py we include similar functions: Functions like construct_codellama_prompt, construct_codellama_prompt_v2, construct_codellama_prompt_oneshot_examples, construct_codellama_comment_prompt_one_shot_psuedocode, construct_codellama_comment_prompt_one_shot, help us create prompts for performing baseline, zero shot steps/pseudocode, as well as 1-shot steps or pseudocode evalutions.
    4. For eval_codellama_mbpp.py also contains such prompt functions like: construct_codellama_prompt, construct_codellama_pseudo_prompt, construct_codellama_pseudo_prompt_example, construct_codellama_prompt_steps.
    5. We include the step by step examples or solving process that we include for one-shot steps/solving process prompt for humaneval at humaneval_steps_magicoder.json. This is generated by magicoder1.
    6. We include the step by step examples or solving process that we include for one-shot steps/solving process prompt for mbpp at
      mbpp_examples_magicoder_reform_v1.json. This is generated by magicoder1.
    7. humaneval_actualpsuedocode_magicoder_reform_v1.json is the collected set of psuedocode generated by magicoder for the human eval dataset. Each pseudocode was generated using the prompt specified in the paper.
    8. mbpp_actualpsuedocode_magicoder_reform_v1.json is the collected set of psuedocode generated by magicoder for the mbpp dataset. Each pseudocode was generated using the prompt specified in the paper.
  3. Misc

    • To run certain scripts you need to include an OPENAI and HUGGINGFACE token, to call the required APIs. OPENAI token can be exported by using export OPENAI_API_KEY = <OPENAIKEY>. The TOKEN can be set in the respective files using the variable defined at the beginning of the files using it (codellama).
    • The above evaluation was performed on a 80GB A100 GPU, with 128GB RAM, alongwith 24 CPUs. We acknowledge Mila for supporting us with the compute resources.

chainofeverything's People

Contributors

sert121 avatar megh-thakkar avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.