Giter Club home page Giter Club logo

solo-performance-prompting's Introduction

Official Repo of paper Solo Performance Prompting (SPP)

Illustration of Solo Performance Prompting

🔥 News

  • 5/8/2024: Update GPT-3.5 and LLama2 inference code and results for Figure 6, which shows the emergent nature of cognitive synergy.
  • 3/15/2024: This paper has been accepted as a main conference paper at NAACL2024!

Setup

  • Install dependencies
    pip install -r requirements.txt
    
  • Set up OpenAI API configs in config_template.sh and run source config_template.sh to set up the env variables (Note that we are using the Azure API in our experiments)

Quick Start

We provide running scripts for each of the three tasks, please check out the comments in the ".sh" scripts for more information:

  • Trivia Creative Writing: bash scripts/trivia_creative_writing.sh
  • Codenames Collaborative: bash scripts/codenames_collaborative.sh
  • Logic Grid Puzzle: bash scripts/logic_grid_puzzle.sh

Prompts

All prompts can be found in the prompts/ folder.

Datasets

All datasets can be found in the data/ folder.

Paper Experiment Results

Experimental results in the paper for each task can be found in the logs/ folder. gpt4_w_sys_mes and gpt4_wo_sys_mes contains results corresponding to Table 2 in our paper. We also include gpt-3.5 and llama2-13b results corresponding to the results in Figure 6, where the hyperparameters, such as whether or not adding system message, follows the best performing choices in the gpt4 experiments.

Log file formats

  • "test_output_infos": contains evaluation metrics for each instance, e.g., # correct answers mentioned.
  • `"prompt"``: full input prompt for the API call. (for Codenames task, there are two API calls for each instance)
  • "*raw_responses": raw responses from each API call.
  • "*parsing_flag": whether the raw response is successfully parsed. (for Codenames task, this field is seperated into "parsing_success_flag_spymaster" and "parsing_success_flag_guesser")
  • "unwrapped_output": parsed output that will be used for computing evaluation metrics. (for Codenames task, this field is seperated into "spymaster_output" and "guesser_output"; there is an additional field named "hint_word" which is parsed from the spymaster's output and inserted into the Guesser's input; the evaluation metric is computed based on the "guesser_output")
  • "task data": data for the current task instance, e.g., quetions, answers, target words, etc.
  • "usage": logging for the number of tokens and cost spended so far.
  • other self-explanatory config fields: "model", "method", "temperature", etc.

Citations

Please cite the paper and star this repo if you find this work interesting/helpful.

@article{wang2023unleashing,
  title={Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration},
  author={Wang, Zhenhailong and Mao, Shaoguang and Wu, Wenshan and Ge, Tao and Wei, Furu and Ji, Heng},
  journal={arXiv preprint arXiv:2307.05300},
  year={2023}
}

Acknowledgements

This codebase referenced the structure of the Tree-of-thought official repo. We thank the authors for their open-sourcing efforts.

solo-performance-prompting's People

Contributors

mikewangwzhl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

solo-performance-prompting's Issues

Use without Azure?

Thank you so much for open-sourcing this repo, this is great!

I am trying to use the repo without Azure, but with no success yet. You note in the Readme that you ran the model on Azure. Have you tested whether the code runs if you set the Azure flag to false? I am getting somewhat inscrutable error messages from the open-ai API:

INFO:openai:error_code=None error_message='Invalid URL (POST /v1/engines/gpt4-32k/chat/completions)' error_param=None error_type=invalid_request_error message='OpenAI API error received' stream_error=False

running on llama-2

Hi Mike!

This is amazing work - thanks for sharing. I'm a beginner when it comes to prompt engineering and LLMs, and am inspired by your project.
I'm looking to test the prompting techniques on LLama-2. Are there any tips and pointers on how to run this code, the files to make amendments with and possible dependencies that I should keep an eye on? Sorry in advance if its already mentioned in the readme.md file

The baseline of CoT is not not in few-shot setting?

Hi there :)

I thoroughly read your work. Your work is awesome and intersting.

However, there is demonstrations in SPP (few-shot setting) but no demonstrations in the CoT baselines (zero-shot setting). So, the improvement may be attributed to the demonstrations? Could you provide the results in the CoT prompting with the same demonstrations (few-shot setting)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.