Giter Club home page Giter Club logo

code_contests's Introduction

CodeContests

CodeContests is a competitive programming dataset for machine-learning. This dataset was used when training AlphaCode. AlphaCode has been published in Science, with a preprint on arXiv.

It consists of programming problems, from a variety of sources:

Site URL Source
Aizu https://judge.u-aizu.ac.jp CodeNet
AtCoder https://atcoder.jp CodeNet
CodeChef https://www.codechef.com description2code
Codeforces https://codeforces.com description2code and Codeforces
HackerEarth https://www.hackerearth.com description2code

Problems include test cases in the form of paired inputs and outputs, as well as both correct and incorrect human solutions in a variety of languages.

Downloading the dataset

Install the Cloud SDK, which provides the gsutil utility. You can then download the full data (~3GiB) with, e.g:

gsutil -m cp -r gs://dm-code_contests /tmp

The data consists of ContestProblem protocol buffers in Riegeli format. See contest_problem.proto for the protocol buffer definition and documentation of its fields.

The dataset contains three splits:

Split Filename
Training code_contests_train.riegeli-*-of-00128
Validation code_contests_valid.riegeli
Test code_contests_test.riegeli

There is example code for iterating over the dataset in C++ (in print_names.cc) and Python (in print_names_and_sources.py). For example, you can print the source and name of each problem in the validation data by installing bazel and then running:

bazel run -c opt \
  :print_names_and_sources /tmp/dm-code_contests/code_contests_valid.riegeli

Or do the same for the training data with the following command (which will print around 13000 lines of output):

bazel run -c opt \
  :print_names_and_sources /tmp/dm-code_contests/code_contests_train.riegeli*

Executing and evaluating solutions

The execution subdirectory contains code for executing a solution and evaluating whether it solves a problem. solve_example demonstrates this functionality, and can be run with e.g.

bazel run -c opt execution:solve_example -- \
  /tmp/dm-code_contests/code_contests_valid.riegeli

The execution code defaults to using Python 3.9 and 2.7, located at /usr/bin/python3.9 and /usr/bin/python2.7, with standard libraries at /usr/lib/python3.9 and /usr/lib/python2.7. These can be changed with the flags defined in py_locations.cc, for example:

bazel run -c opt execution:solve_example -- \
  --valid_path=/tmp/dm-code_contests/code_contests_valid.riegeli \
  --python3_path=/usr/bin/python3.10 --python3_library_paths=/usr/lib/python3.10

Supported platforms

This repository is supported on Linux, compiled with clang.

Citing this work

If you use this dataset or code, please cite this paper:

@article{
  doi:10.1126/science.abq1158,
  author = {Yujia Li  and David Choi  and Junyoung Chung  and Nate Kushman  and Julian Schrittwieser  and R{\'e}mi Leblond  and Tom Eccles  and James Keeling  and Felix Gimeno  and Agustin Dal Lago  and Thomas Hubert  and Peter Choy  and Cyprien de Masson d’Autume  and Igor Babuschkin  and Xinyun Chen  and Po-Sen Huang  and Johannes Welbl  and Sven Gowal  and Alexey Cherepanov  and James Molloy  and Daniel J. Mankowitz  and Esme Sutherland Robson  and Pushmeet Kohli  and Nando de Freitas  and Koray Kavukcuoglu  and Oriol Vinyals },
  title = {Competition-level code generation with AlphaCode},
  journal = {Science},
  volume = {378},
  number = {6624},
  pages = {1092-1097},
  year = {2022},
  doi = {10.1126/science.abq1158},
  URL = {https://www.science.org/doi/abs/10.1126/science.abq1158},
  eprint = {https://www.science.org/doi/pdf/10.1126/science.abq1158},
  abstract = {Programming is a powerful and ubiquitous problem-solving tool. Systems that can assist programmers or even generate programs themselves could make programming more productive and accessible. Recent transformer-based neural network models show impressive code generation abilities yet still perform poorly on more complex tasks requiring problem-solving skills, such as competitive programming problems. Here, we introduce AlphaCode, a system for code generation that achieved an average ranking in the top 54.3\% in simulated evaluations on recent programming competitions on the Codeforces platform. AlphaCode solves problems by generating millions of diverse programs using specially trained transformer-based networks and then filtering and clustering those programs to a maximum of just 10 submissions. This result marks the first time an artificial intelligence system has performed competitively in programming competitions. Computer programming competitions are popular tests among programmers that require critical thinking informed by experience and creating solutions to unforeseen problems, both of which are key aspects of human intelligence but challenging to mimic by machine learning models. Using self-supervised learning and an encoder-decoder transformer architecture, Li et al. developed AlphaCode, a deep-learning model that can achieve approximately human-level performance on the Codeforces platform, which regularly hosts these competitions and attracts numerous participants worldwide (see the Perspective by Kolter). The development of such coding platforms could have a huge impact on programmers’ productivity. It may even change the culture of programming by shifting human work to formulating problems, with machine learning being the main one responsible for generating and executing codes. —YS Modern machine learning systems can achieve average human-level performance in popular competitive programming contests.}}

License

The code is licensed under the Apache 2.0 License.

All non-code materials provided are made available under the terms of the CC BY 4.0 license (Creative Commons Attribution 4.0 International license).

We gratefully acknowledge the contributions of the following:

Use of the third-party software, libraries code or data may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code may be subject to any such terms. We make no representations here with respect to rights or abilities to use any such materials.

Disclaimer

This is not an official Google product.

code_contests's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.