llemma_formal2formal's Introduction

`llemma` formal2formal

Scripts for the Lean formal2formal (tactic prediction) experiments in
Llemma: an open language model for mathematics [Azerbayev et al 2023]

Setup

Install Python packages:

pip install -r requirements.txt

Install Lean:

# from https://leanprover-community.github.io/install/linux.html
# After running this command, select (2), then `nightly`, then `y`:
curl https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh -sSf | sh
source $HOME/.elan/env
lake

Configure LeanDojo:

export CONTAINER="native"

Run

See scripts

Compute metrics

python compute_metrics.py
==>

codellama7b_minif2f_test        0.20491803278688525     50      244
codellama34b_minif2f_test       0.22131147540983606     54      244
llemma7b_minif2f_test   0.26229508196721313     64      244
llemma34b_minif2f_test  0.2581967213114754      63      244

Troubleshooting

We observe a Ray error when running the 34b script (with VLLM --tp-degree > 1) on an untraced LeanDojo repo. A workaround is to run the 7b script with --tp-degree 1 such that LeanDojo completes tracing the repo. Then run the 34b script with --tp-degree > 1.

Citation

Please cite the following:

@misc{azerbayev2023llemma,
      title={Llemma: An Open Language Model For Mathematics}, 
      author={Zhangir Azerbayev and Hailey Schoelkopf and Keiran Paster and Marco Dos Santos and Stephen McAleer and Albert Q. Jiang and Jia Deng and Stella Biderman and Sean Welleck},
      year={2023},
      eprint={2310.10631},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

llemma_formal2formal's People

Contributors

Stargazers

Watchers

llemma_formal2formal's Issues

Any other proof search algorithm out there in python with LLMs?

I wanted code like this one https://github.com/wellecks/llemma_formal2formal/blob/f96fdb1642b2f21bed85c19640b6f1511abb1116/proofsearch.py but for other proof search algs, any exist?

The script use temp=0.0 with sample 20 times is weired

Is that all greedy decoding?

A better architecture might be that the language model can determine possible methods to solve mathematical problems？Has anyone thought about this direction?

A better architecture might be that the language model can determine possible methods to solve mathematical problems, then perform specific operations to transform the solution of the mathematical problem into a series of thoughts, fill the thought chain, and add generality in the transformation part to make the thinking more divergent, and then the part of filling the thought chain needs to be more accurate

It's a bit like the architecture of chatgpt+mathematica, but the problem with the chatgpt+mathematica architecture is that mathematica is too biased towards hard coding and often reports errors, and chatgpt does not specifically train to split the input of a mathematical problem into step-by-step solutions

Performance on Minif2f-valid

Thank you for your valuable contribution to formal theorem proving. I would like to cite your work but I didn't find a reported result on the pass rate of minif2f-valid. I am sorry I don't have enough resources now to reproduce the results. So I wonder if you have done any evaluation on minif2f-valid. What is the performance? Thank you in advance!

Recommend Projects

wellecks / llemma_formal2formal Goto Github PK

llemma_formal2formal's Introduction

`llemma` formal2formal

Setup

Run

Compute metrics

Troubleshooting

Citation

llemma_formal2formal's People

Contributors

Stargazers

Watchers

Forkers

llemma_formal2formal's Issues

Any other proof search algorithm out there in python with LLMs?

The script use temp=0.0 with sample 20 times is weired

A better architecture might be that the language model can determine possible methods to solve mathematical problems？Has anyone thought about this direction?

Performance on Minif2f-valid

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

wellecks / llemma_formal2formal Goto Github PK

llemma_formal2formal's Introduction

llemma formal2formal

Setup

Run

Compute metrics

Troubleshooting

Citation

llemma_formal2formal's People

Contributors

Stargazers

Watchers

Forkers

llemma_formal2formal's Issues

Recommend Projects

Recommend Topics

Recommend Org

`llemma` formal2formal