facebookresearch / codellama Goto Github PK

Inference code for CodeLlama models

License: Other

Shell 5.31% Python 94.69%

codellama's Introduction

Introducing Code Llama

Code Llama is a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct) with 7B, 13B and 34B parameters each. All models are trained on sequences of 16k tokens and show improvements on inputs with up to 100k tokens. 7B and 13B Code Llama and Code Llama - Instruct variants support infilling based on surrounding content. Code Llama was developed by fine-tuning Llama 2 using a higher sampling of code. As with Llama 2, we applied considerable safety mitigations to the fine-tuned versions of the model. For detailed information on model training, architecture and parameters, evaluations, responsible AI and safety refer to our research paper. Output generated by code generation features of the Llama Materials, including Code Llama, may be subject to third party licenses, including, without limitation, open source licenses.

We are unlocking the power of large language models and our latest version of Code Llama is now accessible to individuals, creators, researchers and businesses of all sizes so that they can experiment, innovate and scale their ideas responsibly. This release includes model weights and starting code for pretrained and fine-tuned Llama language models — ranging from 7B to 34B parameters.

This repository is intended as a minimal example to load Code Llama models and run inference.

Download

In order to download the model weights and tokenizers, please visit the Meta website and accept our License.

Once your request is approved, you will receive a signed URL over email. Then run the download.sh script, passing the URL provided when prompted to start the download. Make sure that you copy the URL text itself, do not use the 'Copy link address' option when you right click the URL. If the copied URL text starts with: https://download.llamameta.net, you copied it correctly. If the copied URL text starts with: https://l.facebook.com, you copied it the wrong way.

Pre-requisites: make sure you have wget and md5sum installed. Then to run the script: bash download.sh.

Keep in mind that the links expire after 24 hours and a certain amount of downloads. If you start seeing errors such as 403: Forbidden, you can always re-request a link.

Model sizes

Model	Size
7B	~12.55GB
13B	24GB
34B	63GB
70B	131GB

Setup

In a conda environment with PyTorch / CUDA available, clone the repo and run in the top-level directory:

pip install -e .

Inference

Different models require different model-parallel (MP) values:

Model	MP
7B	1
13B	2
34B	4
70B	8

All models, except the 70B python and instruct versions, support sequence lengths up to 100,000 tokens, but we pre-allocate the cache according to max_seq_len and max_batch_size values. So set those according to your hardware and use-case.

Pretrained Code Models

The Code Llama and Code Llama - Python models are not fine-tuned to follow instructions. They should be prompted so that the expected answer is the natural continuation of the prompt.

See example_completion.py for some examples. To illustrate, see command below to run it with the CodeLlama-7b model (nproc_per_node needs to be set to the MP value):

torchrun --nproc_per_node 1 example_completion.py \
    --ckpt_dir CodeLlama-7b/ \
    --tokenizer_path CodeLlama-7b/tokenizer.model \
    --max_seq_len 128 --max_batch_size 4

Pretrained code models are: the Code Llama models CodeLlama-7b, CodeLlama-13b, CodeLlama-34b, CodeLlama-70b and the Code Llama - Python models CodeLlama-7b-Python, CodeLlama-13b-Python, CodeLlama-34b-Python, CodeLlama-70b-Python.

Code Infilling

Code Llama and Code Llama - Instruct 7B and 13B models are capable of filling in code given the surrounding context.

See example_infilling.py for some examples. The CodeLlama-7b model can be run for infilling with the command below (nproc_per_node needs to be set to the MP value):

torchrun --nproc_per_node 1 example_infilling.py \
    --ckpt_dir CodeLlama-7b/ \
    --tokenizer_path CodeLlama-7b/tokenizer.model \
    --max_seq_len 192 --max_batch_size 4

Pretrained infilling models are: the Code Llama models CodeLlama-7b and CodeLlama-13b and the Code Llama - Instruct models CodeLlama-7b-Instruct, CodeLlama-13b-Instruct.

Fine-tuned Instruction Models

Code Llama - Instruct models are fine-tuned to follow instructions. To get the expected features and performance for the 7B, 13B and 34B variants, a specific formatting defined in chat_completion() needs to be followed, including the INST and <<SYS>> tags, BOS and EOS tokens, and the whitespaces and linebreaks in between (we recommend calling strip() on inputs to avoid double-spaces). CodeLlama-70b-Instruct requires a separate turn-based prompt format defined in dialog_prompt_tokens(). You can use chat_completion() directly to generate answers with all instruct models; it will automatically perform the required formatting.

You can also deploy additional classifiers for filtering out inputs and outputs that are deemed unsafe. See the llama-recipes repo for an example of how to add a safety checker to the inputs and outputs of your inference code.

Examples using CodeLlama-7b-Instruct:

torchrun --nproc_per_node 1 example_instructions.py \
    --ckpt_dir CodeLlama-7b-Instruct/ \
    --tokenizer_path CodeLlama-7b-Instruct/tokenizer.model \
    --max_seq_len 512 --max_batch_size 4

Fine-tuned instruction-following models are: the Code Llama - Instruct models CodeLlama-7b-Instruct, CodeLlama-13b-Instruct, CodeLlama-34b-Instruct, CodeLlama-70b-Instruct.

Code Llama is a new technology that carries potential risks with use. Testing conducted to date has not — and could not — cover all scenarios. In order to help developers address these risks, we have created the Responsible Use Guide. More details can be found in our research papers as well.

Issues

Please report any software “bug”, or other problems with the models through one of the following means:

Reporting issues with the model: github.com/facebookresearch/codellama
Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback
Reporting bugs and security concerns: facebook.com/whitehat/info

Model Card

See MODEL_CARD.md for the model card of Code Llama.

License

Our model and weights are licensed for both researchers and commercial entities, upholding the principles of openness. Our mission is to empower individuals, and industry through this opportunity, while fostering an environment of discovery and ethical AI advancements.

See the LICENSE file, as well as our accompanying Acceptable Use Policy

References

codellama's People

Contributors

Stargazers

Watchers

Forkers

dongguanting adarshxs mvisionai timerope sriramvaidyanathan jyyyyy22 felix-betway abinesh-mathivanan zanjani1 donutloop timscodebase crjaensch eldritchinc gorank mozzipa samedovzaur rand veo555 camenduru paulpascal lazerjesus manelio conqlab vinicius-ianni adilkhash kinddevil moudizg mz0in deluair chrishabiak tarunchy scoutink alirezashamsoshoara wwwiretap adityawasudeo shermineh-gh usernamx42 guveni kp-forks ankitshah009 nomad119 geniusgeek sandhiyara derekdeming rockyniu nkrumah-dubazana gridechelon cateuler peytontolbert juliocspires ai-natural-language-processing-lab moslem-tg cristina-gabriela aamirovich jhon-murillo murnnn throwoutofcoffeeexception samsonboadi touristshaun jinlmsft dontriskit jamiesecops mspublic evdcush eduhayon 3x0dv5 cenrax ia35 daanelson yibit codewith-peterbull fiditenemini danyray420 jamesliu joskid lorekin hitech777 ghkweon loganfrederick mdhasanai ashakibp rmundy tholapz audiebant sojounercaleb rogersaloo glaceage newpolymerization trinhtuanvubk laozhudetui dev-droid praguepp techthiyanes mhaz lytv tinglei-daprediction mehulpython supersky2015 hurricanejin jmaigc

codellama's Issues

Unable to run this morning, yesterday it ran fine

Morning! I need help getting the models to run a second time, on a new instance.

Yesterday, I registered for and downloaded the models onto an AWS sagemaker instance. Everything worked fine and I was able to run

pip install -e .

And from there experiment with the models. I shut down the instance and this morning started it again. I reran the pip installation, but now, everything hangs at this step:

sh-4.2$ torchrun --nproc_per_node 4 example_instructions.py     --ckpt_dir CodeLlama-34b-Instruct/     --tokenizer_path CodeLlama-34b-Instruct/tokenizer.model     --max_seq_len 512 --max_batch_size 4
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
> initializing model parallel with size 4
> initializing ddp with size 1
> initializing pipeline with size 1

This same code would finish loading the model after 8 seconds or so and be good to go. I've tried this with the 7b instruct model, the 13b instruct, and the 34b instruct; all worked fine yesterday, none work today.

How can I make this work? Did I forget some crucial step?

For the rest of this bug report, it's basically how I arrived at the conclusion that

checkpoint = torch.load(ckpt_path, map_location="cpu")

Is not working, and I'm not sure why. Once I get to that point, the RAM usage rises from 1.8GB to 28.9GB, so it looks like it's at least found the first file in the checkpoint. This instance g5.12xlarge has 196GB and 4 24GB GPUs (and everything worked yesterday).

To figure this all out, I went into generation.py in the llama directory, and I added in some line number inspections. I added in lines like:

from inspect import getframeinfo, currentframe

print(f"Got to {getframeinfo(currentframe()).lineno}")

The code in generation now looks like:

        print(f"Got to {getframeinfo(currentframe()).lineno}")
        assert len(checkpoints) > 0, f"no checkpoint files found in {ckpt_dir}"
        assert model_parallel_size == len(
            checkpoints
        ), f"Loading a checkpoint for MP={len(checkpoints)} but world size is {model_parallel_size}"
        
        print(f"Got to {getframeinfo(currentframe()).lineno}")
        ckpt_path = checkpoints[get_model_parallel_rank()]
        print(f"Got to {getframeinfo(currentframe()).lineno}")
        checkpoint = torch.load(ckpt_path, map_location="cpu")
        print(f"Got to {getframeinfo(currentframe()).lineno}")
        with open(Path(ckpt_dir) / "params.json", "r") as f:
            params = json.loads(f.read())
        print(f"Got to {getframeinfo(currentframe()).lineno}")
        model_args: ModelArgs = ModelArgs(
            max_seq_len=max_seq_len,
            max_batch_size=max_batch_size,
            **params,
        )
        tokenizer = Tokenizer(model_path=tokenizer_path)
        model_args.vocab_size = tokenizer.n_words
        print(f"Got to {getframeinfo(currentframe()).lineno}")

and the run output looks like:

sh-4.2$ torchrun --nproc_per_node 4 example_instructions.py     --ckpt_dir CodeLlama-34b-Instruct/     --tokenizer_path CodeLlama-34b-Instruct/tokenizer.model     --max_seq_len 512 --max_batch_size 4
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
> initializing model parallel with size 4
> initializing ddp with size 1
> initializing pipeline with size 1
Got to 86
Got to 92
Got to 94

which is checkpoint = torch.load(ckpt_path, map_location="cpu")

My pip freeze:

sh-4.2$ pip freeze
aiobotocore @ file:///home/conda/feedstock_root/build_artifacts/aiobotocore_1691451276487/work
aiofiles==22.1.0
aiohttp @ file:///home/conda/feedstock_root/build_artifacts/aiohttp_1689804989077/work
aioitertools @ file:///home/conda/feedstock_root/build_artifacts/aioitertools_1663521246073/work
aiosignal @ file:///home/conda/feedstock_root/build_artifacts/aiosignal_1667935791922/work
aiosqlite==0.19.0
anyio @ file:///home/conda/feedstock_root/build_artifacts/anyio_1688651106312/work/dist
argon2-cffi @ file:///home/conda/feedstock_root/build_artifacts/argon2-cffi_1640817743617/work
argon2-cffi-bindings @ file:///home/conda/feedstock_root/build_artifacts/argon2-cffi-bindings_1666850768662/work
arrow @ file:///home/conda/feedstock_root/build_artifacts/arrow_1662382474514/work
astroid==2.15.6
asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1670263926556/work
async-timeout @ file:///home/conda/feedstock_root/build_artifacts/async-timeout_1691763562544/work
attrs @ file:///home/conda/feedstock_root/build_artifacts/attrs_1683424013410/work
autopep8==2.0.2
autovizwidget @ file:///home/conda/feedstock_root/build_artifacts/autovizwidget_1680800327357/work
awscli==1.29.28
Babel==2.12.1
backcall @ file:///home/conda/feedstock_root/build_artifacts/backcall_1592338393461/work
backports.functools-lru-cache @ file:///home/conda/feedstock_root/build_artifacts/backports.functools_lru_cache_1687772187254/work
beautifulsoup4 @ file:///home/conda/feedstock_root/build_artifacts/beautifulsoup4_1680888073205/work
bleach @ file:///home/conda/feedstock_root/build_artifacts/bleach_1674535352125/work
boto3==1.28.28
botocore==1.31.28
brotlipy @ file:///home/conda/feedstock_root/build_artifacts/brotlipy_1666764671472/work
cached-property @ file:///home/conda/feedstock_root/build_artifacts/cached_property_1615209429212/work
certifi==2023.7.22
cffi @ file:///home/conda/feedstock_root/build_artifacts/cffi_1671179353105/work
charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1688813409104/work
cloudpickle==2.2.1
cmake==3.27.2
-e git+ssh://[email protected]/facebookresearch/codellama.git@cb51c14ec761370ba2e2bc351374a79265d0465e#egg=codellama
colorama==0.4.4
comm @ file:///home/conda/feedstock_root/build_artifacts/comm_1691044910542/work
contextlib2==21.6.0
cryptography @ file:///home/conda/feedstock_root/build_artifacts/cryptography-split_1672672382195/work
debugpy @ file:///home/conda/feedstock_root/build_artifacts/debugpy_1691021228385/work
decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work
defusedxml @ file:///home/conda/feedstock_root/build_artifacts/defusedxml_1615232257335/work
dill==0.3.7
docker==6.1.3
docstring-to-markdown==0.12
docutils==0.16
entrypoints @ file:///home/conda/feedstock_root/build_artifacts/entrypoints_1643888246732/work
environment-kernels==1.2.0
exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1692026125334/work
executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1667317341051/work
fairscale==0.4.13
fastjsonschema @ file:///home/conda/feedstock_root/build_artifacts/python-fastjsonschema_1690055433477/work/dist
filelock==3.12.3
fire==0.5.0
flit_core @ file:///home/conda/feedstock_root/build_artifacts/flit-core_1684084314667/work/source/flit_core
fqdn @ file:///home/conda/feedstock_root/build_artifacts/fqdn_1638810296540/work/dist
frozenlist @ file:///home/conda/feedstock_root/build_artifacts/frozenlist_1689244399117/work
fsspec @ file:///home/conda/feedstock_root/build_artifacts/fsspec_1626188337504/work
gitdb==4.0.10
GitPython==3.1.32
google-pasta==0.2.0
hdijupyterutils @ file:///home/conda/feedstock_root/build_artifacts/hdijupyterutils_1680800332182/work
idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1663625384323/work
importlib-metadata==6.8.0
importlib-resources @ file:///home/conda/feedstock_root/build_artifacts/importlib_resources_1691408075105/work
ipykernel==5.5.6
ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1685727741709/work
ipython-genutils==0.2.0
ipywidgets @ file:///home/conda/feedstock_root/build_artifacts/ipywidgets_1690877070294/work
isoduration @ file:///home/conda/feedstock_root/build_artifacts/isoduration_1638811571363/work/dist
isort==5.12.0
jedi==0.18.2
Jinja2 @ file:///home/conda/feedstock_root/build_artifacts/jinja2_1654302431367/work
jmespath @ file:///home/conda/feedstock_root/build_artifacts/jmespath_1655568249366/work
json5==0.9.14
jsonpointer==2.0
jsonschema @ file:///home/conda/feedstock_root/build_artifacts/jsonschema-meta_1691761378595/work
jsonschema-specifications @ file:///home/conda/feedstock_root/build_artifacts/jsonschema-specifications_1689701150890/work
jupyter @ file:///home/conda/feedstock_root/build_artifacts/jupyter_1670249595582/work
jupyter-console @ file:///home/conda/feedstock_root/build_artifacts/jupyter_console_1678118109161/work
jupyter-events @ file:///home/conda/feedstock_root/build_artifacts/jupyter_events_1691505939576/work
jupyter-lsp==2.2.0
jupyter-server-mathjax==0.2.6
jupyter-server-proxy @ git+https://github.com/jupyterhub/jupyter-server-proxy@2d7dd346bb595106b417476de870a348943f3c70
jupyter-ydoc==0.2.5
jupyter_client==7.4.9
jupyter_core @ file:///home/conda/feedstock_root/build_artifacts/jupyter_core_1686775611663/work
jupyter_server @ file:///home/conda/feedstock_root/build_artifacts/jupyter_server_1692108700252/work
jupyter_server_fileid==0.9.0
jupyter_server_terminals @ file:///home/conda/feedstock_root/build_artifacts/jupyter_server_terminals_1673491454549/work
jupyter_server_ydoc==0.8.0
jupyterlab==3.6.5
jupyterlab-git==0.41.0
jupyterlab-lsp==4.2.0
jupyterlab-pygments @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_pygments_1649936611996/work
jupyterlab-widgets @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_widgets_1688489450369/work
jupyterlab_server==2.24.0
lazy-object-proxy==1.9.0
lit==16.0.6
MarkupSafe @ file:///home/conda/feedstock_root/build_artifacts/markupsafe_1685769049201/work
matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1660814786464/work
mccabe==0.7.0
mistune @ file:///home/conda/feedstock_root/build_artifacts/mistune_1675771498296/work
mock @ file:///home/conda/feedstock_root/build_artifacts/mock_1689092066756/work
mpmath==1.3.0
multidict @ file:///home/conda/feedstock_root/build_artifacts/multidict_1672339403932/work
multiprocess==0.70.15
nb-conda @ file:///home/conda/feedstock_root/build_artifacts/nb_conda_1654442778977/work
nb-conda-kernels @ file:///home/conda/feedstock_root/build_artifacts/nb_conda_kernels_1667060632461/work
nbclassic @ file:///home/conda/feedstock_root/build_artifacts/nbclassic_1675369808718/work
nbclient @ file:///home/conda/feedstock_root/build_artifacts/nbclient_1684790896106/work
nbconvert @ file:///home/conda/feedstock_root/build_artifacts/nbconvert-meta_1674590374792/work
nbdime==3.2.1
nbexamples @ file:///opt/nbexamples
nbformat @ file:///home/conda/feedstock_root/build_artifacts/nbformat_1690814868471/work
nest-asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1664684991461/work
networkx==3.1
nose @ file:///home/conda/feedstock_root/build_artifacts/nose_1602434998960/work
notebook @ file:///home/conda/feedstock_root/build_artifacts/notebook_1691436218243/work
notebook_shim @ file:///home/conda/feedstock_root/build_artifacts/notebook-shim_1682360583588/work
numpy @ file:///home/conda/feedstock_root/build_artifacts/numpy_1691056231492/work
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
overrides @ file:///home/conda/feedstock_root/build_artifacts/overrides_1691338815398/work
packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1681337016113/work
pandas @ file:///home/conda/feedstock_root/build_artifacts/pandas_1688740542634/work
pandocfilters @ file:///home/conda/feedstock_root/build_artifacts/pandocfilters_1631603243851/work
parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1638334955874/work
pathos==0.3.1
pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1667297516076/work
pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work
pid==3.0.4
pkgutil_resolve_name @ file:///home/conda/feedstock_root/build_artifacts/pkgutil-resolve-name_1633981968097/work
platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1690813113769/work
plotly @ file:///home/conda/feedstock_root/build_artifacts/plotly_1692220561510/work
pluggy==1.2.0
pox==0.3.3
ppft==1.7.6.7
prometheus-client @ file:///home/conda/feedstock_root/build_artifacts/prometheus_client_1689032443210/work
prompt-toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1688565951714/work
protobuf==4.23.4
psutil @ file:///home/conda/feedstock_root/build_artifacts/psutil_1681775027942/work
psycopg2 @ file:///home/conda/feedstock_root/build_artifacts/psycopg2-split_1667025517155/work
ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl
pure-eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1642875951954/work
py4j==0.10.9.5
pyasn1==0.5.0
pycodestyle==2.10.0
pycparser @ file:///home/conda/feedstock_root/build_artifacts/pycparser_1636257122734/work
pydocstyle==6.3.0
pyflakes==3.0.1
pygal==3.0.0
Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1691408637400/work
pykerberos @ file:///home/conda/feedstock_root/build_artifacts/pykerberos_1671204518513/work
pylint==2.17.5
pyOpenSSL @ file:///home/conda/feedstock_root/build_artifacts/pyopenssl_1685514481738/work
PyQt5==5.12.3
PyQt5_sip==4.19.18
PyQtChart==5.12
PyQtWebEngine==5.12.1
PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1661604839144/work
pyspark==3.3.0
python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1626286286081/work
python-json-logger @ file:///home/conda/feedstock_root/build_artifacts/python-json-logger_1677079630776/work
python-lsp-jsonrpc==1.0.0
python-lsp-server==1.7.4
pytoolconfig==1.2.5
pytz @ file:///home/conda/feedstock_root/build_artifacts/pytz_1680088766131/work
PyYAML @ file:///home/conda/feedstock_root/build_artifacts/pyyaml_1666772395347/work
pyzmq @ file:///home/conda/feedstock_root/build_artifacts/pyzmq_1666828497229/work
qtconsole @ file:///home/conda/feedstock_root/build_artifacts/qtconsole-base_1683329453903/work
QtPy @ file:///home/conda/feedstock_root/build_artifacts/qtpy_1680148448366/work
referencing @ file:///home/conda/feedstock_root/build_artifacts/referencing_1691337268233/work
requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1684774241324/work
requests-kerberos @ file:///home/conda/feedstock_root/build_artifacts/requests-kerberos_1667464887610/work
rfc3339-validator @ file:///home/conda/feedstock_root/build_artifacts/rfc3339-validator_1638811747357/work
rfc3986-validator @ file:///home/conda/feedstock_root/build_artifacts/rfc3986-validator_1598024191506/work
rope==1.9.0
rpds-py @ file:///home/conda/feedstock_root/build_artifacts/rpds-py_1689705060450/work
rsa==4.7.2
s3fs @ file:///home/conda/feedstock_root/build_artifacts/s3fs_1626193591467/work
s3transfer @ file:///home/conda/feedstock_root/build_artifacts/s3transfer_1692149178344/work
sagemaker==2.177.1
sagemaker-experiments==0.1.45
sagemaker-nbi-agent @ file:///opt/sagemaker_nbi_agent
sagemaker-pyspark==1.4.5
schema==0.7.5
Send2Trash @ file:///home/conda/feedstock_root/build_artifacts/send2trash_1682601222253/work
sentencepiece==0.1.99
simpervisor==1.0.0
six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work
smdebug-rulesconfig==1.0.1
smmap==5.0.0
sniffio @ file:///home/conda/feedstock_root/build_artifacts/sniffio_1662051266223/work
snowballstemmer==2.2.0
soupsieve @ file:///home/conda/feedstock_root/build_artifacts/soupsieve_1658207591808/work
sparkmagic @ file:///home/conda/feedstock_root/build_artifacts/sparkmagic_1680849855330/work/sparkmagic
stack-data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1669632077133/work
sympy==1.12
tblib==1.7.0
tenacity @ file:///home/conda/feedstock_root/build_artifacts/tenacity_1692026804430/work
termcolor==2.3.0
terminado @ file:///home/conda/feedstock_root/build_artifacts/terminado_1670253674810/work
tinycss2 @ file:///home/conda/feedstock_root/build_artifacts/tinycss2_1666100256010/work
tomli==2.0.1
tomlkit==0.12.1
torch==2.0.1
tornado @ file:///home/conda/feedstock_root/build_artifacts/tornado_1684150054582/work
traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1675110562325/work
triton==2.0.0
typing-utils @ file:///home/conda/feedstock_root/build_artifacts/typing_utils_1622899189314/work
typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1688315532570/work
tzdata @ file:///home/conda/feedstock_root/build_artifacts/python-tzdata_1680081134351/work
ujson==5.8.0
uri-template @ file:///home/conda/feedstock_root/build_artifacts/uri-template_1688655812972/work/dist
urllib3==1.26.14
wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1673864653149/work
webcolors @ file:///home/conda/feedstock_root/build_artifacts/webcolors_1679900785843/work
webencodings==0.5.1
websocket-client @ file:///home/conda/feedstock_root/build_artifacts/websocket-client_1687789148259/work
widgetsnbextension @ file:///home/conda/feedstock_root/build_artifacts/widgetsnbextension_1688504439014/work
wrapt @ file:///home/conda/feedstock_root/build_artifacts/wrapt_1677485519705/work
y-py==0.6.0
yarl @ file:///home/conda/feedstock_root/build_artifacts/yarl_1685191749966/work
ypy-websocket==0.8.4
zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1689374466814/work

how to use this model? Is it same as hf version?

How to use this model? Is it same as hf version?

From README, I know I must use torchrun to run example, is it possible to run by python? how to write an example that can be run by python not torchrun? Recently all my searched results are to use hf version, is it possible to use original download model?

missing key

when the link https://download.llamameta.net/ is opened in the browser I get the below error.

Still no wget resume option for model download?

Simply adding --continue would save time for interrupted downloads.

Any plans to put this on Replicate?

Would love to be able to use this on Replicate – are there any plans for Facebook Research to upload this there?

Thank you!

Issue with downloading models

Hi all,

I am actually struggling with downloading models. When I paste the link when prompted and after tapping 'enter' to download all models I receive the following:

download.sh: 13: [[: not found
Downloading LICENSE and Acceptable Usage Policy
download.sh: 18: Bad substitution

Tried twice with different links, after removing the cloned repo from HDD.

NOTE: Redirects are currently not supported in Windows or MacOs.

I can't run the examples in Windows machine, currently blocked in this attempts to redirect.

We should see information about requirements in the documentation.

Inference on multi-gpu

Tried to run:

torchrun --nproc_per_node 1 codellama/example_instructions.py \
     --ckpt_dir /home/ubuntu/model/ \
     --tokenizer_path /home/ubuntu/model/tokenizer.model \
     --max_seq_len 4512 --max_batch_size 4

I have a long prompt (4000 tokens).
I have 4 Nvidia A10G each with 300W and 24GB VRAM. However I see only one GPU being used (on nvidia-smi).
The error I get is:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 302.00 MiB (GPU 0; 22.19 GiB total capacity; 21.65 GiB already allocated; 175.50 MiB free; 21.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 33763) of binary:

The whole tracelog is:

 torchrun --nproc_per_node 1 codellama/example_instructions.py \
>     --ckpt_dir /home/ubuntu/model/ \
>     --tokenizer_path /home/ubuntu/model/tokenizer.model \
>     --max_seq_len 4512 --max_batch_size 4
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1

Loaded in 7.07 seconds
Traceback (most recent call last):
  File "/home/ubuntu/codellama/example_instructions.py", line 114, in <module>
    fire.Fire(main)
  File "/home/ubuntu/venv/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ubuntu/venv/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ubuntu/venv/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/ubuntu/codellama/example_instructions.py", line 97, in main
    results = generator.chat_completion(
  File "/home/ubuntu/codellama/llama/generation.py", line 335, in chat_completion
    generation_tokens, generation_logprobs = self.generate(
  File "/home/ubuntu/venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/codellama/llama/generation.py", line 148, in generate
    logits = self.model.forward(tokens[:, prev_pos:cur_pos], prev_pos)
  File "/home/ubuntu/venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/codellama/llama/model.py", line 288, in forward
    h = layer(h, start_pos, freqs_cis, mask)
  File "/home/ubuntu/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/codellama/llama/model.py", line 240, in forward
    h = x + self.attention.forward(
  File "/home/ubuntu/codellama/llama/model.py", line 181, in forward
    scores = F.softmax(scores.float(), dim=-1).type_as(xq)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 302.00 MiB (GPU 0; 22.19 GiB total capacity; 21.65 GiB already allocated; 175.50 MiB free; 21.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 33763) of binary: /home/ubuntu/venv/bin/python
Traceback (most recent call last):
  File "/home/ubuntu/venv/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/venv/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/ubuntu/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/ubuntu/venv/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/ubuntu/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/ubuntu/venv/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
codellama/example_instructions.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-08-25_04:41:10
  host      : ip-172-31-92-135.ec2.internal
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 33763)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Što je Java

torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

https://colab.research.google.com/drive/1rXJyXXO4m-nP4XDoLV7h4_Iea3jYHDZY#scrollTo=xjpOpjPDAVQX
In google colab, When I execute fine-tuning it throws the error

Installed md5sum, but there is an error

I am using Ubuntu 22.04 LTS. I followed the instructions in the readme.md to run download.sh and selected all the models. The installation went smoothly at the beginning until it reached CodeLlama-34b/checklist.chk, where I encountered an md5sum error: md5sum: checklist.chk: No such file or directory, but when I checked with md5sum --version, it was indeed installed. Can you please tell me what steps I might have missed?

Does it support FIM (fill-in-the-middle)?

This is useful for code completion tasks. Starcoder, for instance, is trained with FIM — how does Code LLama compare?

Getting Issue in installation :(

Error :

 root@Indra:/home/gagan/projecta/codellama-gagan-singh# bash download.sh
url:https://download2.llamameta.net/*?Policy=eyJTdGF0ZW1lbnQiOlt7InVuaXF1ZV9oYXNoIjoidHg5ejl1ZHJqN2NkMHN1anNoa3dlaWx3IiwiUmVzb3VyY2UiOiJodHRwczpcL1wvZG93bmxvYWQyLmxsYW1hbWV0YS5uZXRcLyoiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2OTMxMETCCCCCCCCCCCCCCCCCCC&Download-Request-ID=8211ABCDDEMO

Enter the list of models to download without spaces (7b,13b,34b,7b-Python,13b-Python,34b-Python,7b-Instruct,13b-Instruct,34b-Instruct), or press Enter for all:
Downloading LICENSE and Acceptable Usage Policy
wget: missing URL
Usage: wget [OPTION]... [URL]...

Try `wget --help' for more options.
wget: missing URL

Whats the issue ? in download.sh i pasted the link as

read -p "url:https://download2.llamameta.net/*?Policy=eyJTdGF0ZW1lbnQiOlt7InVuaXF1ZV9oYXNoIjoiiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2OTMxMDcwNjl9fX1dfQ__&Signature=Bw-dDvS72Gak1eb0GqeJKU5iD887fN022O6uBGfpvlF9eiMPfx7jDNu1kSre2gbEwHi%7EW15B8Ns8-%7E2fuJqa9t9QEhthKA__&Key-Pair-Id=K15QRJLYKIFSLZ&Download-Request-ID=821102959DEMOO" PRESIGNED_URL
echo ""
ALL_MODELS="7b,13b,34b,7b-Python,13b-Python,34b-Python,7b-Instruct,13b-Instruct,34b-Instruct"
read -p "Enter the list of models to download without spaces ($ALL_MODELS), or press Enter for all: " MODEL_SIZE
TARGET_FOLDER="."             # where all files should end up
mkdir -p ${TARGET_FOLDER}

I have changed token to random string , Please help

Could not parse check file 'checklist.chk'

Getting this on MacOS when attempting to download

HTTP request sent, awaiting response... 200 OK
Length: 6489 (6.3K) [text/html]
Saving to: ‘./CodeLlama-13b-Instruct/checklist.chk’

./CodeLlama-13b-Instruct/checkl 100%[=====================================================>]   6.34K  --.-KB/s    in 0s

2023-08-24 13:51:33 (35.0 MB/s) - ‘./CodeLlama-13b-Instruct/checklist.chk’ saved [6489/6489]

Checking checksums
Could not parse check file 'checklist.chk' (2)

Unable to run it under Windows 10

I followed the instructions, and I was unable to run it under Windows 10 due to nccl

unable to download the model weights

I was unable to download the authorized model within the 24-hour timeframe specified in the authorization email.

--2023-08-29 22:06:38-- https://download2.llamameta.net/LICENSE?Policy=eyJTdGF0ZW1lbnQiOlt7InVuaX......Request-ID=724013916200119
Resolving download2.llamameta.net (download2.llamameta.net)... 99.84.133.49, 99.84.133.23, 99.84.133.48, ...
Connecting to download2.llamameta.net (download2.llamameta.net)|99.84.133.49|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7020 (6.9K) [binary/octet-stream]
Saving to: './LICENSE'

./LICENSE 100%[===================================================================================================================>] 6.86K --.-KB/s in 0s

2023-08-29 22:06:39 (163 MB/s) - './LICENSE' saved [7020/7020]

--2023-08-29 22:06:39-- https://download2.llamameta.net/LICENSE?Policy=eyJTdGF0ZW1lbnQiOlt7InVuaX......&Request-ID=724013916200119
Resolving download2.llamameta.net (download2.llamameta.net)... 99.84.133.49, 99.84.133.23, 99.84.133.48, ...
Connecting to download2.llamameta.net (download2.llamameta.net)|99.84.133.49|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4790 (4.7K) [text/markdown]
Saving to: './USE_POLICY.md'

./USE_POLICY.md 100%[===================================================================================================================>] 4.68K --.-KB/s in 0s

2023-08-29 22:06:39 (139 MB/s) - './USE_POLICY.md' saved [4790/4790]

Downloading tokenizer
--2023-08-29 22:06:39-- https://download2.llamameta.net/LICENSE?Policy=eyJTdGF0ZW1lbnQiOlt7InVuaX......&Request-ID=724013916200119
Resolving download2.llamameta.net (download2.llamameta.net)... 99.84.133.49, 99.84.133.23, 99.84.133.48, ...
Connecting to download2.llamameta.net (download2.llamameta.net)|99.84.133.49|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-08-29 22:06:41 ERROR 403: Forbidden.

--2023-08-29 22:06:41-- https://download2.llamameta.net/LICENSE?Policy=eyJTdGF0ZW1lbnQiOlt7InVuaX......&Request-ID=724013916200119
Reusing existing connection to download2.llamameta.net:443.
HTTP request sent, awaiting response... 403 Forbidden
2023-08-29 22:06:41 ERROR 403: Forbidden.

access through HuggingFace?

Thanks so much for your work and for opening up access to Code Llama.

We received access for Llama2 previously and access the models via HF. Is it possible to do the same with Code Llama?
(We just requested access for Code Llama via your submission form).

Give an explicit example of the instruction prompt structure in the readme

Currently the readme is pointing newcomers to generation.py, where they have to deduce the correct prompt structure for the instruction model from this code:

B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"

[...]

for dialog in dialogs:
    unsafe_requests.append(
        any([tag in msg["content"] for tag in SPECIAL_TAGS for msg in dialog])
    )
    if dialog[0]["role"] == "system":
        dialog = [
            {
                "role": dialog[1]["role"],
                "content": B_SYS
                + dialog[0]["content"]
                + E_SYS
                + dialog[1]["content"],
            }
        ] + dialog[2:]
    assert all([msg["role"] == "user" for msg in dialog[::2]]) and all(
        [msg["role"] == "assistant" for msg in dialog[1::2]]
    ), (
        "model only supports 'system', 'user' and 'assistant' roles, "
        "starting with 'system', then 'user' and alternating (u/a/u/a/u...)"
    )
    dialog_tokens: List[int] = sum(
        [
            self.tokenizer.encode(
                f"{B_INST} {(prompt['content']).strip()} {E_INST} {(answer['content']).strip()} ",
                bos=True,
                eos=True,
            )
            for prompt, answer in zip(
                dialog[::2],
                dialog[1::2],
            )
        ],
        [],
    )
    assert (
        dialog[-1]["role"] == "user"
    ), f"Last message must be from user, got {dialog[-1]['role']}"
    dialog_tokens += self.tokenizer.encode(
        f"{B_INST} {(dialog[-1]['content']).strip()} {E_INST}",
        bos=True,
        eos=False,
    )
    prompt_tokens.append(dialog_tokens)

This seems unnecessarily obscure. Is there a specific reason to not just give an example?

run win10 is error

i found this #55
but is closed,and no solved.

env:

win10+conda(pytorch-gpu+python3.11)+powershell

error:


(pytorch-gpu) PS F:\aiProject\codellama> torchrun --nproc_per_node 1 example_completion.py --ckpt_dir .\CodeLlama-34b-Python\ --tokenizer_path .\CodeLlama-34b-Python\tokenizer.model --max_seq_len 512 --max_batch_size 4
NOTE: Redirects are currently not supported in Windows or MacOs.
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - 在其上下文中，该请求的地址无效。).
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - 在其上下文中，该请求的地址无效。).
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - 在其上下文中，该请求的地址无效。).
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - 在其上下文中，该请求的地址无效。).
Traceback (most recent call last):
  File "F:\aiProject\codellama\example_completion.py", line 55, in <module>
    fire.Fire(main)
  File "C:\Users\b\.conda\envs\pytorch-gpu\Lib\site-packages\fire\core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\b\.conda\envs\pytorch-gpu\Lib\site-packages\fire\core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\b\.conda\envs\pytorch-gpu\Lib\site-packages\fire\core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "F:\aiProject\codellama\example_completion.py", line 20, in main
    generator = Llama.build(
                ^^^^^^^^^^^^
  File "F:\aiProject\codellama\llama\generation.py", line 68, in build
    torch.distributed.init_process_group("nccl")
  File "C:\Users\b\.conda\envs\pytorch-gpu\Lib\site-packages\torch\distributed\distributed_c10d.py", line 907, in init_process_group
    default_pg = _new_process_group_helper(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\b\.conda\envs\pytorch-gpu\Lib\site-packages\torch\distributed\distributed_c10d.py", line 1013, in _new_process_group_helper
    raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 27024) of binary: C:\Users\b\.conda\envs\pytorch-gpu\python.exe
Traceback (most recent call last):
  File "C:\Users\b\.conda\envs\pytorch-gpu\Scripts\torchrun-script.py", line 33, in <module>
    sys.exit(load_entry_point('torch==2.0.1', 'console_scripts', 'torchrun')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\b\.conda\envs\pytorch-gpu\Lib\site-packages\torch\distributed\elastic\multiprocessing\errors\__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "C:\Users\b\.conda\envs\pytorch-gpu\Lib\site-packages\torch\distributed\run.py", line 794, in main
    run(args)
  File "C:\Users\b\.conda\envs\pytorch-gpu\Lib\site-packages\torch\distributed\run.py", line 785, in run
    elastic_launch(
  File "C:\Users\b\.conda\envs\pytorch-gpu\Lib\site-packages\torch\distributed\launcher\api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\b\.conda\envs\pytorch-gpu\Lib\site-packages\torch\distributed\launcher\api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
example_completion.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-08-30_10:00:59
  host      : Administrator
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 27024)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

HOW:

how to use it?

thanks

Programming Languages Support

Is there any chart or breakdown of % programming languages used in training data for the base/instruct models?

Requesting a Colab file to run code llama

Hey,
Thank you very much for such a great work, it would be great if anyone can create a colab file where we can run this model.
In different regions we are not able to download the model weights and run it.
Thanking You in advance. : )

Enhancement on file: download.sh

Changes :

Prerequisite Checks: Added a function check_prerequisites to verify if wget and md5sum are installed. It also offers to install these packages if they are missing.

Log Function: Introduced a log function to handle all output messages. This makes it easier to control the output format.

Color Coding: Added color coding to output messages for better visual differentiation. Green is used for success messages, red for errors, and yellow for ongoing processes.

Download Function: Introduced a download_file function that abstracts the download logic. It uses wget with the --quiet and --show-progress flags.

Model Download Function: Introduced a download_model function to handle the downloading of individual models. This improves code modularity and readability.

Checksum Verification: Added a visible checksum verification step after each model is downloaded. The checksum verification was previously silent; now it explicitly logs whether each file is OK or not.

User Prompts: Revised user prompts for better clarity, including providing example inputs for the list of models.

Press 'c' to Cancel: Added an option for the user to cancel the operation after entering the models to download.

Code Comments: Introduced comments to explain critical sections of the code, improving readability and maintainability.

Silent Checksum Verification: Used --quiet flag for md5sum to reduce noise in the output.

Multi-shard Download: Adjusted the download logic to handle multi-shard models by looping through each shard.

torchrun --nproc_per_node 2 example_instructions.py --ckpt_dir CodeLlama-13b-Instruct/ --tokenizer_path CodeLlama-13b-Instruct/tokenizer.model --max_seq_len 8192 --max_batch_size 4

WARNING:torch.distributed.run:

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

initializing model parallel with size 2
initializing ddp with size 1
initializing pipeline with size 1
Traceback (most recent call last):
File "/home/azureuser/codellama/example_instructions.py", line 68, in
fire.Fire(main)
File "/home/azureuser/.local/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/azureuser/.local/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/azureuser/.local/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/azureuser/codellama/example_instructions.py", line 20, in main
generator = Llama.build(
File "/home/azureuser/codellama/llama/generation.py", line 90, in build
checkpoint = torch.load(ckpt_path, map_location="cpu")
File "/home/azureuser/.local/lib/python3.10/site-packages/torch/serialization.py", line 815, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/azureuser/.local/lib/python3.10/site-packages/torch/serialization.py", line 1033, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.
Traceback (most recent call last):
File "/home/azureuser/codellama/example_instructions.py", line 68, in
fire.Fire(main)
File "/home/azureuser/.local/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/azureuser/.local/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/azureuser/.local/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/azureuser/codellama/example_instructions.py", line 20, in main
generator = Llama.build(
File "/home/azureuser/codellama/llama/generation.py", line 75, in build
torch.cuda.set_device(local_rank)
File "/home/azureuser/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 350, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 14881) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/home/azureuser/.local/bin/torchrun", line 8, in
sys.exit(main())
File "/home/azureuser/.local/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, kwargs)
File "/home/azureuser/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/home/azureuser/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/azureuser/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/azureuser/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

example_instructions.py FAILED

Failures:
[1]:
time : 2023-08-29_13:34:23
host : llm.internal.cloudapp.net
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 14882)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2023-08-29_13:34:23
host : llm.internal.cloudapp.net
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 14881)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

NOTE: Redirects are currently not supported in Windows or MacOs.

Hi guys,

I am trying CodeLlama-13b-Python model in local MacOS 13.4.1 (c) M2, I can make sure I install all packages they need in requirement.txt.

I want to make sure is it a setting up issue or we have to run this model on Linux, just want to make sure

Can't run examples on Windows 10

Hi,
I've tried to run the examples, but I received this error.

(CodeLlama) PS C:\Users\marce\OneDrive\mah-docs\CodeLlama\codellama> python -m torch.distributed.run --nproc_per_node 1 example_infilling.py --ckpt_dir CodeLlama-7b-Python --tokenizer_path ./CodeLlama-7b-Python/tokenizer.model
NOTE: Redirects are currently not supported in Windows or MacOs.
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Traceback (most recent call last):
  File "C:\Users\marce\OneDrive\mah-docs\CodeLlama\codellama\example_infilling.py", line 79, in <module>
    fire.Fire(main)
  File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\fire\core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\fire\core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\fire\core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\marce\OneDrive\mah-docs\CodeLlama\codellama\example_infilling.py", line 18, in main
    generator = Llama.build(
                ^^^^^^^^^^^^
  File "C:\Users\marce\OneDrive\mah-docs\CodeLlama\codellama\llama\generation.py", line 90, in build
    checkpoint = torch.load(ckpt_path, map_location="cpu")
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\serialization.py", line 815, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\serialization.py", line 1033, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_pickle.UnpicklingError: invalid load key, '<'.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 18284) of binary: C:\ProgramData\anaconda3\envs\CodeLlama\python.exe
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\distributed\run.py", line 798, in <module>
    main()
  File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\distributed\elastic\multiprocessing\errors\__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\distributed\run.py", line 794, in main
    run(args)
  File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\distributed\run.py", line 785, in run
    elastic_launch(
  File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\distributed\launcher\api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\distributed\launcher\api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
example_infilling.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-08-28_12:39:51
  host      : DESKTOP-THP4I5R
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 18284)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs

Unable to locally verify the issuer's authority

Enter the list of models to download without spaces (7b,13b,34b,7b-Python,13b-Python,34b-Python,7b-Instruct,13b-Instruct,34b-Instruct), or press Enter for all: 7b
Downloading LICENSE and Acceptable Usage Policy
--2023-08-24 17:02:44--  {codellama url}
Download-Request-ID=254298234165953
Resolving download2.llamameta.net (download2.llamameta.net)... 54.230.31.124, 54.230.31.3, 54.230.31.14, ...
Connecting to download2.llamameta.net (download2.llamameta.net)|54.230.31.124|:443... connected.
ERROR: cannot verify download2.llamameta.net's certificate, issued by ‘[email protected],CN=FGT3KDT418800895,OU=Certificate Authority,O=Fortinet,L=Sunnyvale,ST=California,C=US’:
  Unable to locally verify the issuer's authority.
To connect to download2.llamameta.net insecurely, use `--no-check-certificate'.

I've requested the link 3 times now and the download.sh script still throws this error when trying to download the model.
Thought it was a configuration issue on my side, but I tried again with normal llama-2 and it worked.

Confusingly, the download.sh script for llama-2 only works for some models. 70B-chat worked, 7B and 13B did not.

I also tried editing the download.sh script to include the --no-check-certificate option on all wget commands, but that then gives me the following error:

  Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 403 Forbidden
2023-08-24 16:45:15 ERROR 403: Forbidden.

issue with downloading certain files from the Llama 2

Hello all,
I am currently facing an issue with downloading certain files from the Llama 2 repository. I ran the provided download.sh script to fetch the necessary files but encountered errors that prevented the successful download of the following files:

params.json
tokenizer.model
checklist.chk
I have followed the documentation and also made sure that the URL provided in the email was correctly entered. The script runs without issues for other files but fails specifically for these. I have sufficient disk space and my internet connection is stable.

Could you please assist me in resolving this issue? Is there an alternate way to download these specific files or should I perform some additional troubleshooting steps?

Thank you for your time and assistance.

Greedy decoding of CodeLlama

Hi, thanks for the great work! From the interface it seems there is not an option like do_sample=False to enable deterministic greedy decoding. I am curious if there will be support or how to add that by ourselves. Thanks!

When running `bash download.sh` in my windows laptop, this happens

download.sh: line 5: $'\r': command not found
': not a valid identifier: `PRESIGNED_URL

': not a valid identifier: `MODEL_SIZE
download.sh: line 12: $'\r': command not found
download.sh: line 22: syntax error near unexpected token `$'do\r''
'ownload.sh: line 22: `do```

Model pads response with newlines up to max_length

I tried several of the models through Huggingface, and the response is always padded with newlines up to the number of tokens specified by the max_length argument in model.generate().

I also assign pad_token_id=tokenizer.eos_token_id, so I'm not sure why the model is generating these newline characters.

How to trigger the model?

I got the following - how do I trigger the model GUI?

(codegenllama) C:>pip install -e codellama
Obtaining file:///C:/codellama
Preparing metadata (setup.py) ... done
Requirement already satisfied: torch in c:\users\home\anaconda3\envs\codegenllama\lib\site-packages (from codellama==0.0.1) (2.0.1+cu117)
Collecting fairscale (from codellama==0.0.1)
Downloading fairscale-0.4.13.tar.gz (266 kB)
---------------------------------------- 266.3/266.3 kB 5.4 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Collecting fire (from codellama==0.0.1)
Downloading fire-0.5.0.tar.gz (88 kB)
---------------------------------------- 88.3/88.3 kB 5.2 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting sentencepiece (from codellama==0.0.1)
Downloading sentencepiece-0.1.99-cp310-cp310-win_amd64.whl (977 kB)
---------------------------------------- 977.5/977.5 kB 10.3 MB/s eta 0:00:00
Requirement already satisfied: numpy>=1.22.0 in c:\users\home\anaconda3\envs\codegenllama\lib\site-packages (from fairscale->codellama==0.0.1) (1.24.1)
Requirement already satisfied: filelock in c:\users\home\anaconda3\envs\codegenllama\lib\site-packages (from torch->codellama==0.0.1) (3.9.0)
Requirement already satisfied: typing-extensions in c:\users\home\anaconda3\envs\codegenllama\lib\site-packages (from torch->codellama==0.0.1) (4.4.0)
Requirement already satisfied: sympy in c:\users\home\anaconda3\envs\codegenllama\lib\site-packages (from torch->codellama==0.0.1) (1.11.1)
Requirement already satisfied: networkx in c:\users\home\anaconda3\envs\codegenllama\lib\site-packages (from torch->codellama==0.0.1) (3.0)
Requirement already satisfied: jinja2 in c:\users\home\anaconda3\envs\codegenllama\lib\site-packages (from torch->codellama==0.0.1) (3.1.2)
Collecting six (from fire->codellama==0.0.1)
Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting termcolor (from fire->codellama==0.0.1)
Downloading termcolor-2.3.0-py3-none-any.whl (6.9 kB)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\home\anaconda3\envs\codegenllama\lib\site-packages (from jinja2->torch->codellama==0.0.1) (2.1.2)
Requirement already satisfied: mpmath>=0.19 in c:\users\home\anaconda3\envs\codegenllama\lib\site-packages (from sympy->torch->codellama==0.0.1) (1.2.1)
Building wheels for collected packages: fairscale, fire
Building wheel for fairscale (pyproject.toml) ... done
Created wheel for fairscale: filename=fairscale-0.4.13-py3-none-any.whl size=332117 sha256=0945c01902555985db3ce29135a6b2eef4448b83305b47dbb614485d15188a24
Stored in directory: c:\users\home\appdata\local\pip\cache\wheels\78\a4\c0\fb0a7ef03cff161611c3fa40c6cf898f76e58ec421b88e8cb3
Building wheel for fire (setup.py) ... done
Created wheel for fire: filename=fire-0.5.0-py2.py3-none-any.whl size=116947 sha256=d2a6f24d8bb7ed4b42b780a7fcdbae383514bf5ef87b37c07ab76410ca582514
Stored in directory: c:\users\home\appdata\local\pip\cache\wheels\90\d4\f7\9404e5db0116bd4d43e5666eaa3e70ab53723e1e3ea40c9a95
Successfully built fairscale fire
Installing collected packages: sentencepiece, termcolor, six, fire, fairscale, codellama
Running setup.py develop for codellama
Successfully installed codellama-0.0.1 fairscale-0.4.13 fire-0.5.0 sentencepiece-0.1.99 six-1.16.0 termcolor-2.3.0

Remembering the previous context

Hi everyone! Is it possible to make long dialogue with the Instruct model? In other words, to make the model remember a previous context.

For now, I have an idea to paste the previous prompt and the model's response into the new request. Are there more concise and easier ways?

SSL error when downloading

Hi,

The download script throws the following error:

Resolving download2.llamameta.net (download2.llamameta.net)... ::ffff:130.226.237.92, 130.226.237.92
Connecting to download2.llamameta.net (download2.llamameta.net)|::ffff:130.226.237.92|:443... connected.
OpenSSL: error:0A000152:SSL routines::unsafe legacy renegotiation disabled
Unable to establish SSL connection.
Checking checksums
md5sum: checklist.chk: no properly formatted MD5 checksum lines found

If I set the Options = UnsafeLegacyRenegotiation in SSL conf then it throws the error mentioned in #8 . Thanks for the help!

Invalid load key error

Cmd line:
torchrun --nproc_per_node 1 example_infilling.py --ckpt_dir CodeLlama-7b-Instruct/ --tokenizer_path CodeLlama-7b-Instruct/tokenizer.model --max_seq_len 512 -- max_batch_size 4

Error Raised

> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Traceback (most recent call last):
  File "/home/fran/codellama/example_infilling.py", line 79, in <module>
    fire.Fire(main)
  File "/home/fran/miniconda3/envs/cs-gpt/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/fran/miniconda3/envs/cs-gpt/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/fran/miniconda3/envs/cs-gpt/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/fran/codellama/example_infilling.py", line 18, in main
    generator = Llama.build(
  File "/home/fran/codellama/llama/generation.py", line 90, in build
    checkpoint = torch.load(ckpt_path, map_location="cpu")
  File "/home/fran/miniconda3/envs/cs-gpt/lib/python3.10/site-packages/torch/serialization.py", line 815, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/fran/miniconda3/envs/cs-gpt/lib/python3.10/site-packages/torch/serialization.py", line 1033, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 463834) of binary: /home/fran/miniconda3/envs/cs-gpt/bin/python
Traceback (most recent call last):
  File "/home/fran/miniconda3/envs/cs-gpt/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/fran/miniconda3/envs/cs-gpt/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/fran/miniconda3/envs/cs-gpt/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/fran/miniconda3/envs/cs-gpt/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/fran/miniconda3/envs/cs-gpt/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/fran/miniconda3/envs/cs-gpt/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
example_infilling.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-08-29_20:39:20
  host      : fran.rtzr.ai
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 463834)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Where is the 70B Model?

Just a question: Are you planning to release the 70B model in the future or what is the plan?

Size of the model info is missing

It would be helpful to mention the size of disk space the pre-trained models will take so as to determine if this can be easily done on laptop or not.

how to install wget and md5sum.(Windows user)

I am running all the commands in a git bash terminal.
If you guys tell me a step by step method on how to install the model it will be very helpful

download.sh: line 19: wget: command not found
Downloading CodeLlama-7b
download.sh: line 53: wget: command not found
download.sh: line 56: wget: command not found
download.sh: line 57: wget: command not found
download.sh: line 58: wget: command not found
Checking checksums
md5sum: checklist.chk: No such file or directory```

What's the machine requirements for each model?

I want to know what's the minimum requirement memory/CPU/GPU for each model to run relatively fast. I ran in my M1

torchrun --nproc_per_node 1 example_completion.py \
    --ckpt_dir CodeLlama-7b/ \
    --tokenizer_path CodeLlama-7b/tokenizer.model \
    --max_seq_len 128 --max_batch_size 4
NOTE: Redirects are currently not supported in Windows or MacOs.
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 182.59 seconds

and it's taking more than 5 minutes.

Finetuning 7B codellama: Runtime error

Am trying to finetune codellama with the same idea of llama2 and using the same script to finetune.
Am not sure whether am right as the repo or blog not talking about finetune approach.

Am facing this error. RuntimeError: shape '[-1, 32000]' is invalid for input of size 131073504

RuntimeError Traceback (most recent call last)
Cell In[10], line 29
20 trainer = Trainer(
21 model=model,
22 args=training_args,
(...)
25 callbacks=[profiler_callback] if enable_profiler else [],
26 )
28 # Start training
---> 29 trainer.train()

File /opt/conda/envs/llama_cona_env/lib/python3.8/site-packages/transformers/trainer.py:1662, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1657 self.model_wrapped = self.model
1659 inner_training_loop = find_executable_batch_size(
1660 self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size
1661 )
-> 1662 return inner_training_loop(
1663 args=args,
1664 resume_from_checkpoint=resume_from_checkpoint,
1665 trial=trial,
1666 ignore_keys_for_eval=ignore_keys_for_eval,
1667 )

File /opt/conda/envs/llama_cona_env/lib/python3.8/site-packages/transformers/trainer.py:1929, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
1927 tr_loss_step = self.training_step(model, inputs)
1928 else:
-> 1929 tr_loss_step = self.training_step(model, inputs)
1931 if (
1932 args.logging_nan_inf_filter
1933 and not is_torch_tpu_available()
1934 and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
1935 ):
1936 # if loss is nan or inf simply add the average of previous logged losses
1937 tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)

File /opt/conda/envs/llama_cona_env/lib/python3.8/site-packages/transformers/trainer.py:2699, in Trainer.training_step(self, model, inputs)
2696 return loss_mb.reduce_mean().detach().to(self.args.device)
2698 with self.compute_loss_context_manager():
-> 2699 loss = self.compute_loss(model, inputs)
2701 if self.args.n_gpu > 1:
2702 loss = loss.mean() # mean() to average on multi-gpu parallel training

File /opt/conda/envs/llama_cona_env/lib/python3.8/site-packages/transformers/trainer.py:2731, in Trainer.compute_loss(self, model, inputs, return_outputs)
2729 else:
2730 labels = None
-> 2731 outputs = model(**inputs)
2732 # Save past state if it exists
2733 # TODO: this needs to be fixed and made cleaner later.
2734 if self.args.past_index >= 0:

File /opt/conda/envs/llama_cona_env/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/envs/llama_cona_env/lib/python3.8/site-packages/peft/peft_model.py:947, in PeftModelForCausalLM.forward(self, input_ids, attention_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict, **kwargs)
936 raise AssertionError("forward in MPTForCausalLM does not support inputs_embeds")
937 return self.base_model(
938 input_ids=input_ids,
939 attention_mask=attention_mask,
(...)
944 **kwargs,
945 )
--> 947 return self.base_model(
948 input_ids=input_ids,
949 attention_mask=attention_mask,
950 inputs_embeds=inputs_embeds,
951 labels=labels,
952 output_attentions=output_attentions,
953 output_hidden_states=output_hidden_states,
954 return_dict=return_dict,
955 **kwargs,
956 )
958 batch_size = input_ids.shape[0]
959 if attention_mask is not None:
960 # concat prompt attention mask

File /opt/conda/envs/llama_cona_env/lib/python3.8/site-packages/accelerate/hooks.py:165, in add_hook_to_module..new_forward(*args, **kwargs)
163 output = old_forward(*args, **kwargs)
164 else:
--> 165 output = old_forward(*args, **kwargs)
166 return module._hf_hook.post_forward(module, output)

File /opt/conda/envs/llama_cona_env/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py:709, in LlamaForCausalLM.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
707 # Flatten the tokens
708 loss_fct = CrossEntropyLoss()
--> 709 shift_logits = shift_logits.view(-1, self.config.vocab_size)
710 shift_labels = shift_labels.view(-1)
711 # Enable model parallelism

RuntimeError: shape '[-1, 32000]' is invalid for input of size 131073504

How to deploy this model?

Scheme missing

Followed the instructions as per the blog and got my email download code.

Ater running download.sh, download link and choosing the models, I get a Scheme missing error and the script gets stuck on Checking checksums

Run 13B or 34B in a single GPU

How to load 13B or 34B in a single A100, I notice the model parallel size larger than 1 need more gpus

How can i generate embeddings from the model for a new source code dataset ?

Run code llama on mac?

Hi,

on mac I got the following error:
RuntimeError: Distributed package doesn't have NCCL built in
raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 80731) of binary: /opt/dev/miniconda3/envs/llama/bin/python3.10

Guess this is because of the missing CUDA. Is there an option to run it with CPU?

Sorry, the download is not available in your region

how to finetune it locally

Priority Support

@AhmedHashem2104 is using Mintycode to fund this issue.
You can receive if you provide priority support to @AhmedHashem2104.
To view the support request and terms go to Mintycode.
Thank you in advance for helping.

Can't run any inference

I'm trying to use the exemple inference on windows 10 with python 10, like that:

(py310) d:\git\codellama>torchrun --nproc_per_node 1 example_instructions.py --ckpt_dir CodeLlama-7b-Instruct/ --tokenizer_path CodeLlama-7b-Instruct/tokenizer.model --max_seq_len 512 --max_batch_size 4

But it seems trying to connect to something in docker that I'm not using. That may be related to NCCL even if I only have one gpu... I don't understand.
This is the output I get:

NOTE: Redirects are currently not supported in Windows or MacOs.
[W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - LÆadresse demandÚe nÆest pas valide dans son contexte.).
[W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - LÆadresse demandÚe nÆest pas valide dans son contexte.).
[W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - LÆadresse demandÚe nÆest pas valide dans son contexte.).
[W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - LÆadresse demandÚe nÆest pas valide dans son contexte.).
Traceback (most recent call last):
  File "d:\git\codellama\example_instructions.py", line 68, in <module>
    fire.Fire(main)
  File "D:\anaconda3\envs\py310\lib\site-packages\fire\core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "D:\anaconda3\envs\py310\lib\site-packages\fire\core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "D:\anaconda3\envs\py310\lib\site-packages\fire\core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "d:\git\codellama\example_instructions.py", line 20, in main
    generator = Llama.build(
  File "d:\git\codellama\llama\generation.py", line 68, in build
    torch.distributed.init_process_group("nccl")
  File "D:\anaconda3\envs\py310\lib\site-packages\torch\distributed\distributed_c10d.py", line 907, in init_process_group
    default_pg = _new_process_group_helper(
  File "D:\anaconda3\envs\py310\lib\site-packages\torch\distributed\distributed_c10d.py", line 1013, in _new_process_group_helper
    raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 13132) of binary: D:\anaconda3\envs\py310\python.exe
Traceback (most recent call last):
  File "D:\anaconda3\envs\py310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\anaconda3\envs\py310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\anaconda3\envs\py310\Scripts\torchrun.exe\__main__.py", line 7, in <module>
  File "D:\anaconda3\envs\py310\lib\site-packages\torch\distributed\elastic\multiprocessing\errors\__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "D:\anaconda3\envs\py310\lib\site-packages\torch\distributed\run.py", line 794, in main
    run(args)
  File "D:\anaconda3\envs\py310\lib\site-packages\torch\distributed\run.py", line 785, in run
    elastic_launch(
  File "D:\anaconda3\envs\py310\lib\site-packages\torch\distributed\launcher\api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "D:\anaconda3\envs\py310\lib\site-packages\torch\distributed\launcher\api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
example_instructions.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-08-28_09:51:06
  host      : DESKTOP-123456
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 13132)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Himanshu Kumar wants to pay 10 USD to have this issue fixed

Need changes
For more details visit https://mintycode.io