Giter Club home page Giter Club logo

llama2.jl's Introduction

llama2.jl

Cute Llama

Tired of low-level languages? Ever wanted to infer a baby Llama 2 model in pure Julia? Great news โ€“ you can now do so at in under 300 lines of Julia.

This is a fork of Andrej's llama2.c which has been ported to (for now) a slightly hacky version of Julia. This README is heavily inspired by the Rust port llama.rs.

Don't want to read? Got ya back!

git clone https://github.com/juvi21/llama2.jl && cd llama2.jl && wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin && julia jl_helpers/install_pkg.jl && julia run.jl stories15M.bin tokenizer.bin

How to run?

  1. Grab Andrej's baby Llama2 (see the original instructions) pretrained on the TinyStories dataset:

    wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
    wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories42M.bin
    wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.bin
  2. Ensure you have the tokenizer binary - tokenizer.bin (if not, see tokenizer.py).

  3. Run run.jl:

    Single-threaded:

    julia run.jl <model> <tokenizer> --temp [temperature]

    Multi-Threaded: In Progress
    CUDA: In Progress

Performance

On my current workstation, the performance is quite fast. However, I have been away visiting my parents for a few days, so I only had the opportunity to test it on one of my very first and less powerful station. More testing is coming soon! NOTE: I compiled llama2.c with the provided command in Andrej's README which is only the basic one to get started and not very optimized.

gcc -O3 -o run run.c -lm
system model llama2.c llmaa2.c -0fast llama2.jl
Ubuntu 22.04 AMD Ryzen 2600 stories15M.bin 85.418752 tok/s 189.591078 tok/s 257.445516 tok/s
Ubuntu 22.04 AMD Ryzen 2600 stories42M.bin 30.761836 tok/s 78.485688 tok/s 92.567484 tok/s
Ubuntu 22.04 AMD Ryzen 2600 stories110.bin 11.585283 tok/s 30.375223 tok/s 38.543434 tok/s

Contributions

Join the dark side and code in Julia. Contributions are highly encouraged!

Contribution Ideas:

  • Make it faster.
  • Add CUDA support.
  • Introduce Multi-Threaded support.
  • Cutom Prompt

Art

@Midjourney

llama2.jl's People

Contributors

aegkmq avatar awgu avatar danielgross avatar emma-eva avatar hu-po avatar juvi21 avatar karpathy avatar kris-jusiak avatar kroggen avatar leloykun avatar luigifcruz avatar madroidmaq avatar manuel030 avatar mcognetta avatar python273 avatar richardscottoz avatar richinseattle avatar slyecho avatar som-sama avatar sumo43 avatar tairov avatar tatellos avatar tmc avatar vovw avatar wsmoses avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

novusnota-forks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.