Giter Club home page Giter Club logo

mingrok's Introduction

minGrok

This repo is meant as a guide to how XAI's newly open-sourced model Grok-1 works. To see their original implementation, click here. To find the googel colab notebook that walks through the architecture in excruciating detail as a demonstration for beginners, click here and check out my youtube video where I walk through it below. If you're not a beginner (already knowledgeable about decoder-only transformers) then I recommend skimming through model.py and config.py to see all the ways in which Grok-1 differs from other open-source models like Llama, Mistral and Gemma.

ERROR DISPLAYING IMAGE, CLICK HERE FOR VIDEO

Repo Contents

  • The Accompanying Colab Notebook - the teaching material I walk through in my youtube video
  • minGrok_train-test.ipynb - the notebook where I actually trained the 1m parameter model. The code here is essentially the same as what's in section 3 of the colab notebook
  • model.py - contains the nn.Modules used to define minGrok. The code here is essentially the same as what's in section 2 of the colab notebook
  • config.py - contains minGrok's configuration hyperparameters as well as comments indicating what full-sized Grok uses
  • tokenizer.py - a very simple tokenizer of length 128 built off of TinyShakespeare's original 65 character vocabulary. By no means should anyone actually use this in production but it's fine as a simple stand-in given that the purpose of this repo is not to teach about tokenization
  • input.txt - just TinyShakespeare. If i wasn't so lazy I would've set all this code to download it directly rather than actually storing a copy in this repo
  • models/ - a folder of 1m parameter model(s) that i trained on my macbook air. Again don't expect anything impressive, they're just here for teaching purposes so that you can load them rather than training your own. If you train something bigger feel free to upload I guess, but stick with my lazy practice of designating hyperparameters in the title

ToDo

  • A commenter pointed out my lack of inclusion of MoE specific training dynamics. Basically in order to encourage proper expert utilization rather than over-reliance on one expert, you need to both add randomness to the Router's logits and add a diversity loss to ensure every expert is used in every batch. The video will not be changing but the code has been updated accordingly
  • Grok's FFN inner-dimension multiplier is actually effectively 5.33. Very odd way they set that up on their part which is why i missed it but anyways these comments have also been fixed
  • youtube commenter @rpbmpn caught my silly brainfart at the attention normalization. Originally I was not sure where the 0.08838834764831845 scale factor came from but they pointed out that it's just the reciprocal of the square root of the head dimension (1/sqrt(128)). I've only added comments and not actually updated the code bc i'm too lazy to train a new model based on this one tiny change. If anything bigger comes up worth re-training minGrok then i'll include this.

Check out my socials

mingrok's People

Contributors

evintunador avatar

Stargazers

Astro avatar Roshan Ram avatar  avatar Subho Ghosh avatar  avatar  avatar Indy Scriabin avatar

Watchers

 avatar

Forkers

vonrafael

mingrok's Issues

Expert / Dense layer inner size

Hello,

first, thank you for the nice implementation! But I noticed a small difference to the actual model, which caught my eye when I myself looked at the original implementation.

The FFN inner size for the experts is computed in a very weird way, the actual factor is about 5.33:
https://github.com/xai-org/grok-1/blob/7050ed204b8206bb8645c7b7bbef7252f79561b0/model.py#L85-L89

def ffn_size(emb_size, widening_factor):
    _ffn_size = int(widening_factor * emb_size) * 2 // 3
    _ffn_size = _ffn_size + (8 - _ffn_size) % 8  # ensure it's a multiple of 8
    logger.debug(f"emd_size: {emb_size} adjusted ffn_size: {_ffn_size}")
    return _ffn_size

Regards,

Jan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.