Giter Club home page Giter Club logo

llms-from-scratch's Introduction

Build a Large Language Model (From Scratch)

This repository contains the code for coding, pretraining, and finetuning a GPT-like LLM and is the official code repository for the book Build a Large Language Model (From Scratch).

(If you downloaded the code bundle from the Manning website, please consider visiting the official code repository on GitHub at https://github.com/rasbt/LLMs-from-scratch.)



In Build a Large Language Model (From Scratch), you'll discover how LLMs work from the inside out. In this book, I'll guide you step by step through creating your own LLM, explaining each stage with clear text, diagrams, and examples.

The method described in this book for training and developing your own small-but-functional model for educational purposes mirrors the approach used in creating large-scale foundational models such as those behind ChatGPT.



Table of Contents

Please note that this README.md file is a Markdown (.md) file. If you have downloaded this code bundle from the Manning website and are viewing it on your local computer, I recommend using a Markdown editor or previewer for proper viewing. If you haven't installed a Markdown editor yet, MarkText is a good free option.

Alternatively, you can view this and other files on GitHub at https://github.com/rasbt/LLMs-from-scratch.



Tip

If you're seeking guidance on installing Python and Python packages and setting up your code environment, I suggest reading the README.md file located in the setup directory.


Python PEP8 linting Python tests Check hyperlinks


Chapter Title Main Code (for quick access) All Code + Supplementary
Ch 1: Understanding Large Language Models No code -
Ch 2: Working with Text Data - ch02.ipynb
- dataloader.ipynb (summary)
- exercise-solutions.ipynb
./ch02
Ch 3: Coding Attention Mechanisms - ch03.ipynb
- multihead-attention.ipynb (summary)
- exercise-solutions.ipynb
./ch03
Ch 4: Implementing a GPT Model from Scratch - ch04.ipynb
- gpt.py (summary)
- exercise-solutions.ipynb
./ch04
Ch 5: Pretraining on Unlabeled Data - ch05.ipynb
- gpt_train.py (summary)
- gpt_generate.py (summary)
- exercise-solutions.ipynb
./ch05
Ch 6: Finetuning for Text Classification Q2 2024 ...
Ch 7: Finetuning with Human Feedback Q2 2024 ...
Ch 8: Using Large Language Models in Practice Q2/3 2024 ...
Appendix A: Introduction to PyTorch - code-part1.ipynb
- code-part2.ipynb
- DDP-script.py
- exercise-solutions.ipynb
./appendix-A
Appendix B: References and Further Reading No code -
Appendix C: Exercise Solutions No code -
Appendix D: Adding Bells and Whistles to the Training Loop - appendix-D.ipynb ./appendix-D


Shown below is a mental model summarizing the contents covered in this book.



 

Bonus Material

Several folders contain optional materials as a bonus for interested readers:



 

Citation

If you find this book or code useful for your research, please consider citing it:

@book{build-llms-from-scratch-book,
  author       = {Sebastian Raschka},
  title        = {Build A Large Language Model (From Scratch)},
  publisher    = {Manning},
  year         = {2023},
  isbn         = {978-1633437166},
  url          = {https://www.manning.com/books/build-a-large-language-model-from-scratch},
  note         = {Work in progress},
  github       = {https://github.com/rasbt/LLMs-from-scratch}
}

llms-from-scratch's People

Contributors

d-kleine avatar debnsuma avatar eltociear avatar hammer avatar intelligence-manifesto avatar jameslholcombe avatar joel-foo avatar pitmonticone avatar rasbt avatar rayedbw avatar shenxiangzhuang avatar shuyib avatar taihaozesong avatar xiaotian0328 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

llms-from-scratch's Issues

Incorrect code output in the book (2.2 Tokenizing text)

Hi @rasbt,

I found that in the latest book version (v5) there is an incorrect code output in the section "2.2 Tokenizing text":

result = re.split(r'([,.]|\s)', text)
print(result)

We can see that the words and punctuation characters are now separate list entries just
as we wanted:

['Hello', ',', '', ' ', 'world.', ' ', 'This', ',', '', ' ', 'is', ' ', 'a', ' ', 'test.']

and

The resulting whitespace-free output looks like as follows:

['Hello', ',', 'world.', 'This', ',', 'is', 'a', 'test.']

But if we execute provided notebook, the output is correct.

P.S. It is a great pleasure to explore your next new book, especially about LLMs, thank you! :)

Thank you.

RuntimeError: size mismatch - ch05/03_bonus_pretraining_on_gutenberg

I have an issue running pretraining_simple.py. I have downloaded ca. 50% of the files from Project Gutenberg via the gutenberg repo and then ran your scripts:

The text data preparation works fine so far:

prepare_dataset.py

root@9db1a84319a3:/workspaces/LLMs-from-scratch/ch05/03_bonus_pretraining_on_gutenberg# python prepare_dataset.py
--data_dir gutenberg/data
--max_size_mb 500
--output_dir gutenberg_preprocessed
16697 file(s) to process.

But when trying to train the model, it comes to a shape mismatch. It seems like the data will not be trained batch-wise:

pretraining_simple.py

root@9db1a84319a3:/workspaces/LLMs-from-scratch/ch05/03_bonus_pretraining_on_gutenberg# python pretraining_simple.py --
data_dir "gutenberg_preprocessed" --n_epochs 1 --batch_size 4 --output_dir model_checkpoints
Total files: 16
Tokenizing file 1 of 16: gutenberg_preprocessed/combined_1.txt
Training ...
Traceback (most recent call last):
File "/workspaces/LLMs-from-scratch/ch05/03_bonus_pretraining_on_gutenberg/pretraining_simple.py", line 200, in
train_losses, val_losses, tokens_seen = train_model_simple(
File "/workspaces/LLMs-from-scratch/ch05/03_bonus_pretraining_on_gutenberg/pretraining_simple.py", line 110, in train_model_simple
loss = calc_loss_batch(input_batch, target_batch, model, device)
File "/workspaces/LLMs-from-scratch/ch05/03_bonus_pretraining_on_gutenberg/previous_chapters.py", line 247, in calc_loss_batch
loss = torch.nn.functional.cross_entropy(logits.flatten(0, -1), target_batch.flatten())
File "/opt/conda/lib/python3.10/site-packages/torch/nn/functional.py", line 3029, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: size mismatch (got input: [205852672], target: [4096])

I believe the issue comes from the flatten func. In calc_loss_batch() in previous_chapters.py, what do you think about exchanging flatten() with using view()?

loss = torch.nn.functional.cross_entropy(logits.view(-1, logits.size(-1)), target_batch.view(-1))

Please double-check if this idea and output is correct.


I have run the updated script locally on my RTX 3080 Ti, the output is:

root@9db1a84319a3:/workspaces/LLMs-from-scratch/ch05/03_bonus_pretraining_on_gutenberg# python pretraining_simple.py --data_dir "gutenberg_preprocessed" --n_epochs 1 --batch_size 4 --output_dir model_checkpoints
Total files: 16
Tokenizing file 1 of 16: gutenberg_preprocessed/combined_1.txt
Training ...
Ep 1 (Step 0): Train loss 9.952, Val loss 9.663
Every effort moves you
Ep 1 (Step 100): Train loss 6.567, Val loss 6.906
Ep 1 (Step 200): Train loss 6.468, Val loss 6.637
Ep 1 (Step 300): Train loss 6.170, Val loss 6.578
Ep 1 (Step 400): Train loss 5.560, Val loss 6.485
Ep 1 (Step 500): Train loss 5.874, Val loss 6.381
Ep 1 (Step 600): Train loss 5.481, Val loss 6.449
Ep 1 (Step 700): Train loss 5.620, Val loss 6.314
...

Feedback: Stripe output from notebook

This book is a wonderful read, just wanted to submit one small comment on the notebooks which could just be personal learning style. It's nice to have to run the actual notebook to get the output so block-by-block it's easier to focus on that without being distracted with the output already rendered. So maybe there could 2 notebooks per chapter, a clean one and a completed one? In the meantime I'm just using nbstripeout locally but wanted to pass along the feedback.

Inconsistencies in MHA Wrapper Implementation Between Chapter 3 Main Content and Bonus Material

In the notebook ch03/02_bonus_efficient-multihead-attention/mha-implementations.ipynb, the parameter d_out is not divided by num_heads. As a result, the shape differs from other implementations: [8, 1024, 9216] versus [8, 1024, 768]. Additionally, the implementation lacks the final projection.

It is correctly implemented in ch03\01_main-chapter-code\multihead-attention.ipynb cell 6 and 7.

This inconsistency leads to a significant performance gap in the subsequent cells.

Encoding/decoding transformation of the text (2.3 Converting tokens into token IDs)

Hi @rasbt,

I noticed that when we decode the following encoded sentence:

"It's the last he painted, you know," Mrs. Gisburn said with
pardonable pride.

We will have additional leading spaces at the start of the sentence and after apostrophe in the word It' s:

 "It' s the last he painted, you know," Mrs. Gisburn said with
pardonable pride.

Formally, this does not matter for our case, because we do not take into account spaces, but in general, here we do not precisely restore the original text, right?

Could you please tell if you are interested in such insignificant feedback like this or it is not worth the notes or new issues?

Thank you.

tiktoken is not running in jupyter notebook

Hello Razbt,
Nice to meet you! I've been enjoying your book so far (LLMs from scratch), but I find the examples hard to follow as some of the tools used do not mention which versions you used. I tried to follow along but packages like tiktoken and pytorch refuse to work, or even get installed. I tried using conda to install environments with both Python 3.9 and 3.10. and both successfully install tiktoken, but fail to import it in the jupyter notebook. The command I ran to attempt installation was pip install tiktoken.

Can you let me know which version of Python / tiktoken / pytorch you were using? Is there any intermediate step I missed?

I am running Windows 11 and an (non-Nividia) GPU.

Question about implementation of CausalAttention class (3.5.3 Implementing a compact causal self-attention class)

Hi @rasbt,

This notebook contains the following implementaion of CausalAttention:

class CausalAttention(nn.Module):

    def __init__(self, d_in, d_out, block_size, dropout, qkv_bias=False):
        super().__init__()
        self.d_out = d_out
        self.W_query = nn.Linear(d_in, d_out, bias=qkv_bias)
        self.W_key   = nn.Linear(d_in, d_out, bias=qkv_bias)
        self.W_value = nn.Linear(d_in, d_out, bias=qkv_bias)
        self.dropout = nn.Dropout(dropout) # New
        self.register_buffer('mask', torch.triu(torch.ones(block_size, block_size), diagonal=1)) # New

    def forward(self, x):
        b, num_tokens, d_in = x.shape # New batch dimension b
        keys = self.W_key(x)
        queries = self.W_query(x)
        values = self.W_value(x)

        attn_scores = queries @ keys.transpose(1, 2) # Changed transpose
        attn_scores.masked_fill_(  # New, _ ops are in-place
            self.mask.bool()[:num_tokens, :num_tokens], -torch.inf)
        attn_weights = torch.softmax(attn_scores / keys.shape[-1]**0.5, dim=-1)
        attn_weights = self.dropout(attn_weights) # New

        context_vec = attn_weights @ values
        return context_vec

I have a question - why do we need the following 2 lines in the forward() method implementation:

def forward(self, x):It
        b, num_tokens, d_in = x.shape # New batch dimension b
        ...
        attn_scores.masked_fill_(  # New, _ ops are in-place
            self.mask.bool()[:num_tokens, :num_tokens], -torch.inf)
        ...

Can we remove the first line and just replace the second line to the following code:

attn_scores.masked_fill_(self.mask.bool(), -torch.inf)

As I understand num_tokens = batch_size and we provide batch_size value as the argument, so neither calculating x.shape nor indexing [:num_tokens, :num_tokens] is required.
Is it correct?

Thank you.

Wrong number of token ids specified in the notebook (2.7 Creating token embeddings)

Hi @rasbt,

There is the following description in this section:

Previously, we have seen how to convert a single token ID into a three-dimensional
embedding vector. Let's now apply that to all four input IDs we defined earlier (torch.tensor([5, 1, 3, 2])):

But probably there is a typo in the notebook and you specified only 3 tokens for the same code (after cell [47]):

To embed all three input_ids values above, we do

Thank you.

requirements.txt

Hi,

Can you please add a requirements.txt to the repo as well (to set the environment for book in one go, without needing to install every package manually)?

In 3.3.1, there seems to be a missing image between "The attention weights and context vector calculation are summarized in the figure below:" and "The code below walks through the figure above step by step."

By convention, the unnormalized attention weights are referred to as "attention scores" whereas the normalized attention scores, which sum to 1, are referred to as "attention weights"
The attention weights and context vector calculation are summarized in the figure below:

In 3.3.1, there seems to be a missing image between "The attention weights and context vector calculation are summarized in the figure below:" and "The code below walks through the figure above step by step."

Perhaps the sentence needs to be modified

stride value caused skipping one word

"dataloader = create_dataloader_v1(raw_text, batch_size=8, max_length=4, stride=5, shuffle=False)\n",
This code does skip one word, which is different to the text in the book saying we do not skip a word and do not overlap. stride=4 make it consistent with the book.

Incorrect description of function torch.arange() (2.8 Encoding word positions)

Hi @rasbt,

There is a probably typo in the description of torch.arange() function here:

As shown in the preceding code example, the input to the pos_embeddings is usually a
placeholder vector torch.arange(block_size), which contains a sequence of numbers
1, 2, ..., up to the maximum input length.

I think you mean the range 0, 1, ..., up to the maximum input length - 1?

Thank you.

Missing encoder.json and vocab.bpe for running bpe_openai_gpt2 (02_bonus_bytepair-encoder/compare-bpe-tiktoken.ipynb)

FileNotFoundError occured when trying to instantiate the bpe_openai_gpt2 as following

--------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[20], line 1
----> 1 orig_tokenizer = get_encoder(model_name="gpt2", models_dir=".")

File ~/localdev/python/LLMs-from-scratch/ch02/02_bonus_bytepair-encoder/bpe_openai_gpt2.py:140, in get_encoder(model_name, models_dir)
    139 def get_encoder(model_name, models_dir):
--> 140     with open(os.path.join(models_dir, model_name, 'encoder.json'), 'r') as f:
    141         encoder = json.load(f)
    142     with open(os.path.join(models_dir, model_name, 'vocab.bpe'), 'r', encoding="utf-8") as f:

FileNotFoundError: [Errno 2] No such file or directory: './gpt2/encoder.json'

Inconsistencies in unsqueeze operation description in the book and in notebook and its necessity (3.6.2 Implementing multi-head attention with weight splits)

Hi @rasbt,

I found that implementation of the MultiHeadAttention class has the following line:

mask_unsqueezed = mask_bool.unsqueeze(0).unsqueeze(0)

But there is only one unsqueeze operation in the notebook:

mask_unsqueezed = mask_bool.unsqueeze(0)

But as I understand we can skip unsqueeze operation at all because masked_fill_() method supports broadcasting

Thank you.

Contributions for Chinese simplified version

hi, @rasbt~
This project is awesome and the tutorial structure is rather clear, I was able to get up and running quickly and I'm learning a lot from it. Really appreciate your work! Would you be interested in having a Chinese version of your project? So that LLM learners from China can refer to your work more efficiently. Maybe I can begin with README-zh.md?

Offering Chinese Translation for 'Build a Large Language Model From Scratch

Dear Dr. Sebastian Raschka,

Greetings! I am a researcher passionate about machine learning and artificial intelligence. As a native Chinese speaker, I would like to extend my deepest respect and gratitude for the open-source repository of "Build a Large Language Model From Scratch" that you have made available on GitHub. This book is not only comprehensive and beautifully illustrated but also organized in such a manner that beginners like myself find it both intuitive and easy to understand. Your work showcases profound expertise while being incredibly accessible to newcomers, from which I have greatly benefited.

Above all, I am inspired by your passion for AI and open-source software. Motivated by this passion, I have embarked on a project to translate your book and its associated code into Chinese. This effort aims to assist Chinese-speaking learners, like me, in better understanding the process of building large language models. To date, I have completed the translation of the first four chapters. During this process, I have made a concerted effort to clarify any contextual differences and added some foundational knowledge to help beginners grasp the material more effectively.

I am eager to contribute my translated version to the project and wonder if it would be possible to do so by including a link to my forked version in the official GitHub repository's readme or through another method you deem appropriate. My forked version is located at Intelligence-Manifesto/LLMs-from-scratch, which contains the translation work completed so far.

With this letter, I wish to express not only my admiration and thanks for this invaluable book but also seek your guidance and assistance on how I might integrate my work into this admirable open-source project in a suitable manner. How might I contribute my translation so that more Chinese readers can benefit?

Thank you again for your outstanding work and contributions to the open-source community. I look forward to your response.

Sincerely,
Intelligence-Manifesto

Error in the code in Listing A.13 (DDP-script.py)

Hi @rasbt,

I tried to run your DDP script and found that there is an error while executing this script "as-is":

PyTorch version: 2.2.1+cu121
CUDA available: True
Number of GPUs available: 2
Traceback (most recent call last):
  File "/home/user/app/DDP-script.py", line 178, in <module>
    mp.spawn(main, args=(world_size, num_epochs), nprocs=world_size)
  File "/home/user/miniconda/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 241, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
  File "/home/user/miniconda/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
    while not context.join():
  File "/home/user/miniconda/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 158, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/home/user/miniconda/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 68, in _wrap
    fn(i, *args)
  File "/home/user/app/DDP-script.py", line 128, in main
    features, labels = features.to(rank), labels.to(rank) # New: use rank
AttributeError: 'int' object has no attribute 'to'

The reason is the following incorrect line:

for features, labels in enumerate(train_loader):

which should be like that:

for idx, (features, labels) in enumerate(train_loader):

or like that (because idx was not used):

for features, labels in train_loader:

Thank you.

Inconsistencies between the code in the book and the notebooks (2.6 Data sampling with a sliding window)

Hi @rasbt,

I noticed that in the book you provide the following code with function name create_dataloader and the argument stride = max_length + 1 to avoid overlap in data even for targets:

dataloader = create_dataloader(raw_text, batch_size=8, max_length=4,
stride=5)
data_iter = iter(dataloader)
inputs, targets = next(data_iter)
print("Inputs:\n", inputs)
print("\nTargets:\n", targets)

But in the cell of the jupyter notebook with main code (cell [43]) and jupyter notebook with only dataloader (cell [2]) you use function with name create_dataloader_v1 and argument stride = max_length.

Could you please tell do I understand correctly that we need to use stride = max_length + 1 to avoid overfitting? Does the overlap in target (when stride = max_length) seriously increase the risk of overfitting?

Thank you.

Output of the cell without variable specified (Embedding Layers and Linear Layers)

Hi @rasbt,

There is a cell [28] in this notebook where there is an output but no variable to output is specified (probably it was linear.weight which was deleted after cell execution):

torch.manual_seed(123)
linear = torch.nn.Linear(num_idx, out_dim, bias=False)
---
Parameter containing:
tensor([[-0.2039,  0.0166, -0.2483,  0.1886],
        [-0.4260,  0.3665, -0.3634, -0.3975],
        [-0.3159,  0.2264, -0.1847,  0.1871],
        [-0.4244, -0.3034, -0.1836, -0.0983],
        [-0.3814,  0.3274, -0.1179,  0.1605]], requires_grad=True)

Thank you.

Several package requirements from bonus material are not specified in requirements.txt (Tokenizers comparison)

Hi @rasbt,

I don't know if packages from the notebooks with bonus materials like this notebook with tokenizers comparison are intended to be included in requirements.txt, but there are 2 missing libraries:

  • tqdm (which is required by import from bpe_openai_gpt2 import get_encoder, download_vocab)
  • transformers

To simplify the work with the control of the libraries used for this project I use poetry which is great to track all explicit and implicit dependencies, so if you want I can send you my configuration for it.

Thank you.

Inconsistencies in output for dropout section (3.5.2 Masking additional attention weights with dropout)

Hi @rasbt,

I am trying to explore and reproduce Chapter 3 and found that I can't reproduce results that you specified in the notebook and the book, even if I download notebook and run without any changes.
The difference appears only starting with the following 2 cells (I haven't checked the next cells yet):

Cell [31]

torch.manual_seed(123)
dropout = torch.nn.Dropout(0.5) # dropout rate of 50%
example = torch.ones(6, 6) # create a matrix of ones

print(dropout(example))

Your output

tensor([[2., 2., 0., 2., 2., 0.],
        [0., 0., 0., 2., 0., 2.],
        [2., 2., 2., 2., 0., 2.],
        [0., 2., 2., 0., 0., 2.],
        [0., 2., 0., 2., 0., 2.],
        [0., 2., 2., 2., 2., 0.]])

My output

tensor([[2., 2., 2., 2., 2., 2.],
        [0., 2., 0., 0., 0., 0.],
        [0., 0., 2., 0., 2., 0.],
        [2., 2., 0., 0., 0., 2.],
        [2., 0., 0., 0., 0., 2.],
        [0., 2., 0., 0., 0., 0.]])

Cell [32]

torch.manual_seed(123)
print(dropout(attn_weights))

Your output

tensor([[2.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.7599, 0.6194, 0.6206, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.4921, 0.4925, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.3966, 0.0000, 0.3775, 0.0000, 0.0000],
        [0.0000, 0.3327, 0.3331, 0.3084, 0.3331, 0.0000]],
       grad_fn=<MulBackward0>)

My output

tensor([[2.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.8966, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.6206, 0.0000, 0.0000, 0.0000],
        [0.5517, 0.4921, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.4350, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.3327, 0.0000, 0.0000, 0.0000, 0.0000]],
       grad_fn=<MulBackward0>)

Thank you.

Question about number of tokens in ChatGPT (2.5 Byte pair encoding)

Hi @rasbt,

Could you please clarify this sentence:

In fact, the BPE tokenizer that was used to train models such as GPT-2, GPT-3,
and ChatGPT has a total vocabulary size of 50,257, with <|endoftext|> being assigned
the largest token ID.

Which model do you mean by 'ChatGPT'?
I saw different definitions of this term and based on this definitions there are different vocabulary sizes:

Thank you.

book feedback

hi @rasbt : fantastic work - and code which is clean and readable.

One small feedback / issue, I noticed with the "early access book" is that in chapter 3 , the manual seed of 789 is missing - which is what brought my here :)

Chapter 5 - Context Size and the DataLoaders

First off, great book!

Second, I noticed a small issue in Section 5.1.1 that stumped me for a bit.

"ctx_len": 256, # Shortened context length (orig: 1024)

If this is set to 1024, the val_loader will fail to load with the train_ratio of 0.90. Adjusting to 0.80 will load the data but the shape is mismatched.

Restoring the ctx_len to 256 fixes the issue.

I'm curious as to why this is occurring?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.