Giter Club home page Giter Club logo

awesome-llm-large-language-models-notes's Introduction

Awesome-LLM-Large-Language-Models-Notes


Known LLM models classified by year

Small introduction, paper, code etc.

Year Name Paper Info Implementation
2017 Transformer Attention is All you Need The focus of the original research was on translation tasks. TensorFlow + article
2018 GPT Improving Language Understanding by Generative Pre-Training The first pretrained Transformer model, used for fine-tuning on various NLP tasks and obtained state-of-the-art results
2018 BERT BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Another large pretrained model, this one designed to produce better summaries of sentences PyTorch
2019 GPT-2 Language Models are Unsupervised Multitask Learners An improved (and bigger) version of GPT that was not immediately publicly released due to ethical concerns
2019 DistilBERT - Distilled BERT DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter A distilled version of BERT that is 60% faster, 40% lighter in memory, and still retains 97% of BERT’s performance
2019 BART BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension Large pretrained models using the same architecture as the original Transformer model.
2019 T5 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Large pretrained models using the same architecture as the original Transformer model.
2019 ALBERT ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
2019 RoBERTa - A Robustly Optimized BERT Pretraining Approach RoBERTa: A Robustly Optimized BERT Pretraining Approach
2019 CTRL CTRL: A Conditional Transformer Language Model for Controllable Generation
2019 Transformer XL Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Adopts a recurrence methodology over past state coupled with relative positional encoding enabling longer term dependencies
2020 GPT-3 Language Models are Few-Shot Learners An even bigger version of GPT-2 that is able to perform well on a variety of tasks without the need for fine-tuning (called zero-shot learning)
2020 ELECTRA ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS
2020 mBART Multilingual Denoising Pre-training for Neural Machine Translation
2021 Gopher Scaling Language Models: Methods, Analysis & Insights from Training Gopher
2022 chatGPT/InstructGPT Training language models to follow instructions with human feedback This trained language model is much better at following user intentions than GPT-3. The model is optimised (fine tuned) using Reinforcement Learning with Human Feedback (RLHF) to achieve conversational dialogue. The model was trained using a variety of data which were written by people to achieve responses that sounded human-like. :-:
2022 Chinchilla Training Compute-Optimal Large Language Models Uses the same compute budget as Gopher but with 70B parameters and 4x more more data. :-:
2022 LaMDA - Language Models for Dialog Applications LaMDA It is a family of Transformer-based neural language models specialized for dialog
2023 GPT-4 GPT-4 Technical Report The model now accepts multimodal inputs: images and text :-:
2023 BloombergGPT BloombergGPT: A Large Language Model for Finance LLM specialised in financial domain trained on Bloomberg's extensive data sources

Known LLM models classified by size

Name Size (# Parameters) Training Tokens Training data
Gopher 280B 300B
GPT-3 175B
LaMDA 137B 168B 1.56T words of public dialog data and web text
Chinchilla 70B 1.4T
BloombergGPT 50B 363B+345B
Falcon40B 40B 1T 1,000B tokens of RefinedWeb
  • M=Million | B=billion | T=Trillion

Known LLM models classified by name


Classification by architecture

Architecture Models Tasks
Encoder-only, aka also called auto-encoding Transformer models ALBERT, BERT, DistilBERT, ELECTRA, RoBERTa Sentence classification, named entity recognition, extractive question answering
Decoder-only, aka auto-regressive (or causal) Transformer models CTRL, GPT, GPT-2, Transformer XL Text generation given a prompt
Encoder-Decoder, aka sequence-to-sequence Transformer models BART, T5, Marian, mBART Summarisation, translation, generative question answering

What's so special about HuggingFace?

  • HuggingFace, a popular NLP library, but it also offers an easy way to deploy models via their Inference API. When you build a model using the HuggingFace library, you can then train it and upload it to their Model Hub. Read more about this here.
  • List of notebook

Must-Read Papers on LLM


Blog articles


Know their limitations!


Start-up funding landscape


Available tutorials


A small note on the notebook rendering

  • Two notebooks are available:
    • One with coloured boxes and outside folder GitHub_MD_rendering
    • One in black-and-white under folder GitHub_MD_rendering

How to run the notebook in Google Colab

  • The easiest option would be for you to clone this repository.
  • Navigate to Google Colab and open the notebook directly from Colab.
  • You can then also write it back to GitHub provided permission to Colab is granted. The whole procedure is automated.

Implementations from scratch


awesome-llm-large-language-models-notes's People

Contributors

kyaiooiayk avatar publiusau avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.