Giter Club home page Giter Club logo

transformerlens's Introduction

TransformerLens

Pypi Pypi Total Downloads PyPI - License Release CD Tests CD Docs CD

A Library for Mechanistic Interpretability of Generative Language Models. Maintained by Bryce Meyer and created by Neel Nanda

Read the Docs Here

This is a library for doing mechanistic interpretability of GPT-2 Style language models. The goal of mechanistic interpretability is to take a trained model and reverse engineer the algorithms the model learned during training from its weights.

TransformerLens lets you load in 50+ different open source language models, and exposes the internal activations of the model to you. You can cache any internal activation in the model, and add in functions to edit, remove or replace these activations as the model runs.

Quick Start

Install

pip install transformer_lens

Use

import transformer_lens

# Load a model (eg GPT-2 Small)
model = transformer_lens.HookedTransformer.from_pretrained("gpt2-small")

# Run the model and get logits and activations
logits, activations = model.run_with_cache("Hello World")

Key Tutorials

Gallery

Research done involving TransformerLens:

User contributed examples of the library being used in action:

Check out our demos folder for more examples of TransformerLens in practice

Getting Started in Mechanistic Interpretability

Mechanistic interpretability is a very young and small field, and there are a lot of open problems. This means there's both a lot of low-hanging fruit, and that the bar for entry is low - if you would like to help, please try working on one! The standard answer to "why has no one done this yet" is just that there aren't enough people! Key resources:

Support & Community

Contributing Guide

If you have issues, questions, feature requests or bug reports, please search the issues to check if it's already been answered, and if not please raise an issue!

You're also welcome to join the open source mech interp community on Slack. Please use issues for concrete discussions about the package, and Slack for higher bandwidth discussions about eg supporting important new use cases, or if you want to make substantial contributions to the library and want a maintainer's opinion. We'd also love for you to come and share your projects on the Slack!

❗ HookedSAETransformer Removed

Hooked SAE has been removed from TransformerLens in version 2.0. The functionality is being moved to SAELens. For more information on this release, please see the accompanying announcement for details on what's new, and the future of TransformerLens.

Credits

This library was created by Neel Nanda and is maintained by Bryce Meyer.

The core features of TransformerLens were heavily inspired by the interface to Anthropic's excellent Garcon tool. Credit to Nelson Elhage and Chris Olah for building Garcon and showing the value of good infrastructure for enabling exploratory research!

Creator's Note (Neel Nanda)

I (Neel Nanda) used to work for the Anthropic interpretability team, and I wrote this library because after I left and tried doing independent research, I got extremely frustrated by the state of open source tooling. There's a lot of excellent infrastructure like HuggingFace and DeepSpeed to use or train models, but very little to dig into their internals and reverse engineer how they work. This library tries to solve that, and to make it easy to get into the field even if you don't work at an industry org with real infrastructure! One of the great things about mechanistic interpretability is that you don't need large models or tons of compute. There are lots of important open problems that can be solved with a small model in a Colab notebook!

Citation

Please cite this library as:

@misc{nanda2022transformerlens,
    title = {TransformerLens},
    author = {Neel Nanda and Joseph Bloom},
    year = {2022},
    howpublished = {\url{https://github.com/TransformerLensOrg/TransformerLens}},
}

transformerlens's People

Contributors

adamkarvonen avatar adamyedidia avatar afspies avatar alan-cooney avatar anthonyduong9 avatar arthurconmy avatar avariengien avatar bryce13950 avatar butanium avatar callummcdougall avatar ckkissane avatar cmathw avatar collingray avatar dkamm avatar felhof avatar glerzing avatar jaybaileycs avatar jbloomaus avatar joelburget avatar neelnanda-io avatar richardkronick avatar rusheb avatar seuperhakkerja avatar slavachalnev avatar smithjessk avatar soheeyang avatar tkukurin avatar ufo-101 avatar vasilgeorgiev39 avatar zshn-gvg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

transformerlens's Issues

Create a demo of using pre-trained checkpoints for an interesting task, eg replicating the induction heads paper

There's an MVP demo of using checkpoints in the main demo, but it'd be nice to have something more substantial.

A good option is replicating some of the graphs in [the induction heads paper] for my interpretability-friendly models (or Pythia or Stanford CRFM). Prefix matching score and QK eigenvalue trace are easiest and should be fast, . Loss curves, in-context learning loss, loss by position and PCA of per-token loss should all be doable with approx the same code, the hard part is just going to be running a range of model checkpoints

You may want to add a PR to disable HuggingFace caching, and also to only load analyse eg every 10th checkpoint, to avoid blowing out your computer memory - by default, HuggingFace caches model weights to the hard drive, and this is pretty expensive if eg using 600 checkpoints of GPT-2 Medium!

Some OPT Tokenizer Confusion

Hi,

When trying to use utils.test_prompt with an OPT model I am running into issues surrounding the prepending of BOS tokens. In particular, I think the call to PreTrainedTokenizerBase sets add_special_tokens to true, regardless of the prepend_base flag in model.to_str_tokens, which leads to at least one </s> being preprended (and if prepend_base=True, then two are prepended).

For example:

prompt: "Two young, White males are outside near many bushes"
# Split into prompt = "Two young .... many", answer = "bushes"
Tokenized prompt: ['</s>', '</s>', 'Two', ' young', ',', ' White', ' males', ' are', ' outside', ' near', ' many']
Tokenized answer: ['</s>', ' bushes', '.', '\n']

I am not sure if this is the desired behaviour for OPT?

Even if it is, I think the utils.test_prompt needs to be adjusted to expect the </s> at the beginning of the tokenized answer, in this case.

Happy to make these changes once you confirm what the expected behaviour is.

Thanks!

Add a demo of direct path patching

Direct path patching is like activation patching, but rather than patching in the output of component A, it acts on pairs of components A and B (in a layer after A). And we only patch in the output of A into the input of B, and all other components see the old output of A. I want to add a section to Exploratory Analysis Demo demonstrating this for all pairs of heads.

Eg to do direct path patching on the query of head B, we'd add a hook saying patched_B_query = original_B_query + (clean_A_output - corrupted_A_output) @ W_Q / layer_norm_scale

For reference, an old PR to add it an early version of the library #49

No attention QK/OV attributes

The QK and OV matrices can be accessed from model.QK and model.OV attributes, but not from the individual attention layers. Would be convenient to be able to do that, and not require computing all QK/OV matrices for all layers to access just one.

Convert all TorchTyping to JaxTyping

This project uses Patrick Kidger's torch typing to give tensor types including shapes. He recommends using JAXTyping for newer projects (not JAX specific, better maintained, more compatible with type checkers, etc) And it'd be good to update TL to use it.

@dkamm might be up your alley? Sorry if this invalidates your enum work, but might be a more elegant solution!

Add my interpretability-friendly models to HuggingFace

Add my interpretability-friendly models to HuggingFace (documented here: https://docs.google.com/document/d/1WONBzNqfKIxERejrrPlQMyKqg7jSFW92x5UMXNrMdPo/edit#heading=h.chq47zvs9cii )

This probably looks like adding the HookedTransformer + HookedTransformerConfig class to HuggingFace AutoModel, I'm not super sure how it works. Ideally this would be able to slot into code using HuggingFace models to eg evaluate them or generate text.

(Not actually a TransformerLens issue per se, but useful!)

Add Documentation + Tests for utils.Slice

I really need to document this one better lol - basically a wrapper around Python slice, but None -> slice(None) (equivalent to [:], doesn't add a dummy axis), and n -> [n] (for an integer n), which reduces the number of axes, which Python slices don't support.

Add a permanent hooks feature to HookPoint, that isn't deleted when you run `model.reset_hooks()`

Currently, running run_with_hooks or run_with_cache ends by running model.reset_hooks(), which deletes all hooks. This means that if I, eg, want to create a model without positional embeddings, I can't add a hook that just sets pos_embed to zero, without breaking run_with_hooks or run_with_cache. The underlying problem is that PyTorch hooks are global state, but I want run_with_* to present them as local state to the user.

I'd like to add an add_perma_hook method to a HookPoint, which tracks hooks separately from add_hook, so that reset_hooks only deletes the normal hooks and not the perma-hooks (essentially creating a separate class of global state hooks and local state hooks)

Two issues with tests

As discussed here #161 , currently the testing

i) does not provide signal on which tests are failing
ii) seems to be incorrect? They pass locally for me

I'm a bit busy to do the good honest engineering required to improve this ATM :(

Add a `from_pretrained_no_processing` method to `HookedTransformer`

Currently from_pretrained has a bunch of Boolean flags for various simplifications to the transformer weights, and many default to True. I want these to be on by default, but it makes life a pain if you want to load a large model, or study a model exactly as the makers intended (since you need to set 5 ish Boolean flags to False, and it's not robust to new flags being added). I'd like there to be a from_pretrained_no_processing method with the same API as from_pretrained, which acts as a wrapper but sets all Boolean flags to False.

Add tests + better docs for tokenization methods

Add tests that the tokenization methods work (to_tokens, to_string, to_str_tokens, get_token_position)

Go through the documentation and clarify things that are unclear (this is hard for me to do, so even just having someone new to the library flag confusions is helpful!) The behaviour of prepend_bos is the main confusion. Docs can be copied from https://colab.research.google.com/github/neelnanda-io/TransformerLens/blob/v2/Main_Demo.ipynb#scrollTo=GUSyRfQuKmHU

Add helper function for activation patching

Add a helper function to implement activation patching (probably in utils.py). Crib from the code here: https://colab.research.google.com/github/neelnanda-io/Easy-Transformer/blob/main/Exploratory_Analysis_Demo.ipynb#scrollTo=JAyQI8Mlt3W7

I'd guess a good API is to input the clean cache, and corrupted prompt/tokens, specify which activation to patch and whether to iterate over positions, layers, head index, etc.

Redwood's IOI codebase is a good example of how you might implement this: https://github.com/redwoodresearch/Easy-Transformer/blob/main/easy_transformer/experiments.py

Commit error: No .pre-commit-config.yaml file was found

I was running into this issue where I couldn't commit anything. Turns out that because we previously had a pre-commit rule about using nbdev then removed it, because I'd previously installed pre-commit things broke. I fixed this by running rm -rf .git/hooks in the project root (the project doesn't have any other git hooks needed)

Cannnot install dependencies via poetry (y-py-0.5.5 is yanked)

I'm trying to install the project via poetry install for python 3.9.

I get this error:

  PoetryException

  Failed to install /Users/rusheb/Library/Caches/pypoetry/artifacts/be/5d/9c/38ed00c38e66f11b3f1295c0b4fa2565c954b8e0c8d63deac26e996efa/y_py-0.5.5.tar.gz

  at /opt/homebrew/lib/python3.10/site-packages/poetry/utils/pip.py:58 in pip_install
       54│
       55│     try:
       56│         return environment.run_pip(*args)
       57│     except EnvCommandError as e:
    →  58│         raise PoetryException(f"Failed to install {path.as_posix()}") from e
       59│

  • Installing yarl (1.8.2)
Warning: The file chosen for install of y-py 0.5.5 (y_py-0.5.5.tar.gz) is yanked. Reason for being yanked: Inconsistent wheels

Output of poetry env info:

❯ poetry env info

Virtualenv
Python:         3.9.16
Implementation: CPython
Path:           /Users/rusheb/code/TransformerLens/.venv
Executable:     /Users/rusheb/code/TransformerLens/.venv/bin/python
Valid:          True

Also, I saw this warning at the top of the poetry install output:

Warning: poetry.lock is not consistent with pyproject.toml. You may be getting improper dependencies. Run `poetry lock [--no-update]` to fix it.

I think running poetry lock should fix it. I'll try to raise a fix now.

Make activation functions modules?

I was just merging this in, and this:
Screenshot from 2022-09-16 14-22-26
makes me a bit sad, as I have to think about 4 different activation functions in the otherwise really simple and clean file. Can all the activation functions be made modules instead? Happy to work on this if there's agreement

Originally posted by @ArthurConmy in #8 (comment)

Add documentation for utils.get_act_name

I've received complaints that it needs better documentation! I would want to add clear info re the recommended name for each activation. The code is messy because it's designed to be robust to different names for things, to work for different sub-layers and for layers that aren't part of a block, etc.

Some Confusion on Unembedding

Hey,

If one unembeds from intermediate layers of the residual stream, the lack of normalization (coming from layer_norm_pre) leads to different (intuitively, less likely to be sensible) results.

I don't know if there is a warning about this / whether we might want to add a include_normalization flag to model.unembed(...)

Thanks!

Alex

Better docs for model properties

Make this table better and cover key info for model architecture - whether it uses parallel attn & MLPs, and what positional embedding it is.

Add text at the bottom documenting the models more qualitatively, can basically copy this glossary: https://docs.google.com/document/d/1WONBzNqfKIxERejrrPlQMyKqg7jSFW92x5UMXNrMdPo/edit#heading=h.chq47zvs9cii

I'd want to add a separate table with training info: include training dataset, number of tokens, whether they were trained with dropout, whether they have checkpoints, whether trained with weight decay.

Torchtyping function help strings are extremely verbose

torchtyping used to give a function type signature is extremely verbose when printing the type, to somewhat ridiculous extents. Eg, here is model.run_with_cache?:

Signature:
model.run_with_cache(
    *model_args,
    return_cache_object=True,
    remove_batch_dim=False,
    **kwargs,
) -> Tuple[Union[NoneType, typing_extensions.Annotated[torch.Tensor, {'__torchtyping__': True, 'details': ('batch', 'pos', 'd_vocab',), 'cls_name': 'TensorType'}], typing_extensions.Annotated[torch.Tensor, {'__torchtyping__': True, 'details': ((),), 'cls_name': 'TensorType'}], typing_extensions.Annotated[torch.Tensor, {'__torchtyping__': True, 'details': ('batch', 'position - 1',), 'cls_name': 'TensorType'}], Tuple[typing_extensions.Annotated[torch.Tensor, {'__torchtyping__': True, 'details': ('batch', 'pos', 'd_vocab',), 'cls_name': 'TensorType'}], Union[typing_extensions.Annotated[torch.Tensor, {'__torchtyping__': True, 'details': ((),), 'cls_name': 'TensorType'}], typing_extensions.Annotated[torch.Tensor, {'__torchtyping__': True, 'details': ('batch', 'position - 1',), 'cls_name': 'TensorType'}]]]], Union[transformer_lens.ActivationCache.ActivationCache, Dict[str, torch.Tensor]]]
Docstring: Wrapper around run_with_cache in HookedRootModule. If return_cache_object is True, this will return an ActivationCache object, with a bunch of useful HookedTransformer specific methods, otherwise it will return a dictionary of activations as in HookedRootModule.

In-place operations on `hook_pos_embed` are dangerous

When torch.set_grad_enabled(False), you can overwrite the model's positional embeddings:

#%%
from transformer_lens.HookedTransformer import HookedTransformer
import torch
torch.set_grad_enabled(False)

#%%
model = HookedTransformer.from_pretrained("gpt2")

#%%
assert not torch.allclose(
    torch.norm(model.W_pos[0]), torch.tensor(0.0),
)

# %%
def sketchy_remove_pos_embed(z, hook):
    z[:] = 0.0
    return z

_ = model.run_with_hooks(
    "Hello, world",
    fwd_hooks = [("hook_pos_embed", sketchy_remove_pos_embed)],
)

model.reset_hooks()

# %%
assert torch.allclose(
    torch.norm(model.W_pos[0]), torch.tensor(0.0),
)

Cache common model weights (eg post LN folding) to shorten model loading times.

Not sure exactly what the best way to cache is, I'd try to copy how HuggingFace transformers does it. The easiest way might be to just create a separate model on HuggingFace with the post processing weights and to pull the weights from that

Probably best to cache a version with all the default flags to from_pretrained and just use that by default, otherwise use the existing loading code.

I'd do this for small-ish and important models, eg all 4 GPT-2s and my interpretability friendly models.

Add automatic notebook generation

We should be able to use the python script here: https://github.com/nojvek/vscode-ipynb-py-converter to

i) write notebooks as .py files with #%%
ii) have these automatically converted to .ipynb files on push

This means that a) we can easily test notebooks (since they are .py files, see discussion here) and b) have automatically updated URLs like https://colab.research.google.com/github/neelnanda-io/Easy-Transformer/blob/main/EasyTransformer_Demo.ipynb to display demos.

@alan-cooney this may be of interest since this would work well as another "on push" action like those in #68

Main_Demo.ipynb: plotly rendering / DEVLEOPMENT_MODE vs IN_COLAB

In Main_Demo.ipynb, plotly's imshow does not correctly render in jupyter lab (for me) when running locally in jupyter lab.
An example is this line in cell 20

imshow(ioi_patching_result, x=token_labels, xaxis="Position", yaxis="Layer", title="Normalized Logit Difference After Patching Residual Stream on the IOI Task")

For me adding these two lines in the Setup section fixes it
image

But this also requires that DEVELOPMENT_MODE be True (which does not happen on its own).
Before contributing the fix I therefore wanted to check what's the intention behind DEVELOPMENT_MODE vs IN_COLAB?

Currently, these are not always inverse: running locally without changing anything gives DEVELOPMENT_MODE=False and IN_COLAB=False. Should we just have IN_COLAB and get rid of DEVELOPMENT_MODE?

Issues setting up dev environment

I've been having issues setting up my dev environment.

OS: MacOS Monterey
Model Name: MacBook Air
Model Identifier: Mac14,2
Chip: Apple M2

What I was doing

  • I cloned the repo
  • ran poetry config virtualenvs.in-project true and poetry install --with dev
  • after the last command, I got
    image

What I tried to fix it

  1. Upgrading pytorch
  • Change pytorch version to 1.13.1
  • This produced a new error, stack trace
  1. Installing in a docker container
FROM ubuntu:latest

RUN apt-get -yqq update
RUN apt-get -yqq install git

RUN apt-get -yqq install python3    
RUN apt-get -yqq install python3-pip
RUN apt-get -yqq update

RUN git clone https://github.com/neelnanda-io/TransformerLens.git

WORKDIR /TransformerLens

RUN pip3 install poetry
RUN poetry config virtualenvs.in-project true
# RUN poetry install --with dev

This produced the same nvidia-cudnn-cu11 errors as before, which did not change even after I tried bumping the pytorch version.

Add support for model parallelism

Add support for having a model with layers split across several GPUs.

Make sure the layers (and its HookPoints) know what device they're on, so that hooks can ensure that they aren't needlessly moving information between GPUs. ActivationCache is a dictionary and should work by default.

The MVP would be doing 2 GPUs: putting the embed + first half of layers on GPU 1 and the second half + unembed on GPU 2. This is probably the most that's needed to support eg NeoX?

I'm not sure of the most elegant way of doing this, or how to do this without making the code really messy. I lean towards either adding a method which edits the model to move layers between devices, or making a separate ParallelHookedTransformer class

Add evals for algorithmic-ish tasks like Indirect Object Identification

Add to evals.py support for checking how good a model is at tasks like Indirect Object Identification. Notably, where the eval involves generating a synthetic dataset of names, as they do in the paper (can copy from their codebase), running the model on them, and returning the average logit diff and accuracy.

Bonus:

  • Add support for prompts of different token length
  • Support multi-token answers.
  • Write this in a generic way, where any generated dataset can be subbed in
  • Give an option to pair up prompts, so "John and Mary went to the store, John gave the bag to" is followed by "John and Mary went to the store, Mary gave the bag to", to avoid biases where the model just favours common names

`prepend_bos` inconsistency

Yesterday @epurdy and I were working on an implementation of causal tracing from ROME. We ended up getting stuck for a while because when we ran the model it unexpectedly prepended a BOS token (in contrast to running the tokenizer by itself or with model.to_str_tokens, which doesn't). Looking through the codebase I noticed that of the six functions with a prepend_bos option, they're evenly split on the default. I think these defaults make sense in isolation but may be confusing when used together. Not sure what the right fix is here (or if it needs to be changed at all) but thought I'd at least raise this for discussion.

https://github.com/neelnanda-io/Easy-Transformer/blob/e5790469df3ebe26547c2016f899389e9f7bec30/easy_transformer/EasyTransformer.py#L146

https://github.com/neelnanda-io/Easy-Transformer/blob/e5790469df3ebe26547c2016f899389e9f7bec30/easy_transformer/EasyTransformer.py#L299

https://github.com/neelnanda-io/Easy-Transformer/blob/e5790469df3ebe26547c2016f899389e9f7bec30/easy_transformer/EasyTransformer.py#L345

https://github.com/neelnanda-io/Easy-Transformer/blob/e5790469df3ebe26547c2016f899389e9f7bec30/easy_transformer/EasyTransformer.py#L395

https://github.com/neelnanda-io/Easy-Transformer/blob/e5790469df3ebe26547c2016f899389e9f7bec30/easy_transformer/EasyTransformer.py#L935

https://github.com/neelnanda-io/Easy-Transformer/blob/e5790469df3ebe26547c2016f899389e9f7bec30/easy_transformer/utils.py#L440

Add wrapper integrating HookedTransformer with Google's Learning Interpretability Tool (LIT)

Google have a very cool-looking tool for (mostly non-MI) interpretability of language models, called LIT. It seems designed to be framework agnostic, and to be able to take a wrapper around many kinds of models, with functions to enable various LIT functions. I want to add a wrapper to HookedTransformer such that it can integrate with LIT, ideally for as many LIT functions as possible.

The MVP in mind here would just be a Colab which gets LIT to work with TransformerLens, and maybe showing some things you can do with it. I'm not sure whether this kind of integration should actually be merged into the library, but I'd love for a small demo to exist!

KeyError 'gpt_neox' when loading pythia-125m-deduped

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In [8], line 2
      1 model_name = "pythia-125m-deduped"
----> 2 model = EasyTransformer.from_pretrained(model_name)

File ~/miniconda3/envs/fastai/lib/python3.10/site-packages/easy_transformer/EasyTransformer.py:505, in EasyTransformer.from_pretrained(cls, model_name, fold_ln, center_writing_weights, center_unembed, factored_to_even, checkpoint_index, checkpoint_value, hf_model, device, move_state_dict_to_device, **model_kwargs)
    500 official_model_name = loading.get_official_model_name(model_name)
    502 # Load the config into an EasyTransformerConfig object If loading from a
    503 # checkpoint, the config object will contain the information about the
    504 # checkpoint
--> 505 cfg = loading.get_pretrained_model_config(
    506     official_model_name,
    507     checkpoint_index=checkpoint_index,
    508     checkpoint_value=checkpoint_value,
    509     fold_ln=fold_ln,
    510     device=device,
    511 )
    513 # Get the state dict of the model (ie a mapping of parameter names to tensors), processed to match the EasyTransformer parameter names.
    514 state_dict = loading.get_pretrained_state_dict(
    515     official_model_name, cfg, hf_model
    516 )

File ~/miniconda3/envs/fastai/lib/python3.10/site-packages/easy_transformer/loading_from_pretrained.py:439, in get_pretrained_model_config(model_name, checkpoint_index, checkpoint_value, fold_ln, device)
    437     cfg_dict = convert_neel_model_config(official_model_name)
    438 else:
--> 439     cfg_dict = convert_hf_model_config(official_model_name)
    440 # Processing common to both model types
    441 # Remove any prefix, saying the organization who made a model.
    442 cfg_dict["model_name"] = official_model_name.split("/")[-1]

File ~/miniconda3/envs/fastai/lib/python3.10/site-packages/easy_transformer/loading_from_pretrained.py:271, in convert_hf_model_config(official_model_name)
    269 official_model_name = get_official_model_name(official_model_name)
    270 # Load HuggingFace model config
--> 271 hf_config = AutoConfig.from_pretrained(official_model_name)
    272 architecture = hf_config.architectures[0]
    273 if architecture == "GPTNeoForCausalLM":

File ~/miniconda3/envs/fastai/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py:700, in AutoConfig.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    698     return config_class.from_pretrained(pretrained_model_name_or_path, **kwargs)
    699 elif "model_type" in config_dict:
--> 700     config_class = CONFIG_MAPPING[config_dict["model_type"]]
    701     return config_class.from_dict(config_dict, **kwargs)
    702 else:
    703     # Fallback: use pattern matching on the string.

File ~/miniconda3/envs/fastai/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py:409, in _LazyConfigMapping.__getitem__(self, key)
    407     return self._extra_content[key]
    408 if key not in self._mapping:
--> 409     raise KeyError(key)
    410 value = self._mapping[key]
    411 module_name = model_type_to_module_name(key)

KeyError: 'gpt_neox'

Add tests + better docs to ActivationCache

Add tests that the methods in the ActivationCache class work correctly.

Go through the documentation and clarify things that are unclear (this is hard for me to do, so even just having someone new to the library flag confusions is helpful!)

Add tests + better docs for FactoredMatrix

Add tests that the FactoredMatrix class works (essentially that each of its methods correctly mimics the result for the actual matrix product).

Go through the documentation and clarify things that are unclear (this is hard for me to do, so even just having someone new to the library flag confusions is helpful!)

Add mixed precision inference incl loading

Add the option to load models in bfloat16 and float16. Esp important for large models like GPT-J and GPT-NeoX.

Ideally, load from HuggingFace in this low precision, do weight processing on the CPU, and then move the processed model weights to the GPU. Might be easiest to do the weight processing once and caching to HF (see #103 )

Add a helper function to display vectors of logits nicely

Often you want to look at vectors over the vocabulary (eg the logits at a specific position). This is >50,000 dimensions and this is hard to interpret! I want there to be nice utils to visualize a vector like this.

An MVP would be a function mapping this to a pandas dataframe, with the token index, token string value, logit, log prob and probability. Either for just the top K, or for the entire vocab.

But I expect there's many ways to make something nice here! One option is to imitate nostalgebraist's graphing style for plot_logit_lens in `transformer_utils link. This takes a layer x position x d_vocab tensor, and visualises it as a layer x position heatmap, printing the string value of the top token in each cell, and colouring by the top token value.

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.