SAELens
CallumMcDougallGDM commited on
Commit
e99900b
·
verified ·
1 Parent(s): 7b54a27

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -6,15 +6,15 @@ library_name: saelens
6
 
7
  # 1. Gemma Scope 2
8
 
9
- Gemma Scope 2 is a comprehensive, open suite of sparse autoencoders and transcoders for a range of model 27bs and versions in the Gemma 3 model family. We have SAEs on three different sites (as well as transcoders) for every layer of the pretrained and instruction-tuned models of parameter 27bs 270M, 1B, 4B, 12B and 27B. We also include several multi-layer SAE variants: partial residual stream crosscoders for every base Gemma 3 model, and cross-layer transcoders for the 270M and 1B models.
10
 
11
  Sparse Autoencoders are a "microscope" of sorts that can help us break down a model's internal activations into the underlying concepts, just as biologists use microscopes to study the individual cells of plants and animals.
12
 
13
- See our landing page for details on the whole suite.
14
 
15
  # 2. What Is In This Repo?
16
 
17
- This repo contains a specific set of SAEs and transcoders: the ones trained on Gemma V3 {27b_upper} {it_upper}. Every folder here contains a different suite of models. Each of the folders in this page are named for the type of model that was trained:
18
 
19
  - Single-layer models
20
  - `resid_post`, `attn_out` and `mlp_out` contain SAEs at 4 different layers (25%, 50%, 65% and 85% depth) and a variety of widths & L0 values, trained on the model's residual stream, attention output, and MLP output respectively.
@@ -24,7 +24,7 @@ This repo contains a specific set of SAEs and transcoders: the ones trained on G
24
  - `crosscoder` contains a set of weakly causal crosscoders which were trained on 4 concatenated layers of the residual stream (the same as those we trained our subsets on)
25
  - `clt` contains a set of cross-layer transcoders, which were trained to reconstruct the whole model's MLP outputs from the residual stream values just before each MLP layer.
26
 
27
- So for example, `google/gemma-scope-2-{27b}-{it}/resid_post` contains a range of SAEs trained on the residual stream of `gemma-v3-270m-pt` at 4 different layers.
28
 
29
  # 3. How can I use these SAEs straight away?
30
 
@@ -32,7 +32,7 @@ So for example, `google/gemma-scope-2-{27b}-{it}/resid_post` contains a range of
32
  from sae_lens import SAE # pip install sae-lens
33
 
34
  sae, cfg_dict, sparsity = SAE.from_pretrained(
35
- release = "gemma-scope-2-{27b}-{it}-resid_post",
36
  sae_id = "layer_12_width_16k_l0_small",
37
  )
38
  ```
@@ -44,10 +44,10 @@ Unless you're doing full circuit-style analysis, we recommend using SAEs / trans
44
  - **Width**: our SAEs have widths 16k, 64k, 256k, 1m. You can visit Neuronpedia to get a qualitative sense of what kinds of features you can find at different widths, but we generally recommend using 64k or 256k.
45
  - **L0**: our SAEs have target L0 values "small" (10-20), "medium" (30-60) or "large" (60-150)". You can also look at the `config.json` file saved with every SAE's parameters to check exactly what the L0 is (or just visit the Neuronpedia page!). We generally recommend using "medium" which is useful for most tasks, although this might vary depending on your exact use case. Again you can visit Neuronpedia to get a sense of what kind of features each model type finds.
46
 
47
- # 4. Point of Contact
48
 
49
  Point of contact: Callum McDougall
50
  Contact by email: [email protected]
51
 
52
- # 5. Citation
53
- Paper: (link to go here)
 
6
 
7
  # 1. Gemma Scope 2
8
 
9
+ Gemma Scope 2 is a comprehensive, open suite of sparse autoencoders and transcoders for a range of model sizes and versions in the Gemma 3 model family. We have SAEs on three different sites (as well as transcoders) for every layer of the pretrained and instruction-tuned models of parameter sizes 270M, 1B, 4B, 12B and 27B. We also include several multi-layer SAE variants: partial residual stream crosscoders for every base Gemma 3 model, and cross-layer transcoders for the 270M and 1B models.
10
 
11
  Sparse Autoencoders are a "microscope" of sorts that can help us break down a model's internal activations into the underlying concepts, just as biologists use microscopes to study the individual cells of plants and animals.
12
 
13
+ You can read more in our [blog post](https://deepmind.google/blog/gemma-scope-2-helping-the-ai-safety-community-deepen-understanding-of-complex-language-model-behavior), and also see our [landing page](https://huggingface.co/google/gemma-scope-2) for details on the whole suite.
14
 
15
  # 2. What Is In This Repo?
16
 
17
+ This repo contains a specific set of SAEs and transcoders: the ones trained on Gemma V3 27B IT. Every folder here contains a different suite of models. Each of the folders in this page are named for the type of model that was trained:
18
 
19
  - Single-layer models
20
  - `resid_post`, `attn_out` and `mlp_out` contain SAEs at 4 different layers (25%, 50%, 65% and 85% depth) and a variety of widths & L0 values, trained on the model's residual stream, attention output, and MLP output respectively.
 
24
  - `crosscoder` contains a set of weakly causal crosscoders which were trained on 4 concatenated layers of the residual stream (the same as those we trained our subsets on)
25
  - `clt` contains a set of cross-layer transcoders, which were trained to reconstruct the whole model's MLP outputs from the residual stream values just before each MLP layer.
26
 
27
+ So for example, `google/gemma-scope-2-27b-it/resid_post` contains a range of SAEs trained on the residual stream of `gemma-v3-270m-pt` at 4 different layers.
28
 
29
  # 3. How can I use these SAEs straight away?
30
 
 
32
  from sae_lens import SAE # pip install sae-lens
33
 
34
  sae, cfg_dict, sparsity = SAE.from_pretrained(
35
+ release = "gemma-scope-2-27b-it-resid_post",
36
  sae_id = "layer_12_width_16k_l0_small",
37
  )
38
  ```
 
44
  - **Width**: our SAEs have widths 16k, 64k, 256k, 1m. You can visit Neuronpedia to get a qualitative sense of what kinds of features you can find at different widths, but we generally recommend using 64k or 256k.
45
  - **L0**: our SAEs have target L0 values "small" (10-20), "medium" (30-60) or "large" (60-150)". You can also look at the `config.json` file saved with every SAE's parameters to check exactly what the L0 is (or just visit the Neuronpedia page!). We generally recommend using "medium" which is useful for most tasks, although this might vary depending on your exact use case. Again you can visit Neuronpedia to get a sense of what kind of features each model type finds.
46
 
47
+ # 5. Point of Contact
48
 
49
  Point of contact: Callum McDougall
50
  Contact by email: [email protected]
51
 
52
+ # 6. Citation
53
+ Paper link [here](https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/gemma-scope-2-helping-the-ai-safety-community-deepen-understanding-of-complex-language-model-behavior/Gemma_Scope_2_Technical_Paper.pdf)