most_recent_S_attn_pat.ipynb
https://colab.research.google.com/drive/1KaqcS92-BI4FZ7m-r8rCW9tIovxA_s93#scrollTo=VcFgqbcF4YvI
Most Recent S Name Movers
Based on IOI findings, we expect to find:
- Induction heads (b/c of in-context learning)
- Name mover heads
- Find evidence for this using Copy Scores
- Subject influencing heads (here, seems to be most recent subject)
Can be done w/ just GPT-2-small
<<<<<<
Direct Logit Attribution [IOI paper]
- Was logit diff commented on? If so, how?
- Only states average logit difference X over Y examples. The rest of its info was only used in activation patching comparisons (no figures)
- How was activation head patching described?
- Include the heatmap Figure
Direct Logit Attribution [IOI paper]
- average logit difference X over Y examples
<<<