Custom Dataset for Circuit Discovery
Outline:
Attention Patterns, Scatterplot and Copy Scores
<<<<<<<<<<<<<<<
Attention Patterns
most_recent_S_attn_pat.ipynb
https://colab.research.google.com/drive/1KaqcS92-BI4FZ7m-r8rCW9tIovxA_s93#scrollTo=M93Hy1XdqNKm&line=1&uniqifier=1
For the attention heads that matter, check their attention patterns to see where they are active on (dest token) and where they attend to (source token)
L8H11:
The last token “is” (dest) attends to the latest subject (src)
The second last token “child” attends to both 2nd latest and latest subject, but moreso on latest
But in a neg name mover head, the last token “is” (dest) ALSO attends to the latest subject (src)