Modify copy circuits code

https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens

What about the 1600-dim vectors produced in the middle of the network, say the output of the 12th layer or the 33rd? If we convert them to vocab space, do the results make sense? The answer is yes.

Convert to vocab space: logits = model_small.unembed(model_small.ln_final(o))

most_recent_S_name_movers_DRAFT.ipynb

z_0.shape = [1, 6, 1280]

Overall, z_0 has a shape of [batch_size, seq_length, embedding_size],

The second dimension (6) represents the seq_length

logits:[batch_size, sequence_length, vocab_size],

logits[seq_idx, ioi_dataset.word_idx[word][seq_idx]]:

ioi_dataset.word_idx[word][seq_idx] retrieves the indices for the specific word in the current sequence using the word_idx attribute of ioi_dataset.

Eg) Subject S is “mary”, so finds the index of mary. This gives the logit of mary for the prompt sequence

In a transformer, why does logits have size [batch_size, seq_length, vocab_size]? Why for every batch, for every token in an input is there a logit for every token?
Why for each position? Doesn't the model just need to predict the next token? How does it uses these logits for each token?
Modify the following to check if "short" appears in top logits instead of the subject or IO:

Figure out the dims for z in copy_scores to see how skipping QK matrix “fully attends to the name tokens”

Specifically, we first obtained the state of the residual stream at the position of each name token after the first MLP layer. Then, we multiplied this by the OV matrix of a Name Mover Head (simulating what would happen if the head attended perfectly to that token),

This is z

“position of each name token” : model(ioi_dataset.toks.long())