PLAN- Size Comparison Circuits and Neurons

NOTE: IOI paper only used single tokens, and attempts at “averaging” multiple did not go well, so just use single tokens
Find neurons. How do they interact in a circuit with each other?
- How are its weights related to features? They ARE in the same direction.
- “External knowledge” like big vs small- isn’t that stored in MLPs? How do MLPs interact with attn heads to combine knowledge with patterns (of moving info by circuits)? Do certain circuits attend to certain MLPs of “processing info”? (ROME showed the exact MLPs didn’t matter as long as in middle; but is that due to “analogous” shfits in attn also mean analogous shifts in MLP? is it due to backups?
Find circuit. How does this combine information from MLP weights?
- See ‘pronoun circuits’ to see circuits not targeting just info, but external semantic knowledge (eg. knowing Mary is ‘she’)
  - How do you input the tokens to ACDC?
- Query and key weights connect tokens and features
Check if similar heads for “synonyms”, “same type of opposites comparisons (large/small, black/white, man/woman, king/queen, etc)
Check how embeddings get more similar over time
Trace thru a token by dot product sequences. What is obscured (not findable) by this approach? Why must locate components it uses by actv patch over other methods?
Dot prod of Chihuahua with dog+small, like king queen
Give the embedding or dot of embs to gpt to see if it recognizes it, in some universal way