Previously: tall_short_neuron_investigation.ipynb
Expm blocks: EB- Analysis on Inputs for Tall vs Short
tall_short_circuit_draft.ipynb
Working on
Size Comparison Congruence
Done
- State Goals and brainstorm starting points
- Dot prod of large and huge and decline
Future Work
- NOTE: IOI paper only used single tokens, and attempts at “averaging” multiple did not go well, so just use single tokens
- Find neurons. How do they interact in a circuit with each other?
- How are its weights related to features? They ARE in the same direction.
- “External knowledge” like big vs small- isn’t that stored in MLPs? How do MLPs interact with attn heads to combine knowledge with patterns (of moving info by circuits)? Do certain circuits attend to certain MLPs of “processing info”? (ROME showed the exact MLPs didn’t matter as long as in middle; but is that due to “analogous” shfits in attn also mean analogous shifts in MLP? is it due to backups?
- Find circuit. How does this combine information from MLP weights?
- See ‘pronoun circuits’ to see circuits not targeting just info, but external semantic knowledge (eg. knowing Mary is ‘she’)
- How do you input the tokens to ACDC?
- Query and key weights connect tokens and features
- Check if similar heads for “synonyms”, “same type of opposites comparisons (large/small, black/white, man/woman, king/queen, etc)
- Check how embeddings get more similar over time
- Trace thru a token by dot product sequences. What is obscured (not findable) by this approach? Why must locate components it uses by actv patch over other methods?
- Dot prod of Chihuahua with dog+small, like king queen
- Give the embedding or dot of embs to gpt to see if it recognizes it, in some universal way