Questions/Hypotheses

Questions

When NN adapts by transfer learning, what jnterpretablr neurons are removed and what aren't?

Identify component label by analogous relations

GPT-2-Small Circuits:

Are there consistent patterns in inputs that allow us to identify which subject, among given subjects, the model will output? Motivated by tests from test_prompt_most_recent_S.ipynb
- HYPOTHESIS: It may be that the “default” would output earliest.
How does adding a duplicate name change from outputting “earliest subject” to “latest subject”?
- HYPOTHESIS: ???
MLP:
CNN:

Superposition

If a neuron activates for 2 unrelated objs, is there actually something they have in common? Perhaps down the line in a circuit? Or a “shared role” (inhibition, eg) is used for them?
- Expm by finding commonalities in their circuits

[ Evidence level: ? how to quanify this in comparable ways, if possible ? ]