- EXPM: See what GPT-2-large outputs given prompts that intend to output opposite adjectives such as:
- EXPM: Continuing analyzing the prompt above, observe the logit lens diff for “short” vs “tall”
- EXPM: Check the attention heads of the prompt above
- EXPM: Add a source system in input and check logit diffs
- EXPM: Will adding a source system prevent the model from even considering tall over short within its layers?