Size Comparison Congruence —- SUMMARIZE:


GPT-2-Small

https://colab.research.google.com/drive/18JcQcn7TKhN-1ULNjqQqvst9yJ6ZDhAA

Dot products of tokens after embedding layer

Histogram of dot product of 100 random tokens

Untitled

AVG: 4.9035

dot product histogram only for single tokens that are synonyms of “large”

Untitled

AVG: >6

We observe that the “large synonyms” have dot product that’s higher than average

TO DO: Perform actual hypothesis testing to get p-value

ISSUE: This takes pairs as observations, meaning many observations are repeated in pairs. What problems arise from this? [try to identify as many issues as poss]