Size Comparison Congruence —- SUMMARIZE:
GPT-2-Small
https://colab.research.google.com/drive/18JcQcn7TKhN-1ULNjqQqvst9yJ6ZDhAA
Dot products of tokens after embedding layer
- In “large”, we run a few tests to find that adjectives with semantic similarity to “large” in terms of size, such as “huge”, have a high dot product. We take the dot product right after the first embedding layer.
- In “average over dataset of single tokens”, we generate 100 random tokens put through the first embedding layer and plot the histogram of their dot products.
- In Dot Product of large synonyms, we obtain this dot product histogram only for single tokens that are synonyms of “large”
Histogram of dot product of 100 random tokens
AVG: 4.9035
dot product histogram only for single tokens that are synonyms of “large”
AVG: >6
We observe that the “large synonyms” have dot product that’s higher than average
TO DO: Perform actual hypothesis testing to get p-value
ISSUE: This takes pairs as observations, meaning many observations are repeated in pairs. What problems arise from this? [try to identify as many issues as poss]
- To ChatGPT: I am taking the dot product of 100 items, and making a histogram of the dot products. This takes pairs as observations, meaning many observations are repeated in pairs. Can I do hypothesis testing on this to see how many deviations away a subset is from the average value? What problems arise from this?