TY - GEN
T1 - Similarity analysis of contextual word representation models
AU - Wu, John M.
AU - Belinkov, Yonatan
AU - Sajjad, Hassan
AU - Durrani, Nadir
AU - Dalvi, Fahim
AU - Glass, James
N1 - Publisher Copyright:
© 2020 Association for Computational Linguistics
PY - 2020
Y1 - 2020
N2 - This paper investigates contextual word representation models from the lens of similarity analysis. Given a collection of trained models, we measure the similarity of their internal representations and attention. Critically, these models come from vastly different architectures. We use existing and novel similarity measures that aim to gauge the level of localization of information in the deep models, and facilitate the investigation of which design factors affect model similarity, without requiring any external linguistic annotation. The analysis reveals that models within the same family are more similar to one another, as may be expected. Surprisingly, different architectures have rather similar representations, but different individual neurons. We also observed differences in information localization in lower and higher layers and found that higher layers are more affected by fine-tuning on downstream tasks.
AB - This paper investigates contextual word representation models from the lens of similarity analysis. Given a collection of trained models, we measure the similarity of their internal representations and attention. Critically, these models come from vastly different architectures. We use existing and novel similarity measures that aim to gauge the level of localization of information in the deep models, and facilitate the investigation of which design factors affect model similarity, without requiring any external linguistic annotation. The analysis reveals that models within the same family are more similar to one another, as may be expected. Surprisingly, different architectures have rather similar representations, but different individual neurons. We also observed differences in information localization in lower and higher layers and found that higher layers are more affected by fine-tuning on downstream tasks.
UR - http://www.scopus.com/inward/record.url?scp=85117917760&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85117917760
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 4638
EP - 4655
BT - ACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
T2 - 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020
Y2 - 5 July 2020 through 10 July 2020
ER -