TY - JOUR
T1 - Neuron-level Interpretation of Deep NLP Models
T2 - A Survey
AU - Sajjad, Hassan
AU - Durrani, Nadir
AU - Dalvi, Fahim
N1 - Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022/11/22
Y1 - 2022/11/22
N2 - The proliferation of Deep Neural Networks in various domains has seen an increased need for interpretability of these models. Prelimi-nary work done along this line, and papers that surveyed such, are focused on high-level representation analysis. However, a recent branch of work has concentrated on interpretability at a more granular level of analyzing neurons within these models. In this paper, we survey the work done on neuron analysis including: i) methods to discover and understand neurons in a network; ii) evaluation methods; iii) major findings including cross architectural compar-isons that neuron analysis has unraveled; iv) applications of neuron probing such as: controlling the model, domain adaptation, and so forth; and v) a discussion on open issues and future research directions.
AB - The proliferation of Deep Neural Networks in various domains has seen an increased need for interpretability of these models. Prelimi-nary work done along this line, and papers that surveyed such, are focused on high-level representation analysis. However, a recent branch of work has concentrated on interpretability at a more granular level of analyzing neurons within these models. In this paper, we survey the work done on neuron analysis including: i) methods to discover and understand neurons in a network; ii) evaluation methods; iii) major findings including cross architectural compar-isons that neuron analysis has unraveled; iv) applications of neuron probing such as: controlling the model, domain adaptation, and so forth; and v) a discussion on open issues and future research directions.
UR - http://www.scopus.com/inward/record.url?scp=85139701589&partnerID=8YFLogxK
U2 - 10.1162/tacl_a_00519
DO - 10.1162/tacl_a_00519
M3 - Article
AN - SCOPUS:85139701589
SN - 2307-387X
VL - 10
SP - 1285
EP - 1303
JO - Transactions of the Association for Computational Linguistics
JF - Transactions of the Association for Computational Linguistics
ER -