TY - GEN
T1 - Overview of character-based models for natural language processing
AU - Adel, Heike
AU - Asgari, Ehsaneddin
AU - Schütze, Hinrich
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2018.
PY - 2018
Y1 - 2018
N2 - Character-based models become more and more popular for different natural language processing task, especially due to the success of neural networks. They provide the possibility of directly model text sequences without the need of tokenization and, therefore, enhance the traditional preprocessing pipeline. This paper provides an overview of character-based models for a variety of natural language processing tasks. We group existing work in three categories: tokenization-based approaches, bag-of-n-gram models and end-to-end models. For each category, we present prominent examples of studies with a particular focus on recent character-based deep learning work.
AB - Character-based models become more and more popular for different natural language processing task, especially due to the success of neural networks. They provide the possibility of directly model text sequences without the need of tokenization and, therefore, enhance the traditional preprocessing pipeline. This paper provides an overview of character-based models for a variety of natural language processing tasks. We group existing work in three categories: tokenization-based approaches, bag-of-n-gram models and end-to-end models. For each category, we present prominent examples of studies with a particular focus on recent character-based deep learning work.
KW - Document representation
KW - Feature selection
KW - Language models
KW - Natural language generation
KW - Natural language processing
KW - Neural networks
KW - Structured prediction
KW - Supervised learning by classification
UR - http://www.scopus.com/inward/record.url?scp=85055437661&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-77113-7_1
DO - 10.1007/978-3-319-77113-7_1
M3 - Conference contribution
AN - SCOPUS:85055437661
SN - 9783319771120
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 3
EP - 16
BT - Computational Linguistics and Intelligent Text Processing - 18th International Conference, CICLing 2017, Revised Selected Papers
A2 - Gelbukh, Alexander
PB - Springer Verlag
T2 - 18th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2017
Y2 - 17 April 2017 through 23 April 2017
ER -