TY - GEN
T1 - Towards Bangla Named Entity Recognition
AU - Chowdhury, Shammur Absar
AU - Alam, Firoj
AU - Khan, Naira
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - Named Entity Recognition is one of the fundamental problems for Information Extraction and the task is to find the mentioned entities in text. Over the years there has been significant progress in Named Entity Recognition (NER) research for resource-rich languages such as English, Chinese, and Italian. Although, there are a number of studies for Bangla NER, however, most of these studies are conducted almost a decade ago and were focused on a single geographical location (i.e., India). Therefore, in this paper, we present a corpus annotated with seven named entities with a particular focus on Bangladeshi Bangla. It is a part of the development of the Bangla Content Annotation Bank (B-CAB). We also present baseline results, which can be useful for future research. For the baseline results, we employed word-level, POS, gazetteers and contextual features along with Conditional Random Fields (CRFs). Our study also includes the exploration of deep neural networks. Additionally, we investigated another large corpus from a different geographical location (i.e., India) and concluded on the importance of geographic-based NER for a language.
AB - Named Entity Recognition is one of the fundamental problems for Information Extraction and the task is to find the mentioned entities in text. Over the years there has been significant progress in Named Entity Recognition (NER) research for resource-rich languages such as English, Chinese, and Italian. Although, there are a number of studies for Bangla NER, however, most of these studies are conducted almost a decade ago and were focused on a single geographical location (i.e., India). Therefore, in this paper, we present a corpus annotated with seven named entities with a particular focus on Bangladeshi Bangla. It is a part of the development of the Bangla Content Annotation Bank (B-CAB). We also present baseline results, which can be useful for future research. For the baseline results, we employed word-level, POS, gazetteers and contextual features along with Conditional Random Fields (CRFs). Our study also includes the exploration of deep neural networks. Additionally, we investigated another large corpus from a different geographical location (i.e., India) and concluded on the importance of geographic-based NER for a language.
KW - BangIa
KW - CRF
KW - LSTM
KW - Named Entity Recognition
KW - Neural Network
KW - Sequence Labeling
UR - http://www.scopus.com/inward/record.url?scp=85062858920&partnerID=8YFLogxK
U2 - 10.1109/ICCITECHN.2018.8631931
DO - 10.1109/ICCITECHN.2018.8631931
M3 - Conference contribution
AN - SCOPUS:85062858920
T3 - 2018 21st International Conference of Computer and Information Technology, ICCIT 2018
BT - 2018 21st International Conference of Computer and Information Technology, ICCIT 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 21st International Conference of Computer and Information Technology, ICCIT 2018
Y2 - 21 December 2018 through 23 December 2018
ER -