Bidirectional LSTMs - CRFs networks for bangla POS tagging

Firoj Alam, Shammur Absar Chowdhury, Sheak Rashed Haider Noori

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

28 Citations (Scopus)

Abstract

Part-of-speech (POS) information is one of the fundamental components in the natural language processing pipeline, which helps in extracting higher-level information such as named entities, discourse, and syntactic structure of a sentence. For some languages, such as English, Dutch, and Chinese, it is considered as a solved problem due to the higher accuracy (97%) of the predicted system. Significant efforts have been made for such languages in terms of making the data publicly accessible and also organizing evaluation campaigns. Compared to that there are very fewer efforts for Bangla (ethnonym: Bangla; exonym: Bengali). In this paper, we present a knowledge poor approach for POS tagging, which we evaluated using publicly accessible dataset from LDC. The motivation of our approach is that we did not want to rely on any existing resources such as lexicon or named entity recognizer for designing the system as they are not publicly available and difficult to develop. We have not used any handcrafted features, rather we employed distributed representations of word and characters. We designed the system using Long Short Term Memory (LSTM) neural networks followed by Conditional Random Fields (CRFs) for designing the model with an inclusion of pre-trained word embedded model. We obtained promising results with an accuracy of 86:0%.

Original languageEnglish
Title of host publication19th International Conference on Computer and Information Technology, ICCIT 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages377-382
Number of pages6
ISBN (Electronic)9781509040896
DOIs
Publication statusPublished - 21 Feb 2017
Externally publishedYes
Event19th International Conference on Computer and Information Technology, ICCIT 2016 - Dhaka, Bangladesh
Duration: 18 Dec 201620 Dec 2016

Publication series

Name19th International Conference on Computer and Information Technology, ICCIT 2016

Conference

Conference19th International Conference on Computer and Information Technology, ICCIT 2016
Country/TerritoryBangladesh
CityDhaka
Period18/12/1620/12/16

Keywords

  • Bangla
  • Deep learning
  • POS tagging

Fingerprint

Dive into the research topics of 'Bidirectional LSTMs - CRFs networks for bangla POS tagging'. Together they form a unique fingerprint.

Cite this