A combination of classifiers for named entity recognition on transcription

Firoj Alam, Roberto Zanoli

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper presents a Named Entity Recognition (NER) system on broadcast news transcription where two different classifiers are set up in a loop so that the output of one of the classifiers is exploited by the other to refine its decision. The approach we followed is similar to that used in Typhoon, which is a NER system designed for newspaper articles; in that respect, one of the distinguishing features of our approach is the use of Conditional Random Fields in place of Hidden Markov Models. To make the second classifier we extracted sentences from a large unlabelled corpus. Another relevant feature is instead strictly related to transcription annotations. Transcriptions lack orthographic and punctuation information and this typically results in poor performance. As a result, an additional module for case and punctuation restoration has been developed. This paper describes the system and reports its performance which is evaluated by taking part in Evalita 2011 in the task of Named Entity Recognition on Transcribed Broadcast News. In addition, the Evalita 2009 dataset, consisting of newspapers articles, is used to present a comparative analysis by extracting named entities from newspapers and broadcast news.

Original languageEnglish
Title of host publicationEvaluation of Natural Language and Speech Tools for Italian - International Workshop, EVALITA 2011, Revised Selected Papers
Pages107-115
Number of pages9
DOIs
Publication statusPublished - 2013
Externally publishedYes
EventInternational Workshop on Evaluation of Natural Language and Speech Tools for Italian, EVALITA 2011 - Rome, Italy
Duration: 24 Jan 201225 Jan 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7689 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceInternational Workshop on Evaluation of Natural Language and Speech Tools for Italian, EVALITA 2011
Country/TerritoryItaly
CityRome
Period24/01/1225/01/12

Keywords

  • Entity Detection
  • NER on Transcription
  • Named Entity Recognition

Fingerprint

Dive into the research topics of 'A combination of classifiers for named entity recognition on transcription'. Together they form a unique fingerprint.

Cite this