Past, present, future: A computational investigation of the typology of tense in 1000 languages

Ehsaneddin Asgari, Hinrich Schütze

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

34 Citations (Scopus)

Abstract

We present SuperPivot, an analysis method for low-resource languages that occur in a superparallel corpus, i.e., in a corpus that contains an order of magnitude more languages than parallel corpora currently in use. We show that SuperPivot performs well for the crosslingual analysis of the linguistic phenomenon of tense. We produce analysis results for more than 1000 languages, conducting – to the best of our knowledge – the largest crosslingual computational study performed to date. We extend existing methodology for leveraging parallel corpora for typological analysis by overcoming a limiting assumption of earlier work: We only require that a linguistic feature is overtly marked in a few of thousands of languages as opposed to requiring that it be marked in all languages under investigation.

Original languageEnglish
Title of host publicationEMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages113-124
Number of pages12
ISBN (Electronic)9781945626838
DOIs
Publication statusPublished - 2017
Externally publishedYes
Event2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017 - Copenhagen, Denmark
Duration: 9 Sept 201711 Sept 2017

Publication series

NameEMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings

Conference

Conference2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017
Country/TerritoryDenmark
CityCopenhagen
Period9/09/1711/09/17

Fingerprint

Dive into the research topics of 'Past, present, future: A computational investigation of the typology of tense in 1000 languages'. Together they form a unique fingerprint.

Cite this