TY - GEN
T1 - Classifying Arab names geographically
AU - Mubarak, Hamdy
AU - Darwish, Kareem
N1 - Publisher Copyright:
© ACL 2015. All rights reserved.
PY - 2015
Y1 - 2015
N2 - Different names may be popular in different countries. Hence, person names may give a clue to a person's country of origin. Along with other features, mapping names to countries can be helpful in a variety of applications such as country tagging twitter users. This paper describes the collection of Arabic Twitter user names that are either written in Arabic or transliterated into Latin characters along with their stated geographical locations. To classify previously unseen names, we trained naive Bayes and Support Vector Machine (SVM) multi-class classifiers using primarily bag-of-words features. We are able to map Arabic user names to specific Arab countries with 79% accuracy and to specific regions (Gulf, Egypt, Levant, Maghreb, and others) with 94% accuracy. As for transliterated Arabic names, the accuracy per country and per region was 67% and 83% respectively. The approach is generic and language independent, and can be used to collect and classify names to other countries or regions, and considering language-dependent name features (like the compound names, and person titles) yields to better results.
AB - Different names may be popular in different countries. Hence, person names may give a clue to a person's country of origin. Along with other features, mapping names to countries can be helpful in a variety of applications such as country tagging twitter users. This paper describes the collection of Arabic Twitter user names that are either written in Arabic or transliterated into Latin characters along with their stated geographical locations. To classify previously unseen names, we trained naive Bayes and Support Vector Machine (SVM) multi-class classifiers using primarily bag-of-words features. We are able to map Arabic user names to specific Arab countries with 79% accuracy and to specific regions (Gulf, Egypt, Levant, Maghreb, and others) with 94% accuracy. As for transliterated Arabic names, the accuracy per country and per region was 67% and 83% respectively. The approach is generic and language independent, and can be used to collect and classify names to other countries or regions, and considering language-dependent name features (like the compound names, and person titles) yields to better results.
UR - http://www.scopus.com/inward/record.url?scp=85120062641&partnerID=8YFLogxK
U2 - 10.18653/v1/w15-3201
DO - 10.18653/v1/w15-3201
M3 - Conference contribution
AN - SCOPUS:85120062641
T3 - 2nd Workshop on Arabic Natural Language Processing, ANLP 2015 - held at 53rd Annual Meeting of the Association for Computational Linguistics, ACL 2015 - Proceedings
SP - 1
EP - 8
BT - 2nd Workshop on Arabic Natural Language Processing, ANLP 2015 - held at 53rd Annual Meeting of the Association for Computational Linguistics, ACL 2015 - Proceedings
A2 - Habash, Nizar
A2 - Vogel, Stephan
A2 - Darwish, Kareem
PB - Association for Computational Linguistics (ACL)
T2 - 2nd Workshop on Arabic Natural Language Processing, ANLP 2015
Y2 - 30 July 2015
ER -