TY - JOUR
T1 - A hybrid domain adaptation approach for identifying crisis-relevant tweets
AU - Mazloom, Reza
AU - Li, Hongmin
AU - Caragea, Doina
AU - Caragea, Cornelia
AU - Imran, Muhammad
PY - 2019/7
Y1 - 2019/7
N2 - Huge amounts of data generated on social media during emergency situations is regarded as a trove of critical information. The use of supervised machine learning techniques in the early stages of a crisis is challenged by the lack of labeled data for that event. Furthermore, supervised models trained on labeled data from a prior crisis may not produce accurate results, due to inherent crisis variations. To address these challenges, the authors propose a hybrid feature instance-parameter adaptation approach based on matrix factorization, k-nearest neighbors, and self-training. The proposed feature-instance adaptation selects a subset of the source crisis data that is representative for the target crisis data. The selected labeled source data, together with unlabeled target data, are used to learn self-training domain adaptation classifiers for the target crisis. Experimental results have shown that overall the hybrid domain adaptation classifiers perform better than the supervised classifiers learned from the original source data.
AB - Huge amounts of data generated on social media during emergency situations is regarded as a trove of critical information. The use of supervised machine learning techniques in the early stages of a crisis is challenged by the lack of labeled data for that event. Furthermore, supervised models trained on labeled data from a prior crisis may not produce accurate results, due to inherent crisis variations. To address these challenges, the authors propose a hybrid feature instance-parameter adaptation approach based on matrix factorization, k-nearest neighbors, and self-training. The proposed feature-instance adaptation selects a subset of the source crisis data that is representative for the target crisis data. The selected labeled source data, together with unlabeled target data, are used to learn self-training domain adaptation classifiers for the target crisis. Experimental results have shown that overall the hybrid domain adaptation classifiers perform better than the supervised classifiers learned from the original source data.
U2 - 10.4018/IJISCRAM.2019070101
DO - 10.4018/IJISCRAM.2019070101
M3 - Article
VL - 11
JO - International Journal of Information Systems for Crisis Response and Management
JF - International Journal of Information Systems for Crisis Response and Management
IS - 2
ER -