TY - GEN
T1 - Supervised Cross-Modal Factor Analysis for Multiple Modal Data Classification
AU - Wang, Jingbin
AU - Zhou, Yihua
AU - Duan, Kanghong
AU - Wang, Jim Jing Yan
AU - Bensmail, Halima
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/1/12
Y1 - 2016/1/12
N2 - In this paper we study the problem of learning from multiple modal data for purpose of document classification. In this problem, each document is composed two different modals of data, i.e., An image and a text. Cross-modal factor analysis (CFA) has been proposed to project the two different modals of data to a shared data space, so that the classification of a image or a text can be performed directly in this space. A disadvantage of CFA is that it has ignored the supervision information. In this paper, we improve CFA by incorporating the supervision information to represent and classify both image and text modals of documents. We project both image and text data to a shared data space by factor analysis, and then train a class label predictor in the shared space to use the class label information. The factor analysis parameter and the predictor parameter are learned jointly by solving one single objective function. With this objective function, we minimize the distance between the projections of image and text of the same document, and the classification error of the projection measured by hinge loss function. The objective function is optimized by an alternate optimization strategy in an iterative algorithm. Experiments in two different multiple modal document data sets show the advantage of the proposed algorithm over other CFA methods.
AB - In this paper we study the problem of learning from multiple modal data for purpose of document classification. In this problem, each document is composed two different modals of data, i.e., An image and a text. Cross-modal factor analysis (CFA) has been proposed to project the two different modals of data to a shared data space, so that the classification of a image or a text can be performed directly in this space. A disadvantage of CFA is that it has ignored the supervision information. In this paper, we improve CFA by incorporating the supervision information to represent and classify both image and text modals of documents. We project both image and text data to a shared data space by factor analysis, and then train a class label predictor in the shared space to use the class label information. The factor analysis parameter and the predictor parameter are learned jointly by solving one single objective function. With this objective function, we minimize the distance between the projections of image and text of the same document, and the classification error of the projection measured by hinge loss function. The objective function is optimized by an alternate optimization strategy in an iterative algorithm. Experiments in two different multiple modal document data sets show the advantage of the proposed algorithm over other CFA methods.
KW - Cross-modal factor analysis
KW - Multiple modal learning
KW - Supervised learning
UR - http://www.scopus.com/inward/record.url?scp=84964444011&partnerID=8YFLogxK
U2 - 10.1109/SMC.2015.329
DO - 10.1109/SMC.2015.329
M3 - Conference contribution
AN - SCOPUS:84964444011
T3 - Proceedings - 2015 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2015
SP - 1882
EP - 1888
BT - Proceedings - 2015 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - IEEE International Conference on Systems, Man, and Cybernetics, SMC 2015
Y2 - 9 October 2015 through 12 October 2015
ER -