TY - JOUR
T1 - Multilabel Image Classification via Feature/Label Co-Projection
AU - Wen, Shiping
AU - Liu, Weiwei
AU - Yang, Yin
AU - Zhou, Pan
AU - Guo, Zhenyuan
AU - Yan, Zheng
AU - Chen, Yiran
AU - Huang, Tingwen
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/11/1
Y1 - 2021/11/1
N2 - This article presents a simple and intuitive solution for multilabel image classification, which achieves the competitive performance on the popular COCO and PASCAL VOC benchmarks. The main idea is to capture how humans perform this task: We recognize both labels (i.e., objects and attributes) and the correlation of labels at the same time. Here, label recognition is performed by a standard ConvNet pipeline, whereas label correlation modeling is done by projecting both labels and image features extracted by the ConvNet to a common latent vector space. Specifically, we carefully design the loss function to ensure that: 1) labels and features that co-appear frequently are close to each other in the latent space and 2) conversely, labels/features that do not appear together are far apart. This information is then combined with the original ConvNet outputs to form the final prediction. The whole model is trained end-to-end, with no additional supervised information other than the image-level supervised information. Experiments show that the proposed method consistently outperforms previous approaches on COCO and PASCAL VOC in terms of mAP, macro/micro precision, recall, and F-measure. Further, our model is highly efficient at test time, with only a small number of additional weights compared to the base model for direct label recognition.
AB - This article presents a simple and intuitive solution for multilabel image classification, which achieves the competitive performance on the popular COCO and PASCAL VOC benchmarks. The main idea is to capture how humans perform this task: We recognize both labels (i.e., objects and attributes) and the correlation of labels at the same time. Here, label recognition is performed by a standard ConvNet pipeline, whereas label correlation modeling is done by projecting both labels and image features extracted by the ConvNet to a common latent vector space. Specifically, we carefully design the loss function to ensure that: 1) labels and features that co-appear frequently are close to each other in the latent space and 2) conversely, labels/features that do not appear together are far apart. This information is then combined with the original ConvNet outputs to form the final prediction. The whole model is trained end-to-end, with no additional supervised information other than the image-level supervised information. Experiments show that the proposed method consistently outperforms previous approaches on COCO and PASCAL VOC in terms of mAP, macro/micro precision, recall, and F-measure. Further, our model is highly efficient at test time, with only a small number of additional weights compared to the base model for direct label recognition.
KW - Deep learning
KW - label embedding
KW - multilabel classification
KW - neural network
UR - http://www.scopus.com/inward/record.url?scp=85117468009&partnerID=8YFLogxK
U2 - 10.1109/TSMC.2020.2967071
DO - 10.1109/TSMC.2020.2967071
M3 - Article
AN - SCOPUS:85117468009
SN - 2168-2216
VL - 51
SP - 7250
EP - 7259
JO - IEEE Transactions on Systems, Man, and Cybernetics: Systems
JF - IEEE Transactions on Systems, Man, and Cybernetics: Systems
IS - 11
ER -