Multilabel Image Classification via Feature/Label Co-Projection

Shiping Wen, Weiwei Liu, Yin Yang*, Pan Zhou, Zhenyuan Guo, Zheng Yan, Yiran Chen, Tingwen Huang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

68 Citations (Scopus)

Abstract

This article presents a simple and intuitive solution for multilabel image classification, which achieves the competitive performance on the popular COCO and PASCAL VOC benchmarks. The main idea is to capture how humans perform this task: We recognize both labels (i.e., objects and attributes) and the correlation of labels at the same time. Here, label recognition is performed by a standard ConvNet pipeline, whereas label correlation modeling is done by projecting both labels and image features extracted by the ConvNet to a common latent vector space. Specifically, we carefully design the loss function to ensure that: 1) labels and features that co-appear frequently are close to each other in the latent space and 2) conversely, labels/features that do not appear together are far apart. This information is then combined with the original ConvNet outputs to form the final prediction. The whole model is trained end-to-end, with no additional supervised information other than the image-level supervised information. Experiments show that the proposed method consistently outperforms previous approaches on COCO and PASCAL VOC in terms of mAP, macro/micro precision, recall, and F-measure. Further, our model is highly efficient at test time, with only a small number of additional weights compared to the base model for direct label recognition.

Original languageEnglish
Pages (from-to)7250-7259
Number of pages10
JournalIEEE Transactions on Systems, Man, and Cybernetics: Systems
Volume51
Issue number11
DOIs
Publication statusPublished - 1 Nov 2021

Keywords

  • Deep learning
  • label embedding
  • multilabel classification
  • neural network

Fingerprint

Dive into the research topics of 'Multilabel Image Classification via Feature/Label Co-Projection'. Together they form a unique fingerprint.

Cite this