TY - GEN
T1 - Recognition using visual phrases
AU - Sadeghi, Mohammad Amin
AU - Farhadi, Ali
PY - 2011
Y1 - 2011
N2 - In this paper we introduce visual phrases, complex visual composites like "a person riding a horse". Visual phrases often display significantly reduced visual complexity compared to their component objects, because the appearance of those objects can change profoundly when they participate in relations. We introduce a dataset suitable for phrasal recognition that uses familiar PASCAL object categories, and demonstrate significant experimental gains resulting from exploiting visual phrases. We show that a visual phrase detector significantly outperforms a baseline which detects component objects and reasons about relations, even though visual phrase training sets tend to be smaller than those for objects. We argue that any multi-class detection system must decode detector outputs to produce final results; this is usually done with non-maximum suppression. We describe a novel decoding procedure that can account accurately for local context without solving difficult inference problems. We show this decoding procedure outperforms the state of the art. Finally, we show that decoding a combination of phrasal and object detectors produces real improvements in detector results.
AB - In this paper we introduce visual phrases, complex visual composites like "a person riding a horse". Visual phrases often display significantly reduced visual complexity compared to their component objects, because the appearance of those objects can change profoundly when they participate in relations. We introduce a dataset suitable for phrasal recognition that uses familiar PASCAL object categories, and demonstrate significant experimental gains resulting from exploiting visual phrases. We show that a visual phrase detector significantly outperforms a baseline which detects component objects and reasons about relations, even though visual phrase training sets tend to be smaller than those for objects. We argue that any multi-class detection system must decode detector outputs to produce final results; this is usually done with non-maximum suppression. We describe a novel decoding procedure that can account accurately for local context without solving difficult inference problems. We show this decoding procedure outperforms the state of the art. Finally, we show that decoding a combination of phrasal and object detectors produces real improvements in detector results.
UR - http://www.scopus.com/inward/record.url?scp=80052889458&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2011.5995711
DO - 10.1109/CVPR.2011.5995711
M3 - Conference contribution
AN - SCOPUS:80052889458
SN - 9781457703942
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 1745
EP - 1752
BT - 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011
PB - IEEE Computer Society
ER -