TY - JOUR
T1 - End-to-End Detection-Segmentation System for Face Labeling
AU - Wen, Shiping
AU - Dong, Minghui
AU - Yang, Yin
AU - Zhou, Pan
AU - Huang, Tingwen
AU - Chen, Yiran
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2021/6
Y1 - 2021/6
N2 - In this paper, we propose an end-to-end detection-segmentation system to implement detailed face labeling. Fully convolutional networks (FCN) has become the mainstream algorithm in the field of semantic segmentation due to the state-of-the-art performance. However, a general FCN usually produces smooth and homogeneous results. Moreover, when semantic category is extremely unbalanced in samples such as face labeling problem, features for some categories cannot be well explored by FCN. To alleviate these problems, a face image is firstly encoded to multi-level feature maps by a pyramid FCN, then features of different facial components are extracted separately according to the bounding box provided by a one-stage detection head. Three class-specific sub-networks are employed to process the extracted features to obtain the respective segmentation results. The skin-hair region can be decoded directly from the back end of the pyramid FCN. Finally, the overall segmentation result is obtained by combining different branches. Moreover, the proposed method trained on a single-face labeled dataset, can be directly used to implement detailed multi-face labeling tasks without any network modification and additional module or data. The overall structure can be trained in an end-to-end manner while maintaining a small network size (12 MB). Experiments show that the proposed method can generate more accurate (single or multi) face labeling results comparing with previous works and gets the state-of-the-art results in HELEN face dataset.
AB - In this paper, we propose an end-to-end detection-segmentation system to implement detailed face labeling. Fully convolutional networks (FCN) has become the mainstream algorithm in the field of semantic segmentation due to the state-of-the-art performance. However, a general FCN usually produces smooth and homogeneous results. Moreover, when semantic category is extremely unbalanced in samples such as face labeling problem, features for some categories cannot be well explored by FCN. To alleviate these problems, a face image is firstly encoded to multi-level feature maps by a pyramid FCN, then features of different facial components are extracted separately according to the bounding box provided by a one-stage detection head. Three class-specific sub-networks are employed to process the extracted features to obtain the respective segmentation results. The skin-hair region can be decoded directly from the back end of the pyramid FCN. Finally, the overall segmentation result is obtained by combining different branches. Moreover, the proposed method trained on a single-face labeled dataset, can be directly used to implement detailed multi-face labeling tasks without any network modification and additional module or data. The overall structure can be trained in an end-to-end manner while maintaining a small network size (12 MB). Experiments show that the proposed method can generate more accurate (single or multi) face labeling results comparing with previous works and gets the state-of-the-art results in HELEN face dataset.
KW - Detection-segmentation
KW - face labeling
KW - multi-face
UR - http://www.scopus.com/inward/record.url?scp=85100664069&partnerID=8YFLogxK
U2 - 10.1109/TETCI.2019.2947319
DO - 10.1109/TETCI.2019.2947319
M3 - Article
AN - SCOPUS:85100664069
SN - 2471-285X
VL - 5
SP - 457
EP - 467
JO - IEEE Transactions on Emerging Topics in Computational Intelligence
JF - IEEE Transactions on Emerging Topics in Computational Intelligence
IS - 3
M1 - 8892599
ER -