Journal / Conference
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR, 2020)
[PDF link: link]
[Code link: link]
Keywords
Person Search, Bi-directional Interaction
Abstract
Existing works have designed end-to-end frameworks based on Faster-RCNN for person search. Due to the large receptive fields in deep networks, the feature maps of each proposal, cropped from the stem feature maps, involve redundant context information outside the bounding boxes. However, person search is a fine-grained task which needs accurate appearance information. Such context information can make the model fail to focus on persons, so the learned representations lack the capacity to discriminate various identities. To address this issue, we propose a Siamese network which owns an additional instance-aware branch, named Bi-directional Interaction Network (BINet). During the training phase, in addition to scene images, BINet also takes as inputs person patches which help the model discriminate identities based on human appearance. Moreover, two interaction losses are designed to achieve bi-directional interaction between branches at two levels. The interaction can help the model learn more discriminative features for persons in the scene. At the inference stage, only the major branch is applied, so BINet introduces no additional computation. Extensive experiments on two widely used person search benchmarks, CUHK-SYSU and PRW, have shown that our BINet achieves state-of-the-art results among end-to-end methods without loss of efficiency.
Method/Framework
BINet takes as inputs scene images and cropped person patches. The common parts of the two branches share parameters. In a mini-batch, the Siamese network takes as inputs scene images and corresponding cropped patches. Therefore, for a positive RoI, the two branches should have consistent responses to it. We believe that the consistency exists at two levels, including the feature-level and the prediction-level. For a positive RoI, the former means that in the feature space, it should be embedded closely while the latter means that the two branches output the same identity prediction. Bi-directional interaction between two branches is achieved by the interaction losses. During inference, we only apply the search branch.
Experiments
We perform several analytic experiments on CUHK-SYSU and PRW to explore the contribution of each component in our proposed BINet, including the instance-aware branch and the bi-directional interaction losses.
Highlight
- We propose the Bi-directional Interaction Network which can learn to focus on persons in the scene with the guidance of the cropped person patches.
- We design two interaction losses to perform bi-directional interaction between the branches during the backward process. The interaction can make the model learn more discriminative identity representations.
- Our BINet brings significant performance improvements on popular benchmarks without additional parameters or computation. In particular, compared with our baseline, it achieves improvements of 3.6% on CUHK-SYSU and 10.3% on PRW in mAP accuracy.
Citation
@InProceedings{BINet_2020_CVPR,
author = {Dong, Wenkai and Zhang, Zhaoxiang and Tan, Tieniu and Song, Chunfeng},
title = {Bi-directional Interaction Network for Person Search},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year = {2020}
}