Journal / Conference
The IEEE International Conference on Computer Vision (ICCV, 2019)
[PDF link: here]
[Code link: Pending]
Keywords
Scale-sensitive object detection,Global Scale Learning module
Abstract
Scale-sensitive object detection remains a challenging task, where most of the existing methods could not learn it explicitly and are not robust to scale variance. In addition, the most existing methods are less efficient during training or slow during inference, which are not friendly to real-time applications. In this paper, we propose a practical object detection method with scale-sensitive network. Our method first predicts a global continuous scale , which is shared by all position, for each convolution filter of each network stage. To effectively learn the scale, we average the spatial features and distill the scale from channels. For fast-deployment, we propose a scale decomposition method that transfers the robust fractional scale into combination of fixed integral scales for each convolution filter, which exploits the dilated convolution. We demonstrate it on one-stage and two-stage algorithms under different configurations. For practical applications, training of our method is of efficiency and simplicity which gets rid of complex data sampling or optimize strategy. During testing, the proposed method requires no extra operation and is very supportive of hardware acceleration like TensorRT and TVM. On the COCO test-dev, our model could achieve a 41.5 mAP on one-stage detector and 42.1 mAP on two[1]stage detectors based on ResNet-101, outperforming base[1]lines by 2.4 and 2.1 respectively without extra FLOPS.
Method/Framework
Overview of the pipeline of our method. We take ResNet-50 as backbone of detector in this example. We first train a GSL Network to learn a stable scale for each block in h and w directions respectively. The learnt scales are decomposed into combination of integral dilations for each block we care. Given groups of integral dilations, we construct a fast-deployment network and finetune it from GSL network.
Experiments
We present experimental results on the bounding box detection track of the challenging MS COCO benchmark. For training, we follow common practice and use the MS COCO trainval35k split (union of 80k images from train and a random 35k subset of images from the 40k image val split). If not specified, we report studies by evaluating on the minival5k split. The COCO-style Average Precision (AP) averages AP across IoU thresholds from 0.5 to 0.95 with an interval of 0.05. These metrics measure the detection performance of various qualities. Final results are also reported on the test-dev set.
Highlight
- We design a module to learn a stable global scale for each layer and prove that these learnt scales could collaborate to significantly help network handle objects of a large range of scales.
- We propose a dilation decomposition method to transfer fractional scale into combination of integral dilations. The decomposed dilations which are integral and fixed enable our fast-deployment network to be fast and supportive of hardware optimization during inference.
- Our method is widely applicable across different detection frameworks(both one-stage and two-stage) and network architectures, and could bring significant and consistent gains on both AP50 and AP without being time-consuming.
Citation
@InProceedings{Peng_2019_ICCV,
author = {Peng, Junran and Sun, Ming and Zhang, Zhaoxiang and Tan, Tieniu and Yan, Junjie},
title = {POD: Practical Object Detection With Scale-Sensitive Network},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}}