Detecting objects using Rolling Convolution and Recurrent Neural Network

Wenqing Huang; YaMing Wang

Detecting objects using Rolling Convolution and Recurrent Neural Network

Authors

Wenqing Huang School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou
YaMing Wang School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou

Abstract

Abstract—At present, most of the existing target detection algorithms use the method of region proposal to search for the target in the image. The most effective regional proposal method usually requires thousands of target prediction areas to achieve high recall rate.This lowers the detection efficiency. Even though recent region proposal network approach have yielded good results by using hundreds of proposals, it still faces the challenge when applied to small objects and precise locations. This is mainly because these approaches use coarse feature. Therefore, we propose a new method for extracting more efficient global features and multi-scale features to provide target detection performance. Given that feature maps under continuous convolution lose the resolution required to detect small objects when obtaining deeper semantic information; hence, we use rolling convolution (RC) to maintain the high resolution of low-level feature maps to explore objects in greater detail, even if there is no structure dedicated to combining the features of multiple convolutional layers. Furthermore, we use a recurrent neural network of multiple gated recurrent units (GRUs) at the top of the convolutional layer to highlight useful global context locations for assisting in the detection of objects. Through experiments in the benchmark data set, our proposed method achieved 78.2% mAP in PASCAL VOC 2007 and 72.3% mAP in PASCAL VOC 2012 dataset. It has been verified through many experiments that this method has reached a more advanced level of detection.

References

WöHler C, Anlauf J K. An adaptable time-delay neural-network algorithm for image sequence analysis[J]. IEEE Transactions on Neural Networks, 1999, 10(6):1531-1536.

Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]// Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. IEEE, 2005:886-893.

Laptev I. Improvements of Object Detection Using Boosted Histograms[C]// British Machine Vision Conference 2006, Edinburgh, Uk, September. DBLP, 2006:949-958.

Shet V D, Neumann J, Ramesh V, et al. Bilattice-based Logical Reasoning for Human Detection[C]// Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on. IEEE, 2007:1-8.

Zhang L, Wu B, Nevatia R. Detection and Tracking of Multiple Humans with Extensive Pose Articulation[C]// IEEE, International Conference on Computer Vision. IEEE, 2007:1-8.

Azizpour H, Laptev I. Object Detection Using Strongly-Supervised Deformable Part Models[M]// Computer Vision – ECCV 2012. Springer Berlin Heidelberg, 2012:836-849.

Dalal N, Triggs B, Schmid C. Human detection using oriented histograms of flow and appearance[J]. 2006, 3952:428-441.

Dollar P, Wojek C, Schiele B, et al. Pedestrian Detection: An Evaluation of the State of the Art[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2012, 34(4):743.

Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems. (2012)1106–1114

Lin, M., Chen, Q., Yan, S.: Network in network. In: International Conference on Learning Representations. (2014)

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D.,Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. arXiv preprintarXiv:1409.4842 (2014)

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scaleimage recognition. arXiv preprint arXiv:1409.1556 (2014)

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition.arXiv preprint arXiv:1512.03385 (2015)

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. (2014) 580–587

Uijlings, J., van de Sande, K., Gevers, T., Smeulders, A.: Selective search for object recognition. International Journal on Computer Vision 104(2) (2013) 154–171

Zitnick, C.L., Doll´ ar, P.: Edge boxes: Locating object proposals from edges. In:European Conference on Computer Vision. (2014) 391–405

Arbeláez, P., Pont-Tuset, J., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: IEEE Conference on Computer Vision and Pattern Recognition. (2014) 328–335

Girshick, R.: Fast r-cnn. In: IEEE Conference on Computer Vision and Pattern Recognition. (2015) 1440–1448

Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Neural Information Processing Systems.(2015) 91–99

Liang, X., Wei, Y., Shen, X., Jie, Z., Feng, J., Lin, L., Yan, S.: Reversible recursiveinstance-level object segmentation. arXiv preprint arXiv:1511.04517 (2015)

Zeng, X., Ouyang, W., Wang, X.: Window-object relationship guided representation learning for generic object detections. arXiv preprint arXiv:1512.02736 (2015)

Gidaris, S., Komodakis, N.: Object detection via a multi-region and semanticsegmentation-aware cnn model. In: IEEE International Conference on ComputerVision. (2015) 1134–1142

Long, J., Shelhamer, E., Darrell, T. Fully convolutionalnetworks for semantic segmentation. In CVPR, 2015.

Hariharan, B., Arbeláez, P., Girshick, R., Malik, J. Hypercolumns for object segmentation and fine-grained localization. In CVPR, 2015.

Kong, T., Yao, A., Chen, Y., Sun, F. Hypernet: Towards accurate region proposal generation and joint object detection. In CVPR, 2016.

Liu, W., Rabinovich, A., Berg, A.C. ParseNet: Lookingwider to see better. In ICLR workshop, 2016.

Bell, S., Zitnick, C.L., Bala, K., Girshick, R. Inside-outside net: Detecting objects in context with skip poolingand recurrent neural networks. In CVPR, 2016.

Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N. A unified multi-scale deep convolutional neural network for fast object detection. In ECCV, 2016.

Li J, Wei Y, Liang X, et al. Attentive Contexts for Object Detection[J]. IEEE Transactions on Multimedia, 2017, 19(5):944-954.

Stewart, R., Andriluka, M. End-to-end people detection in crowded scenes. arXiv preprint arXiv:1506.04878 (2015).

Downloads

Published

2024-04-19

Issue

Vol. 65 No. 2 (2019)

Section

Image Processing

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

1. License

The non-commercial use of the article will be governed by the Creative Commons Attribution license as currently displayed on https://creativecommons.org/licenses/by/4.0/.

2. Author’s Warranties

The author warrants that the article is original, written by stated author/s, has not been published before, contains no unlawful statements, does not infringe the rights of others, is subject to copyright that is vested exclusively in the author and free of any third party rights, and that any necessary written permissions to quote from other sources have been obtained by the author/s. The undersigned also warrants that the manuscript (or its essential substance) has not been published other than as an abstract or doctorate thesis and has not been submitted for consideration elsewhere, for print, electronic or digital publication.

3. User Rights

Under the Creative Commons Attribution license, the author(s) and users are free to share (copy, distribute and transmit the contribution) under the following conditions: 1. they must attribute the contribution in the manner specified by the author or licensor, 2. they may alter, transform, or build upon this work, 3. they may use this contribution for commercial purposes.

4. Rights of Authors

Authors retain the following rights:

- copyright, and other proprietary rights relating to the article, such as patent rights,

- the right to use the substance of the article in own future works, including lectures and books,

- the right to reproduce the article for own purposes, provided the copies are not offered for sale,

- the right to self-archive the article

- the right to supervision over the integrity of the content of the work and its fair use.

5. Co-Authorship

If the article was prepared jointly with other authors, the signatory of this form warrants that he/she has been authorized by all co-authors to sign this agreement on their behalf, and agrees to inform his/her co-authors of the terms of this agreement.

6. Termination

This agreement can be terminated by the author or the Journal Owner upon two months’ notice where the other party has materially breached this agreement and failed to remedy such breach within a month of being given the terminating party’s notice requesting such breach to be remedied. No breach or violation of this agreement will cause this agreement or any license granted in it to terminate automatically or affect the definition of the Journal Owner. The author and the Journal Owner may agree to terminate this agreement at any time. This agreement or any license granted in it cannot be terminated otherwise than in accordance with this section 6. This License shall remain in effect throughout the term of copyright in the Work and may not be revoked without the express written consent of both parties.

7. Royalties

This agreement entitles the author to no royalties or other fees. To such extent as legally permissible, the author waives his or her right to collect royalties relative to the article in respect of any use of the article by the Journal Owner or its sublicensee.

8. Miscellaneous

The Journal Owner will publish the article (or have it published) in the Journal if the article’s editorial process is successfully completed and the Journal Owner or its sublicensee has become obligated to have the article published. Where such obligation depends on the payment of a fee, it shall not be deemed to exist until such time as that fee is paid. The Journal Owner may conform the article to a style of punctuation, spelling, capitalization and usage that it deems appropriate. The Journal Owner will be allowed to sublicense the rights that are licensed to it under this agreement. This agreement will be governed by the laws of Poland.

By signing this License, Author(s) warrant(s) that they have the full power to enter into this agreement. This License shall remain in effect throughout the term of copyright in the Work and may not be revoked without the express written consent of both parties.

Detecting objects using Rolling Convolution and Recurrent Neural Network

Authors

Abstract

References

Downloads

Published

Issue

Section

License

Information

Current Issue