In the real scene, because pedestrians are occluded or the size of pedestrians is small, the convolutional neural network cannot fully extract their features, resulting in poor detection results. In two adjacent frames, the same pedestrian is prone to errors when doing data association, which makes the pedestrian tracking effect unsatisfactory. In order to solve this problem, the pedestrian tracking algorithm based on Anchor-free idea is improved. A fusion context information module is proposed to enhance the model's feature extraction ability for different receptive fields, and improve the model's detection and tracking performance when the pedestrian size is small. In addition, in order to let the model learn to pay attention to the effective information of the feature layer. A coordinated attention mechanism is introduced to guide the model to learn the weights of different channels and different regions of the feature layer, and to improve the tracking performance of the model when pedestrians are occluded. In the experiment, the tracking performance of the model was verified on the MOT16 dataset. Experimental results show that compared with other main popular person tracking algorithms, the improved algorithm has higher tracking accuracy and lower pedestrian ID switching times. Its tracking accuracy is 70.74.
Published in | American Journal of Computer Science and Technology (Volume 4, Issue 4) |
DOI | 10.11648/j.ajcst.20210404.14 |
Page(s) | 111-118 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2021. Published by Science Publishing Group |
Pedestrian Tracking, Anchor-Free, Context Information, Attention Mechanism, JDE
[1] | Claparrone G, Sanchez F L, Tabik S. Deep learning in video multi-object tracking: A survey [J]. Neurocomputing, 2020, 381: 61-88. |
[2] | Zhang Y, Lu H Z, Zhang L P. Overview of Visual Multi-object Tracking Algorithms with Deep Learning [J] Computer Engineering and Applications, 2021, 57 (13): 55-66. |
[3] | Voigtlaender P, Krause M, Osep A. Mots: Multi-object tracking and segmentation [C]//IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 7942-7951. |
[4] | Wang Z, Zheng L, Liu Y. Towards real-time multi-object tracking [C]//European Conference on Computer Vision. Glasgow: Springer, 2020: 107-122. |
[5] | Zhang Y, Wang C, Wang X. Fairmot: On the fairness of detection and re-identification in multiple object tracking [J]. International Journal of Computer Vision, 2021: 1-19. |
[6] | Zhou X, Koltun V, Krahenbuhl P. Tracking objects as points [C]//European Conference on Computer Vision. Springer, Cham, 2020: 474-490. |
[7] | Szegedy C, Loffe S, Vanhoucke V. Inception-v4, inception-resnet and the impact of residual connections on learning [C]//The AAAI Conference on Artificial Intelligence. San Francisco: AAAI, 2017. |
[8] | Liu W, Lei H, Xie H. Multi-level Light U-Net and Atrous Spatial Pyramid Pooling for Optic Disc Segmentation on Fundus Image [C]//International Workshop on Ophthalmic Medical Image Analysis. Springer, Cham, 2020: 104-113. |
[9] | Liu S, Huang D, Wang Y. Receptive field block net for accurate and fast object detection [C]//IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 385-400. |
[10] | Dai J, Qi H, Xiong Y. Deformable convolutional networks [C]//Proceedings of the IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 764-773. |
[11] | Haase D, Amthor M. Rethinking depthwise separable convolutions: How intra-kernel correlations lead to improved Mobile Nets [C]//IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 14600-14609. |
[12] | Hu J, Shen L, Sun G. Squeeze-and-Excitation Networks [C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake: IEEE Press, 2018: 7132-7141. |
[13] | Park J, Woo S, Lee J Y. A simple and light-weight attention module for convolutional neural networks [J]. International Journal of Computer Vision, 2020, 128 (4): 783-798. |
[14] | Hou Q, Zhou D, Feng J. Coordinate attention for efficient mobile network design [C]//IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 13713-13722. |
[15] | Dollar P, Wojek C, Schiele B. Pedestrian detection: A benchmark [C]//IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2009: 304-311. |
[16] | Milan A, Leal-taixel L, Reid I. MOT16: A benchmark for multi-object tracking [J]. arXiv preprint arXiv: 1603. 00831, 2016. |
[17] | Xiao T, Li S, Wang B. Joint detection and identification feature learning for person search [C]//IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 3415-3424. |
[18] | Zheng L, Zhang H, Sun S. Person re-identification in the wild [C]//IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 1367-1376. |
[19] | Ess A, Leibe B, Schindler K. A mobile vision system for robust multi-person tracking [C]//IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2008: 1-8. |
[20] | Zhang S, Benenson R, Schiele B. Citypersons: A diverse dataset for pedestrian detection [C]//IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 3213-3221. |
[21] | Cheng B, Xiao B, Wang J. Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation [C]//IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 5386-5395. |
[22] | Pang B, Li Y, Zhang Y. Tubetk: Adopting tubes to track multi-object in a one-step training model [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 6308-6318. |
[23] | Mahmoudi N, Ahadi S M, Rahmati M. Multi-target tracking using CNN-based features: CNNMTT [J]. Multimedia Tools and Applications, 2019, 78 (6): 7077-7096. |
[24] | Peng J, Wang C, Wan F. Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking [C]//European Conference on Computer Vision. Springer, Cham, 2020: 145-161. |
APA Style
Shunliang Xiao, Zanxia Qiang, Weiguang Liu, Xianfu Bao. (2021). Pedestrian Tracking Algorithm Combining Contextual Information and Attention Mechanism. American Journal of Computer Science and Technology, 4(4), 111-118. https://doi.org/10.11648/j.ajcst.20210404.14
ACS Style
Shunliang Xiao; Zanxia Qiang; Weiguang Liu; Xianfu Bao. Pedestrian Tracking Algorithm Combining Contextual Information and Attention Mechanism. Am. J. Comput. Sci. Technol. 2021, 4(4), 111-118. doi: 10.11648/j.ajcst.20210404.14
AMA Style
Shunliang Xiao, Zanxia Qiang, Weiguang Liu, Xianfu Bao. Pedestrian Tracking Algorithm Combining Contextual Information and Attention Mechanism. Am J Comput Sci Technol. 2021;4(4):111-118. doi: 10.11648/j.ajcst.20210404.14
@article{10.11648/j.ajcst.20210404.14, author = {Shunliang Xiao and Zanxia Qiang and Weiguang Liu and Xianfu Bao}, title = {Pedestrian Tracking Algorithm Combining Contextual Information and Attention Mechanism}, journal = {American Journal of Computer Science and Technology}, volume = {4}, number = {4}, pages = {111-118}, doi = {10.11648/j.ajcst.20210404.14}, url = {https://doi.org/10.11648/j.ajcst.20210404.14}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajcst.20210404.14}, abstract = {In the real scene, because pedestrians are occluded or the size of pedestrians is small, the convolutional neural network cannot fully extract their features, resulting in poor detection results. In two adjacent frames, the same pedestrian is prone to errors when doing data association, which makes the pedestrian tracking effect unsatisfactory. In order to solve this problem, the pedestrian tracking algorithm based on Anchor-free idea is improved. A fusion context information module is proposed to enhance the model's feature extraction ability for different receptive fields, and improve the model's detection and tracking performance when the pedestrian size is small. In addition, in order to let the model learn to pay attention to the effective information of the feature layer. A coordinated attention mechanism is introduced to guide the model to learn the weights of different channels and different regions of the feature layer, and to improve the tracking performance of the model when pedestrians are occluded. In the experiment, the tracking performance of the model was verified on the MOT16 dataset. Experimental results show that compared with other main popular person tracking algorithms, the improved algorithm has higher tracking accuracy and lower pedestrian ID switching times. Its tracking accuracy is 70.74.}, year = {2021} }
TY - JOUR T1 - Pedestrian Tracking Algorithm Combining Contextual Information and Attention Mechanism AU - Shunliang Xiao AU - Zanxia Qiang AU - Weiguang Liu AU - Xianfu Bao Y1 - 2021/11/05 PY - 2021 N1 - https://doi.org/10.11648/j.ajcst.20210404.14 DO - 10.11648/j.ajcst.20210404.14 T2 - American Journal of Computer Science and Technology JF - American Journal of Computer Science and Technology JO - American Journal of Computer Science and Technology SP - 111 EP - 118 PB - Science Publishing Group SN - 2640-012X UR - https://doi.org/10.11648/j.ajcst.20210404.14 AB - In the real scene, because pedestrians are occluded or the size of pedestrians is small, the convolutional neural network cannot fully extract their features, resulting in poor detection results. In two adjacent frames, the same pedestrian is prone to errors when doing data association, which makes the pedestrian tracking effect unsatisfactory. In order to solve this problem, the pedestrian tracking algorithm based on Anchor-free idea is improved. A fusion context information module is proposed to enhance the model's feature extraction ability for different receptive fields, and improve the model's detection and tracking performance when the pedestrian size is small. In addition, in order to let the model learn to pay attention to the effective information of the feature layer. A coordinated attention mechanism is introduced to guide the model to learn the weights of different channels and different regions of the feature layer, and to improve the tracking performance of the model when pedestrians are occluded. In the experiment, the tracking performance of the model was verified on the MOT16 dataset. Experimental results show that compared with other main popular person tracking algorithms, the improved algorithm has higher tracking accuracy and lower pedestrian ID switching times. Its tracking accuracy is 70.74. VL - 4 IS - 4 ER -