Remus Brad and Raluca Brad
Jou. Artif. Intell. Auto. Intell., 2 (3):385-409
Remus Brad : Universitatea Lucian Blaga din Sibiu
Raluca Brad : Universitatea "Lucian Blaga" din Sibiu
Article History: Received on: 12-Nov-25, Accepted on: 10-Dec-25, Published on: 02-Nov-25
Corresponding Author: Remus Brad
Email: remus.brad@ulbsibiu.ro
Citation: Remus Brad (2025). A Comparison of Multi-Object Tracking Methods in Image Sequences. Jou. Artif. Intell. Auto. Intell., 2 (3 ):385-409
Multi-Object Tracking is one of the main tasks in computer vision. It deals with the real-
time detection and tracking of several objects across video frames. This paper discusses and
compares three MOT algorithms: SORT, DeepSORT, and JDE on pedestrian tracking in
urban scenes. A brief discussion on some important theoretical aspects such as online vs of-
fline tracking, the use of Kalman filters, data association methods, and the use of appearance
features for identity continuity is presented. The three tracking algorithms are practically
evaluated based on theoretical knowledge on the MOT16 and MOT17 benchmark datasets.
The algorithms were tested under similar conditions using standard MOT metrics MOTA,
MOTP, IDF1 and number of identity switches along with visual inspection. Results state
that SORT is fast and simple but does not maintain consistent identities most of the time;
DeepSORT does better by adding appearance features; JDE does even better by combining
detection and feature embedding into one model at a cost of increased computational com-
plexity. Implementation issues are also discussed, and future work will include testing newer
models for better runtime efficiency and adaptability to real-world tracking scenarios.
[1] Ciaparrone G, Luque Sánchez FL, Tabik S, Troiano L, Tagliaferri R, et al. Deep Learning in
Video Multi-Object Tracking: A Survey. Neurocomputing. 2020;381:61-88.
[2] Luo W, Xing J, Milan A, Zhang X, Liu W, et al. Multiple Object Tracking: A Literature Review.
Artif Intell. 2021;293:103448.
[3] Agrawal H, Halder A, Chattopadhyay P. A Systematic Survey on Recent Deep Learning-Based
Approaches to Multi-Object Tracking. Multimedia Tool Appl. 2024;83:36203-36259.
[4] Du C, Lin C, Jin R, Chai B, Yao Y, et al. Exploring the State-Of-The-Art in Multi-Object
Tracking: A Comprehensive Survey Evaluation Challenges and Future Directions. Multimedia
Tool Appl. 2024;83:73151-73189.
[5] Wojke N, Bewley A, Paulus D. Simple Online and Realtime Tracking With a Deep Association
Metric. In: 2017 IEEE international conference on image processing. ICIP. IEEE. 2017:3645-
3649.
[6] Zhang Y, Wang C, Wang X, Zeng W, Liu W. Fairmot: On the Fairness of Detection and Re-
Identification in Multiple Object Tracking. Int J Comput Vis. 2021;129:3069-3087.
[7] Bewley A, Ge Z, Ott L, Ramos F, Upcroft B. Simple Online and Realtime Tracking. In: 2016
IEEE international conference on image processing. ICIP. IEEE. 2016:3464-3468.
[8] Zhang Y, Sun P, Jiang Y, Yu D, Weng F, et al. Bytetrack: Multi-Object Tracking by Associating
Every Detection Box. In: European conference on computer vision. Cham: Springer Nature.
2022:1-21.
[9] Bergmann P, Meinhardt T, Leal-Taixe L. Tracking Without Bells and Whistles. In:
Proceedings of the 2019 IEEE/CVF international conference on computer vision. New York:
IEEE. 2019:941-951.
[10] Wang Z, Zheng L, Liu Y, Li Y, Wang S. Towards Real-Time Multi-Object Tracking.
In: European Conference on Computer Vision. Cham: Springer International Publishing.
2020:107-122.
[11] Zeng F, Dong B, Zhang Y, Wang T, Zhang X, et al. Motr: End-To-End Multiple-Object
Tracking With Transformer. In: 2022 European conference on computer vision. Cham:
Springer Nature Switzerland. 2022:659-675.
[12] Manafifard M, Ebadi H, Abrishami Moghaddam HA. A Survey on Player Tracking in Soccer
Videos. Comput Vis Image Underst. 2017;159:19-46.
[13] Smal I, Meijering E, Draegestein K, Galjart N, Grigoriev I, et al. Multiple Object Tracking in
Molecular Bioimaging by Rao-Blackwellized Marginal Particle Filtering. Med Image Anal.
2008;12:764-777.
[14] https://encord.com/blog/yolo-object-detection-guide/.
[15] https://www.chooch.com/blog/what-is-object-detection/.
408
https://jaiai.org/ |November 2025 Journal of Artificial Intelligence and Autonomous Intelligence
[16] Redmon J, Divvala S, Girshick R, Farhadi A. You Only Look Once: Unified Real-Time Object
Detection. In: Proceedings of the 2016 IEEE conference on computer vision and pattern
recognition. CVPR. New York: IEEE. 2016:779-788.
[17] Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards Real-Time Object Detection With
Region Proposal Networks. Adv Neural Inf Process Syst. 2015;28.
[18] Duan K, Bai S, Xie L, Qi H, Huang Q, et al. Centernet: Keypoint Triplets for Object Detection.
In: Proceedings of the IEEE/CVF international conference on computer vision. 2019:6569-
6578.
[19] Kalman RE. A New Approach to Linear Filtering and Prediction Problems. Journal of Basic
Engineering. 1960;82:35-45.
[20] https://kalmanfilter.net/multiSummary.html.
[21] Cui Y, Zeng C, Zhao X, Yang Y, Wu G, et al. Sportsmot: A Large Multi-Object Tracking
Dataset in Multiple Sports Scenes. In: Proceedings of the IEEE/CVF international conference
on computer vision; 2023:9921-9931.
[22] https://en.wikipedia.org/wiki/Python_(programming_language.
[23] https://numpy.org/.
[24] https://pytorch.org/.
[25] https://pypi.org/project/opencv-python/.
[26] https://matplotlib.org/.
[27] https://github.com/cheind/py-motmetrics.
[28] Milan A, Leal-Taixé L, Reid I, Roth S, Schindler K. MOT16: A Benchmark for Multi-Object
Tracking. 2016. arXiv preprint: https://arxiv.org/pdf/1603.00831.
[29] Dendorfer P, Rezatofighi H, Milan A, Shi J, Cremers D, et al. MOT20: A Benchmark for Multi
Object Tracking in Crowded Scenes. 2020. arXiv preprint: https://arxiv.org/pdf/2003.09003
[30] Bernardin K, Stiefelhagen R. Evaluating Multiple Object Tracking Performance: The Clear
Mot Metrics. EURASIP J Image Video Process. Semantic Scholar. 2008;2008:1-10.
[31] https://motchallenge.net/.
[32] https://www.lightly.ai/blog/yolo.