Research on Multi-player Tracking in Soccer Videos by Combining Tracking-Prediction-Detection

Gyong-Jun So; Hak-Song Kim; Man-Chol Ho; Jin-Song Ri

doi:doi:10.11648/j.sr.20251304.15

Research Article |

| Peer-Reviewed

Research on Multi-player Tracking in Soccer Videos by Combining Tracking-Prediction-Detection

Gyong-Jun So, Hak-Song Kim^*, Man-Chol Ho, Jin-Song Ri

Published in Science Research (Volume 13, Issue 4)

Received: 26 May 2025 Accepted: 1 July 2025 Published: 8 August 2025

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Unlike vehicles or pedestrians, which have relatively predictable motion patterns, soccer players try to confuse each other with unexpected changes in velocity or direction and look almost identical because of their same jerseys in each team, and they can be frequently occluded by others. The accuracy of Multi-player tracking, that is the key issue in the soccer video analysis, is highly dependent on the accuracy of the player detection, and some factors such as video quality, camera long-range defocus, noise, weather or environmental changes can also be the factors that make it difficult to accurately detect players. In this paper, we propose an approach that combines the tracking history, query prediction and detection information reasonably to improve the single-view player tracking performance and then integrates the multi-view analysis information to improve the multi-player tracking performance in soccer videos. First, in 2D single-view multi-object tracking, we combine the deep learning-based player detection/tracking result and their prediction information by the trajectory to decrease the bias and mistakes in the player detection and improve the multi-player tracking performance. And, to improve the robustness of object data association, we use an assignment cost calculation method that integrates L2-distance and Intersection over Union (IOU) between the tracking query region and the detected object region, and also use an identity assignment method that combines Hungarian method and greedy one. Then we integrate the multi-view analysis information to decrease the occlusions of the players dramatically and improve the overall tracking performance significantly. Experimental results on several soccer video datasets show that our approach exhibits significant improvements over the previous approaches in terms of Multi-Object Tracking (MOT) performance metrics such as Multi-Object Tracking Accuracy (MOTA), Number of Identity Switchs (IDS) and Multi-Object Tracking Precision (MOTP).

Published in	Science Research (Volume 13, Issue 4)
DOI	10.11648/j.sr.20251304.15
Page(s)	90-100
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2025. Published by Science Publishing Group

Keywords

Soccer Video Analysis, Multi-Player Tracking, Data Association, Player Detection Mistake, Multi-view Analysis

1. Introduction

Soccer (football) is among the world’s most popular sports played by millions of people. Such popularity has led many computer vision researchers to work on soccer video analysis. Soccer video analysis offers the information for team/player performance analysis, referee decision support, video summarization, highlight extraction, and intelligent broadcast. As being an incredibly competitive field, soccer clubs around the world are incorporating video analysis methods as training tools for the development of the team. In the development of teams, playbacks provide unparalleled coverage of key events, enabling teams to understand their own strengths and weaknesses, facilitating strategy development

[1-3]

Team/player performance measurement systems have the potential to reveal aspects of the game that are not obvious to the human eye. Such systems can measure the distance covered by players, speed of movement, number of sprints, and players’ relative positioning with respect to others and then use this data in individual player performance evaluation, fatigue detection, assessment of team’s tactical performance and analysis of the opponents

[2]

Multi-player tracking that accurately tracks multiple players on soccer video in real time is the key issue in performance evaluation, and requires detecting all the players on video, finding their positions at regular intervals, and linking the spatiotemporal data to extract their moving trajectories.

Multi-player tracking in a soccer match is a nontrivial task due to various challenges. Unlike vehicles or pedestrians, which have relatively predictable motion patterns, soccer players try to confuse each other with unexpected changes in velocity or direction and usually run in groups. Moreover, soccer players look almost identical because of their same jerseys in each team, and they are frequently involved in possession challenges and tackles, where they can be occluded by others, resulting in some ambiguities for tracking. Since the trajectory of a tracking object can be propagated toward another tracking object or other occlusion element, the complete occlusion of tracking objects can result in identity switches or identity hijacking of tracking objects

[3, 4]

. Here, the identity refers to a label with a unique integer value assigned to the trajectory of the tracking object to distinguish it from each other.

The accuracy of tracking is highly dependent on the accuracy of object detection, and factors such as video quality, camera long-range defocus, noise, weather change or environmental change can also be factors that make it difficult to accurately detect players

[2, 4, 21]

With recent advances in object detection, most of the state-of-the-art multi-object tracking algorithms adopt a “tracking-by-detection” paradigm

[15, 16]

. Even though, given single frame detection results of the video, different approaches have been proposed to improve data association, motion propagation, and life cycling, most of these works assume the localization accuracy of each detection output. Therefore, data association is usually conducted based on location, optionally combined with abstracted object attributes

[18, 19]

. This bias is a drawback of camera modality with higher localization uncertainty. Although the latest methods incorporate deep learning-based algorithms to improve the association with high-fidelity features such as low-level features from feature point cloud or intermediate features from cameras, these approaches are also highly dependent on the accuracy of localization

[17, 20]

The recently published YOLOv8n object detection model

[5, 7]

has significantly improved the detection performance of objects with different sizes and orientations, but as shown in Figure 1, problems such as uncertainty of detection in the occlusion regions where multiple players are close to each other, and missing detection due to noise and long-range defocus, still arise. Only with the track-query prediction, the short-term missing detection problem can be solved, but the relatively long-term missing detection problem cannot be solved.

Download: Download full-size image

Figure 1. Detection errors of YOLOv8n player detection result.

In this paper, as a solution, we propose an approach that combines the tracking history, query prediction and detection information reasonably to improve the single-view player tracking performance and then integrates the multi-view analysis information to improve the multi-player tracking performance in soccer videos.

It can be seen that the tracking history information in several consecutive frames is relatively reliable compared to the object detection result using only one frame, and the reliability is considered to be proportional to the number of frames in the tracking history. Sometimes, object detection result using only one-frame can be ambiguous and biased. Therefore, combining reasonably the spatiotemporal detection histories of the trajectories, the query prediction information based on them, and the current detection result can improve the reliability of object detection result, gradually decrease its bias, and thus improve the tracking performance. In addition, by excluding false detections in the occlusion regions of several players, the uncertainty in player detection can be decreased to some extent.

Multi-view geometric constraints can exclude false detections in a single view and improve multi-frame association, whereas multi-frame association in each view can compensate for the effect of noise and outliers that hamper multi-view association. Therefore, we attempt to jointly leverage multi-frame and multi-view information. We demonstrate the advantages of our approach by presenting experimental results on several test datasets.

To summarize, our contributions are as follows.

1). In 2D single-view multi-player tracking,

a. We propose an approach that combines the deep learning-based player detection/tracking result and their prediction information by the trajectory to decrease the bias and mistakes in the player detection and improve the multi-player tracking performance.

b. We propose an approach that uses an assignment cost calculation method that integrates L2-distance and IOU between the tracking query region and the detected object region, and also use an identity assignment method that combines Hungarian method and Greedy one, to improve the robustness of object data association.

2). We integrate the multi-view analysis information to decrease the occlusions of the players dramatically and improve the overall tracking performance significantly.

3). Experimental results on soccer video datasets show that our approach exhibits significant improvements over the previous approaches in terms of MOT performance metrics such as MOTA, IDS and MOTP.

2. Outline of Our Approach

In Figure 2, we show the camera placement and view for our soccer video analysis system. The system uses eight video cameras with a resolution of 1920×1088 pixels at a rate of 25 frames per second.

Download: Download full-size image

Figure 2. Camera placement and view for our system.

Download: Download full-size image

Figure 3. Our multi-player tracking framework.

In our multi-player tracking framework (Figure 3), we first detect all the player candidates from the input video at each camera and then run the 2D single-view multi-player tracking module. And then we register all the tracking objects from all views onto one soccer field model via homography transformations, and integrate these results, and then comprehensively analyse them to decrease the player occlusions.

3. Player Candidate Detection and Verification

In this paper, we transfer the human detection model of YOLOv8n to suit the player detection situation in soccer videos and use it to detect all player candidates in video frames from each camera view

[5, 7]

Object detection algorithms typically generate multiple bounding boxes with different confidence scores around an identical object. And then a post-processing technique, Non-Maximum Suppression (NMS) algorithm, filters out the redundant and irrelevant bounding boxes, keeping only the most accurate one with the highest confidence score

[5]

The standard NMS algorithm works as follows. First, filter out the unnecessary bounding boxes whose reliability score is below the threshold, and store the remaining ones in the predicted bounding box list. Next, sort the predicted bounding boxes in the bounding box list by their confidence scores in descending order. And, until the predicted bounding box list is empty, repeat as follows. First, select a bounding box with the highest confidence score in the predicted bounding box list, then put it in the final bounding box list and remove it from the predicted bounding box list. Next, in the predicted bounding box list, find and remove the bounding boxes whose IOU with the selected bounding box is larger than the threshold. Here, the IOU threshold is used to characterize a bounding box cluster containing an identical object, the bounding box sorting by the confidence score is used to select the centre of the bounding box cluster, and the confidence threshold is used to select only the most accurate ones among those cluster centres. The standard NMS algorithm uses the fixed confidence threshold and IOU threshold, which decreases the flexibility of object detection

[5]

To solve this problem, we modify the standard NMS algorithm as follows. First, we further use the lower limit threshold of the confidence scores of the potential player entities to store all of the player entities with low confidence scores. And we use them to correct some of the player detection errors described in the following sections. Verification of such player detection errors is carried out by selecting the candidate bounding box that is most similar to the prediction result by the trajectory among the candidate bounding boxes in the local error region.

The algorithm is as follows.

[Algorithm 1: Modified NMS Algorithm]

Require:

B

, Set of predicted bounding boxes,

S

, confidence scores,

τ

, IoU threshold,

T

, confidence threshold,

T_{l}

, lower bound threshold of confidences. (

T_{l} < T

)

Ensure:

F

, Set of filtered bounding boxes,

F_{c}

, Set of filtered candidate bounding boxes.

1: Initialize:

F \leftarrow \emptyset

F_{c} \leftarrow \emptyset

2: Filter the boxes in

B

with

T_{l}

B \leftarrow {b \in B | S (b) \geq T_{l}}

3: Sort all

b \in B

by their confidence scores in descending order.

4: while

B \neq \emptyset

5: Select the box

b \in B

with the highest confidence score.

6: Add

b

F_{c}

F_{c} \leftarrow F_{c} \cup {b}

7: Remove

b

from

B

B \leftarrow B - {b}

8: for all remaining boxes

r

B

9: Calculate the IOU between

b

and

r

iou \leftarrow IOU (b, r)

10: if

iou \geq τ

then

11: Remove

r

from

B

B \leftarrow B - {r}

12: end if

13: end for

14: end while

15: Filter the boxes in

F_{c}

with

T

F \leftarrow {b \in F_{c} | S (b) \geq T}

16: Remove

F

from

F_{c}

F_{c} \leftarrow F_{c} - F

4. 2D Single-view Multi-player Tracking

2D Single-view MOT, which is the basis for 3D Multi-view Multi-person Tracking (3D MM-Tracking), is still challenging due to a large number of objects that need to be tracked, the occlusions between objects, and the changing appearances of objects over time.

In general, appearance and geometric consistency are two important assumptions used for MOT. Appearance consistency means that the previous appearance of an identical object should be similar to its current appearance, and geometric consistency means that its previous location and shape added to its estimated motion should be approximate to its current location and shape. While appearance-based MOT methods

[9-11]

have achieved promising performance, recent appearance-free MOT solutions

[12, 13]

prove that only using the geometric features can also provide robust tracking results on multiple difficult MOT datasets

[14]

In order to achieve fast online processing, in this paper, we mainly investigate the appearance-free approaches based on geometric consistency, and also show that adding some appearance features can improve the tracking performance.

4.1. Analysis of Multi-player Tracking Errors

Multi-object tracking by detection consists of the iterations of the object detection and data association. Therefore, multi-object tracking errors can be classified as the object detection errors and the data association errors.

From the viewpoint of the appearance of an object in an image and the detection of it, the object detection examples of a detector can generally be classified into True Positives (TPS), False Positives (FPS), False Negatives (FNS) and True Negatives (TNS).

When an object detector says an object detection example is an object, it is a True Positive if in fact it is an object and a False Positive if in fact no object is present. And, when the detector says an object detection example is not an object, it is a True Negative if in fact it is not an object and a False Negative if in fact it is an object.

Data association errors come from the wrong association between the detected objects and the tracking ones. Due to the characteristics of the soccer match, in multi-player tracking in soccer videos, occlusions and separations of the players occur frequently, and their patterns are very complex. Multi-player tracking is a free competition assignment problem because there is no constraint on the assignment between the tracking objects and detected ones. The occlusion is a situation where the correspondence between the tracking objects and new detected ones is n-to-1, and the separation is a situation where the correspondence between them is 1-to-n. Owing to the frequent occlusions and separations of the players during the match, multi-player tracking entails the frequent occurrences of new appearances and disappearances of the tracking objects over time in videos.

Similar to the object detection, we can classify the new appearances and disappearances of the tracking players in videos into the following categories.

First, we classify the newly appeared tracking players in videos into two classes: TPS and FPS. TPS refer to the newly appeared tracking players that are detected by the player detector and in fact appear in the image. They consist of two classes: that come from the outside into the inside of the camera view, and that are separated from the occluded player groups. And FPS refer to the newly appeared tracking players that are detected by the player detector but in fact do not appear in the image. They are the player detection errors caused by the player detector. They also consist of two classes: that in fact do not appear in the image, and that the player detector detects more than two bounding boxes around an identical player.

Next, we classify the disappeared tracking players in videos into two classes: TNS and FNS. TNS refer to the disappeared tracking players that are not detected by the detector and in fact do not appear in the image. They consist of two classes: that go out of the camera view, and that are occluded by other tracking players. (In other cases, there is also the occlusion by the background objects, but we exclude it.) FNS refer to the disappeared tracking players that are not detected by the detector but in fact appear in the image. They are also the player detection errors caused by the player detector.

We can classify all the player detection errors occurred during multi-player tracking to FNS or FPS, and find them in a way that distinguishes from TNS and TPS, respectively.

FNS: The errors that the detector did not detect some players that have been tracked so far and in fact appear in the current frame image (see Figure 1(a)). They are also called the missing detections and should be distinguished from TNS.

FPS: They can be classified into two classes, and should be distinguished from TPS.

a) False detection errors: The errors that are detected by the detector but in fact do not appear in the image.

b) Overlapped detection errors: The errors that the detector detects more than two bounding boxes around an identical player (see Figure 1(b)), or that caused by false detection in some regions where several players are adjacent (see Figure 1(c)), due to some reasons such as NMS processing of the detector, etc. Generally, it is very difficult to find the overlapped detection errors correctly.

In multi-player tracking, such detection errors are the main causes of the segmentation of the trajectories and identity increasing, and also appear to some extent in deep learning-based player detection results with high detection performance.

4.2. Framework of Our Multi-player Tracking Module

In Figure 4, we show the framework of our 2D single-view multi-player tracking module.

Download: Download full-size image

Figure 4. The framework of our 2D single-view multi-player tracking module.

Our framework can be outlined as follows.

First, with the player detection model using YOLOv8n, we detect all players appeared in the current frame image, and then apply Hungarian method

[8]

to perform a 1-to-1 assignment between the current detection results and the tracking results to the previous frame. Next, we use the formula (10) to find and eliminate only obvious errors among the aforementioned overlapped detection errors. And then, for the detected objects to which have not been assigned any identity and for the tracking objects that are not assigned to any detected object, we apply the greedy method to perform the occlusion/separation processes of the tracking objects. Finally, based on the tracking history to the previous frame, the result predicted by it, and the player detection result in the current frame, we find all the player detection errors in the current frame. And then, for each of them, we select a candidate bounding box that is most similar to the result predicted by the tracking history among the candidate bounding boxes that are included in the candidate bounding box list

F_{c}

and also are in its target region, and modify it according to the result.

4.3. 1-to-1 Assignment Between Tracking and Detected Objects

From a descriptive necessity, hereafter, let us denote the sets of the tracking objects and detected ones at time

t

Q_{t} = {q_{i}^{t}}

and

D_{t} = {d_{j}^{t}}

, respectively.

Applying Hungarian method, we assign each of the detected objects in the current frame image

d_{j}^{t} \in D_{t}

to each of the tracking ones up to the previous frame

q_{i}^{t - 1} \in Q_{t - 1}

, and compute the assignment cost

C_{ij}

between them as follows:

C_{ij}^{1} = L_{2} (m_{q_{i}^{t - 1}}, m_{d_{j}^{t}})

(1)

C_{ij}^{2} = L_{2} (m_{{qp}_{i}^{t - 1}}, m_{d_{j}^{t}})

(2)

C_{ij}^{3} = \min_{k \in {1, 2}} C_{ij}^{k}

(3)

C_{ij}^{4} = \frac{S_{(q_{i}^{t - 1}, d_{j}^{t})}^{cross}}{(S_{q_{i}^{t - 1}} + S_{d_{j}^{t}} - S_{(q_{i}^{t - 1}, d_{j}^{t})}^{cross})}

(4)

C_{ij} = (1.0 - C_{ij}^{4}) \times C_{ij}^{3}

(5)

i_{j}^{*} = \underset{i}{argmin} {C_{ij}}

(6)

where

m_{α}

denotes the location of the center of the bounding box of the object

α \in {q_{i}^{t - 1}, d_{j}^{t}}

L_{2} (∙, ∙)

is the

L_{2}

-norm between the two objects,

{qp}_{i}^{t - 1}

is the prediction region of the tracking object

q_{i}^{t - 1}

[16]

S_{α}

denotes the area of the detection region of the object

α

S_{(q_{i}^{t - 1}, d_{j}^{t})}^{cross}

represents the area of the occlusion region of the detection regions of

q_{i}^{t - 1}

and

d_{j}^{t}

We assign the detected object

d_{j}^{t}

to the tracking object

q_{i_{j}^{*}}^{t - 1}

and assign the identity of

q_{i_{j}^{*}}^{t - 1}

d_{j}^{t}

if the minimum assignment cost

C_{i_{j}^{*} j}

is below a threshold.

C_{i_{j}^{*} j} < T_{C}

(7)

where

T_{C}

is the assignment cost threshold, whose value is determined experimentally.

To improve the detection accuracy of the tracking object, the detection information of the tracking object in the image coordinate system of each camera view is expressed as a 12D feature vector

(x, y, w, h, θ, s, \overset{̅}{x}, \overset{̅}{y}, \overset{̅}{w}, \overset{̅}{h}, \overset{̅}{θ}, \overset{̅}{s})

including the central position

(x, y)

, width and height

(w, h)

, direction

(θ)

, reliability score

(s)

of the bounding box, and the rates of change of each of these quantities.

The rate of change of each quantity is obtained as follows:

\overset{̅}{β} = \frac{1}{τ} \sum_{k = 1}^{τ} (β_{t - k} - β_{t - k - 1})

β \in {x, y, w, h, θ, s}

(8)

where

τ = \min (τ_{h}, t - 1)

is the number of prediction frames set to represent the temporal variability of the quantities of interest,

τ_{h}

is its value estimated experimentally, and

t

is the total number of frames of the trajectory of the tracking object.

The final detection information of the tracking object is adjusted as follows.

q_{i}^{t} = (x_{i}^{t} + ∆ x, y_{i}^{t} + ∆ y, w_{i}^{t} + ∆ w, h_{i}^{t} + ∆ h, θ_{i}^{t} + ∆ θ, s_{i}^{t} + ∆ s)

(9)

According to Eq. (9), the object detection bias of the tracking object gradually decreases.

4.4. Exploring and Elimination of the Overlapped Detection Errors

Since it is difficult to find the overlapped detection errors correctly, we find and remove only the obvious errors.

For each of the newly appeared detected object within the camera view, we assign an occlusion label to the occlusion pixels with the neighbouring detected objects that the tracking identity is assigned to.

Let

w

and

h

be the width and height of the bounding box of the detected object of interest, respectively, and let

N_{cross}

the number of pixels assigned the occlusion label inside that region. If the following formula is satisfied, we judge it as an overlapped detection error, and remove it from the detection object list.

\frac{N_{cross}}{w \times h} > T_{fd}

(10)

where

T_{fd}

is a threshold determined experimentally, and we set its value to 0.95 through statistical analysis experiments.

4.5. Occlusion / Separation of the Tracking Objects

At this stage, first, we find the detected objects with no any tracking identity assigned and the tracking objects with no detected object assigned, and then for them, apply the greedy method to perform the occlusion/separation processes of the tracking objects.

1) Occlusion of the tracking objects

First, we find a tracking object with no detected object assigned. If any, we investigate whether there are any detected objects that have already been assigned a tracking identity within its appropriate neighbourhood or not. If any, among them, we find the detected object with the minimum assignment cost, and add the identity of the tracking object of interest to its tracking identity list. And then we delete the tracking object of interest. We repeat these steps until there is no such tracking object.

2) Separation of the tracking objects

First, we find a detected object with no any tracking identity assigned. If any, we investigate whether there are any detected objects to which have already been assigned a tracking identity within its appropriate neighbourhood or not. If any, we find a detected object with the minimum assignment cost, and in its identity list, select the identity of the tracking object with the most similar appearance characteristics to the detected object of interest, and then set it as the tracking identity of the target detected object. Then we delete the target detected object. We repeat these steps until there is no such detected object.

Although we did not describe specially, the team information of a tracking object with a single identity plays an important role in the analysis of the appearance characteristics of a separated object. And “its appropriate neighbourhood” means the neighbourhood determined by formula (7).

4.6. Exploring and Correction of the Player Detection Errors Using the Tracking Results

1) Exploring and Verification of FNS

a. Among the tracking objects, we find all the tracking objects with no detected object assigned, i.e., that have been disappeared in the current frame.

b. Among them, we find and exclude all TNS.

First, we find the disappeared tracking objects due to going out of the camera view, by examining whether their vanishing positions are near the boundaries of the frame image or not, and if any, exclude them.

Next, we find the disappeared tracking objects due to the occlusions by another tracking objects, by exploring the detected object that is assigned simultaneously to more than two different tracking objects, and if any, exclude them.

a. Among the remaining tracking objects, we obtain a minimal bounding rectangle containing both of the detection region (i.e., object detection region in the last frame of the trajectory) of the target tracking object and its prediction region obtained from its trajectory information and the extended Kalman filter

[2]

, and then extend it slightly to get an exploring region for the verification of FNS.

b. In the candidate bounding box list

F_{c}

from the modified NMS, we find the candidate bounding boxes that lie within the exploring region. If any, among them, we choose a candidate bounding box that is most similar to the result predicted from the trajectory information, and use it as a new detected object of the target tracking object in the current frame and continue tracking it. If no, end up tracking it.

2) Exploring and Elimination of the False Detection Errors

a. First, we find all the detected objects with no identity assigned, i.e., the newly appeared detected ones in the current frame.

b. Among them, we find the detected objects that are near the boundaries of the image by examining their position, and exclude them.

c. Among the remaining ones, we find the newly appeared detected objects due to the separation from the occlusion by another tracking objects, by exploring for the detected objects with no tracking objects assigned, and exclude them.

d. For each region of the remaining ones, we examine whether in fact an object exists in that region by exploring the candidate bounding box list

F_{c}

as above, and if not, remove it from the detected object list.

5. Multi-View Analysis

The positions of the players as the results of single-view multi-player detection/tracking, are those in terms of the each camera view coordinates, but what we would like to achieve from the soccer video analysis is the overall match analysis results in terms of the realistic soccer field model coordinates. Therefore, we integrate the information from all the cameras to represent the results in terms of a common model coordinates of the soccer field.

We detect all the player candidates from the input video of each camera view, and execute the player tracking module, and synchronize them in terms of the shooting times, and map them onto a common model of the soccer field.

Given the projections of all players on the field model, we establish the correspondence between the detected players across cameras to integrate them, and analyse the results to decrease the occlusions of the players as much as possible and obtain the overall match analysis results.

5.1. Player Registration Onto the Field Model

A general mathematical expression describing the relationship between the 3D coordinate point of the object being captured and its 2D image point is the camera matrix. If we assume the shooting scene is to be planar, i.e. all three-dimensional coordinate points lie on a plane, then, we can reduce the camera matrix. This is known as homography, or the planar projective transformation

[2]

With the image coordinates of all players and the homography matrices of all cameras, we project all players onto the field model, shown in Figure 5. The colour of the circle indicates the player’s team while the number inside the circle represents the camera that was tracking it.

Since the dimensions of the field model are proportional to the realistic soccer field, we can apply a scaling operation to convert the coordinates of all projections from pixels to meters.

5.2. Player Fusion on the Field Model

As seen in Figure 5, the result of registering the players from multiple cameras produces multiple objects on the field model. Therefore, we need to identify the pairs of objects that belong to the same player in the realistic world, and integrate them.

We use the nearest neighbour method to identify the pairs of objects that belong to the same player in the realistic world. For each player, we compute the L2-distance between every other player from the same team and declare the pair with the smallest distance as the same object if their distance does not exceed a certain threshold (Figure 6).

Download: Download full-size image

Figure 5. Field Model Registration of All Players.

Download: Download full-size image

Figure 6. Player Fusion Result on the field model.

Players that do not meet the criteria above will still remain in the field albeit without a pair. In this case, it is assumed that the player is being tracked by a single camera and thus does not have a closest pair. Such a scenario also occurs when the occlusion handling fails.

5.3. Occlusion Handling Using Multi-view Information

Uncertainty during the occlusion and separation of the players is an important issue that leads to uncertainty in the tracking identity management. The use of player tracking information in different camera views can easily solve this problem.

If the players occluded in a certain camera view appear to be the different players separated in another camera view, we treat them as the separated tracking objects on the field model. In that way, we can separate many occluded objects into the individual players and track them separately and accurately. This approach is relatively simple, but very effective.

6. Experiment Results

6.1. Evaluation Data Sets

We compare our approach with the other tracking methods using the publicly available Institute of Intelligent Systems for Automation (ISSIA) dataset

[2]

. This dataset consists of 3000 frames captured at 25 frames/s by six cameras placed around a stadium in a multi-view configuration. In order to evaluate our method on the large scale, we also experiment on a video dataset captured during 45 min by eight cameras at a full-length soccer stadium, as shown in Figure 2. The evaluation metrics are approximated for this tracking data.

6.2. Evaluation Metrics

In our experiments, to evaluate the tracking results from various perspectives, we evaluate the multi-player tracking performance strictly following the official metrics to employ CLEAR metrics and Identity metrics

[20, 21]

. In detail, IDS (i.e., Number of ID switches) indicate the times of identity jumps, IDF1 (i.e., ID F1 scores) accounts for identity match performance, and MOTA (i.e., Multi-Object Tracking Accuracy) is a combination of false positives, missed targets and IDs. MT (i.e., number of mostly tracked trajectories) measures the number of trajectories whose target was tracked more than 80%, whereas ML (i.e., number of mostly lost trajectories) measures the number of trajectories that have less than 20% target tracked. Among them, the MOTA score is the dominant metric used to measure the overall tracking performance. And MOTP (Multi-Object Tracking Precision) is a summary of overall tracking precision in terms of bounding box overlap between ground-truth and predicted location. It incorporates the Intersection over Union (IOU) measure to assess the quality of the predicted bounding boxes.

6.3. Experiment Setup

For detection of players, we choose the object detection model of YOLOv8n

[5]

and follow the default settings of it, to train and inference bounding boxes for our experimental datasets. We run it on an NVIDIA Geforce RTL 2070 GPU.

For NMS, we set up the parameters of the modified NMS algorithm as

τ = 0.5

T = 0.5

T_{l} = 0.19

. And, for one-to-one assignment between the tracking objects and the detected ones, we set up the assignment cost threshold of the formula (7) as

T_{C} = 25

To identify the pairs of objects on the field model that belong to the same player in the realistic world, we set up the L2-distance threshold at the nearest neighbour method to be less than 5m in terms of the realistic distance. As our soccer field model is 660×450 pixels, we set up it to 30.

6.4. Experiment Results

We explore the following aspects of our framework.

1) Efficacy of our assignment cost calculation method and identity assignment method.

In Table 1, we compare the multi-player tracking performances when L2-distance (Eq. (1)), IOU (Eq. (4)), and our method (Eq. (5)) are applied to the assignment cost calculation between the tracking objects and detected ones, respectively. And in Table 2, we show the multi-player tracking performances when the Hungarian method, the greedy method and our combined approach are applied to the assignment between the tracking objects and detection ones, respectively. In these experiments, we use the object detection model of YOLOv8n and the simple online real-time tracking (SORT) method.

Table 1. MOT performances using the different assignment cost calculation methods (Hungarian method).

approach

IDF1↑ MOTA↑ MT↑ ML↓ IDs↓

Eq. (1)

[16]

86.4% 80.2% 79.6% 12.1% 188

Eq. (4)

[17]

87.1% 80.9% 78.5% 12.8% 186

Eq. (5) (our)

89.3% 81.2% 82.6% 11.3% 178

Table 2. MOT performances using the different assignment methods.

approach

IDF1↑ MOTA↑ MT↑ ML↓ IDs↓

Hungarian

[8]

89.3% 81.2% 82.6% 11.3% 178

Greedy

[1]

88.6% 80.7% 78.3% 13.7% 191

Comb.

92.4% 83.6% 84.5% 9.5% 166

In Table 1, our approach achieves a good result with some improvement on every metric. Using our approach, MOTA, IDF1 and IDS are improved more than 0.3%, 2.2%, and 8, respectively, over the previous methods. And, from it, it can be seen that, using our approach, there is a tendency for all metrics to be improved slightly. In addition, in Table 2, our combined approach also achieves the excellent result with some significant improvements on all metrics. MOTA, IDF1 and IDS are improved more than 2.4%, 3.1%, and 12, respectively, over the other approaches, and other metrics are also improved. Such results indicate the excellent association ability of our algorithm.

2) Efficiency of Exploring and Correction of the Player Detection Errors using the Tracking results.

In this part, we analyse the efficiency of our exploring and correction method of the player detection errors using the tracking results, in terms of the performances of multi-player tracking and player detection.

In Table 3, we show the performances of multi-player tracking and detection when we apply our approach to several object detection methods. In Table 3, both of YOLOv5

[6]

and YOLOv8n

[5]

are the deep learning-based object detection methods. In the experiment, we apply our approach to the object detection/tracking results from these methods and analyse their effectiveness.

Table 3. Applying result of our approach.

approach

IDF1↑ MOTA↑ MOTP↑ IDs↓

YOLOv5

[6]

89.5% 81.7% 75.8% 187

YOLOv8n

[5]

92.4% 83.6% 73.9% 166

YOLOv5 + our

94.7% 93.2% 87.3% 169

YOLOv8n + our

97.2% 94.4% 86.6% 151

As in Table 3, applying this approach, MOTA, IDF1 and IDS are increased, on average, 11.15%, 5.0%, and 16.5, respectively, over the previous methods. Such improvements by this approach are significantly large, compared to the improvements by the above approaches. In Table 3, MOTP is also increased 12.1% on average. This improvement mainly comes from adapting the equation (9). Such significant improvements confirm the necessity and efficiency of our approach for multi-player tracking.

3) Efficiency of multi-view analysis.

In Table 4, we show the multi-player tracking performances, with and without occlusion handling using multi-view analysis information. From Table 4, it can be seen that the multi-player tracking performance on the field model is significantly improved with occlusion handling using multi-view analysis information. And the tracking performance using eight cameras is better than that using six cameras.

Table 4. Efficiency of Multi-view Analysis.

Multi-view analysis	IDF1↑ MOTA↑ MT↑ ML↓ IDs↓
no	97.2% 94.4% 96.2% 4.6% 151
do (6 cameras)	98.7% 98.9% 98.1% 2.2% 35
do (8 cameras)	99.3% 99.4% 98.3% 1.7% 24

7. Conclusion

We propose an approach that combines tracking history, query prediction and detection information reasonably to improve the single-view player tracking performance, and integrates the multi-view analysis information to improve the multi-player tracking performance in soccer videos. In 2D single-view multi-player tracking, combining deep learning-based player detection/tracking information and their prediction information by the trajectories reduces the bias and mistakes in the player detection significantly and therefore, improves the multi-player tracking performance. And, integrating of L2-distance and IOU between the tracking query region and the detected object region and combining Hungarian method and greedy one improve the robustness of object data association. And integration of multi-view analysis information decreases the occlusions of the players dramatically and improves the overall tracking performance significantly. Experimental results show that our approach exhibits significant improvements in terms of several MOT performance metrics such as MOTA and IDS and MOTP.

Abbreviations

IOU	Intersection over Union
MOT	Multi-Object Tracking
MOTA	Multi-Object Tracking Accuracy
IDS	Number of Identity Switchs
MOTP	Multi-Object Tracking Precision
NMS	Non-Maximum Suppression
TPS	True Positives
FPS	False Positives
TNS	True Negatives
FNS	False Negatives

Author Contributions

Gyong-Jun So: Conceptualization, Methodology, Writing-original draft preparation

Hak-Song Kim: Formal analysis, Project administration

Man-Chol Ho: Investigation, Software, Writing-review & editing

Jin-Song Ri: Resources, Data curation, Validation

Conflict of Interest

The authors declare no conflicts of interest.

References

[1]	Manafifard M, Ebadi H, Moghaddam HA. (2017). A survey on player tracking in soccer videos. Computer Vision and Image Understanding. 159, 19-46. https://doi.org/10.1016/j.cviu.2017.02.002
[2]	S. Baysal, P. Duygulu. (2016). Sentioscope: A soccer player tracking system using model field. IEEE Transactions on Circuits and Systems for Video Technology. 26(7), 1350-1362. https://doi.org/10.1109/TCSVT.2015.2455713
[3]	Su, Z.; Li, J.; Chang, J. (2020). Real-time visual tracking using complementary kernel support correlation filters. Front. Comput. Sci. 14, 417-429. https://doi.org/10.1007/s11704-018-8116-1
[4]	C. Direkoglu, M. Sah, and N. E. O’Connor. (2018). Player detection in field sports. Machine Vision and Applications, Springer. 29, 187-206. https://doi.org/10.1007/s100138-017-0893-8
[5]	Juan R. Terven, Diana M. Cordova-Esparza. (2023). A comprehensive review of YOLO: from YOLOV1 and BEYOND: Under review in ACM computing surveys, arXiv preprint arXiv: 2304.00501. 1-34. https://doi.org/10.48553/arXiv.2304.00501
[6]	G. Jocher. (2023). YOLOv5 by Ultralytics. https://github.com/ultralytics/yolov5. Accessed: February 30.
[7]	M. Contributors. (2023). YOLOv8 by MMYOLO. https://github.com/open-mmlab/mmyolo/tree/main/configs/yolov8 Accessed: May 13.
[8]	Quazzafi Rabbani et al. (2019). Modified Hungarian method for unbalanced assignment problem with multiple jobs, Applied Mathematics and Computation. 361, 493-498. https://doi.org/10.1016/j.amc.2019.05.041
[9]	Fan Yang, Zheng Wang, Yang Wu, Sakriani Sakti, and Satoshi Nakamura. (2022). Tackling multiple object tracking with complicated motions—re-designing the integration of motion and appearance. Image and Vision Computing. 124, 1-10. https://doi.org/10.1016/j.mavis.2022.104514
[10]	Fangao Zeng, Bin Dong, Yuang Zhang, Tiancai Wang, Xiangyu Zhang, and Yichen Wei. (2022). Motr: End-to-end multiple-object tracking with transformer. In European Conference on Computer Vision. 659-675. https://doi.org/10.1007/978-3-031-19812-038
[11]	Xingyi Zhou, Tianwei Yin, Vladlen Koltun, and Philipp Krähenbühl. (2022). Global tracking transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8771-8780. https://doi.org/10.48550/arXiv.2203.13250
[12]	Yunhao Du, Yang Song, Bo Yang, and Yanyun Zhao. (2022). Strongsort: Make deepsort great again. arXiv preprint arXiv: 2202.13514. https://doi.org/10.48550/arXiv.2202.13514
[13]	Fan Yang, Shigeyuki Odashima, Shoichi Masui, and Shan Jiang. (2023). Hard to track objects with irregular motions and similar appearances? make it easier by buffering the matching space. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 1, 4799-4808. https://doi.org/10.48550/arXiv.2211.14317
[14]	Silvio Giancola, Anthony Cioppa, Adrien Deliège, Floriane Magera, Vladimir Somers, Le Kang, Xin Zhou, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, et al. (2022). Soccernet 2022 challenges results. In Proceedings of the 5^th International ACM Workshop on Multimedia Content Analysis in Sports, 75-86. https://doi.org/10.48550/arXiv.2210.02365
[15]	Shin-Yi Wen, Yu Yen, Albert Y. Chen. (2020). Human Tracking for Facility Surveillance. CVC 2019, AISC 944. 329-338. https://doi.org/10.1007/978-3-030-17798-0_27
[16]	Zelin Zhao et al. (2022). Tracking Objects as Pixel-wise Distributions, arXiv preprint arXiv: 2207.05518. 1-19. https://doi.org/10.48550/arXiv.2207.05514
[17]	Yifu Zhang, Chunyu Wang, Xinggang Wang, Wenjun Zeng, and Wenyu Liu. (2021). FairMOT: On the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision. 1-13. https://doi.org/10.1007/s11263-021-01513-4
[18]	Duy MH Nguyen, Roberto Henschel, Bodo Rosenhahn, Daniel Sonntag, and Paul Swoboda. (2022). Lmgp: Lifted multicut meets geometry projections for multi-camera multi-object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8866-8875. https://doi.org/10.48550/arXiv.2111.11892
[19]	Ziqi Pang, et al. (2023). Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking, arXiv preprint arXiv: 2302.03802. 1-15. https://doi.org/10.48550/arXiv.2302.03802
[20]	Fan Yang, et al. (2024). A Unified Multi-view Multi-person Tracking Framework. Computational Visual Media. 10, 1, 137-160. https://doi.org/10.1007/s41095-023-0334-8
[21]	Ergys Ristani, Francesco Solera, Roger Zou, Rita Cucchiara, and Carlo Tomasi. (2016). Performance measures and a data set for multi-target, multi-camera tracking. In European conference on computer vision. 17-35. https://doi.org/10.48550/arXiv.1609.01775

Cite This Article

Plain Text BibTeX RIS

APA Style

So, G., Kim, H., Ho, M., Ri, J. (2025). Research on Multi-player Tracking in Soccer Videos by Combining Tracking-Prediction-Detection. Science Research, 13(4), 90-100. https://doi.org/10.11648/j.sr.20251304.15

Copy | Download

ACS Style

So, G.; Kim, H.; Ho, M.; Ri, J. Research on Multi-player Tracking in Soccer Videos by Combining Tracking-Prediction-Detection. Sci. Res. 2025, 13(4), 90-100. doi: 10.11648/j.sr.20251304.15

Copy | Download

AMA Style

So G, Kim H, Ho M, Ri J. Research on Multi-player Tracking in Soccer Videos by Combining Tracking-Prediction-Detection. Sci Res. 2025;13(4):90-100. doi: 10.11648/j.sr.20251304.15

Copy | Download

@article{10.11648/j.sr.20251304.15,
  author = {Gyong-Jun So and Hak-Song Kim and Man-Chol Ho and Jin-Song Ri},
  title = {Research on Multi-player Tracking in Soccer Videos by Combining Tracking-Prediction-Detection
},
  journal = {Science Research},
  volume = {13},
  number = {4},
  pages = {90-100},
  doi = {10.11648/j.sr.20251304.15},
  url = {https://doi.org/10.11648/j.sr.20251304.15},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.sr.20251304.15},
  abstract = {Unlike vehicles or pedestrians, which have relatively predictable motion patterns, soccer players try to confuse each other with unexpected changes in velocity or direction and look almost identical because of their same jerseys in each team, and they can be frequently occluded by others. The accuracy of Multi-player tracking, that is the key issue in the soccer video analysis, is highly dependent on the accuracy of the player detection, and some factors such as video quality, camera long-range defocus, noise, weather or environmental changes can also be the factors that make it difficult to accurately detect players. In this paper, we propose an approach that combines the tracking history, query prediction and detection information reasonably to improve the single-view player tracking performance and then integrates the multi-view analysis information to improve the multi-player tracking performance in soccer videos. First, in 2D single-view multi-object tracking, we combine the deep learning-based player detection/tracking result and their prediction information by the trajectory to decrease the bias and mistakes in the player detection and improve the multi-player tracking performance. And, to improve the robustness of object data association, we use an assignment cost calculation method that integrates L2-distance and Intersection over Union (IOU) between the tracking query region and the detected object region, and also use an identity assignment method that combines Hungarian method and greedy one. Then we integrate the multi-view analysis information to decrease the occlusions of the players dramatically and improve the overall tracking performance significantly. Experimental results on several soccer video datasets show that our approach exhibits significant improvements over the previous approaches in terms of Multi-Object Tracking (MOT) performance metrics such as Multi-Object Tracking Accuracy (MOTA), Number of Identity Switchs (IDS) and Multi-Object Tracking Precision (MOTP).},
 year = {2025}
}

Copy | Download

TY  - JOUR
T1  - Research on Multi-player Tracking in Soccer Videos by Combining Tracking-Prediction-Detection

AU  - Gyong-Jun So
AU  - Hak-Song Kim
AU  - Man-Chol Ho
AU  - Jin-Song Ri
Y1  - 2025/08/08
PY  - 2025
N1  - https://doi.org/10.11648/j.sr.20251304.15
DO  - 10.11648/j.sr.20251304.15
T2  - Science Research
JF  - Science Research
JO  - Science Research
SP  - 90
EP  - 100
PB  - Science Publishing Group
SN  - 2329-0927
UR  - https://doi.org/10.11648/j.sr.20251304.15
AB  - Unlike vehicles or pedestrians, which have relatively predictable motion patterns, soccer players try to confuse each other with unexpected changes in velocity or direction and look almost identical because of their same jerseys in each team, and they can be frequently occluded by others. The accuracy of Multi-player tracking, that is the key issue in the soccer video analysis, is highly dependent on the accuracy of the player detection, and some factors such as video quality, camera long-range defocus, noise, weather or environmental changes can also be the factors that make it difficult to accurately detect players. In this paper, we propose an approach that combines the tracking history, query prediction and detection information reasonably to improve the single-view player tracking performance and then integrates the multi-view analysis information to improve the multi-player tracking performance in soccer videos. First, in 2D single-view multi-object tracking, we combine the deep learning-based player detection/tracking result and their prediction information by the trajectory to decrease the bias and mistakes in the player detection and improve the multi-player tracking performance. And, to improve the robustness of object data association, we use an assignment cost calculation method that integrates L2-distance and Intersection over Union (IOU) between the tracking query region and the detected object region, and also use an identity assignment method that combines Hungarian method and greedy one. Then we integrate the multi-view analysis information to decrease the occlusions of the players dramatically and improve the overall tracking performance significantly. Experimental results on several soccer video datasets show that our approach exhibits significant improvements over the previous approaches in terms of Multi-Object Tracking (MOT) performance metrics such as Multi-Object Tracking Accuracy (MOTA), Number of Identity Switchs (IDS) and Multi-Object Tracking Precision (MOTP).
VL  - 13
IS  - 4
ER  -

Copy | Download

Author Information

Gyong-Jun So

AI Technology Laboratory, University of Science, Pyongyang, Democratic People’s Republic of Korea
Hak-Song Kim

AI Technology Laboratory, University of Science, Pyongyang, Democratic People’s Republic of Korea

Contact Email
Man-Chol Ho

AI Technology Laboratory, University of Science, Pyongyang, Democratic People’s Republic of Korea
Jin-Song Ri

AI Technology Laboratory, University of Science, Pyongyang, Democratic People’s Republic of Korea

Download PDF

Submit an Article

Plain Text BibTeX RIS

APA Style

So, G., Kim, H., Ho, M., Ri, J. (2025). Research on Multi-player Tracking in Soccer Videos by Combining Tracking-Prediction-Detection. Science Research, 13(4), 90-100. https://doi.org/10.11648/j.sr.20251304.15

Copy | Download

ACS Style

So, G.; Kim, H.; Ho, M.; Ri, J. Research on Multi-player Tracking in Soccer Videos by Combining Tracking-Prediction-Detection. Sci. Res. 2025, 13(4), 90-100. doi: 10.11648/j.sr.20251304.15

Copy | Download

AMA Style

So G, Kim H, Ho M, Ri J. Research on Multi-player Tracking in Soccer Videos by Combining Tracking-Prediction-Detection. Sci Res. 2025;13(4):90-100. doi: 10.11648/j.sr.20251304.15

Copy | Download

@article{10.11648/j.sr.20251304.15,
  author = {Gyong-Jun So and Hak-Song Kim and Man-Chol Ho and Jin-Song Ri},
  title = {Research on Multi-player Tracking in Soccer Videos by Combining Tracking-Prediction-Detection
},
  journal = {Science Research},
  volume = {13},
  number = {4},
  pages = {90-100},
  doi = {10.11648/j.sr.20251304.15},
  url = {https://doi.org/10.11648/j.sr.20251304.15},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.sr.20251304.15},
  abstract = {Unlike vehicles or pedestrians, which have relatively predictable motion patterns, soccer players try to confuse each other with unexpected changes in velocity or direction and look almost identical because of their same jerseys in each team, and they can be frequently occluded by others. The accuracy of Multi-player tracking, that is the key issue in the soccer video analysis, is highly dependent on the accuracy of the player detection, and some factors such as video quality, camera long-range defocus, noise, weather or environmental changes can also be the factors that make it difficult to accurately detect players. In this paper, we propose an approach that combines the tracking history, query prediction and detection information reasonably to improve the single-view player tracking performance and then integrates the multi-view analysis information to improve the multi-player tracking performance in soccer videos. First, in 2D single-view multi-object tracking, we combine the deep learning-based player detection/tracking result and their prediction information by the trajectory to decrease the bias and mistakes in the player detection and improve the multi-player tracking performance. And, to improve the robustness of object data association, we use an assignment cost calculation method that integrates L2-distance and Intersection over Union (IOU) between the tracking query region and the detected object region, and also use an identity assignment method that combines Hungarian method and greedy one. Then we integrate the multi-view analysis information to decrease the occlusions of the players dramatically and improve the overall tracking performance significantly. Experimental results on several soccer video datasets show that our approach exhibits significant improvements over the previous approaches in terms of Multi-Object Tracking (MOT) performance metrics such as Multi-Object Tracking Accuracy (MOTA), Number of Identity Switchs (IDS) and Multi-Object Tracking Precision (MOTP).},
 year = {2025}
}

Copy | Download

TY  - JOUR
T1  - Research on Multi-player Tracking in Soccer Videos by Combining Tracking-Prediction-Detection

AU  - Gyong-Jun So
AU  - Hak-Song Kim
AU  - Man-Chol Ho
AU  - Jin-Song Ri
Y1  - 2025/08/08
PY  - 2025
N1  - https://doi.org/10.11648/j.sr.20251304.15
DO  - 10.11648/j.sr.20251304.15
T2  - Science Research
JF  - Science Research
JO  - Science Research
SP  - 90
EP  - 100
PB  - Science Publishing Group
SN  - 2329-0927
UR  - https://doi.org/10.11648/j.sr.20251304.15
AB  - Unlike vehicles or pedestrians, which have relatively predictable motion patterns, soccer players try to confuse each other with unexpected changes in velocity or direction and look almost identical because of their same jerseys in each team, and they can be frequently occluded by others. The accuracy of Multi-player tracking, that is the key issue in the soccer video analysis, is highly dependent on the accuracy of the player detection, and some factors such as video quality, camera long-range defocus, noise, weather or environmental changes can also be the factors that make it difficult to accurately detect players. In this paper, we propose an approach that combines the tracking history, query prediction and detection information reasonably to improve the single-view player tracking performance and then integrates the multi-view analysis information to improve the multi-player tracking performance in soccer videos. First, in 2D single-view multi-object tracking, we combine the deep learning-based player detection/tracking result and their prediction information by the trajectory to decrease the bias and mistakes in the player detection and improve the multi-player tracking performance. And, to improve the robustness of object data association, we use an assignment cost calculation method that integrates L2-distance and Intersection over Union (IOU) between the tracking query region and the detected object region, and also use an identity assignment method that combines Hungarian method and greedy one. Then we integrate the multi-view analysis information to decrease the occlusions of the players dramatically and improve the overall tracking performance significantly. Experimental results on several soccer video datasets show that our approach exhibits significant improvements over the previous approaches in terms of Multi-Object Tracking (MOT) performance metrics such as Multi-Object Tracking Accuracy (MOTA), Number of Identity Switchs (IDS) and Multi-Object Tracking Precision (MOTP).
VL  - 13
IS  - 4
ER  -

Copy | Download