UA-Track: Uncertainty-Aware End-to-End 3D Multi-Object Tracking

nuScenes

Real-World

Abstract

3D multiple object tracking (MOT) plays a crucial role in autonomous driving perception. Recent end-to-end query-based trackers simultaneously detect and track objects, which have shown promising potential for the 3D MOT task. However, existing methods overlook the uncertainty issue, which refers to the lack of precise confidence about the state and location of tracked objects. Uncertainty arises owing to various factors during motion observation by cameras, especially occlusions and the small size of target objects, resulting in an inaccurate estimation of the object's position, label, and identity. To this end, we propose an Uncertainty-Aware 3D MOT framework, UA-Track, which tackles the uncertainty problem from multiple aspects. Specifically, we first introduce an Uncertainty-aware Probabilistic Decoder to capture the uncertainty in object prediction with probabilistic attention. Secondly, we propose an Uncertainty-guided Query Denoising strategy to further enhance the training process. We also utilize Uncertainty-reduced Query Initialization, which leverages predicted 2D object location and depth information to reduce query uncertainty. As a result, our UA-Track achieves state-of-the-art performance on the nuScenes benchmark, i.e., 66.3% AMOTA on the test split, surpassing the previous best end-to-end solution by a significant margin of 8.9% AMOTA.

Method

UA-Track

The UA-Track Framework. To model and capture the uncertainty in object prediction, we introduce an Uncertainty-aware Probabilistic Decoder (blue module). Moreover, we present an Uncertainty-guided Query Denoising strategy (green module) to enhance the model robustness and convergence against uncertainty of the training process. We also propose Uncertainty-reduced Query Initialization (yellow module) to improve the query initialization with reduced uncertainty The proposed Uncertainty-aware Probabilistic Decoder (UPD), Uncertainty-guided Query Denoising (UQD), and Uncertainty-reduced Query Initialization (UQI) are incorporated together to tackle the uncertainty issue.

UPD

Uncertainty-aware Probabilistic Decoder (UPD) architecture. The traditional cross-attention is upgraded with probabilistic attention to quantifying the uncertainty. The probabilistic attention utilizes a multi-layer perception that takes the query q and key k as input to generate the mean and standard deviation, which are used to form a Gaussian distribution. Subsequently, the attention value $\alpha$ is sampled from the constructed Gaussian distribution.

UQI

Qualitative results of our UQI. The initial queries generated by our UQI module accurately locate the regions of interest for the objects, resulting in more ac- curate tracking results.

Results

nuScenes val set

Our UA-Track outperforms all existing camera-based 3D MOT methods in all metrics.

nuScenes test set

Our UA-Track surpasses the previous best solution by a significant margin of 8.9% AMOTA.

Analysis

Uncertainty quantification results and ablations on the proposed modules of UA-Track. s and σ donate entropy and standard deviation, respectively. It is clear that incorporating each uncertainty-aware module facilitates the tackling of the uncertainty issue and leads to performance gain in tracking.

UA-Track consistently outperforms state-of-the-art tracker PF-Track under different uncertainty situations, especially under severe occlusions and small object size settings.

Qualitative Results

(a) The tracking results for an occlusion scenario of two pedestrians of consecutive frames (ti − ti+12), which are often encountered in real life. (b) The tracking results on several challenging tracking scenes with uncertainty, including the small size of vehicles and pedestrians (scene 1 and scene 2) and occlusions in spacious and crowded environments (scene 3 and scene 4). Moreover, we plot the attention scores of object queries, which indicate how strongly the model focuses on the target objects. A higher concentration of color represents a higher attention score and a stronger confidence in the corresponding object.

The tracking results on several challenging tracking scenarios with uncertainty, including the small size of the target objects and the occlusions. Moreover, we plot the attention scores of object queries, which indicate how strongly the model focuses on the target objects. A higher concentration of color represents a higher attention score and a stronger confidence in the corresponding object.

BibTeX


  @misc{zhou2024uatrack,
        title={UA-Track: Uncertainty-Aware End-to-End 3D Multi-Object Tracking}, 
        author={Lijun Zhou and Tao Tang and Pengkun Hao and Zihang He and Kalok Ho and Shuo Gu and Wenbo Hou and Zhihui Hao and Haiyang Sun and Kun Zhan and Peng Jia and Xianpeng Lang and Xiaodan Liang},
        year={2024},
        eprint={2406.02147},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
  }