Prediction Competition¶

Starting 24.08.2020 we are hosting a Kaggle competition about predicting future movements of other traffic participants. This page serves as introduction point for it and gives additional information.

Scoring¶

When taking part in the competition, you will be asked to submit predictions for a private test set (no ground truth is available), and your solutions will be scored by Kaggle. Overall 30.000 USD as prizes are available! As traffic scenes can contain a large amount of ambiguity and uncertainty, we encourage the submission of multi-modal predictions. For scoring, we calculate the negative log-likelihood of the ground truth data given these multi-modal predictions. Let us take a closer look at this. Assume, ground truth positions of a sample trajectory are

$x_1, \ldots, x_T, y_1, \ldots, y_T$

and we predict K hypotheses, represented by means

$\bar{x}_1^k, \ldots, \bar{x}_T^k, \bar{y}_1^k, \ldots, \bar{y}_T^k$

In addition, we predict confidences c of these K hypotheses. We assume the ground truth positions to be modelled by a mixture of multi-dimensional independent Normal distributions over time, yielding the likelihood

$p(x_{1, \ldots, T}, y_{1, \ldots, T}|c^{1, \ldots, K}, \bar{x}_{1, \ldots, T}^{1, \ldots, K}, \bar{y}_{1, \ldots, T}^{1, \ldots, K})$ $= \sum_k c^k \mathcal{N}(x_{1, \ldots, T}|\bar{x}_{1, \ldots, T}^{k}, \Sigma=1) \mathcal{N}(y_{1, \ldots, T}|\bar{y}_{1, \ldots, T}^{k}, \Sigma=1)$ $= \sum_k c^k \prod_t \mathcal{N}(x_t|\bar{x}_t^k, \sigma=1) \mathcal{N}(y_t|\bar{y}_t^k, \sigma=1)$

yielding the loss

$L = - \log p(x_{1, \ldots, T}, y_{1, \ldots, T}|c^{1, \ldots, K}, \bar{x}_{1, \ldots, T}^{1, \ldots, K}, \bar{y}_{1, \ldots, T}^{1, \ldots, K})$ $= - \log \sum_k e^{\log(c^k) + \sum_t \log \mathcal{N}(x_t|\bar{x}_t^k, \sigma=1) \mathcal{N}(y_t|\bar{y}_t^k, \sigma=1)}$ $= - \log \sum_k e^{\log(c^k) -\frac{1}{2} \sum_t (\bar{x}_t^k - x_t)^2 + (\bar{y}_t^k - y_t)^2}$

You can find our implementation here, which uses error as placeholder for the exponent

$L = -\log \sum_k e^{\texttt{error}})$

and for numeral stability further applies the log-sum-exp trick: Assume, we need to calculate the logarithm of a sum of exponentials:

$LSE(x_1, \ldots, x_n) = \log(e^{x_1} + \ldots + e^{x_n})$

Then, we rewrite this by substracting the maximum value x^* from each exponent, resulting in much increased numerical stability:

$LSE(x_1, \ldots, x_n) = x^* + \log(e^{x_1 - x^{*}} + \ldots + e^{x_n - x^{*}})$

Coordinates System for the competition¶

Please refer to this doc for a full description of the different coordinate systems used in L5Kit.

The ground truth coordinates for the competition are stored as positional displacements in the world coordinate system. However, you will likely predict relative displacements for the agent of interest either in the agent coordinate system or in the image coordinate system. Before using our utils to write a CSV file for you predictions convert them into the world coordinate system using the appropriate transformation matrix available as part of the input data and subtract the centroid.

Yaw is not required/used for this competition.

Additional Metrics¶

Scoring multi-modal prediction models is a highly complex task, and while we chose the metric described above due to its elegance and support for multi-modality, we encourage participants to also employ other metrics for assessing their models. Examples of such other metrics, commonly used in literature, are Average Displacement Error (ADE) and Final Displacement Error (FDE) (see our dataset paper or SophieGAN): ADE is the average displacement error (L2 distance between prediction and ground truth averaged over all timesteps), while FDE reports the final displacement error (L2 distance between prediction and ground truth, evaluated only at the last timestep). As we consider multiple predictions, we offer implementations for both these metrics either averaging over all hypotheses or using the best hypothesis (oracle variant) - ignoring generated confidence scores in both cases.