Evaluation Metrics#
The ECML 2026 challenge is the newest competition around the Flatland environment.
In this edition, we are encouraging participants to develop innovative solutions that leverage reinforcement learning. The scenario setup and the evaluation metrics are designed accordingly. However, we are still open for other solutions as well, e.g. operations research, and encourage participants to benchmark their state-of-the art algorithms
⚖ Evaluation metrics#
Normalized Episode Rewards#
The primary metrics uses the normalized return from your agents - the higher the better.
What is the normalized return?
The returns are the sum of Flatland’s default rewards your agents accumulate during each episode as described in rewards.md
To normalize these return, we scale them so that they stays in the range \([0.0, 1.0]\). To guarantee this, the maximum penalty per agent can be at most
max_episode_steps. This normalized rewards allows to compare results between environments of different dimensions and different number of agents.
In code:
normalized_reward = sum([max(cumulative_rewards[agent.handle], - self.env.max_episode_steps) for agent in agents]) / (
self.env.max_episode_steps * self.env.get_num_agents()) + 1
The episodes finish when all the trains have reached their target, or when the maximum number of time steps is reached. Therefore:
The minimum possible value (i.e. worst possible) is 0.0, which occurs if none of the agents reach their goal during the episode.
The maximum possible value (i.e. best possible) is 1.0, which would occur if all the agents would reach their targets and intermediate stops on time, i.e. not receive any penalty.
Submission Score#
The submission score is the sum of the normalized scenario rewards.
Evaluation is stopped when a submission does not reach the threshold of 25% completed agents within a level (5 scenarios).
Factors in reward function#
The factors for the reward function in this competition are:
factor |
value |
|---|---|
journey not started (cancellation factor) |
5 |
cancellation time buffer |
0 |
delay at target |
1 |
target not reached minimum penalty |
100 |
intermediate stop not served |
50 |
intermediate late arrival |
0.5 |
intermediate early departure |
0.5 |
collision |
250 |
This configuration is implemented using --rewards flatland.envs.rewards.ECML2026Rewards.
⛽ Time and Resource limits#
The agents have to act within time limits:
You are allowed up to 30 minutes per scenario.
The full evaluation must finish in 5 hours.
The agents are evaluated in a container with resource limits
4 CPU cores
15 GB of main memory.
We do not provide GPUs.
Detailed overview over resource limits#
Limit[1] |
Value |
Submission outcome |
Details |
|---|---|---|---|
|
|
Not created |
Error in frontend as error |
|
|
Failure |
submission pod should be listed by now, i.e. pulling has started by now. |
|
|
Failure |
submission pod should have reached running state by now, i.e. pulling should be done by now |
|
|
Success with termination cause |
per scenario; evaluation terminated; results do notexcl. the overlong scenario |
|
|
Success with termination cause |
all scenarios, excluding technical overhead for starting pods and running offline trajectory evaluation; results do not include the overlong scenario |
|
|
Failure/cleanup |
everything including technical overhead for starting pods for submission |
|
|
Success with termination cause |
|
|
|
Failure |
resource limits for pod running the submission |
|
|
Failure |
resource limits for pod running the submission |
|
|
Failure/cleanup |
everything including technical overhead for starting pods for orchestration and evaluation |
📪 Daily Submission Limits and Submission Closure.#
You can submit up to 2 times per day.