flatland.envs.rewards module#

class flatland.envs.rewards.BasicMultiObjectiveRewards[source]#

Bases: DefaultRewards, Rewards[Tuple[float, float, float]]

Basic MORL (Multi-Objective Reinforcement Learning) Rewards: with 3 items
  • default score

  • energy efficiency: - square of (speed/max_speed).

  • smoothness: - square of speed differences

For illustration purposes.

cumulate(*rewards: Tuple[float, float, float]) Tuple[float, float, float][source]#

Cumulate multiple rewards to one.

Parameters#

rewards

Returns#

Cumulative rewards

empty() Tuple[float, float, float][source]#

Return empty initial value neutral for the cumulation.

end_of_episode_reward(agent: EnvAgent, distance_map: DistanceMap, elapsed_steps: int) Tuple[float, float, float][source]#

Handles end-of-episode reward for a particular agent.

Parameters#

agent: EnvAgent distance_map: DistanceMap elapsed_steps: int

step_reward(agent: EnvAgent, agent_transition_data: AgentTransitionData, distance_map: DistanceMap, elapsed_steps: int) Tuple[float, float, float][source]#

Handles end-of-step-reward for a particular agent.

Parameters#

agent: EnvAgent agent_transition_data: AgentTransitionData distance_map: DistanceMap elapsed_steps: int

class flatland.envs.rewards.DefaultRewards(epsilon: float = 0.01, cancellation_factor: float = 1, cancellation_time_buffer: float = 0, intermediate_not_served_penalty: float = 1, intermediate_late_arrival_penalty_factor: float = 0.2, intermediate_early_departure_penalty_factor: float = 0.5, crash_penalty_factor: float = 0.0)[source]#

Bases: Rewards[float]

Reward Function.

This scoring function is designed to capture key operational metrics such as punctuality, efficiency in responding to disruptions, and safety.

Punctuality and schedule adherence are rewarded based on the difference between actual and target arrival and departure times at each stop respectively, as well as penalties for intermediate stops not served or even journeys not started.

Safety measures are implemented as penalties for collisions which are directly proportional to the train’s speed at impact, ensuring that high-speed operations are managed with extra caution.

cumulate(*rewards: int) RewardType[source]#

Cumulate multiple rewards to one.

Parameters#

rewards

Returns#

Cumulative rewards

empty() float[source]#

Return empty initial value neutral for the cumulation.

end_of_episode_reward(agent: EnvAgent, distance_map: DistanceMap, elapsed_steps: int) float[source]#

Handles end-of-episode reward for a particular agent.

Parameters#

agent: EnvAgent distance_map: DistanceMap elapsed_steps: int

step_reward(agent: EnvAgent, agent_transition_data: AgentTransitionData, distance_map: DistanceMap, elapsed_steps: int) float[source]#

Handles end-of-step-reward for a particular agent.

Parameters#

agent: EnvAgent agent_transition_data: AgentTransitionData distance_map: DistanceMap elapsed_steps: int

class flatland.envs.rewards.PunctualityRewards[source]#

Bases: Rewards[Tuple[int, int]]

Punctuality: n_stops_on_time / n_stops An agent is deemed not punctual at a stop if it arrives to late, departs to early or does not serve the stop at all. If an agent is punctual at a stop, n_stops_on_time is increased by 1.

The implementation returns the tuple (n_stops_on_time, n_stops).

cumulate(*rewards: Tuple[int, int]) Tuple[int, int][source]#

Cumulate multiple rewards to one.

Parameters#

rewards

Returns#

Cumulative rewards

empty() Tuple[int, int][source]#

Return empty initial value neutral for the cumulation.

end_of_episode_reward(agent: EnvAgent, distance_map: DistanceMap, elapsed_steps: int) Tuple[int, int][source]#

Handles end-of-episode reward for a particular agent.

Parameters#

agent: EnvAgent distance_map: DistanceMap elapsed_steps: int

step_reward(agent: EnvAgent, agent_transition_data: AgentTransitionData, distance_map: DistanceMap, elapsed_steps: int) Tuple[int, int][source]#

Handles end-of-step-reward for a particular agent.

Parameters#

agent: EnvAgent agent_transition_data: AgentTransitionData distance_map: DistanceMap elapsed_steps: int

class flatland.envs.rewards.Rewards[source]#

Bases: Generic[RewardType]

Reward Function Interface.

cumulate(*rewards: RewardType) RewardType[source]#

Cumulate multiple rewards to one.

Parameters#

rewards

Returns#

Cumulative rewards

empty() RewardType[source]#

Return empty initial value neutral for the cumulation.

end_of_episode_reward(agent: EnvAgent, distance_map: DistanceMap, elapsed_steps: int) RewardType[source]#

Handles end-of-episode reward for a particular agent.

Parameters#

agent: EnvAgent distance_map: DistanceMap elapsed_steps: int

step_reward(agent: EnvAgent, agent_transition_data: AgentTransitionData, distance_map: DistanceMap, elapsed_steps: int) RewardType[source]#

Handles end-of-step-reward for a particular agent.

Parameters#

agent: EnvAgent agent_transition_data: AgentTransitionData distance_map: DistanceMap elapsed_steps: int

flatland.envs.rewards.defaultdict_set()[source]#