flatland.envs.rewards module#
- class flatland.envs.rewards.BasicMultiObjectiveRewards[source]#
Bases:
DefaultRewards
,Rewards
[Tuple
[float
,float
,float
]]- Basic MORL (Multi-Objective Reinforcement Learning) Rewards: with 3 items
default score
energy efficiency: - square of (speed/max_speed).
smoothness: - square of speed differences
For illustration purposes.
- cumulate(*rewards: Tuple[float, float, float]) Tuple[float, float, float] [source]#
Cumulate multiple rewards to one.
Parameters#
rewards
Returns#
Cumulative rewards
- end_of_episode_reward(agent: EnvAgent, distance_map: DistanceMap, elapsed_steps: int) Tuple[float, float, float] [source]#
Handles end-of-episode reward for a particular agent.
Parameters#
agent: EnvAgent distance_map: DistanceMap elapsed_steps: int
- step_reward(agent: EnvAgent, agent_transition_data: AgentTransitionData, distance_map: DistanceMap, elapsed_steps: int) Tuple[float, float, float] [source]#
Handles end-of-step-reward for a particular agent.
Parameters#
agent: EnvAgent agent_transition_data: AgentTransitionData distance_map: DistanceMap elapsed_steps: int
- class flatland.envs.rewards.DefaultRewards(epsilon: float = 0.01, cancellation_factor: float = 1, cancellation_time_buffer: float = 0, intermediate_not_served_penalty: float = 1, intermediate_late_arrival_penalty_factor: float = 0.2, intermediate_early_departure_penalty_factor: float = 0.5, crash_penalty_factor: float = 0.0)[source]#
Bases:
Rewards
[float
]Reward Function.
This scoring function is designed to capture key operational metrics such as punctuality, efficiency in responding to disruptions, and safety.
Punctuality and schedule adherence are rewarded based on the difference between actual and target arrival and departure times at each stop respectively, as well as penalties for intermediate stops not served or even journeys not started.
Safety measures are implemented as penalties for collisions which are directly proportional to the train’s speed at impact, ensuring that high-speed operations are managed with extra caution.
- cumulate(*rewards: int) RewardType [source]#
Cumulate multiple rewards to one.
Parameters#
rewards
Returns#
Cumulative rewards
- end_of_episode_reward(agent: EnvAgent, distance_map: DistanceMap, elapsed_steps: int) float [source]#
Handles end-of-episode reward for a particular agent.
Parameters#
agent: EnvAgent distance_map: DistanceMap elapsed_steps: int
- step_reward(agent: EnvAgent, agent_transition_data: AgentTransitionData, distance_map: DistanceMap, elapsed_steps: int) float [source]#
Handles end-of-step-reward for a particular agent.
Parameters#
agent: EnvAgent agent_transition_data: AgentTransitionData distance_map: DistanceMap elapsed_steps: int
- class flatland.envs.rewards.PunctualityRewards[source]#
Bases:
Rewards
[Tuple
[int
,int
]]Punctuality: n_stops_on_time / n_stops An agent is deemed not punctual at a stop if it arrives to late, departs to early or does not serve the stop at all. If an agent is punctual at a stop, n_stops_on_time is increased by 1.
The implementation returns the tuple (n_stops_on_time, n_stops).
- cumulate(*rewards: Tuple[int, int]) Tuple[int, int] [source]#
Cumulate multiple rewards to one.
Parameters#
rewards
Returns#
Cumulative rewards
- end_of_episode_reward(agent: EnvAgent, distance_map: DistanceMap, elapsed_steps: int) Tuple[int, int] [source]#
Handles end-of-episode reward for a particular agent.
Parameters#
agent: EnvAgent distance_map: DistanceMap elapsed_steps: int
- step_reward(agent: EnvAgent, agent_transition_data: AgentTransitionData, distance_map: DistanceMap, elapsed_steps: int) Tuple[int, int] [source]#
Handles end-of-step-reward for a particular agent.
Parameters#
agent: EnvAgent agent_transition_data: AgentTransitionData distance_map: DistanceMap elapsed_steps: int
- class flatland.envs.rewards.Rewards[source]#
Bases:
Generic
[RewardType
]Reward Function Interface.
- cumulate(*rewards: RewardType) RewardType [source]#
Cumulate multiple rewards to one.
Parameters#
rewards
Returns#
Cumulative rewards
- end_of_episode_reward(agent: EnvAgent, distance_map: DistanceMap, elapsed_steps: int) RewardType [source]#
Handles end-of-episode reward for a particular agent.
Parameters#
agent: EnvAgent distance_map: DistanceMap elapsed_steps: int
- step_reward(agent: EnvAgent, agent_transition_data: AgentTransitionData, distance_map: DistanceMap, elapsed_steps: int) RewardType [source]#
Handles end-of-step-reward for a particular agent.
Parameters#
agent: EnvAgent agent_transition_data: AgentTransitionData distance_map: DistanceMap elapsed_steps: int