flatland.envs.rewards module#
- class flatland.envs.rewards.Rewards(epsilon: float = 0.01, cancellation_factor: float = 1, cancellation_time_buffer: float = 0, intermediate_not_served_penalty: float = 1, intermediate_late_arrival_penalty_factor: float = 0.2, intermediate_early_departure_penalty_factor: float = 0.5, crash_penalty_factor: float = 0.0)[source]#
Bases:
object
Reward Function.
This scoring function is designed to capture key operational metrics such as punctuality, efficiency in responding to disruptions, and safety.
Punctuality and schedule adherence are rewarded based on the difference between actual and target arrival and departure times at each stop respectively, as well as penalties for intermediate stops not served or even journeys not started.
Safety measures are implemented as penalties for collisions which are directly proportional to the train’s speed at impact, ensuring that high-speed operations are managed with extra caution.
- end_of_episode_reward(agent: EnvAgent, distance_map: DistanceMap, elapsed_steps: int) int [source]#
Handles end-of-episode reward for a particular agent.
Parameters#
agent: EnvAgent distance_map: DistanceMap elapsed_steps: int
- step_reward(agent: EnvAgent, agent_transition_data: AgentTransitionData, distance_map: DistanceMap, elapsed_steps: int)[source]#
Handles end-of-step-reward for a particular agent.
Parameters#
agent: EnvAgent agent_transition_data: AgentTransitionData distance_map: DistanceMap elapsed_steps: int