flatland.envs.rewards module#

class flatland.envs.rewards.Rewards[source]#

Bases: object

Reward Function:

It costs each agent a step_penalty for every time-step taken in the environment. Independent of the movement of the agent. Currently all other penalties such as penalty for stopping, starting and invalid actions are set to 0.

alpha = 0 beta = 0 Reward function parameters:

  • invalid_action_penalty = 0

  • step_penalty = -alpha

  • global_reward = beta

  • epsilon = avoid rounding errors

  • stop_penalty = 0 # penalty for stopping a moving agent

  • start_penalty = 0 # penalty for starting a stopped agent

  • intermediate_not_served_penalty = -1

  • intermediate_late_arrival_penalty_factor = 0.2

  • intermediate_early_departure_penalty_factor = 0.5

alpha = 0#
beta = 0#
cancellation_factor = 1#
cancellation_time_buffer = 0#
end_of_episode_reward(agent: EnvAgent, distance_map: DistanceMap, elapsed_steps: int) int[source]#

Handles end-of-episode reward for a particular agent.

Parameters#

agent: EnvAgent distance_map: DistanceMap elapsed_steps: int

epsilon = 0.01#
global_reward = 0#
intermediate_early_departure_penalty_factor = 0.5#
intermediate_late_arrival_penalty_factor = 0.2#
intermediate_not_served_penalty = -1#
invalid_action_penalty = 0#
start_penalty = 0#
step_penalty = 0#
step_reward(agent: EnvAgent, distance_map: DistanceMap, elapsed_steps: int)[source]#

Handles end-of-step-reward for a particular agent.

Parameters#

agent: EnvAgent distance_map: DistanceMap elapsed_steps: int

stop_penalty = 0#