flatland.envs.rewards module#
- class flatland.envs.rewards.Rewards[source]#
Bases:
object
Reward Function:
It costs each agent a step_penalty for every time-step taken in the environment. Independent of the movement of the agent. Currently all other penalties such as penalty for stopping, starting and invalid actions are set to 0.
alpha = 0 beta = 0 Reward function parameters:
invalid_action_penalty = 0
step_penalty = -alpha
global_reward = beta
epsilon = avoid rounding errors
stop_penalty = 0 # penalty for stopping a moving agent
start_penalty = 0 # penalty for starting a stopped agent
intermediate_not_served_penalty = -1
intermediate_late_arrival_penalty_factor = 0.2
intermediate_early_departure_penalty_factor = 0.5
- alpha = 0#
- beta = 0#
- cancellation_factor = 1#
- cancellation_time_buffer = 0#
- end_of_episode_reward(agent: EnvAgent, distance_map: DistanceMap, elapsed_steps: int) int [source]#
Handles end-of-episode reward for a particular agent.
Parameters#
agent: EnvAgent distance_map: DistanceMap elapsed_steps: int
- epsilon = 0.01#
- global_reward = 0#
- intermediate_early_departure_penalty_factor = 0.5#
- intermediate_late_arrival_penalty_factor = 0.2#
- intermediate_not_served_penalty = -1#
- invalid_action_penalty = 0#
- start_penalty = 0#
- step_penalty = 0#
- step_reward(agent: EnvAgent, distance_map: DistanceMap, elapsed_steps: int)[source]#
Handles end-of-step-reward for a particular agent.
Parameters#
agent: EnvAgent distance_map: DistanceMap elapsed_steps: int
- stop_penalty = 0#