To use Pathmind in your Python-based simulation, you must implement the following methods.


Number of Agents - This can be any integer greater than 0.

def getNumberOfAgents(self):
return 1

Initialize - Defines initial simulation parameters.

def __init__(self):
# Define initial parameter values here
self.simulation = simulation

Reset - Resets the simulation.

def reset(self):

Done - Informs the reinforcement learning agent when a simulation is done. Must return a Boolean.

def isDone(self, agentId):
return self.simulation.is_done()

Observation Space - Must return a double array.

# Specify length of double array

def getObservationSpace(self):
return nativerl.Continuous(nativerl.FloatVector([-math.inf]), nativerl.FloatVector([math.inf]), nativerl.SSizeTVector([self.simulation.number_of_observations]))

# Grab observations from simulation

def getObservation(self, agentId):
return nativerl.Array(nativerl.FloatVector(self.simulation.get_observation()))

Rewards - Must return a double array.

def getReward(self, agentId):
return self.simulation.get_reward()

Action Space - Specifies total possible actions and how the simulation should execute the next action.

# Specify total number of possible actions

def getActionSpace(self, i):
return nativerl.Discrete(self.simulation.number_of_actions) if i == 0 else None

# Execute Action
def setNextAction(self, action, agentId):
self.simulation.action = action.values()[0]

Steps - Each time an action is triggered, this defines how the predicted action impacts the state of the environment.

def step(self):
return self.simulation.step()

Advanced Functionality

For some use cases, you may require more advanced reinforcement learning techniques. These are listed below, but we won't describe them in detail in this article.

  • action masking
  • continuous actions
  • tuples
  • skip conditions and agentId
  • tracking simulation metrics
Did this answer your question?