The Environment

General

The environment is a simplified ‘reality simulator’ running in the Unity 3D game engine.

Features:
  • Simulates physics and object collisions
  • Provides cameras for robot vision
  • Environments can easily be swapped to focus on specific aspects of the agent
  • Behaviors and functionality can be programmed into objects
  • Unlimited world size – the world can be as big as the agent can explore
  • Level of detail can be tailored to the agent’s needs
  • The Unity machine learning agents add-on provides a Python API
Properties:
  • Partially (vs. Fully) Observable:   While this is also a function of the robot sensor limits, the world is a very large place, and the agent does not have access to all the information in the environment as a matter of course (as apposed to a chess-playing agent, which knows the entire chess board state at all times).  In fact, an environmental observation is entirely up to the agent, as the environment doesn’t care about any agents within it.
  • Stochastic (vs. Deterministic):   Many of the early test environments have been built as static (as in not-moving) single-agent environments, but they are not entirely deterministic.  The agent can’t predict the exact outcome of all actions all the time.  As a simple example, the first time the agent runs into a wall, the outcome of moving forward is completely unexpected.  The intent is to make the environment more dynamic (things moving around) and more stochastic as it evolves.
  • Sequential (vs. Episodic):   The general environment is sequential.  Once it starts, it keeps going.  There can be episodic elements, however, such as games that the agent can play through the data port in the environment are mostly episodic in nature (such as Cart-pole, Tick-Tack-Toe, etc).
  • Static (vs. Dynamic):   If the environment can change while an agent is deliberating, then the environment is said to be dynamic for the agent; otherwise, it is static.  The environment is technically static since the agent is allowed to process one full observation/action cycle per time step of the Unity physics engine.   The agent is tightly coupled for the time being in this cycle, though this will likely be decoupled in the future.  This tight coupling causes the environment frame rate to vary based on how much the agent has to think through each frame (annoying), but allows development to move forward.
  • Continuous (vs. Discrete):   States in the environment sweep through a range of continuous values and do so smoothly over time.  For example, the agent or any object can be at location (0,0,0), or (1, 1, 1), or any location between those two points, up to the floating point limit of the machine.  States observed through the data port are allowed to be discrete with a limited number of values, such as two-state indicators.
  • Single Agent (vs. Multi-agent):   The environment is capable of having any number of agents, but most scenarios so far have only included a single agent.  Having multiple agents in the environment will certainly happen in the future, including one or more humans via virtual reality.
  • 3-Dimensional (vs. 2-Dimensional):   The environment is three dimensional.  An example of a two dimensional environment would be a board game such as chess or checkers.

Software/Hardware:

  • Unity 3D game engine with the Machine Learning plugin (CPU and GPU)
  • C# for the robot, sensors, and game object controllers (CPU)

Images: