The Environment


The environment is a simplified ‘reality simulator’ running in the Unity 3D game engine.

  • Simulates physics and object collisions
  • Provides cameras for robot vision
  • Environments can easily be swapped to focus on specific aspects of the agent
  • Behaviors and functionality can be programmed into objects
  • Unlimited world size – the world can be as big as the agent can explore
  • Level of detail can be tailored to the agent’s needs
  • The Unity machine learning agents add-on provides a Python API
  • Partially (vs. Fully) Observable:   While this is also a function of the robot sensor limits, the world is a very large place, and the agent does not have access to all the information in the environment as a matter of course (as apposed to a chess-playing agent, which knows the entire chess board state at all times).  In fact, an environmental observation is entirely up to the agent, as the environment doesn’t care about any agents within it.
  • Stochastic (vs. Deterministic):   Many of the early test environments have been built as static (as in not-moving) single-agent environments, but they are not entirely deterministic.  The agent can’t predict the exact outcome of all actions all the time.  As a simple example, the first time the agent runs into a wall, the outcome of moving forward is completely unexpected.  The intent is to make the environment more dynamic (things moving around) and more stochastic as it evolves.
  • Sequential (vs. Episodic):   The general environment is sequential.  Once it starts, it keeps going.  There can be episodic elements, however, such as games that the agent can play through the data port in the environment are mostly episodic in nature (such as Cart-pole, Tick-Tack-Toe, etc).
  • Dynamic (vs. Static):   If the environment can change while an agent is deliberating, then the environment is said to be dynamic for the agent; otherwise, it is static.  The environment was static up through v0.4, and now is dynamic.   The environment is a server, and client agents connect up and communicate state/observation data asynchronously.
  • Continuous (vs. Discrete):   States in the environment sweep through a range of continuous values and do so smoothly over time.  For example, the agent or any object can be at location (0,0,0), or (1, 1, 1), or any location between those two points, up to the floating point limit of the machine.  States observed through the data port are allowed to be discrete with a limited number of values, such as two-state indicators.
  • Single Agent or Multi-agent:   The environment is capable of supporting multiple agents, limited by modest computer hardware, but most scenarios so far have only included a single agent.  Additionally, the environment can support a human in the environment via virtual reality.
  • 3-Dimensional (vs. 2-Dimensional):   The environment is three dimensional.  An example of a two dimensional environment would be a board game such as chess or checkers.


  • Unity 3D game engine with the Machine Learning plugin (CPU and GPU)
  • C# for the robot, sensors, and game object controllers (CPU)
  • Python for the aiCore