rlpAI v0.4

Version 0.4 is a significant advancement from the version 0.3 agent. It combines the cumulative updates from the v0.3 branch including visual object detection, n-dimensional planning, and learning multi-step actions, in addition to some major code cleanup and execution speed improvements.

The system architecture (see below) is completely different than in v0.3, which was a simple 2D python class which contained and managed all the world objects. The agent is now a robot combined with an AI core – analogous to a real-world robot which contains a computer which runs the AI. The 3D world, created using the Unity editor, provides a rich learning environment for the agent.


  • The AI operates a robot in a 3D Unity environment
  • Ability to visually detect objects and model the environment
  • Ability to learn by interacting with the environment
  • Ability to plan a series of actions in multiple dimensions to meet an objective

System Architecture

The Unity3D environment is synchronously connected to the python aiCore ‘brain’ through the Unity ML interface. At initialization, the environment sends an initial observation from the robot to the aiCore. The aiCore processes the observation and sends a set of action commands back to the robot to execute. The environment then processes five frames (about 80ms of simulated time) where the robot executes the action commands, then sends an updated robot observation back to the aiCore. This cycle continues indefinitely.

The robot can interact with other objects in the environment which have a data port installed, such as the battery charger. This is not unlike R2D2 connecting to the Death Star to stop the trash compactor. Use of the data port was demonstrated in the Multi-step Planning post video.

Future System Improvements:

  • Synchronous operation with the environment (per time step) causes the environment to operate only as fast as the agent can process a state update – usually about 3 frames per second. It also means the agent takes zero time to think, with respect to the passage of time in the environment. This doesn’t represent how the real world works. Ideally, the environment and agent run asynchronously, and the agent has to keep up with the world.