Curriculum training with progressive learning goals is used to help the agent learn specific actions, and to allow it to discover the importance of the order of operations in a multi-step operation. Note that curriculum training doesn’t teach the agent how to do a task, it just puts the agent in a situation to learn something it needs to know. Its up to the agent to learn how it all goes together.
After being trained on how to use the charger, this agent demonstrates (in the video below) the ability charge its batter when needed. To do this, the agent must:
Find a charger,
Connect to the charger via the dataport
Push the ‘Charge’ button using the dataport until the battery is charged
Disconnect from the charger dataport when complete
The AI can navigate around a cluttered room using fwd/cw/ccw movements to a series of waypoints in level0-6.csv, but it cannot intentionally navigate around large obstacles. OpenAI games CartPole_v0, MountainCar_v0, and LunarLander_v2 are implemented as world objects and the agent can navigate to and play these games, with varied success. The agent has been observed to make perfect landings in LunarLander_v2, and has scored above the threshold on the other games within a few tries (less than 4 generally), however consistency is lacking after initial successes.
The AI was given 4 waypoint goals (state = ‘GPS’) in Level 0-0, and it was able to learn how to navigate rapidly (within about 10 steps), and then navigated to each of the 4 goal locations using a simple difference planning engine, which was kind of a hack but was added to the AI for the purpose of demonstrating that the CE/CM works.
The AI was designed to play OpenAI Cartpole-v0. It was not consistent however – usually performing moderately well, on occasion performing very well. Sometimes it performed terribly. Typically though it was able to reach an average of 195 steps in a 500-run test.
Version v0.1 was a move from C# in the Unity3D environment to python, since much of the AI community uses python, it seemed a good shift. The agent was based on a very simplified rlpAI architecture with a scikit-learn MLP Classifier as the centerpiece.
The agent trained the classifier (state input, action output) with some of the results of prior episodes. In the best test, the agent reached 200 steps in less than the first 10 episodes, then hit 200 steps on every episode after that for 500 episodes (bottom right chart).
Vertical axis: Number of steps completed in the Cartpole-V0 game. Horizontal axis: Episode number.