The AI was designed to play OpenAI Cartpole-v0. It was not consistent however – usually performing moderately well, on occasion performing very well. Sometimes it performed terribly. Typically though it was able to reach an average of 195 steps in a 500-run test.
Version v0.1 was a move from C# in the Unity3D environment to python, since much of the AI community uses python, it seemed a good shift. The agent was based on a very simplified rlpAI architecture with a scikit-learn MLP Classifier as the centerpiece.
The agent trained the classifier (state input, action output) with some of the results of prior episodes. In the best test, the agent reached 200 steps in less than the first 10 episodes, then hit 200 steps on every episode after that for 500 episodes (bottom right chart).
Vertical axis: Number of steps completed in the Cartpole-V0 game.
Horizontal axis: Episode number.