Develop an Artificial General Intelligence (AGI) agent which can learn to function in a complex and somewhat arbitrary environment, make rational decisions and predictions in order to solve problems and meet goals, and eventually interact with and communicate with people through virtual reality.
The agent learns from environmental observations and determines the best actions to take based on what was learned in order to satisfy whatever is motivating it at the time, interacting with objects in the local environment as needed.
The system is general, and cannot have prior knowledge of the environment or itself at startup. Training can be used to accelerate learning, but it doesn’t have to be trained. It must learn everything it needs to know to operate.
The MaCH-SR1 hybrid rocket launch vehicle was a student-driven project under development at the University of Colorado in Boulder in 2001-2002. The core team consisted of eight students, but there were also advisors and many other interested people who contributed in one way or another.
The archived website is here, which was built with late-night telnet sessions and HTML before WordPress (or others) were really a thing.
If you’ve watched any of the videos of the agent going around doing things, you’ve seen it shoot lasers randomly on occasion. You might be wondering if we really want robots all around us with frickin’ lasers on their heads. Fair enough.
So why put a set of laser cannons on an AGI agent? At least two good reasons:
From a technical perspective, It adds complexity to the experience the agent has in the environment. Firing lasers adds heat rapidly to the bot when fired, then it cools down slowly. They also drain the battery rapidly. The agent now has to figure out if any of those changing states were causual to anything it’s intending to accomplish now. It forces complexity into the scenario, which is good for learning.
From a safety perspective, it provides a concrete way for the agent to do something wrong. What’s better: build an AI, put it in a robot in the real world, and see if it shoots somebody? Or, put the AI in a simulated bot in a simulated world and see if it shoots somebody. When (not IF) it does, the simulated world is the perfect test bed for making the safety systems more robust before the AI has a chance to do something bad in the real world.
For example, what if the agent learns that to get past a certain type of obstacle, it can either go around it, or just shoot it. Then, what if it decides to try that on a human and see what happens? There are many issues surrounding AI safety that need to be dealt with, many much more subtle that this scenario. Giving it lasers is one way to approach the problem. Or in general, setting it up to be able to fail in order to see how it creatively fails so that more robust safety systems can be developed.
There’s all sorts of ways to imagine an AI running around the pool with knives. This type of approach doesn’t cover all bases, not even close. But it adds to the pile of test methods that will be needed as it gets smarter.
The agent has been restricted to solving one-dimensional problems so far, but now the planning engine has been updated to allow n-dimensional planning. One simple example of this is being able to intentionally navigate around obstacles, whereas before it could not.
There are significant changes on the whiteboard for the planning engine in the near future which will add more versatility and functionality to the agent’s reasoning ability.
Also, recently updated the environment to be a boot camp-style training course to facilitate the initial curriculum training it will need to do more complex tasks later on.
The AI can navigate around a room between way points using fwd/cw/ccw controls. OpenAI games are in place but sync issues are not debugged yet.
Curriculum training is accomplished with progressive goals designed to help the agent to learn specific actions, and to allow it to discover the importance of prerequisite state requirements, from which it builds plans to meet objectives with.
After training this agent demonstrated, upon receiving an internal motivation to charge its battery (‘low-battery’ alarm), the ability to find a charger, connect to it, charge, disconnect, and continue on.
Moved the project back into the Unity 3D environment from the 2D python environment it was using. The agent AI is written in python, the agent bot is in C#.
This video shows the agent learning how to move in the environment. It can only move forward, and turn clockwise and counter clockwise. It can’t move backwards or strafe to the sides, so it has to learn how to point in the correct direction before moving forward to get there. The agent learns quickly which actions are most likely to help it achieve its goals.
The AI can navigate around a cluttered room using fwd/cw/ccw movements to a series of waypoints in level0-6.csv, but it cannot intentionally navigate around large obstacles. OpenAI games CartPole_v0, MountainCar_v0, and LunarLander_v2 are implemented as world objects and the agent can navigate to and play these games, with varied success. The agent has been observed to make perfect landings in LunarLander_v2, and has scored above the threshold on the other games within a few tries (less than 4 generally), however consistency is lacking after initial successes.
The AI was given 4 waypoint goals (state = ‘GPS’) in Level 0-0, and it was able to learn how to navigate rapidly (within about 10 steps), and then navigated to each of the 4 goal locations using a simple difference planning engine, which was kind of a hack but was added to the AI for the purpose of demonstrating that the CE/CM works.
The AI was designed to play OpenAI Cartpole-v0. It was not consistent however – usually performing moderately well, on occasion performing very well. Sometimes it performed terribly. Typically though it was able to reach an average of 195 steps in a 500-run test.
Version v0.1 was a move from C# in the Unity3D environment to python, since much of the AI community uses python, it seemed a good shift. The agent was based on a very simplified rlpAI architecture with a scikit-learn MLP Classifier as the centerpiece.
The agent trained the classifier (state input, action output) with some of the results of prior episodes. In the best test, the agent reached 200 steps in less than the first 10 episodes, then hit 200 steps on every episode after that for 500 episodes (bottom right chart).
Vertical axis: Number of steps completed in the Cartpole-V0 game. Horizontal axis: Episode number.