Visual Object Detection

The agent now can visually detect objects and add them to its internal model of the world.  All the environmental hints pushed to the agent at startup have been removed, so it has to visually discover everything on its own.

In the video below, the agent demonstrates using its new eyes to visually navigate around obstacles and to find a way point which is hidden out of sight.  It starts knowing nothing as usual, and quickly learns it can’t go through walls.  It then searches for the hidden waypoint and learns that it’s at the end of a U-shaped hallway outside of the room.  It then goes back to the starting point and repeats, this time much more quickly since it knows about the walls now.

The visual subsystem uses image streams from the color camera and the depth camera on the agent robot to detect objects, though it doesn’t need both for basic detection.  The depth data is roughly similar to a LiDAR system.

When an object is detected, it calculates a bounding box (above) and passes that to the environmental model, which then tracks the object as a surface point-cloud.   The images below show the agent environmental model compared to the actual environment.

Object Detection vs. Recognition

Before being able to see, the agent knew that a charger, for example, was distinct from and different from a wall.  It could navigate to the charger and learn to charge.  Now that the environmental hints are gone, it doesn’t know that a charger is a charger, so it has a harder time learning to charge because it can’t visually differentiate between a wall and a charger – it detects the charger but can’t recognize it (yet).

Detecting objects is a precursor to recognizing objects.  Because the design requirement is that the agent has to learn everything on its own, it can’t use a pre-trained convolutional neural network (CNN) to recognize objects (because it would be impossible to pre-train a CNN on every possible object the agent could ever see).  That means it will have to train itself, which means it needs to be able to detect an object before it can learn to recognize it, which is planned for version 0.5.  So for now, it knows an object is there and it knows how to navigate around it, but it doesn’t know what the object is exactly.

And so it goes: as new functionality is added to replace old duct tape, the AI is required to be smarter in order to deal with its new self.  Spiral development.