Visual Perception for Common Sense Robots

May, 2023

Commercially available robots are becoming increasingly versatile regarding the contexts in which they can be deployed. While a few years ago, robots were confined to cages in which they operated in isolation, today, collaborative robots share tasks with humans in a more natural way.

Despite this increase in situations in which humans and robots work alongside each other, truly hybrid intelligence setups, in which robots and humans collaborate intuitively in dynamic environments, remain unseen. By combining computer vision algorithms, knowledge graphs and large language models we equip robots with deep contextual understanding of their visual surroundings, so they can perform actions and interact with humans intuitively.

Contextual understanding in everyday situations

In our everyday lives, we as humans continuously, and most often unconsciously, contextualize the environment in which we find ourselves.

For example, if we are the last person to enter a meeting room, we scan the room for a free spot in which we can sit. To do so, we consider a variety of information such as whether there are free chairs, whether there are indications that some of the unoccupied chairs are reserved already (e.g. as indicated by a notebook placed on the table or by a jacket hung over the back of a chair) and whether there is a reason that some of the free chairs shall remain that way (e.g. because if someone sat there, the view on the presentation would be blocked).

This type of common-sense reasoning is part of humans’ everyday interactions with the world. It is intuitive and almost mechanical and thereby enables humans to efficiently interact in highly diverse contexts.

When it comes to robots, their capabilities to perceive, understand, infer knowledge from a scene and accordingly adjust their behavior are still in their infancy. This lack of contextual understanding therefore puts severe limits on the complexity of tasks which robots can effectively master.

Enabling contextual understanding in robots

At the Interactions lab we are developing technologies which provide robots with contextual understanding and have the potential to unleash a new wave of applications which can be fulfilled by robots.

To do so, we leverage computer vision algorithms based on which robots can visually observe their environment and detect relevant objects in their surroundings. Based on knowledge-graphs, the robot can understand the relationships among these objects and infer knowledge therefrom.

How this works in practice can be illustrated with the example of ReliaBot, one of our robots which we have equipped with the ability to understand the context in which it is operating. ReliaBot is a mobile robot that roams around buildings to monitor and manage building occupancy. By visually inspecting the rooms, ReliaBot can reliably infer the occupancy of rooms not only based on whether someone is currently sitting in the room but also based on visual cues of the environment, such as its understanding that a smartphone placed on the table and a backpack on a chair indicates that the room is still in use even if the owner of the smartphone and the backpack has currently left the room (e.g. in order to grab a coffee). ReliaBot can infer this type of information at the “first glance”.

By also equipping ReliaBot with a temporal understanding of scenes, more complex use cases can be realized. For example, if 30 minutes after the first inspection, the ReliaBot passes in front of the meeting room for the second time, and the smartphone which was detected during the first inspection is still in the exact same place, it can infer that someone forgot their phone in the meeting room and that the room is not in use anymore. Based on its temporal reasoning ability, it can also “remember” the last person that used the room and interacted with the phone (e.g., by holding it), therefrom inferring who the owner of the smartphone must be. Based on this knowledge, ReliaBot could put the phone in a safe place and inform the owner about it.

While ReliaBot serves as an illustrative example of how robots can be empowered to fulfill more complex tasks, the scope of possible applications for context aware robots is vast and ranges from applications in industrial manufacturing to applications for customer care in shopping centers. We therefore expect this technology to open-up large economic potential which could currently not be realized with robots.

For more information on our work in this field, please reach out to us.