Overview
- A prototype from TUM’s Learning Systems and Robotics Lab combines image recognition with a language model to predict likely locations for missing objects.
- The system constructs a centimeter-accurate, continuously updated 3D map from camera-derived depth cues as it navigates indoor spaces.
- Internet-sourced relationships between household items are translated into the robot’s internal representation, guiding it to prioritize likely surfaces such as tables or window sills over sinks or stovetops.
- The robot compares new views with stored images and flags areas as highly probable when changes appear, strengthening its targeted search behavior.
- The team plans to add arms and hands so the robot can open cupboards and drawers, and the research is detailed in an IEEE Robotics and Automation Letters paper published March 3.