
Multimodal Autonomous Mobile Robot Testbed for Human-Aware AI, Perception, and Industry 5.0 Applications
SHORT SUMMARY
This testbed is based on the PAL Robotics TIAGo mobile robot, integrating RGB-D vision on a pan-tilt head, 2D LiDAR, IMU sensors, microphones, and speakers within a Robot Operating System (ROS) ecosystem. The platform enables advanced research in multimodal perception, combining visual, spatial, and auditory data for robust environment understanding.
The system can be tested in manufacturing-like environments, including dedicated university facilities that simulate real industrial conditions, enabling realistic validation of intelligent inspection and autonomous navigation tasks. It supports deployment of deep learning models either directly on the robot (e.g., using an NVIDIA Jetson-based onboard computing unit) or remotely on a high-performance server/workstation for more computationally intensive AI workloads.
Its capabilities support DeepTech applications such as AI-driven scene interpretation, human detection and interaction, SLAM, and adaptive navigation. The pan-tilt mechanism enhances active perception, while audio interfaces enable speech-based interaction and situational awareness.
Aligned with Industry 5.0 principles, the testbed facilitates human-centric robotics, emphasizing safe collaboration, adaptability, and intelligent behavior in shared manufacturing environments, smart logistics, assistive systems, and inspection tasks.
HOSTING INSTITUTION AND PI INFO
| Name of Host Organization | University of Belgrade – Faculty of Mechanical Engineering |
| Department or Lab | Production Engineering Department – Laboratory for Industrial Robotics and AI |
| Name of Building | N/A |
| Physical Address | Kraljice Marije 16 |
| Website Links | https://cent.mas.bg.ac.rs/ |
| Institutional contact name | Aleksandar Jokic |
| Institutional contact email | ajokic@mas.bg.ac.rs |
APPLICATION CASES
| Application case: | Short description: |
| Autonomous Visual Inspection in Manufacturing Environments |
The testbed, based on the PAL Robotics TIAGo operating within a Robot Operating System (ROS) framework, is used for autonomous inspection of industrial environments. The robot navigates factory floors using SLAM while collecting multimodal data (RGB-D, LiDAR, IMU). The pan-tilt camera enables adaptive viewpoint control for detailed inspection of machinery, surfaces, and production lines. AI-based algorithms can be deployed for defect detection, anomaly identification, and condition monitoring.
This setup supports Proof-of-Concept validation of intelligent inspection systems, reducing the need for manual checks and enabling continuous monitoring. It is particularly relevant for Industry 5.0 applications where mobile robots assist human operators in quality control and predictive maintenance tasks. |
| Human-Robot Collaboration and Interactive Monitoring |
The testbed supports research in human-robot collaboration for smart manufacturing scenarios. Using onboard microphones and speakers, the robot enables basic voice interaction with operators, while its multimodal perception system detects human presence and activities. The platform can be used for experiments in operator assistance, such as guided inspection routines, real-time reporting, and environment monitoring.
The pan-tilt head enhances human-aware perception by tracking operators and focusing attention on relevant areas. This application is suitable for DeepTech deployment in human-centric manufacturing, where robots act as intelligent assistants. Additionally, the testbed is used in educational and research activities, including student projects on AI, robotics, and perception, as well as collaboration with SMEs developing inspection and monitoring solutions. |
| Vision-Language-Action (VLA) Models for Autonomous Inspection and Instruction-Based Robot Behavior |
The testbed is used to evaluate and deploy Vision-Language-Action (VLA) models on a mobile robotic platform based on the PAL Robotics TIAGo operating within a Robot Operating System (ROS) framework. The robot combines RGB-D perception from a pan-tilt head, LiDAR-based navigation, IMU sensing, and audio interfaces to enable multimodal grounding of language instructions into physical actions.
In this setup, operators can provide high-level natural language commands such as “inspect the conveyor belt for defects” or “check the status of the machine on the left,” which are interpreted by VLA models and translated into navigation, perception focus, and inspection actions. The system supports research in grounding language in real-world industrial environments, improving flexibility and usability of autonomous inspection systems. This application is particularly relevant for Industry 5.0, where intuitive human-robot interaction and adaptive autonomy are key enablers for intelligent manufacturing and maintenance workflows. |
POTENTIAL STAKEHOLDERS
Non-academic stakeholders
Industrial Partners, Startups, SMEs
Academic stakeholders
PhD students, Researchers







