A comprehensive review of recent advances in multimodal natural interaction techniques for Extended Reality (XR) headsets has been conducted by researchers, revealing trends in spatial computing technologies. The review analyzes how breakthroughs in artificial intelligence (AI) and large language models (LLMs) are transforming user interaction with virtual environments, offering insights for future XR experiences.
A research team led by Feng Lu systematically reviewed 104 papers published since 2022 in six top venues. Their review article was published on December 15, 2025, in Frontiers of Computer Science, co-published by Higher Education Press and Springer Nature.
Context of Spatial Computing
With the adoption of XR headsets like Microsoft HoloLens 2, Meta Quest 3, and Apple Vision Pro, spatial computing technologies are receiving increased attention. Natural human-computer interaction is central to spatial computing, enabling users to interact with virtual elements via methods such as eye tracking, hand gestures, and voice commands.
Interaction Classification and Trends
The review classifies interactions based on application scenarios, operation types, and interaction modalities. Operation types are divided into seven categories, distinguishing between active (user input) and passive (user feedback) interactions. Interaction modalities explore nine distinct types, from unimodal (e.g., gesture, gaze) to various multimodal combinations.
Statistical analysis of the reviewed literature revealed several trends:
- Hand gesture and eye gaze interactions, including their combined modalities, remain the most prevalent.
- There has been a notable increase in speech-related studies in 2024, linked to LLM advancements.
- Regarding operation types, pointing and selection remains a primary focus, though the number of studies in this area has been decreasing annually.
- Research on locomotion, viewport control, typing, and querying has increased, reflecting growing attention on user experience and LLM integration.
Identified Challenges
The researchers identified several challenges in current natural interaction techniques:
- Gesture-only interactions often require users to adapt to complex paradigms, which can increase cognitive load.
- Eye gaze interactions face the "Midas touch" problem, where users unintentionally select items they are merely looking at.
- Speech-based interactions contend with latency and recognition accuracy issues.
Future Research Directions
Based on these findings, the research team suggested potential directions for future research, including:
- Developing more accurate and reliable natural interactions through multimodal integration and error recovery mechanisms.
- Enhancing the naturalness, comfort, and immersion of XR interactions by reducing physical and cognitive load.
- Leveraging AI and LLMs to enable more sophisticated, context-aware interactions.
- Bridging interaction design and practical XR applications to encourage wider adoption.
The paper includes detailed illustrations of various interaction techniques, serving as a reference for researchers and practitioners. This review offers insights for designing natural and efficient interaction systems for XR, contributing to the advancement of spatial computing technologies.