16-05-2013, 11:53 AM
Natural Human-Robot Interaction: Audio-Visual Perception of Humans and of their communication modalities
Natural Human-Robot.pdf (Size: 777.32 KB / Downloads: 26)
3D TRACKING OF HEAD AND HAND
• Combined use of skin-color and disparity features. Benefits:
– 3D positions of head and hands
– improved tracking due to a basic 3D-model of human body
• No markers, no manual initialization, no static background modeling
• 10 frames/sec on a 2.6GHz PC (for a single Person)
JANUS-Speech Recognition Toolkit (JRTk)
– Unlimited and Open Vocabulary
– Spontaneous and Conversational Human-Human Speech
– Speaker-Independent
– High Bandwidth, Telephone, Car, Broadcast
– Languages: English, German, Spanish, French, Italian, Swedish,
Portuguese, Korean, Japanese, Serbo-Croatian, Chinese,
Shanghai, Arabic, Turkish, Russian, Tamil, Czech
– Best Performance on Public Benchmarks
• DoD, (English) DARPA Hub-5 Test ‘96, ‘97 (SWB-Task)
• Verbmobil (German) Benchmark ’95-’00 (Travel-Task)
CONCLUSION
• Interface on a Humanoid Robot Should
– Operate Naturally around Humans
– React to Explicit and Implicit Input
• The full context must be perceived and interpreted:
– Who, What, Where, Why, How ?
– Necessary technologies include: Person/Body Tracking,
Identification, Head Pose / Attention, Gesture Recognition,
Speech, Emotions, Language Understanding, Dialogue, …
• These technologies must improve with respect to
– Robustness (noise, lighting conditions etc.)
– Naturalness