Multimodal Communication Analysis
Written by Dr. Francis Quek
The majority of our work in Computer Vision and Multimedia Systems fall under the purview of Multimodal Analysis where human tracking and computational analysis supports instrumental access to understand multimodal communicative behavior. This research relates to the overall philosophy of embodied interaction in HCI in that human language and interaction is multimodal because gesture and speech are inseparable parts of a whole. In fact, it is this earlier and ongoing body of research in human multimodal communicative behavior that directed our attention to a more fundamental understanding of human embodiment.
Several ongoing and recently concluded projects reside along this research trajectory of our research. These include:
- Agent-based human tracking where we model parts of the human body are modeled using active agents. These agents forage in the video feature space to find candidates for the objects being tracked (each candidate is associated with an agent). These autonomous software agents are able to form coalitions that as a group represent a consistent set of body parts that constitute a human being tracked.
- The KABAAM project that investigates the learning of structural temporal event models that describe meaningful human behavior. We employ a genetic programming learning model using temporal reasoning about temporal events. Our approach supports joint discovery between human experts in behavioral studies, and the computational process.
- MacVisSTA, a multimedia interaction and visualization system that enables multi-video and audio channel analysis of multimodal communicative behavior.
- Vision GPU, where we employ CUDA-based GPU computation to implement our Vector Coherence Mapping (VCM) algorithm. We were able to implement a real-time version of the VCM algorithm that presents a set of significant challenges in mapping algorithm to GPU computing architecture.
- Meeting Analysis, where we investigated multimodal cues to understand real discourse among meeting participants using multimodal behavioral cues.
- Mirror Track, where we investigate the tracking of fingers hovering and touching a large display device. We use the reflective property of display screens as the view angle is at a low azimuth (exceeding the criitical angle for refraction) to enable this tracking.