- Dr. Francis Quek Professor, Center for HCI
- Yingen Xiong Graduated-PhD student'
Note: This is a completed project that is currently dormant
Meetings are gatherings of humans for the purpose of communication. As such, we argue that the understanding of human multimodal communicative behavior, and how witting or unwitting visual displays relate to such communication is critical. We address the ‘Why’, ‘What’, and ‘How’ of meeting video analysis: ‘Why’ are we doing this in the first place – is not the audio or speech transcription of the communication sufficient? ‘What’ are the units of analysis for meeting video – since humans do not use rote presentation of a set of predefined gestural semaphores, what are the entities that we can access, and what models do we need to access them? ‘How’ might one process the video to get a handle on the entities of access? To enable this research, we are assembling a planning meeting corpus that is coded at multiple levels. This corpus is ‘hypothesis driven’ in that the coding is designed to support the multimodal language theories that undergird our research.
The figure above describes our project in a nutshell. On the left (in yellow), we have the mental model of each participant. Within each subject, discourse construction is a dynamic between mental imagery and the discourse production process. This produces both the speech and the embodied behavior (imagery) within the physical meeting place (in grey). Beyond the individual, a series of inter-related processes of shared discourse construction, social interaction, hierarchy etc. The subjects in our studies will include real military officers in war-gaming (planning) activity. We select this domain because the planning activity spans multiple sessions, the mission is known, the hierarchical relationships are known, and there is an underpinning doctrine for the planning. This permits us to produce high-confidence codings of the meeting behavior. Military war-games provide real scenarios, and there is tremendous expertise in scenario construction. All meetings adhere to some social-regulating doctrine, but these are often ill-defined, and introduce another element of uncertainty concerning the meeting structure.
We capture synchronized multimedia data (10 pair-wise stereo calibrated cameras, wireless fixed-distance directional microphones, table-mounted microphones, and a system of Vicon infrared 3D motion trackers) of the proceedings in the meeting room. The subjects in the meeting are instrumented non-obtrusively with infrared-reflective markers so that we have truthed data for head orientation, shoulder orientation, torso orientation, and hand motion. This helps to bootstrap our coding of the data and to train our vision/video processing systems.
We engage video processing/analysis research on the multichannel video to extract head orientation, hand motion, and posture tracking (in green). We will perform a multi-layered coding of the data. The speech will be transcribed and meticulously time-aligned. The discourse will be coded psycholinguistically for reference chains, motion descriptions, and discourse structure. We will code the subject's gestures and interactive/instrumental gaze activity. We will engage in a military operations analysis of the meetings to provide a operations-level coding. In the complex environment of multi-participant multimodal communicative behavior, the coding itself constitute a significant research trajectory.
Our intention is to provide the research and corpora resources to study the multimodal patterning and fusion that provide an avenue to access the social interaction and discourse construction process in the participants.
To address our broad research goals, we have assembled a team of researchers:
- Francis Quek - PI (WSU) and Tom Huang (UIUC) for the vision research component. Quek also provides expertise in HCI and multimodal interaction
- Mary Harper (Purdue) for research in speech and natural language processing
- Bennett Bertenthal and David McNeill (U Chicago) for psychology and psycholinguistics research and coding.
- Ron Tuttle and Clark Groves (Air Force Institute of Technology) for research and expertise in war-gaming and opera
This project is supported by the Advanced Research and Development Activity (ARDA) Maryland Procurement Office – Video Analysis and Content Extraction (VACE) R&D Program.