KABAMM - Knowledge Aided Behavior Analysis through Model Modification
- Project Lead: Dr. Chreston Miller (Graduated PhD Student)
- Dr. Francis Quek
Note: This is a recently completed project that is currently dormant
This project's focus is supporting a domain expert in utilizing his/her knowledge and expertise in the process of analyzing and understanding human behavior patterns within multimodal data.
The project addresses how social scientists may derive patterns of human behavior within multimodal data through a hypothesis-driven model production approach. Multimodal corpora have been and are currently being created to capture the actions of humans interacting in many situations and environments. Such corpora consists of sequences of discrete and continuous event information describing human actions. These corpora create a challenging data space to search and analyze as they are characterized by non-numerical, temporal, descriptive data, e.g., Person A walks up to Person B at time T. We devised an approach that allows one to interactively search and discover temporal behavior patterns within multimodal data.
In this research, we address how human and social scientists may derive patterns of human behavior within multimodal data as these behaviors are situated in time. Such situated analysis considers the study of behavior in time as opposed to looking at behavior in the form of aggregated data divorced from how they occur in context. This multimodal analysis is concerned with searching for and discovering behavior phenomena of interest amongst multiple channels of data (e.g., gaze, speech, and gesture) where a phenomenon's value to an expert performing the analysis may not be based on frequency or statistical significance but on subjective relevance.
There are a multitude of annotated behavior data-sets or corpora (manual and automatic annotations) available as research in multimodal analysis of human behavior expands. Many of these corpora and supporting visualization tools store, organize, and display multimodal data based on the structural nature of behavior. By structure we mean discrete events that hold ordered relations in time that may vary from one occurrence to another. For example, the visualization tools MacVisSTA , ANVIL , and ELAN  display multimodal data as interval based events with some support for continuous signal data. One challenge is searching for and identifying behavior models of interest.
Therefore, we seek to address several questions:
- How can human behavior be modeled effectively within multimodal data? What data representation should be chosen for such modeling?
- How can expert knowledge be leveraged in identification of behavior within multimodal data? How does one explore such data and identify relevant behavior?
Human and social scientists approach data from a theoretical viewpoint, forming hypotheses of what behaviors they see or expect to see in the data. Analyzing a large amount of event data from a multimodal corpus, however, is a tedious endeavor, and it is desirable to apply machine learning approaches to assist in this process. Machine learning, on the other hand, often takes a tabula rasa approach that is not able to take advantage of the knowledge of human experts. Many machine learning approaches produce their own models requiring only that experts provide labeled data. Furthermore, many patterns defining human behavior (behavior models) are structural in nature requiring construction of relationships between events and component behaviors. Automated learning, on the other hand, is often framed as the learning of weights and parameters, as opposed to the structure of the pattern itself. These observations motivated us to address the problem from a different angle.
Our motivating domain is a subset of multimedia analysis known as multimodal analysis of human behavior. This domain comprise s analysis of multiple synchronized media streams (signals) each recording different aspects of human interactions such as gaze, speech, and gesture traces. Generally, each stream represents a modality of interaction (e.g., speech, gaze) and sometimes multiple streams are used to represent different aspects of a modality (e.g. separate motion traces for each hand). An example can be seen below:
However, there are a few challenging nuances of human behavior and its analysis to address. First, human behavior is variant. The idea represented by a behavior interaction, e.g., a greeting between two individuals, may be formulated in the data many different ways making modeling difficult. So, how does one identify situated instances of behavior when the way they exist in the data may be unknown? (Figure A below). The second challenge (Figure B) is every observed behavior has the potential to be relevant to an expert depending on his/her analysis goal(s). Hence, there is no concept of ``noise" in the data but rather one of relevance. For example, consider the situation where three students are working together on a math problem, then at some point a door slams nearby and draws their attention. One expert may analyze the co-construction of space based on the students' aligned gaze while another may analyze interrupting events. The door slam is "noise" to the first expert but not the second. Lastly (Figure C), a pattern's value to the expert in the analysis of human behavior may not be based on frequency or statistical significance but on subjective relevance. Our approach is designed to address these challenging nuances.
We approach the problem by looking at patterns of events within multimodal channels. A pattern defines a sequence of behaviors, also called a behavior model. Behaviors are encoded as annotated event intervals with temporal order being implicitly or explicitly defined. An example is a greeting among two individuals with the possible formulation: <A walks up to B>[within 1 second]<A shakes B's hand> and <A says, "Hello">. Here one could potentially evolve the pattern by successively adding/removing relationships with other events, and/or pruning relational connections. However, evolving this pattern without guidance is a large search space even for a small pattern.
Our solution is founded on creating a formalism of a pattern based on structure, timing, and ordered relationships. We operate on a pattern at the semi-interval level (start or end of an interval). This representation was first introduced by Freksa in  and later revisited by Moerchen and Fradkin in . Semi-intervals are also known as instant-based models (points) in multimedia authoring and synchronization [17,18]. Semi-intervals allows a flexible representation where partial or incomplete knowledge can be handled since operation is on parts of an interval and not the whole. Patterns are evolved at the semi-interval level, which we call a '1-step' change, representing the smallest change that can occur.
Our method uses real occurrences in data to help constrain the generation of alternatives and produce a convergence to a desired pattern. We engage the expert in an interactive data-driven discovery process to evolve a pattern to a desired formulation. The expert brings to the table ideas and hypotheses with which he/she creates an initial exploration pattern (seed) to start narrowing the possibilities. The challenge is to find likely, relevant extensions to the pattern that somehow recur in the data-set, and to allow the expert to steer the evolution process iteratively. The end result is a pattern representing a behavior model as it exists in the data, and yet reflecting the expert's knowledge.
An overview of our method can be seen below. Given a set of event annotations (e.g., from ELAN or MacVisSTA), create a semi-interval set organized in a database of definitions and instances. This is done offline. Then the expert provides an event sequence of interest (hypothesis) to search for. This sequence is converted into a pattern which contains implicit search criteria. The pattern is given to our Structural and Temporal Inference Search (STIS)  component which: performs structural analysis on the pattern, uses the results of the analysis to form search criteria, searches to identify occurrences based on the criteria, and returns a set of occurrences in context. The context provides potential extensions to the pattern. The expert then chooses an extension based on his/her current analysis goal, updates the pattern, and the search process begins again. This marks one full evolution step.
Related Research Areas
Embodied Interaction, Model Construction and modification, Behavior Analysis, Recognition Systems
- C. Miller, F. Quek, and L.P. Morency. Search Strategies for Pattern Identification in Multimodal Data: Three Case Studies. ICMR '14, Glasgow, UK. (AR: 39%).
- C. Miller, F. Quek, and L.P. Morency. Interactive Relevance Search and Modeling: Support for Expert-Driven Analysis of Multimodal Data. ICMI ’13, Sydney, Autralia. (AR: 38%). PDF
- C. Miller, L.P. Morency and F. Quek. Structural and Temporal Inference Search (STIS): Pattern Identification in Multimodal Data. ICMI 2012 (35.8% ). PDF
- C. Miller and F. Quek. Interactive Data-Driven Discovery of Temporal Behavior Models From Events In Media Streams. ACM Multimedia, Oct. 29 - Nov. 2, 2012 (20.2%). PDF
- C. Miller and F. Quek. Toward Multimodal Situated Analysis. ICMI 2011. (47/120, 39%) PDF
- C. Miller. The relationship between mental workload and interface design for petri net environments. In T. Smith-Jackson and T. Coalson, editors, ISE 5604: Human Information Processing Scholar Series 2009-4. TR# VT-ISE-ACE2009-4, pages 55–60. 2011.
- C. Miller, F. Quek, and N. Ramakrishnan. Structuring ordered nominal data for event sequence discovery. In MM ’10: Proceedings of the eighteenth ACM international conference on Multimedia. ACM, 2010. (29/85, 34.12%) PDF
- Eiben, A.E. and J.E. Smith, Introduction to Evolutionary Computing. 2003: SpringerVerlag.
- J. F. Allen. Maintaining knowledge about temporal intervals. Commun. ACM, 26(11):832–843, 1983.
- P. Cohen. Fluent learning: Elucidating the structure of episodes. Advances in Intel ligent Data Analysis, pages 268–277, 2001.
- G. Guimarães and A. Ultsch. A method for temporal knowledge conversion. Advances in Intelligent Data Analysis, pages 369–380, 1999.
- P. Kam and A. Fu. Discovering temporal patterns for interval-based events. Data Warehousing and Knowledge Discovery, pages 317–326, 2000.
- C. H. Mooney and J. F. Roddick. Mining relationships between interacting episodes. In SDM’04, 2004.
- F. Mörchen. Algorithms for time series knowledge mining. In KDD ’06, pages 668–673, New York, NY, USA, 2006. ACM.
- E. Schwalb and L. Vila. Temporal constraints: A survey. Constraints, 3(2/3):129–149, 1998.
- A. Ultsch. Uniﬁcation-based temporal grammar. Technical report, Philips-University Marburg, Germany, 2004.
- C. Freksa. Temporal reasoning based on semi-intervals. Artiﬁcial Intelligence, 54(1-2):199 – 227, 1992.
- F. Mörchen and D. Fradkin. Robust mining of timeintervals with semi-interval partial order patterns. In SIAM Conference on Data Mining (SDM), 2010.
- C. Miller, F. Quek, and N. Ramakrishnan. Structuring ordered nominal data for event sequence discovery. In MM ’10: Proceedings of the eighteenth ACM international conference on Multimedia. ACM, 2010.
- P. F. Brown, P. V. deSouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai. Class-based n-gram models of natural language. Comput. Linguist., 18(4):467–479, 1992.
- R. T. Rose, F. Quek, and Y. Shi. Macvissta: a system for multimodal analysis. In ICMI ’04, pages 259–264.
- M. Kipp. Anvil - a generic annotation tool for multimodal dialogue. In Eurospeech, 2001.
- H. Brugman and A. Russel. Annotating multi-media / multimodal resources with elan. In In proceedings of LREC, pages 2065–2068, 2004.
- M. C. Buchanan and P. Zellweger. Scheduling multimedia documents using temporal constraints. In NOSSDAV, pages 237–249, 1993.
- M. C. Buchanan and P. T. Zellweger. Automatic temporal layout mechanisms. In ACM Multimedia, pages 341–350, 1993.
- C. Miller, L.P. Morency and F. Quek. Structural and Temporal Inference Search (STIS): Pattern Identification in Multimodal Data. ICMI 2012.
This research has been supported by the National Science Foundation grants: “EAGER: Multimodal Corpus for Vision-Based Meeting Analysis,”, 1 August 2010 – July 21, 2013, IIS-1053039 and “Formal Models, Algorithms, And Visualizations,” 15 Sep 2009 – 31 Aug 2012, IIS-0937133.