Grounding in Multi-modal Task-Oriented Collaboration
There have been several proposals for formally modelling this kind of mutuality. For When common ground concerns simple beliefs, the most common representation of commonality is iterated belief (A believes X and A believes B believes X and A believes B believes A believes X,...), or access to a shared situation, formulated by [Lewis69] as:
Let us say that it is common knowledge in a population P that X if and only if some state of affairs A holds such that:
Clark and Brennan (1991) discuss grounding in different media. They point out that different media bring different resources and constraints on grounding as well as having different associated costs. They describe several media (including face-to-face, telephone, video-teleconference, terminal teleconference, and email) according to whether they have the following properties: copresence (can see the same things), visibility (can see each other), audibility (can hear each other), cotemporality (messages received at the same time as sent), simultaneity (can both parties send messages at the same time or do they have to take turns), sequentiality (can the turns get out of sequence), reviewability (can they review messages, after they have been first received), and reviseability (can the producer edit the message privately before sending). Also, the following costs are considered for these media: formulation costs (how easy is it to decide exactly what to say), production costs (articulating or typing the message), reception costs (listening to or reading the message, including attention and waiting time), understanding costs (interpreting the message in context), start-up costs (initiating a conversation, including summoning the other partner's attention), delay costs (making the receiver wait during formulation), asynchronous costs (not being able to tell what is being responded too), speaker change costs, fault costs, and repair costs. Since different media have different combinations of these constraints and costs, one would expect the principle of least collaborative effort to predict different styles of grounding for use in different media.
In the human-computer collaborative systems that we previously designed, communication was mainly text based for the machine agent and based on direct manipulation for the human agent. Direct manipulation includes pointing gestures which are important in grounding, especially for solving referential ambiguities [Frohlich93]. Since our research goal is to design computational agents capable of grounding, we wanted to reduce the cost of grounding by providing agents with multi-modal communication. In a communicative setting, collaborators take advantage of all the media available to help them in their task. In a face to face setting, this includes eye-gaze and gesture as well as speech, but can also include writing notes and drawing schemata. This type of interaction is becoming increasingly important, also, for computer-mediated and human-computer collaboration. As the technologies become more widely available for communication with and through computers by modes other than typing and displaying text, it becomes more important to study how these technologies can facilitate various aspects of collaboration, including grounding.
Grounding is not a monolithic processes. There are many aspects to communicating which involve grounding. Properly communicating and grounding content requires action at multiple levels of interaction. Clark [Clark94] identifies 4 different levels of conversation at which problems for maintaining common ground may arise. These are:
Grounding act | From A's viewpoint | From B's viewpoint |
---|---|---|
Monitoring | Passive/Inferential (How A reasons about B 's knowledge) | Pro-active (How B can help A to know about B) |
level 1: A infers whether B can access X | level 1: B tells A about what he can access | |
level 2: A infers whether B has noticed X | level 2: B tells (or shows) A that B perceived X | |
level 3: A infers whether B understood X | level 3: B tells A how B understands X | |
level 4: A infers whether B (dis)agrees | level 4: B tells A that B (dis)agrees about X | |
Diagnosis | Active (How A tries to know that B knows X) | Reactive (How B participates in A's grounding) |
level 1: A joins B to initiate copresence | level 1: B joins A | |
level 2: A asks B to acknowledge X | level 2: B acknowledges X | |
level 3: A asks B a question about X | level 3: B displays understanding or requests repair of X | |
level 4: A persuades B to agree about X | level 4: B (dis)agrees on X | |
Repair | How A repairs B's ignorance of X | How B repairs the fact that A ignores that B knows X |
level 1: A makes X accessible to B | level 1: B mentions or manipulates X | |
Level 2: A communicates X to B | level 2: B communicates X to A | |
level 3: A repeats / rephrases / explains X | level 3: B repeats / rephrases / explains X | |
level 4: A argues about X | level 4: B argues about X |
Grounding in Multi-modal Task-Oriented Collaboration - 3 SEP 1996 Generated with Harlequin WebMaker