The design of MOO agents: Implications from a study on multi-modal collaborative problem solving.

3. Observations & implications

3.1 Matching knowledge and media persistence

Observations. Regarding MOO dialogues or whiteboard items, we discriminate four categories of content: Task (subdivided into facts and inferences), management (who has collected what, who goes where, who is where, etc.), technical problems and meta-communication (turn taking rules, graphical codes, etc.). The distributions of these content categories differs between MOO dialogues and whiteboard notes. In dialogues, 33% of interactions concern task management, versus 9% on the whiteboard. Task management, technical problems and meta-communication utterances have a short term validity, while task level information is more persistent, especially facts[3]: the former categories represent 45% of MOO dialogues (semi-persistent) versus only 10% of whiteboard items (persistent) (Fig.1). Moreover, the average delay of acknowledgment is shorter in MOO dialogues (48 seconds) than in whiteboard acknowledgment (70 seconds).

This difference probably reflects the difference of persistence of these two media: there is no urgency to acknowledge information which will (probably) remain a long time on the whiteboard. We refer to two forms of persistence of knowledge: persistence of validity, i.e. how long some piece of information remains true and persistence of display, i.e. how long it can be viewed on a particular medium. The latter depends on the medium: MOO information is semi-persistent, it scrolls up every time a new command is performed; whiteboard information is persistent until explicitly erased.

Figure 1: Categories of content in MOO (left) and whiteboard (right) interactions

Implications. An intelligent agent should select a mode of communication adequate to the persistence of knowledge being discussed. Negotiating non-persistent information through a persistent medium leads to the display of invalid information (when its validity has expired - a well-known WWW problem). Vice-versa, negotiating persistent information through a non-persistent medium increases the memory load, since the medium does not play its role of group memory. Selecting the best mode of communication requires reasoning about the match between the persistence of knowledge being discussed and the persistence of the medium through which it is discussed. The latter can de determined once and for all. The difficulty consists in reasoning on the persistence of information validity. The issue is not the same as in a truth maintenance system: it is not only to detect when information is not relevant anymore: it is to anticipate, when information is being discussed, how long it might be valid. A simple solution would be to associate levels of persistence to knowledge categories: facts and inferences are more persistence than strategies, plans, and MOO positions.

3.2 Reasoning on sharedness.

Observations. We computed the percentage of utterances produced by agent-A which are acknowledged by Agent-B. This rate of acknowledgment for facts is 26% while it is 46% for inferences and 43% for task management utterances. The difference between facts and inferences or management issues is the probability of disagreement or misunderstanding. In this task, there is little to disagree about or misunderstand with facts. Moreover there is a significant interaction effect on the acknowledgment rate between the knowledge category and the mode of negotiation (F=6.09; df=2; p=.001): Facts are rarely acknowledged on the whiteboard (Fig.2).Our interpretation is the following. Since there is a low probability of misunderstanding or disagreeing about facts, their acknowledgment in MOO conversation (37%) basically means "ok, I read your message". Acknowledgment simply aims to inform one's partner about shared perception (level 2). Conversely, on the whiteboard, mutual visibility is the default assumption, which makes acknowledgment unnecessary.

Implications. One difficulty we encountered in previous systems was that the artificial agents were behaving too rigidly, e.g. checking agreement at every step of the solution. There are several 'social' reasons for not checking agreement, such as demonstrating leadership, and avoiding harassment. These results indicate another reason: when the medium guarantees a level of grounding sufficient for the interaction. For example, given that mutual visibility (level 2) is guaranteed by the whiteboard and that mutual visibility is enough for facts since facts are not open to misunderstanding or disagreement (levels 3 and 4), facts are rarely explicitly acknowledged on the whiteboard. Reasoning about mutual visibility is easy to implement for an artificial agent: concerning the whiteboard, it may compare windows; concerning the MOO, simple rules can determine whether two agents see the same thing, given their respective positions and the command being performed. The most difficult aspect is reasoning on the probability that an item of knowledge can be misunderstood or disagreed with.

Figure 2: Interaction effect on the acknowledgment rate between the mode of interaction and the content of interaction (ask knowledge).

3.3 Reasoning on mutual position

Observations. This is a particular case of the former point. We observed that agents rarely ask their partners where they are located in the virtual space. They also rarely use the "who" command to check position. Actually, the MOO automatically provides information about mutual position[4]. Montandon (1996) showed that, for a task which requires more information about MOO position, when the MOO provides less information, the subjects perform significantly more interactions to get this information. In other words, MOO positions are not ignored, but normally the medium more or less automatically provided enough information, hence making explicit interactions about it very rare. This is confirmed by other observations: (1) When subjects are in different rooms, they acknowledge 34% of utterances, versus 50% when they are in the same room. (2) Subjects often meet in the same room when they have long discussions (making inferences about the facts). (3) Experienced MOO users are more sensitive to space than novices: the average space sensitivity[5] is 75% for novices versus 87% for experienced users (F=4.39; df=1; p=.05) . In other words, the MOO spatial metaphor seems to really affect the behavior of subjects.

Implications. Since the spatial metaphor appears to be salient even when it is not functionally important, agents should be able to reason about spatial positions. This reasoning is also useful regarding the previous point: they need to reason about which information can be seen by the partner, and visibility in the MOO is itself bound spatially. Through this spatial metaphor, MOOs provide formal rules to reason about sharedness.

3.4 Negotiation by action

Observations. The subjects not only acknowledge MOO utterances using other MOO utterances but also with other MOO actions. The first condition for this type of acknowledgment is visibility: Hercule's utterance can be acknowledged by Sherlock's Action-X and only if Hercule sees that Action-X has been performed. In example 1, Sherlock can see that Hercule asks the question (9.6) as Sherlock previously invited him, since they are in the same room.
9BarS' ask him what he was doing las night. i am talking to mr saleve
9.4BarSask js about last night
9.6BarHask giuz about last night

Example 1: Talk/action acknowledgment (from Pair 16)

The second condition of acknowledgment by action is that the MOO commands enable the speaker to express the dialogue move that (s)he wants to make. In this experiment, only three dialogue moves could be expressed through MOO actions: simple acknowledgment, straight agreement (one agent suggests an action, the other does it) and disagreement. Therefore, the type of information being acknowledged through action is generally decision about actions, namely spatial moves, asking questions and exchanging objects. Subjects could not use MOO actions for negotiating who was suspect, because our experimental environment did not include commands conveying this type of information. We could include verbs (e.g. putting a suspect in jail) or objects (e.g. putting handcuffs) to indicate degree of suspicion. In other words, the semantics of 'talking by action' are bound by the semantics of the MOO commands created by the designer.

Implications. Negotiation structures should be represented in terms of dialogue moves (Baker, 1989), which could be turned into interactions either as dialogue utterances, or as MOO actions. In everyday life, interpreting an action is not less complex than interpreting natural language, it heavily relies on the context. However, MOO actions may be simpler to interpret because the designer can define commands with clearly different pragmatic values.

3.5 Dynamically allocate functions to tools

Observations. Different pairs allocate communicative or problem solving functions (store facts, exchange inferences, coordinate action,...) to different tools or even re-allocate a function to another tool on the fly. Let us illustrate that with an example. Our subjects carried a MOO notebook recording the answer to all questions asked to suspects. They could exchange their notebooks or merge the content. Some pairs intensively used these notebooks for sharing facts, while other prefer to systematically report facts on the whiteboard. The former pairs hence had the whiteboard available for sharing inferences, while the latter filled their whiteboard and had hence to discuss inferences through MOO dialogues. The actual [function X tool] matrix varies from one pair to another. It may also vary within a pair as the collaboration progresses, one function being for instance progressively abandoned because the detectives become familiar with another one.

Implications. From sections 4.1. and 4.2, one could infer that the whiteboard and MOO communicative functions could be directly coded in the design of artificial agents. The plasticity we observed, i.e. the pair ability to self-organize along different configurations leads us to think that functions such as "share facts", "negotiate strategy" should be dynamically allocated to a particular medium during interaction.

3.6 Deliberately maintaining the task context

Observations. Our initial hypothesis was that the whiteboard would help to disambiguate MOO dialogues, by simple deictic gestures or by drawing explanatory graphics. We observed almost no deictic gestures[6] and few explanatory graphics. Conversely, information is sometimes negotiated before being put on the whiteboard. Grounding is not achieved through the whiteboard, but appears then as a pre-condition to display information in a public space. Conversely, whiteboard/talk interactions often aim to ground the information put on the whiteboard (e.g. "why did you put a cross on...?"). The subjects do not draw elaborated graphics, probably because the experimental task was not intrinsically spatial. The main difficulty was the management of the large amount of collected information. Most whiteboards contain a collection of text notes (collected facts and inferences). The main function of the whiteboard was to support individual and group memory. It thereby also plays a regulation role: (1) during the data collection stage, each subject can see on the whiteboard what her partner has done so far; (2) during the data synthesis stage, most pairs use a graphical rule for marking discarded suspects (generally, crossing notes about these suspects), thereby indicating advances towards the solution. In short, the whiteboard is the central space of coordination, probably because it retains the context (Whittaker et al, 1995). This context is established at the task level: the whiteboard increases mutual knowledge with respect to what has been done and how to do the rest. The context is not established at the conversational level: the mutual understanding of MOO utterances does not seem to rely on whiteboard information. We even observed several cases in which two different contexts are established, i.e. that the subjects participate in parallel into two conversations, one on the whiteboard and the other in MOO dialogues.

Implications. The simplicity of observed graphics is good news for designers (otherwise we should design agents able to interpret complex graphics). However, this observation is bound to the selected task: more complex representations would certainly have been drawn for a problem in physics or in geometry. What can nevertheless be generalized is the role of the whiteboard as a tool for maintaining a shared context. This role is due to knowledge persistence on the whiteboard. Artificial MOO agents should both build in this shared context and use it when reasoning.

3.7 Maintaining multiple conversational contexts

Observations. The mechanisms of turn taking in the MOO are very different than in voice conversation. There is no constraint to wait for one's partner's answer before saying more. Moreover, one can implicitly or explicitly refer to utterances earlier than the last one, since they are still visible on the screen. Hence a MOO conversation between two people is not a simple alternation of turns. The average index of complexity[7] on 'say' and 'page' commands is 0.9 (SD = .06), which indicates an almost complete non-systematicity of turn taking! Interestingly, the average index of complexity is exactly the same if we consider the group of subjects with a high acknowledgment rate versus the group with a low acknowledgment rate (both 0.9 as well). This seems to indicate that the irregularity of turn taking does not really affect acknowledgment. Moreover, we observed simultaneous turns (example 2), interwoven turns (example 3) and even cases where the subjects talk in parallel about one thing in the MOO and another on the whiteboard. This phenomena is more important when more than two agents interact on the MOO.

87.1r3SShe couldn't have stolen the gun, could she?
87.4r3Sread giuzeppe from dn1
87.5r3HI'm just checking something.
87.7r3Hread Giuzeppe from dn2
88.2r3SNo - Mona was in the restaurant till 9.
88.2r3HNo, she left around 9:00. She couldn't have stolen the gun.
88.7r3SSo Lisa, Rolf, Claire, Giuzeppe and Jacques are still open.
88.9r3Sand Oscar
88.9r3HAnd Oscar...

Example 2: Simultaneous talk at 88.2 and 88.9 (from Pair 18)

88.5r1Hpage sherlock but what about the gun?
88.8PrivS'Hercule which motive jealousy? He would have killed hans no?
89.3PrivS'Hercule he stole it when the colonel was in the bar
90.3r1Hpage sherlock Giuzeppe wanted to avoid that one discovers that the painting was fake.

Example 3: XYXY turns (from Pair 11, translated)

Implications. To participate in parallel conversations, artificial MOO agents need to maintain distinctively different contexts of interactions. If the context was unique, the interwoven turns reported above would lead to complete misunderstanding, which was not the case, or to spend a lot of energy in disambiguating references. Let us note that the multiplicity of contexts questions situated cognition theories in which context is often perceived as a whole.

[3] If it is true at time t than Helmut is a colonel, this information will still be true at time t+1.
[4] When A pages to B, B receives a notice such as "You sense that A is looking for you in room X". Every time A arrives in or leave a room where B is located, B is informed of A's arrival/departure.
[5] The 'space sensitivity' factor was computed as the number of 'say' commands in same room plus de number of 'page' commands in different rooms, divided by the total number of messages.
[6] Probably for several reasons: (1) the emitter cannot simultaneously type the message and point on the whiteboard; (2) the receiver cannot look at the same time at the MOO window and at the whiteboard window; (3) the partner's cursor was not visible on the whiteboard.
[7] This index evaluates the regularity of turn taking. Its value would be zero if, knowing the speaker at turn n we have a probability of 1 for predicting who will speak at n+1. Its value would be 1 if knowing the speaker at turn n does not give us any information regarding who will speak at n+1.
3.1 Matching knowledge and media persistence
3.2 Reasoning on sharedness.
3.3 Reasoning on mutual position
3.4 Negotiation by action
3.5 Dynamically allocate functions to tools
3.6 Deliberately maintaining the task context
3.7 Maintaining multiple conversational contexts

The design of MOO agents: Implications from a study on multi-modal collaborative problem solving - 21 MARCH 1997

Generated with Harlequin WebMaker