Still here I guess. The first step, before getting into the details of this all, is to try to explain why things are so incredibly complex and what we were trying to accomplish.
As usual, it all comes down to flexibility. We'd long been advocates of making groupware customizable and personalizable for end users. This naturally extended itself to making the toolkit flexible enough so that developers can provide for the flexibility their end users needed. We knew that very little could be set in stone, particularly issues like session management, and how data is shared between processes.
Ted O'Grady did his Masters thesis on applying a technique called open implementations to groupware toolkits. Essentially the underlying idea is that not only is it enough for toolkits to provide an API that developers can use to build their applications, but a second API has to be provided to let developers change how that first API behaves; essentially providing a structured way to let application developers rewrite parts of the toolkit when they find that some of the assumptions made by the toolkit's developers don't fly for their particular application.
In GroupKit, this has focused (after too many iterations to count) on letting developers radically redefine the way that environments behave. As you'll see, environments are the core mechanism for doing anything inside GroupKit. They do a lot of different things. The reason they're able to do so many different things is that they are incredibly customizable.
The rest of this architecture overview consists of two parts. The first looks at how GroupKit uses environments and other mechanisms internally to provide its run-time infrastructure. From this you should learn enough to understand how GroupKit actually works, and how to make some basic changes. The second part talks about how to customize environments; if you need to make some really radical changes, you'll probably need to work through this part. Needless to say, for all of this you should be intimately familiar with the user-level facilities provided by GroupKit.
There are essentially three types of processes you'll find running in GroupKit. The first is a central process called the registrar; there's usually only one of these. It's most important job is to maintain the global registry; it also provides a number of other important services such as generating unique id's and hosting central environments.
The second type of process is the session manager, of which there are usually one per active user of the system. The main job of the session managers is starting and joining conferences.
Finally, conference processes run the actual GroupKit tools that most people spend their time using.
Every process is identified with a unique id number. The registrar is normally "0". Other processes get their id number from the "idgen" service that runs on the registrar, which may look something like "pumori_cpsc_ucalgary_ca>>>9357+3".
In the current implementation, processes are interconnected by sockets as follows. Session managers maintain a single socket connection to the registrar (i.e. client-server). Therefore, if session managers wish to communicate with each other, they normally do so via the registrar. Conferences maintain a socket to the registrar, a socket to the session manager that created them, and direct socket connections to every other conference process in their session (e.g. if three people have joined the same brainstorming session, each conference process has sockets to the two others). So conferences communicate to each other in a peer-peer fashion, and also have have connections to the central registrar and their session manager.
Specifically, GroupKit's global registry is a server environment named ::gk::globalRegistry hosted by the registrar. One of the best ways to understand all the pieces of GroupKit is to examine this environment. To do so, start the registrar and a session manager. Open up the "Debug" window in the session manager, and invoke the command "::gk::globalRegistry debug".
The registry tree is divided into a number of sections. We'll explore each of these in turn.
For example, here is the processes tree of the registry showing the registrar, one session manager, and a conference created by that session manager, all which happen to be running on the same machine:
::gk::globalRegistry processes 0 host: ratbert port: 9357 usernum: 0 group: 0 ratbert>>>9357+2 host: ratbert port: 1108 usernum: ratbert>>>9357+2 group: ratbert>>>9357+2 ratbert>>>9357+4 host: ratbert port: 1110 usernum: ratbert>>>9357+4 group: ratbert>>>9357+2
Each group has a unique id (which is what is stored in the "group" field in the processes tree). In the current implementation, the process id of the session manager is also used as the group id. Information on each group is stored under the "groups" key of the global registry, indexed by the group id.
For each group, GroupKit stores the person's name ("username"), a color associated with them ("color"), and various information that is normally associated with their business card ("title", "dept", "company", "phone", "fax", "email", "www", and "office"). Finally, under the "members" key, GroupKit stores a list of all the processes that are members of the group.
Here is an example showing the group holding the session manager and conference processes from the above example:
::gk::globalRegistry groups ratbert>>>9357+2 username: Mark Roseman color: red title: Reluctant Documenter dept: Computer Science company: University of Calgary phone: 403-220-7259 fax: 403-284-4707 email: roseman@cpsc.ucalgary.ca www: http://www.teamwave.com/~roseman/ office: Math Science 618 members: ratbert>>>9357+2: ratbert>>>9357+2 ratbert>>>9357+4: ratbert>>>9357+4
The registrar starts up a number of centralized environments which it maintains, as shown in this registry fragment:
Note that peer-peer environments associated with a particular conference are usually not stored in the global registry, but in another environment specifically for the conference. We'll come to this shortly, but it again illustrates the fact that any environment can be used as a registry.::gk::globalRegistry environments ::gk::globalRegistry type: server maintainers 0: 0 ::gk::sessionManagers type: server maintainers 0: 0
In this fragment, we see two services provided by the registrar, and a "launcher" service offered by a session manager:
::gk::globalRegistry services idgenerator 0x1 name: global processID: 0 environmenthost 0x2 name: global processID: 0 launcher ratbert>>>9357+2x2 name: launcherratbert>>>9357+2 processID: ratbert>>>9357+2
Things are essentially the same when a new process is created by an existing process, which is done via a "launcher" service. The launcher passes a number of parameters to the newly created process. For example, the group id is provided, so that the new process becomes part of an existing group, rather than creating its own. Also, the new process connects back to the process launcher service, passing its process id to signify that it was created successfully.
Notice the inconsistency two paragraphs up? The new process connects to the idgenerator service to look up its id, but to do that it would need to look up the service in the global registry. Yet before it can connect to the registry it needs its id. Oops. The way around this bootstrap problem is using the "-socket" option of the "service subscriber" command which just assumes that the requested service exists on the other end of a given socket, and doesn't bother to look it up in the registry.
This is actually handled very cleanly using the "addDependent" command of environments. Essentially, this lets you mark a particular node (or subtree) of an environment as dependent on the existence of a process, and when the process goes away, the corresponding nodes will be deleted automatically.
You'll find information about dependent nodes stored in most shared environments under the "_dependents" key. (The leading underscore is a bit of a hack in the environments code; essentially any nodes that start with it won't show up in a traversal using the "keys" command, but the data is still replicated around between instances of environments, which wouldn't be the case if stored under the "option" toplevel key).
Here is an example showing dependents, again with the registrar, one session manager, and one conference running. Items are indexed by process, then by "data" or "option", and within that just by a unique id.
::gk::globalRegistry _dependents 0 data 2: environments.::gk::globalRegistry.maintainers.0 3: environments.::gk::sessionManagers.maintainers.0 4: environments.::gk::confs.maintainers 0 5: services.idgenerator.0x1 6: services.environmenthost.0x2 11: environments.::gk::confs.ratbert>>>9357+3.maintainers.0 ratbert>>>9357+2 option 8: subnet.defaultserver.clients.ratbert>>>9357+2 data 2: groups.ratbert>>>9357+2.members.ratbert>>>9357+2 5: services.launcher.ratbert>>>9357+2x2 ratbert>>>9357+4 option 13: subnet.defaultserver.clients.ratbert>>>9357+4 data 2: groups.ratbert>>>9357+2.members.ratbert>>>9357+4
Environments have a subcommand called "execute" which specifies that the parameters to the subcommand should be evaluated as a Tcl command. The "gk::to" command uses the environment's router (described in excruciating detail below), to run this "execute" subcommand on all of the networked instances of the environment.
This means that if the environment in question uses a replicated architecture, the "gk::to" messages will be distributed in a replicated fashion. If it uses a centralized architecture, broadcasts will be sent around in a centralized fashion. The distribution of broadcasts follows the topology set up by the environment, whatever that may be.
Certain environments, maintained by the registrar process, are also available. The "gk::sessionManagers" environment is used to manage broadcasts, as described in the previous section.
The "gk::confs" environment simply keeps a list of the currently known conferences. Changes to this environment cause the session manager code to generate higher-level events describing exact changes.
When a conference is created, an environment named "gk::confs.$confid" (where $confid is the conference number) is created on the registrar. This environment contains information pertinent to the conference. It holds the current list of users of the conference for example. It also acts as a registry for any environments created to be used strictly within the conference. As an example, the "gk::awarenessModel" environment is registered with the "gk::confs.$confid" environment, rather than the "gk::globalRegistry".
There are several ways that you can customize environments, depending on exactly what you want to accomplish. These vary from adding or changing individual environment commands, changing what processes those commands are sent to, assembling new network topologies from existing components, or even building entirely new types of network configurations.
You can also redefine the behavior of the existing commands. This is similar to subclassing them. For example, you could redefine the "set" command, adding functionality but still calling the original, as follows:$env command set lock myLockCmdHandler proc myLockCmdHandler {env cmd args} { ... }
$env command rename set _originalset $env command set set myNewSetHandler proc myNewSetHandler {env cmd key value} { ... $env _originalset $key $value ... }
Each environment has a "router" built into it. The router is responsible for delivering messages to different instances of the shared environment, located among different processes. In particular, a router knows how to send messages to "all" (all instances of the shared environment), "others" (instances of the shared environment except the instance in the local process), and to a particular process, identified by its unique process id. Note that the actual mechanism by which the messages get sent is a separate issue (see the next section); in calling the router you just specify who the messages should go to, not how.
To specify that a particular message should be routed, use the "command routing" option for environments. For example, this code specifies that when invoked locally, the set, delete and execute commands should be sent to all processes containing an instance of the shared environment:
foreach i "set delete execute" { $env command routing $i all }
Routers are set up to include one or more attached subnets. Each subnet represents a particular type of network topology. Three pre-made subnets are provided in GroupKit, and others can be defined (see later section).
A "server" subnet resides on an environment in a centralized process, and accepts connections from clients. This subnet therefore has direct knowledge of all its attached clients, and knows how to route messages directly to any of them. A "client" subnet would reside in the client processes, and knows only about the server; to route messages to everyone, it just sends them to the server. Finally, a "peer" subnet is used to establish a fully-replicated topology with no centralized component, where all peers communicate directly with each other.
When setting up a router, each subnet is given a name. This is so you could for example have several subnets of the same type on one router. For example, the router could act as a bridge between two replicated networks by including two different peer subnets.
You can add new subnets using the "router addsubnet" command. This takes the name to assign to a subnet, as well as a handler which implements the subnet. The handler may also take other parameters; for example the handler for "peer" subnets can optionally take the host and port of an existing peer to connect to. Here are two examples:
Note again that the router itself knows nothing about the internals of the subnets; this is completely encapsulated in their handlers. By calling the handlers, the routers can find out enough information (e.g. is a particular process located on a given subnet?) to route messages to and between different subnets. However, by combining together individual subnet components, arbitrary network architectures can be created.$env router addsubnet defaultserver gk::EnvServerSubnet $env router addsubnet defaultpeer gk::EnvPeerSubnet $host $port
As mentioned above, to define a new subnet type you need to provide a handler. This is a standard Tcl procedure accepting several parameters. The first is the environment for which this subnet will be a part of, the second is the name given to the particular subnet (remember that several subnets may be added to the router of a single environment), and the third parameter is an operation for the handler to perform. Any additional parameters required by the operation will follow in further parameters.
The first operation to be implemented is "init", which is called when the subnet is first added. You can do anything you want to initialize the subnet here. Any parameters provided by the "router addsubnet" command are passed along here.
The "route" operation is invoked by the router to specify that the subnet should send a message. It takes the following parameters: who the message should be sent to, the message itself, and an optional parameter specifying who this message was received from. The destination of the message will be specified as either "others" (all other processes known by the subnet), "notsender" (all processes except the sender of the message, as specified in the third parameter), or a process id meaning send the message to only that single process. The "notsender" tag essentially helps prevent loops in the routing process.
The third operation is "contains" which is called by the router to determine if a particular process (identified by process id, the only parameter here) can be reached via this particular subnet. It is used by the router for efficiency purposes. The valid responses by the subnet to this query are "yes", "no", or "maybe".
Finally, the "connectfrom" operation is invoked when a subnet on a remote environment attempts to make a connection to the local subnet. You will be passed two parameters, the id of the remote process, and a socket used to connect to the remote process.
Those are all the operations a subnet needs to support. In terms of actual implementation, subnets rely heavily on the "gk::rpc" command to build their communications infrastructure and send messages around. Subnets typically store any necessary housekeeping information in the "option" tree of their environment, under the key "subnet.$subnetname".
To define an actual environment type, you need to define a procedure that implements the new type; this procedure takes one parameter, the name of the environment which is to have that type applied to it. You also need to register this procedure, so that the gk::environment command knows about the new type.
Here is a fairly simple example, which implements a server environment type. It contains a single server subnet (which clients can connect to), and all modify commands are routed to everyone. Information about the environment is also stored in the registry, which a client environment type might query to find out how to connect. Users could instantiate an instance of this type of environment using "gk::environment -server".
proc gk::EnvTypeServer {env} { $env router addsubnet defaultserver gk::EnvServerSubnet foreach i "set delete execute" { $env command routing $i all } registry addenvironment $env server } gk::EnvironmentRegisterType -server gk::EnvTypeServer