ABSTRACT

An extension to the seven-layer OSI Reference Model is proposed as a way to facilitate discussions between HCI practitioners on one hand, and application and network developers on the other. The advantages of the present framework are its extension of the OSI model upwards in a fashion consistent with the original OSI vision, and its completeness in capturing all of HCI in the top layers. The 3 HCI layers are conceived as representing three distinct aspects of HCI that can be summarized as 1) what a user wants to do in the abstract sense (i.e., the need), 2) how that need is acted upon by the human, and, 3) the artifacts that the user employs (hardware, software, etc.). This new common conceptual ground can be used to link applications to human needs as a function of network capabilities. The framework also helps in the discovery and localization of application performance problems and optimization opportunities.

INTRODUCTION

We propose a human factors extension to the seven-layer OSI Reference Model [10]. This extension is consistent with the design principles of the OSI model and offers a common conceptual language to facilitate meaningful discussions between the HCI (Human-Computer Interaction) disciplines and those responsible for network and application design. This new common conceptual ground can be used to link applications to human needs as a function of network and device capabilities and to determine whose purview an issue is. In the sections that follow we summarize the relevant properties of the OSI Reference Model and then present the human factors motivation for the extension and its properties.

THE OSI MODEL

The quarter-century-old OSI model [10] describes a layered network architecture that spans from the Physical Layer (1) of networking (connectors, wires, voltages, etc.) up to the Application Layer (7) which delivers reconstituted data to the applications (see Table 1). There are several important properties of the model. The duty of each layer is well defined and documented. Each layer's job is to facilitate the transfer of information between adjacent layers. It is possible to implement something new at any layer (or even create sub-layers) provided interlayer semantics are respected. Another useful feature of this schema is to focus attention on the appropriate layer when a problem arises. A trivial example might be packet collision on Ethernet. Although it may affect higher layers, it is not within the scope, for example, of a Layer 7 protocol to deal with this. A thorough treatment of the subtleties of the OSI model is beyond the scope of the present paper, and the reader is encouraged to see [10], if required.

OSI	7.	Application (http, ftp, nfs, pop...)
	6.	Presentation (ps, lz, iso-pp...)
	5.	Session (dns, rpc, pap...)
	4.	Transport (tcp, udp, rtp...)
	3.	Network (ip, dhcp, icmp, aep...)
	2.	Data Link (arp, ppp...)
	1.	Physical (10bt, xDSL, V.42...)

Table 1: The 7-Layer OSI Reference Model [10]

Two additional principles in the specification of the model are that separate layers perform functions that differ in their technology, and, that similar functions are placed in the same layer [10]. This model has proven to be quite successful as a design and teaching tool, as evidenced by its standardization and its longevity.

From the human factors point of view, the 7- layer OSI model is incomplete because a user does not directly interact with, nor perceive, any of these layers in getting some task done. Thus, the OSI layers are invisible to users and the terminology and types of data in these layers are meaningless to them; users do not know or care about the protocols, packets, or stacks. This observation is not meant as a disparagement of the OSI model. It handles concerns in its own domain very well. But, there is another domain that we claim sits on top of the OSI layers. This domain is concerned with the reasons and ways in which people interact with data that has been so capably forwarded by the OSI layers. These are the HCI (Human-Computer Interaction) layers shown in Table 2.

HCI	10.	Human Needs (communication, education, acquisition, security, entertainment...)
	9.	Human Performance (perception, cognition, memory, motor control, social...)
	8.	Display (keyboard, GUI/CLI, vocal, bpp, ppi, ppm...)

Table 2: The HCI extension to the OSI model.

THE HCI LAYERS

The proposal of these three layers is based in part on the requirement that they be psychologically relevant and accessible (i.e., part of direct human experience, observable, and quantifiable), and that they conform to the 13 principles followed in specification of the OSI layers [10]. These HCI layers (Display, Human Performance, and Human Needs) represent the experience that people have with the devices and services that technology offers. Each layer is examined and justified below.

HCI Layer 10: Human Needs

Layer 10 captures the essence of why a user would interact with technology: to get something done to satisfy a need. That need should be defined in a technology-independent way. Layer 10 needs (communication, acquisition of goods and knowledge, entertainment...) drive the entire value chain. Note that the needs are expressed in a generic form that is independent of the rapidly changing technologies available today (e.g., e-mail, e-commerce). This layer captures the fundamental needs that have existed for a very long time (e.g., [9]). Thus, Layer 10 is one of the slowest changing layers.

Satisfying needs is the key to designing compelling and useful applications and services. The question to ask is; "What human need am I trying to address?". If the Layer 10 need is, for example, human-human interactive communication, then to the extent that Layers 8 and below get in the way of this, the need is not satisfied. There are many possible modes of human-human interaction and one must delve more deeply into the need to understand the requirements and opportunities. Is the need to hear the other person's voice, is immediate interaction expected, is acknowledgment of receipt needed? After all, human-human communication can be accomplished by technologies that range from postal mail, e-mail, text-chat, phone calls, video conferencing and perhaps one day, 3D holographic/force-feedback virtual reality. The critical determination is whether the technology at hand addresses the need. If there is a gap between what the technology can do and what the need requires, is there a fallback, lower-tech mode that can suffice?

People rarely interact with technology for its own sake, nor do they usually know or care what goes on below Layer 8. Surely there are exceptions for highly technical users or when a technology is brand new, but if that is the only attraction of the technology, then such novelty (and perhaps the technology) will rapidly fade. Note that when technology is novel, a desire to experience the technology rather than what it can do is common (recall your first e-mail?).

An important consideration at Layer 10 is the manner in which a need is satisfied. If an application or service provides a compelling and attractive method for satisfying a need, this can lead to high levels of user satisfaction. A useful term for this satisfaction is Quality of Experience (QoE), which is often rated on a scale from "poor" or "unusable" to "excellent". Thus QoE (an HCI layer concept) is determined by the manner and extent to which the service satisfies the need. This is in contrast to Quality of Service (QoS), which is discussed below.

The importance of satisfying human needs was expressed by Shenker [8]: "By what criteria do we evaluate a particular network architecture? The Internet was designed to meet the needs of users, and so any evaluative criteria must reduce, in essence, to the following question: how happy does this architecture make the users?" (p. 1178). Thus, the 'acid test' for any technology is whether it improves QoE either directly (faster download, higher quality reproduction, etc.), or indirectly (cheaper access, fewer failures, better security, etc.).

HCI Layer 9: Human Performance

Layer 9 captures the information processing features and limitations of users. Over 100 years of research in psychophysics, cognitive psychology, learning, and memory have produced an understanding of human performance capacities and limits. Basic findings include the bandpass nature of human auditory and visual processing, fundamental capacity limits on memory and motor performance, thresholds for perception of various types of energy, constraints on attention, temporal perception, and so forth. Many of these are direct results of the properties of the sensory organs and the brain, and are thus not likely to change in the near term. Some are changing, but on fairly long timescales (e.g., there is some suggestion that acceptable delays for web-page download are decreasing as users' typical network access progresses from dialup to broadband to 10 Mb/s office LAN [11]). In addition, though we frequently think of these human performance limits in terms of the general population, there are sub-populations with different Layer 9 characteristics (e.g., anomalous colour vision, motor problems, age-related deficits). The critical point here is that although the Layer 10 needs still must be addressed, the characteristics at Layers 9 may be quite different.

Impressive design optimizations have occurred in places where knowledge of human performance characteristics (Layer 9) has been taken into consideration. For example, vertical refresh rates and colour gamuts for computer monitors are based on knowledge of human temporal and chromatic vision limits (in fact, the reason raster displays produce acceptable images is that, under normal conditions, we cannot resolve the individual triads of pixels nor the rapid temporal attack and decay of the phosphors, thus producing the illusion of an image that is continuous in time and space). Audio and video codecs take advantage of the spatial and acoustic bandpass nature of human perception. There are similar optimization opportunities for interface and network designers who understand the nature of human attention, memory, and time comprehension (see for example Akscyn's law [1] which claims, among other things, that interface response can sometimes be too fast and detrimental to performance).

HCI Layer 8: Display

Layer 8 represents that aspect of the hardware, software, and interfaces that a user experiences. Here at the lowest HCI layer a representation of the data is created out of signals that the human cannot understand directly (packets, bits, etc.) and that representation is displayed on a device of some sort (CRT, printer, force-feedback pointer, etc.) and used as input to Layer 9. Layer 8 also works in the opposite direction to translate user output (mostly motor behaviour such as keystrokes, gestures, and voice) into a form that the OSI layers can understand. A sequence from Layer 10 downward might be: Layer 10 says "I need to agree to that" which is translated in Layer 9 as "I must move a pointer and click", which is then translated by Layer 8 into electrical signals to be fed to the OSI layers as packets. The successful input and output devices at Layer 8 have been engineered with the human senses and physical limitations in mind.

It may seem odd that physical devices such as keyboards and screens, rather than just the software interfaces, are included at Layer 8. There are two reasons. First, OSI Layer 1 (Physical) is also concerned with hardware that is highly substitutable; the higher OSI layers don't care whether packets came across a serial line, coax, or through the air, just as the higher HCI layers don't care whether the data comes from a CRT, a flat panel, or a projector provided it does so without significant loss. Second, users may not differentiate between the hardware aspects and the software interface, which may in fact be the same thing in a kiosk for example. In fact, for novice users, there may be little if any differentiation between the Layer 8 device and the network.

RELATED WORK AND BENEFITS OF THE APPROACH

Related Work

The apposition of some form of human-centric layers on top of the OSI layers is not entirely new. In [2] a model of the hierarchical relationship between the user, applications, and networks was given. They used the term " subjective QoS" to reflect user satisfaction and investigated several Layer 10 needs such as perception of value, security, and confidence. Richards et al. [7] also proposed a schema with user cost and satisfaction on the topmost layer, applications in the middle, and network/hardware on the lowest level. Hints of the layered framework were also present in, for example [6] and [3] where there is a differentiation between "perceptual QoS" (QoE), application QoS (Layers 7 and 8) and Network QoS (Layers 5 and lower). HCI evaluation methods such as task analysis and GOMS (Goals, Operators, Methods, and Selection rules) [5] also have a similar flavour, although they are often focused on Layer 8 issues and fail to capture the high- level user needs we describe in Layer 10. Also, the term QoE is not new, but is found more frequently in white papers provided by companies who provide web page and server tuning products than it is in research literature.

The advantages of the present framework are its extension of the OSI model upwards in a fashion consistent with the original OSI vision, and its completeness in capturing all of HCI in the top layers. The 3 HCI layers are conceived as representing three distinct aspects of HCI that can be summarized as 1) what a user wants to do in the abstract sense (i.e., the need), 2) how that need is acted upon by the human, and, 3) the artifacts that the user employs (hardware, software, etc.).

In the spirit of the original OSI model principles, there are obviously many plausible sub-layers to each of the HCI layers (see Principles 11 and 12 in [10]). For example, one might propose partitioning Layer 9 into Sensation, Perception, Cognition, Meta-Cognition, and so forth. But these do seem to form a class whose nature is described by Human Performance (Principle 4). Likewise, Layer 8 can be divided into the physical input/output devices such as keyboards, printed pages, microphones and speakers, etc., and the higher-level I/O (input/output) device, for example, the GUI (Graphical User Interface). Again, this distinction can be made, but for the user, the device is the interface because that is what is experienced directly.

Benefit 1: QoS versus QoE

Performance issues in the OSI layers (e.g., physical, transport, etc.) are often referred to as QoS (Quality of Service) issues. But what is "QoS" and how is it used? Consider the results of a haphazard Web search of network vendor sites, technology dictionaries, and press releases for the term "QoS". The results can be grouped into four general uses:

1. QoS as a user-perceived entity:

"... is throwing more bandwidth at its problem areas and believes management and monitoring is the best way to offer users stable QoS."

"Quality of Service (QoS) is a broad term used to describe the overall experience a user or application will receive over a network."

"[QoS is the] ... collective effect of service performances which determine the degree of satisfaction of a user of the service."

2. QoS as a quantified network or application trait:

"QoS ... The performance properties of a network service, possibly including throughput, transit delay, priority. Some protocols allow packets or streams to include QoS requirements."

"This results in unpredictable QoS in a best-effort network."

"In the simplest sense, Quality of Service (QoS) means providing consistent, predictable data delivery service. In other words, satisfying customer application requirements."

3. QoS as a packet or network management mechanism:

"We are told in just about every venue that the Internet needs all sorts of quality-of-service [QoS] mechanisms to make it useful."

"DiffServ provides the IP QoS necessary to support telephony-grade networks."

"Quality of Service (QoS) refers to the classification of packets for the purpose of treating certain classes or flows of packets in a particular way compared to other packets."

4. QoS as an effect of packet or network management mechanisms:

"Quality of Service (QoS) is to the ability of a network element (e.g., an application, host or router) to have some level of assurance that its traffic and service requirements can be satisfied."

This looseness of language (having QoS simultaneously be a state, a cause, an effect, a measurement, and a subjective experience) is clearly a difficulty. We propose that in cases where QoS is being used to refer to the effects on the perceptions or opinions of the users, the term "QoE" (Quality of Experience) be used instead. QoE is thus a term relevant to Layers 8-10. The term "QoS" is best understood when it is used to refer to packet or network management practices, and this includes such OSI-level technologies as DiffServ and MPLS. Finally, some other terminology is needed for the other uses of "QoS" that refer to network traits and measurements (perhaps "QoT", Quality of Transmission). In using these terms, then, we can make statements like: "QoS mechanisms can be used to obtain a certain level of QoT that will assure a pleasing and acceptable QoE".

This discussion also points out a critical difference in the language that must be used in relating QoE to the success of QoS implementations. Those who talk about QoS discuss such things as packet drop probability and delay and their higher order moments, i.e., packet loss rates and jitter. They also discuss queuing, bandwidth, tail-drops, and buffer sizes. This is all relevant terminology in their 7-layer domain but most of these terms and concepts are invalid in any discussion of QoE (see [8]). Users experience delay, distortion, and consistency, not network queuing and packet loss.

Consider the user experience of web-browsing. What the users see is a page that loads satisfactorily or it takes too long (common estimates for a high QoE are in the 2-10 second range [11]). This delay is directly perceived but the underlying network performance is not. For example, the low layer protocols usually take care of packet loss by resending the lost packets, but this takes time so the loss is experienced as delay. In addition, the user experiences aggregate delay directly as opposed to the individual delay contributors such as serialization, transmission, server lag, etc. Thus, "[F]rom the user's viewpoint, delay is delay. Therefore, any delay due to server processing and data access from multiple sources will have to be considered along with the traditional calculations in taxing the user's patience" [11].

Benefit 2: An End-to-End Perspective

Another by-product of the OSI+HCI perspective is a clarification of the oft-used but rarely consistent term "end-to-end". In discussions with network engineers and architects, it has become painfully obvious that their idea of end-to-end frequently means "one-way from this box to that box", perhaps because they map the term onto the scope of their control or responsibility (maybe their OSI layers). Clearly, from the HCI point of view end-to-end spans the full action-to-fulfillment scope. This means that a Layer 10 need proceeds down the HCI layers, through the OSI layers, across the network to a server or other human and then up the reverse path. Therefore, we claim that the only true end-to-end perspective is from Layer 10 through the network/hardware and back again to the same Layer 10. This is based on the earlier assertion that people interact with technology as a way to satisfy Layer 10 needs and that what they experience directly is the sum total of delay (i.e., round-trip delay) and aggregate distortion.

Benefit 3: Category Shifting

With the focus placed clearly on Layer 10 as the driver for the rest of the layers, we can use the OSI+HCI model to design and de-risk applications and services; that is, identify matches and gaps between what the 3 HCI layers require, and what the 7 OSI layers can provide.

For example, one attempt to quantify the delay requirements for the 3 HCI layers proposes that there are 4 general delay categories that are meaningful from a user perspective (see Table 3) [4]. (The exact number of categories and their extents does not limit the value of the present discussion.) For bulk services such as USENET and mailing lists, the delay requirements are easily 100s of seconds or perhaps 100s of minutes because these services are unattended -- the user is not waiting expectantly for the contents. For timely services such as e-mail collection or the start-up of streaming media the requirement is on the order of 10 or so seconds. For responsive applications such as web-browsing, voice messaging, and e-commerce, delays on the order of a few seconds are tolerable. Finally, for highly interactive services (e.g., telnet, voice-calls, remote control) acceptable delays are in the low 100s of milliseconds. It is important to note that only round-trip delay is relevant in HCI terms -- the user does not know or care about OSI issues such as per-hop behaviour nor the wonders of IP routing.

Tolerable Response Times
Interactive	Responsive	Timely	Bulk
±10^-1s	±10⁰s	±10¹s	±10²s

Table 3: Four general categories of applications based on tolerable delay for acceptable QoE (after [4]).

These delay categories are point estimators for acceptable QoE and prescribe what the OSI layers must provide in each case. If the lower layers cannot meet the upper layer requirements (because they are too slow, too bandwidth constrained, etc.), then alternate ways to address Layer 10 needs may be found by shifting the service to a less-demanding category (i.e., a rightward shift in Table 3). Consider, for example, early 2.5G and 3G wireless cellular data networks (GPRS, 1XRTT). It is known that these networks will perform at or below the levels of dialup V.90 modems especially when mobile. Such networks will exhibit relatively low and variable bandwidth (10-60kb/s), long round-trip delay (100s of ms), and periods of disconnectivity due to cell reselection, radio fading, or obstruction. Therefore, common desktop office applications that are designed for networks with high bandwidth (10s or 100s of Mb/s), low round-trip delay (10s of ms or less), and constant connectivity will not provide a high QoE in a wireless environment. Users will experience long delays in downloading e-mail and web pages, failed connections, very slow uploads, and perhaps interfaces that appear to freeze while waiting for data.

From the 10-Layer model perspective, one could say that such network performance (experienced as interface performance) is likely not to satisfy Layer 10, 9, and 8 requirements. However, a Layer 10 focus would cause one to ask "what need was the user trying to address" and how can this be achieved given what we know about performance of the OSI layers in this wireless case. If the Layer 10 need was human-human communication, then the goal is to find solutions at Layers 9 and below to satisfy this need given the nature of the network. We might disqualify voice communication (interactive) due to excessive delay or packet loss and instead consider alternate methods to achieve the human-human communication. For example, a voice message rather than a voice call might suffice. As the Layer 8 and below resources become more constrained (terminal capabilities, network resources, etc.) one might consider chat, e-mail, or SMS (Short Messaging Service). In going from voice to messaging to SMS, we shifted the category and may have still satisfied the Layer 10 need (i.e., respected the semantics of the users' intent).

Thus, knowing the need and translating it into an application that can satisfy the need given human perception and network resource limitations can improve the likelihood of a higher QoE. Users may be willing to sacrifice some aspects of resolution (that is, they may tolerate distortion, low-resolution screens, low frame-rate, etc.) to gain economy or speed. In fact, relatively large quality reductions (in the colour, size, and spatial frequency domains) are well tolerated. When network characteristics will not provide a suitable transport to make a given application perform at high QoE, then the goal is to rework the application into something that will fit in the constraints of the OSI Layers, but address the Layer 10 requirements.

CONCLUSION

The new OSI+HCI model (Table 4) provides a consistent language to help bridge different disciplines and serves as an aid in deciding in which discipline a concern falls. It also makes clear that a complete end-to-end perspective involves realizing that user experience is affected by aggregate network and application performance. Finally, the new OSI+HCI reference model provides a strategy for ensuring that the applications can operate satisfactorily within network limitations and still address the Layer 10 needs.

HCI	10.	Human Needs (communication, education, acquisition, security, entertainment...)
	9.	Human Performance (perception, cognition, memory, motor control, social...)
	8.	Display (keyboard, GUI/CLI, vocal, bpp, ppi, ppm...)
OSI	7.	Application (http, ftp, nfs, pop...)
	6.	Presentation (ps, lz, iso-pp...)
	5.	Session (dns, rpc, pap...)
	4.	Transport (tcp, udp, rtp...)
	3.	Network (ip, dhcp, icmp, aep...)
	2.	Data Link (arp, ppp...)
	1.	Physical (10bt, xDSL, V.42...)

Table 4: The complete 10-layer OSI+HCI model.

REFERENCES

1. Akscyn, R., McCracken, D., and Yoder, E. KMS: A distributed hypermedia systems for managing knowledge in organizations. Comm. of the ACM, 31 (7), 1988.

2. Bouch A., and Sasse, M.A. It ain't what you charge, it's the way that you do it: A user perspective of network QoS and pricing. Proc. IFIP/IEEE International Symposium on Integrated Network Management (IM'99), May 1999.

3. Guo, X. and Pattinson, C. Quality of service requirements for multimedia communications. Paper presented at TIME and the WEB, Staffordshire University, 19 June 1997. http://www.hiraeth.com/conf/web97/papers/guo.html

4. ITU-T G.1010 (November 2001). Telecommunication Standardization Sector of ITU. "Series G: Transmission systems and media, digital systems and networks: End-user multimedia QoS categories."

5. John, B.E. Why GOMS? Interactions, 2 (4), Oct. 1995.

6. Nahrstedt, K., and Smith, J. A service kernel for multimedia endpoints. In R. Steinmetz, (Ed.), Multimedia: Advanced Teleservices and High-Speed Communication Architectures. Lecture Notes in Computer Science LNCS-868, pp. 8-22, Springer Verlag, 1994.

7. Richards, A., Rogers, G., Antoniades, M., and Witana, V. Mapping user level QoS from a single parameter. Proc. International Conference on Multimedia Networks and Services (MMNS '98), Nov. 1998. http://citeseer.nj.nec.com/richards98mapping.html

8. Shenker, S. Fundamental design issues for the future internet. IEEE Journal on Selected Areas in Communications, 13 (7), September 1995.

9. Shneiderman, B. Leonardo's Laptop: Human Needs and the New Computing Technologies. MIT Press, 2002.

10. Zimmerman, H. OSI reference model - the ISO model of architecture for open systems intercommunications. IEEE Trans. on Communications, COM-28, April 1980.

11. Zona Research. The need for speed II. Zona Market Bulletin, 5, 2001.