06-05-2014, 12:49 PM
Eye Movement-Based Human-Computer Interaction Techniques: Toward Non-Command Interfaces
Eye Movement-Based Human.pdf (Size: 198.51 KB / Downloads: 107)
ABSTRACT
User-computer dialogues are typically one-sided, with the bandwidth from
computer to user far greater than that from user to computer. The movement
of a user’s eyes can provide a convenient, natural, and high-bandwidth source
of additional user input, to help redress this imbalance. We therefore investi-
gate the introduction of eye movements as a computer input medium. Our
emphasis is on the study of interaction techniques that incorporate eye move-
ments into the user-computer dialogue in a convenient and natural way. This
chapter describes research at NRL on developing such interaction techniques
and the broader issues raised by non-command-based interaction styles. It
discusses some of the human factors and technical considerations that arise in
trying to use eye movements as an input medium, describes our approach and
the first eye movement-based interaction techniques that we have devised and
implemented in our laboratory, reports our experiences and observations on
them, and considers eye movement-based interaction as an exemplar of a new,
more general class of non-command-based user-computer interaction.
INTRODUCTION
In searching for better interfaces between users and their computers, an additional mode
of communication between the two parties would be of great use. The problem of human-
computer interaction can be viewed as two powerful information processors (human and com-
puter) attempting to communicate with each other via a narrow-bandwidth, highly constrained
interface [25]. Faster, more natural, more convenient (and, particularly, more parallel, less
sequential) means for users and computers to exchange information are needed to increase the
useful bandwidth across that interface.
Outline
This chapter begins by discussing the non-command interaction style. Then it focuses on
eye movement-based interaction as an instance of this style. It introduces a taxonomy of the
interaction metaphors pertinent to eye movements. It describes research at NRL on developing
and studying eye movement-based interaction techniques. It discusses some of the human fac-
tors and technical considerations that arise in trying to use eye movements as an input medium,
describes our approach and the first eye movement-based interaction techniques that we have
devised and implemented in our laboratory, and reports our experiences and observations on
them. Finally, the chapter returns to the theme of new interaction styles and attempts to iden-
tify and separate out the characteristics of non-command styles and to consider the impact of
these styles on the future of user interface software.
PERSPECTIVES ON EYE MOVEMENT-BASED INTERACTION
As with other areas of user interface design, considerable leverage can be obtained by
drawing analogies that use people’s already-existing skills for operating in the natural environ-
ment and searching for ways to apply them to communicating with a computer. Direct mani-
pulation interfaces have enjoyed great success, particularly with novice users, largely because
they draw on analogies to existing human skills (pointing, grabbing, moving objects in physical
space), rather than trained behaviors; and virtual realities offer the promise of usefully exploit-
ing people’s existing physical navigation and manipulation abilities. These notions are more
difficult to extend to eye movement-based interaction, since few objects in the real world
respond to people’s eye movements. The principal exception is, of course, other people: they
detect and respond to being looked at directly and, to a lesser and much less precise degree, to
what else one may be looking at. In describing eye movement-based human-computer interac-
tion we can draw two distinctions, as shown in Figure 1: one is in the nature of the user’s eye
movements and the other, in the nature of the responses.
The Eye
The retina of the eye is not uniform. Rather, one small portion near its center contains
many densely-packed receptors and thus permits sharp vision, while the rest of the retina per-
mits only much blurrier vision. That central portion (the fovea) covers a field of view approxi-
mately one degree in diameter (the width of one word in a book held at normal reading dis-
tance or slightly less than the width of your thumb held at the end of your extended arm).
Anything outside that area is seen only with ‘‘peripheral vision,’’ with 15 to 50 percent of the
acuity of the fovea. It follows that, to see an object clearly, it is necessary to move the eye so
that the object appears on the fovea. Conversely, because peripheral vision is so poor relative
to foveal vision and the fovea so small, a person’s eye position gives a rather good indication
(to within the one-degree width of the fovea) of what specific portion of the scene before the
person is being examined.
METHODS FOR MEASURING EYE MOVEMENTS
What to Measure
For human-computer dialogues, we wish to measure visual line of gaze, rather than sim-
ply the position of the eye in space or the relative motion of the eye within the head. Visual
line of gaze is a line radiating forward in space from the eye; the user is looking at something
along that line. To illustrate the difference, suppose an eye-tracking instrument detected a
small lateral motion of the pupil. It could mean either that the user’s head moved in space
(and his or her eye is still looking at nearly the same point) or that the eye rotated with respect
to the head (causing a large change in where the eye is looking). We need to measure where
the eye is pointing in space; not all eye tracking techniques do this. We do not normally
measure how far out along the visual line of gaze the user is focusing (i.e., accommodation),
but when viewing a two-dimensional surface like a computer console, it will be easy to deduce.
Since both eyes generally point together, it is customary to track only one eye.
Instability in Eye Tracking Equipment
During operation of the eye tracker, there are often moments when the eye position is not
available–the eye tracker fails to obtain an adequate video image of the eye for one or more
frames. This could mean that the user blinked or moved his or her head outside the tracked
region; if so, such information could be passed to the user interface. However, it could also
mean simply that there was a spurious reflection in the video camera or any of a variety of
other momentary artifacts. The two cases may not be distinguishable; hence, it is not clear
how the user interface should respond to brief periods during which the eye tracker reports no
position. The user may indeed have looked away, but he or she may also think he is looking
right at some target on the screen, and the system is failing to respond.
Accuracy and Range
A user generally need not position his or her eye more accurately than the width of the
fovea (about one degree) to see an object sharply. Finer accuracy from an eye tracker might be
needed for studying the operation of the eye muscles but adds little for our purposes. The
eye’s normal jittering further limits the practical accuracy of eye tracking. It is possible to
improve accuracy by averaging over a fixation, but not in a real-time interface.
Despite the servo-controlled mirror mechanism for following the user’s head, we find that
the steadier the user holds his or her head, the better the eye tracker works. We find that we
can generally get two degrees accuracy quite easily, and sometimes can achieve one degree (or
approximately 0.4" or 40 pixels on the screen at a 24" viewing distance). The eye tracker
should thus be viewed as having a resolution much coarser than that of a mouse or most other
pointing devices, perhaps more like a traditional touch screen. An additional problem is that
the range over which the eye can be tracked with this equipment is fairly limited. In our
configuration, it cannot quite cover the surface of a 19" monitor at a 24" viewing distance.
Re-assignment of Off-target Fixations
The processing steps described thus far are open-loop in the sense that eye tracker data
are translated into recognized fixations at specific screen locations without reference to what is
displayed on the screen. The next processing step is applied to fixations that lie outside the
boundaries of the objects displayed on the screen. This step uses knowledge of what is actu-
ally on the screen, and serves further to compensate for small inaccuracies in the eye tracker
data. It allows a fixation that is near, but not directly on, an eye-selectable screen object to be
accepted. Given a list of currently displayed objects and their screen extents, the algorithm
will reposition a fixation that lies outside any object, provided it is "reasonably" close to one
object and "reasonably" further from all other such objects (i.e., not halfway between two
objects, which would lead to unstable behavior). It is important that this procedure is applied
only to fixations detected by the recognition algorithm.
USER INTERFACE MANAGEMENT SYSTEM
In order to make the eye tracker data more tractable or use as input to an interactive user
interface, we turn the output of the recognition algorithm into a stream of tokens. We report
tokens for eye events considered meaningful to the user-computer dialogue, analogous to the
way that raw input from a keyboard (shift key went down, letter a key went down, etc.) is
turned into meaningful events (one ASCII upper case A was typed). We report tokens for the
start, continuation (every 50 ms., in case the dialogue is waiting to respond to a fixation of a
certain duration), and end of each detected fixation. Each such token is tagged with the actual
fixation duration to date, so an interaction technique that expects a fixation of a particular
length will not be skewed by delays in processing by the UIMS (user interface management
system) or by the delay inherent in the fixation recognition algorithm.
Object Selection
This task is to select one object from among several displayed on the screen, for example,
one of several file icons on a desktop or, as shown in Figure 6, one of several ships on a map
in a hypothetical ‘‘command and control’’ system. With a mouse, this is usually done by
pointing at the object and then pressing a button. With the eye tracker, there is no natural
counterpart of the button press. As noted, we rejected using a blink and instead tested two
alternatives. In one, the user looks at the desired object then presses a button on a keypad to
indicate his or her choice. In Figure 6, the user has looked at ship ‘‘EF151’’ and caused it to
be selected (for attribute display, described below). The second alternative uses dwell time–if
the user continues to look at the object for a sufficiently long time, it is selected without
further operations. The two techniques are actually implemented simultaneously, where the
button press is optional and can be used to avoid waiting for the dwell time to expire, much as
an optional menu accelerator key is used to avoid traversing a menu. The idea is that the user
can trade between speed and a free hand: if the user needs speed and can push the button he
or she need not be delayed by eye tracker dwell time; if the user does not need maximum
speed, then object selection reverts to the more passive eye-only mode using dwell time.