03-10-2016, 03:27 PM
1457525624-eyephone.docx (Size: 110.79 KB / Downloads: 6)
ABSTRACT
As smartphones evolve researchers are studying new tech-niques to ease the human-mobile interaction. We propose EyePhone, a novel \hand-free" interfacing system capable of driving mobile applications/functions using only the user's eyes movement and actions (e.g., wink). EyePhone tracks the user's eye movement across the phone's display using the camera mounted on the front of the phone; more speci cally, machine learning algorithms are used to: i) track the eye and infer its position on the mobile phone display as a user views a particular application; and ii) detect eye blinks that em-ulate mouse clicks to activate the target application under view. We present a prototype implementation of EyePhone on a Nokia N810, which is capable of tracking the position of the eye on the display, mapping this positions to an ap-plication that is activated by a wink. At no time does the user have to physically touch the phone display.
Categories and Subject Descriptors
C.3 [Special-Purpose and Application-Based Systems]: Real-time and embedded systems
General Terms
Algorithms, Design, Experimentation, Human Factors, Mea-surement, Performance
1. INTRODUCTION
Human-Computer Interaction (HCI) researchers and phone vendors are continuously searching for new approaches to reduce the e ort users exert when accessing applications on limited form factor devices such as mobile phones. The most signi cant innovation of the past few years is the adoption of touchscreen technology introduced with the Apple iPhone [1] and recently followed by all the other major vendors, such as Nokia [2] and HTC [3]. The touchscreen has changed the way people interact with their mobile phones because it provides an intuitive way to perform actions using the move-ment of one or more ngers on the display (e.g., pinching a photo to zoom in and out, or panning to move a map).
Several recent research projects demonstrate new people-to-mobile phone interactions technologies [4, 5, 6, 7, 8, 9, 10]. For example, to infer and detect gestures made by the user, phones use the on-board accelerometer [4, 7, 8], camera [11, 5], specialized headsets [6], dedicated sensors [9] or radio features [10]. We take a di erent approach than that found in the literature and propose the EyePhone system which exploits the eye movement of the user captured using the phone's front-facing camera to trigger actions on the phone.
HCI research has made remarkable advances over the last decade [12] facilitating the interaction of people with ma-chines. We believe that human-phone interaction (HPI) ex-tends the challenges not typically found in HCI research, more specially related to the phone and how we use it. We term HPI as developing techniques aimed at advanc-ing and facilitating the interaction of people with mobile phones. HPI presents challenges that di er somewhat from traditional HCI challenges. Most HCI technology addresses the interaction between people and computers in \ideal" environments, i.e., where people sit in front of a desktop machine with specialized sensors and cameras centered on them. In contrast, mobile phones are mobile computers with which people interact on the move under varying conditions and context. Any phone's sensors, e.g., accelerometer, gy-roscope, or camera, used in a HPI technology must take into account the constraints that mobility brings into play. For example, a person walking produces a certain signature in the accelerometer readings that must be ltered out be-fore being able to use the accelerometer for gesture recog-nition (e.g., double tapping the phone to stop an incoming phone call). Similarly, if the phone's camera is adopted in a HPI application [11, 5, 9] the di erent light conditions and blurred video frames due to mobility make the use of the camera to infer events very challenging. For these reasons HCI technologies need to be extended to be applicable to HPI environments.
In order to address these goals HPI technology should be less intrusive; that is, i) it should not rely on any external devices other than the mobile phone itself; ii) it should be readily usable with minimum user dependency as possible; iii) it should be fast in the inference phase; iv) it should be lightweight in terms of computation; and v) it should pre-serve the phone user experience, e.g., it should not deplete the phone battery over normal operations.
We believe that HPI research advances will produce a leap forward in the way people use their mobile phones by im-proving people safety, e.g., HPI techniques should aim to re-duce the distraction and consequently the risk of accidents if driving for example, or facilitating the use of mobile phones for impaired people (e.g., people with disabilities).
We propose EyePhone, the rst system capable of track-ing a user's eye and mapping its current position on the display to a function/application on the phone using the phone's front-facing camera. EyePhone allows the user to activate an application by simply \blinking at the app", em-ulating a mouse click. While other interfaces could be used in a hand-free manner, such as voice recognition, we focus on exploiting the eye as a driver of the HPI. We believe EyePhone technology is an important alternative to, for ex-ample, voice activation systems based on voice recognition, since the performance of a voice recognition system tends to degrade in noisy environments.
The front camera is the only requirement in EyePhone. Most of the smartphones today are equipped with a front camera and we expect that many more will be introduced in the future (e.g., Apple iPhone 4G [1]) in support of video conferencing on the phone. The EyePhone system uses ma-chine learning techniques that after detecting the eye create a template of the open eye and use template matching for eye tracking. Correlation matching is exploited for eye wink detection [13]. We implement EyePhone on the Nokia N810 tablet and present experimental results in di erent settings. These initial results demonstrate that EyePhone is capable of driving the mobile phone. An EyePhone demo can be found at [15].
The paper is organized as follows. In Section 2, we dis-cuss the challenges encountered in the development of HPI technology. Section 3 presents the design of the EyePhone system followed by its evaluation in Section 4. The future research direction are reported in Section 5. Section 6 dis-cusses related work and Section 7 nishes with some con-cluding remarks.
2. HUMAN-PHONE INTERACTION
Human-Phone Interaction represents an extension of the eld of HCI since HPI presents new challenges that need to be addressed speci cally driven by issues of mobility, the form factor of the phone, and its resource limitations (e.g., energy and computation). More speci cally, the distinguish-ing factors of the mobile phone environment are mobility and the lack of sophisticated hardware support, i.e., specialized headsets, overhead cameras, and dedicated sensors, that are often required to realize HCI applications. In what follows, we discuss these issues.
Mobility Challenges. One of the immediate products of mobility is that a mobile phone is moved around through unpredicted context, i.e., situations and scenarios that are hard to see or predict during the design phase of a HPI ap-plication. A mobile phone is subject to uncontrolled move-ment, i.e., people interact with their mobile phones while stationary, on the move, etc. It is almost impossible to pre-dict how and where people are going to use their mobile phones. A HPI application should be able to operate reliably in any encountered condition. Consider the following examples: two HPI applications, one using the accelerometer, the other relying on the phone's camera. Imagine exploiting the accelerometer to infer some simple gestures a person can per-form with the phone in their hands, e.g., shake the phone to initiate a phone call, or tap the phone to reject a phone call [7]. What is challenging is being able to distinguish between the gesture itself and any other action the person might be performing. For example, if a person is running or if a user tosses their phone down on a sofa, a sudden shake of the phone could produce signatures that could be easily confused with a gesture. There are many examples where a classi er could be easily confused. In response, erroneous actions could be triggered on the phone. Similarly, if the phone's camera is used to infer a user action [5][9], it be-comes important to make the inference algorithm operating on the video captured by the camera robust against lighting conditions, which can vary from place to place. In addition, video frames blur due to the phone movement. Because HPI application developers cannot assume any optimal operating conditions (i.e., users operating in some idealized manner) before detecting gestures in this example, (e.g., requiring a user to stop walking or running before initiating a phone call by a shaking movement), then the e ects of mobility must be taken into account in order for the HPI application to be reliable and scalable.
Hardware Challenges. As opposed to HCI applica-tions, any HPI implementation should not rely on any ex-ternal hardware. Asking people to carry or wear additional hardware in order to use their phone [6] might reduce the penetration of the technology. Moreover, state-of-the art HCI hardware, such as glass mounted cameras, or dedicated helmets are not yet small enough to be conformably worn for long periods of time by people. Any HPI application should rely as much as possible on just the phone's on-board sen-sors.
Although modern smartphones are becoming more com-putationally capable [16], they are still limited when run-ning complex machine learning algorithms [14]. HPI solu-tions should adopt lightweight machine learning techniques to run properly and energy e ciently on mobile phones.
3. EYEPHONE DESIGN
One question we address in this paper is how useful is a cheap, ubiquitous sensor, such as the camera, in building HPI applications. We develop eye tracking and blink de-tection mechanisms based algorithms [13, 17] originally de-signed for desktop machines using USB cameras. We show the limitations of an o -the-shelf HCI technique [13] when used to realize a HPI application on a resource limited mo-bile device such as the Nokia N810. The EyePhone algorith-mic design breaks down into the following pipeline phases: 1) an eye detection phase; 2) an open eye template creation phase; 3) an eye tracking phase; 4) a blink detection phase. In what follows, we discuss each of the phases in turn.
Eye Detection. By applying a motion analysis tech-nique which operates on consecutive frames, this phase con-sists on nding the contour of the eyes. The eye pair is iden-ti ed by the left and right eye contours. While the original algorithm [17] identi es the eye pair with almost no error when running on a desktop computer with a xed camera (see the left image in Figure 1), we obtain errors when the algorithm is implemented on the phone due to the quality of the N810 camera compared to the one on the desktop and
4. EVALUATION
In this section, we discuss initial results from the evalua-tion of the EyePhone prototype. We implement EyePhone on the Nokia N810 [19]. The N810 is equipped with a 400 MHz processor and 128 MB of RAM2. The N810 operat-ing system is Maemo 4.1, a Unix based platform on which we can install both the C OpenCV (Open Source Computer Vision) library [20] and our EyePhone algorithms which are cross compiled on the Maemo scratchbox. To intercept the video frames from the camera we rely on GStreamer [21], the main multimedia framework on Maemo platforms. In what follows, we rst present results relating to average accuracy for eye tracking and blink detection for di erent lighting and user movement conditions to show the performance of Eye-Phone under di erent experimental conditions. We also re-port system measurements, such as CPU and memory usage, battery consumption and computation time when running EyePhone on the N810. All experiments are repeated ve times and average results are shown.
Daylight Exposure Analysis for a Stationary Sub-ject. The rst experiment shows the performance of Eye-Phone when the person is exposed to bright daylight, i.e., in a bright environment, and the person is stationary. The eye tracking results are shown in Figure 2. The inner white box in each picture, which is a frame taken from the front camera when the person is looking at the N810 display while holding the device in their hand, represents the eye position on the phone display. It is evident that nine di erent positions for the eye are identi ed. These nine positions of the eye can be mapped to nine di erent functions and applications as shown in Figure 4. Once the eye locks onto a position (i.e., the person is looking at one of the nine buttons on the display), a blink, acting as a mouse click, launches the ap-plication corresponding to the button. The accuracy of the eye tracking and blink detection algorithms are reported in Table 1. The results show we obtain good tracking accuracy of the user's eye. However, the blink detection algorithms accuracy oscillates between 67 and 84%. We are studying further improvements in the blink detection as part of future work.
Arti cial Light Exposure for a Stationary Subject.
In this experiment, the person is again not moving but in an arti cially lit environment (i.e., a room with very low daylight penetration from the windows). We want to verify if di erent lighting conditions impact the system's perfor-mance. The results, shown in Table 1, are comparable to the daylight scenario in a number of cases. However, the accuracy drops. Given the poorer lighting conditions, the eye tracking algorithm fails to locate the eyes with higher frequency.
Daylight Exposure for Person Walking. We carried out an experiment where a person walks outdoors in a bright environment to quantify the impact of the phone's natural movement; that is, shaking of the phone in the hand induced by the person's gait. We anticipate a drop in the accuracy of the eye tracking algorithm because of the phone movement. This is con rmed by the results shown in Table 1, column 4. Further research is required to make the eye tracking algorithm more robust when a person is using the system on the move.
Impact of Distance Between Eye and Tablet. Since in the current implementation the open eye template is cre-ated once at a xed distance, we evaluate the eye track-ing performance when the distance between the eye and the tablet is varied while using EyePhone. We carry out the measurements for the middle-center position in the display (similar results are obtained for the remaining eight posi-tions) when the person is steady and walking.
4.1 Applications
EyeMenu. An example of an EyePhone application is EyeMenu as shown in Figure 4. EyeMenu is a way to short-cut the access to some of the phone's functions. The set of applications in the menu can be customized by the user. The idea is the following: the position of a person's eye is mapped to one of the nine buttons. A button is highlighted when EyePhone detects the eye in the position mapped to the button. If a user blinks their eye, the application associ-ated with the button is lunched. Driving the mobile phone user interface with the eyes can be used as a way to facilitate the interaction with mobile phones or in support of people with disabilities.
Car Driver Safety. EyePhone could also be used to detect drivers drowsiness and distraction in cars. While car manufactures are developing technology to improve drivers safety by detecting drowsiness and distraction using dedi-cated sensors and cameras [22], EyePhone could be readily usable for the same purpose even on low-end cars by just clipping the phone on the car dashboard
5. FUTURE WORK
We are currently working on improving the creation of the open eye template and the ltering algorithm for wrong eye contours. The open eye template quality a ects the accu-racy of the eye tracking and blink detection algorithms. In
particular, variations of lighting conditions or movement of the phone in a person's hand might make the one-time tem-plate inaccurate by not matching the current conditions of the user. A template created in a bright environment might work poorly to match the eye in a darker setting. Similarly, an eye template created when the person is stationary does not match the eye when the person is walking. We observe the implications of the one-time template strategy in the re-sults presented in Section 4. It is important to modify the template generation policy in order for the system to be able to either evolve the template according to the encountered contexts if the template is generated in a context di erent from the current one, or, create new templates on the y for each of the encountered settings (e.g., bright, dark, moving, etc.). In both cases the template routine should be fast to compute and minimize the resources used.
A second important issue that we are working on is a lter-ing algorithm that minimizes false positives, (i.e., false eye contours). One way to solve this problem is by using a learn-ing approach instead of a xed thresholding policy. With a learning strategy the system could adapt the lter over time according to the context it operates in. For example, a semi-supervised learning approach could be adopted, having the system evolve by itself according to a re-calibration process every time a completely new environment is encountered. In order to be sure the lter is evolving in the right direction the user could be brought into the loop by asking if the result of the inference is correct. If so, the new lter parameters are accepted, otherwise, discarded. Clearly, proper and ef-fective user involvement policies are required, so prompting is not annoying to the user.