16-03-2011, 04:52 PM
1.doc (Size: 1.73 MB / Downloads: 260)
INTRODUCTION
Devices with significant computational power and capabilities can now be easily carried on our bodies. However, their small size typically leads to limited interaction space (e.g., diminutive screens, buttons, and jog wheels) and consequently diminishes their usability and functionality. Since we cannot simply make buttons and screens larger without losing the primary benefit of small size, we consider alternative approaches that enhance interactions with small mobile systems. One option is to opportunistically appropriate surface area from the environment for interactive purposes. For example, [10] describes a technique that allows a small mobile device to turn tables on which it rests into a gestural finger input canvas. However, tables are not always present, and in a mobile context, users are unlikely to want to carry appropriated surfaces with them (at this point, one might as well just have a larger device). However, there is one surface that has been previous overlooked as an input canvas, and one that happens to always travel with us: our skin.
Appropriating the human body as an input device is appealing not only because we have roughly two square meters of external surface area, but also because much of it is easily accessible by our hands (e.g., arms, upper legs, torso). Furthermore, proprioception – our sense of how our body is configured in three-dimensional space – allows us to accurately interact with our bodies in an eyes-free manner. For example, we can readily flick each of our fingers, touch the tip of our nose, and clap our hands together without visual assistance. Few external input devices can claim this accurate, eyes-free input characteristic and provide such a large interaction area.
In this paper, we present our work on Skinput – a method that allows the body to be appropriated for finger input using a novel, non-invasive, wearable bio-acoustic sensor.
The contributions of this paper are:
1) We describe the design of a novel, wearable sensor for bio-acoustic signal acquisition (Figure
2) We describe an analysis approach that enables our system to resolve the location of finger taps on the body.
3) We assess the robustness and limitations of this system through a user study.
4) We explore the broader space of bio-acoustic input through prototype applications and additional experimentation.
RELATED WORK
Always-Available Input
The primary goal of Skinput is to provide an alwaysavailable mobile input system – that is, an input system that does not require a user to carry or pick up a device. A number of alternative approaches have been proposed that operate in this space. Techniques based on computer vision are popular (e.g. [3,26,27], see [7] for a recent survey). These, however, are computationally expensive and error prone in mobile scenarios (where, e.g., non-input optical flow is prevalent). Speech input (e.g. [13,15]) is a logical choice for always-available input, but is limited in its precision in unpredictable acoustic environments, and suffers from privacy and scalability issues in shared environments.
Other approaches have taken the form of wearable computing. This typically involves a physical input device built in a form considered to be part of one’s clothing. For example, glove-based input systems (see [25] for a review) allow users to retain most of their natural hand movements, but are cumbersome, uncomfortable, and disruptive to tactile sensation. Post and Orth [22] present a “smart fabric” system that embeds sensors and conductors into fabric, but taking this approach to always-available input necessitates embedding technology in all clothing, which would be prohibitively complex and expensive.
The SixthSense project [19] proposes a mobile, alwaysavailable input/output capability by combining projected information with a color-marker-based vision tracking system. This approach is feasible, but suffers from serious occlusion and accuracy limitations. For example, determining whether, e.g., a finger has tapped a button, or is merely hovering above it, is extraordinarily difficult. In the present work, we briefly explore the combination of on-body sensing with on-body projection.
Bio-Sensing
Skinput leverages the natural acoustic conduction properties of the human body to provide an input system, and is thus related to previous work in the use of biological signals for computer input. Signals traditionally used for diagnostic medicine, such as heart rate and skin resistance, have been appropriated for assessing a user’s emotional state (e.g. [16,17,20]). These features are generally subconsciouslydriven and cannot be controlled with sufficient precision for direct input. Similarly, brain sensing technologies such as electroencephalography (EEG) and functional near-infrared spectroscopy (fNIR) have been used by HCI researchers to assess cognitive and emotional state (e.g. [9,11,14]); this work also primarily looked at involuntary signals. In contrast, brain signals have been harnessed as a direct input for use by paralyzed patients (e.g. [8,18]), but direct braincomputer interfaces (BCIs) still lack the bandwidth required for everyday computing tasks, and require levels of focus, training, and concentration that are incompatible with typical computer interaction.
There has been less work relating to the intersection of finger input and biological signals. Researchers have harnessed the electrical signals generated by muscle activation during normal hand movement through electromyography (EMG) (e.g. [23,24]). At present, however, this approach typically requires expensive amplification systems and the application of conductive gel for effective signal acquisition, which would limit the acceptability of this approach for most users.
The input technology most related to our own is that of Amento et al. [2], who placed contact microphones on a user’s wrist to assess finger movement. However, this work was never formally evaluated, as is constrained to finger motions in one hand. The Hambone system [6] employs a similar setup, and through an HMM, yields classification accuracies around 90% for four gestures (e.g., raise heels, snap fingers). Performance of false positive rejection remains untested in both systems at present. Moreover, both techniques required the placement of sensors near the area of interaction (e.g., the wrist), increasing the degree of invasiveness and visibility.
Finally, bone conduction microphones and headphones – now common consumer technologies - represent an additional bio-sensing technology that is relevant to the present work. These leverage the fact that sound frequencies relevant to human speech propagate well through bone. Bone conduction microphones are typically worn near the ear, where they can sense vibrations propagating from the mouth and larynx during speech. Bone conduction headphones send sound through the bones of the skull and jaw directly to the inner ear, bypassing transmission of sound
through the air and outer ear, leaving an unobstructed path for environmental sounds.
Acoustic Input
Our approach is also inspired by systems that leverage acoustic transmission through (non-body) input surfaces. Paradiso et al. [21] measured the arrival time of a sound at multiple sensors to locate hand taps on a glass window. Ishii et al. [12] use a similar approach to localize a ball hitting a table, for computer augmentation of a real-world game. Both of these systems use acoustic time-of-flight for localization, which we explored, but found to be insufficiently robust on the human body, leading to the fingerprinting approach described in this paper.
SKINPUT
To expand the range of sensing modalities for always available input systems, we introduce Skinput, a novel input technique that allows the skin to be used as a finger input surface. In our prototype system, we choose to focus on the arm (although the technique could be applied elsewhere). This is an attractive area to appropriate as it provides considerable surface area for interaction, including a contiguous and flat area for projection (discussed subsequently).
Furthermore, the forearm and hands contain a complex assemblage of bones that increases acoustic distinctiveness of different locations. To capture this acoustic information, we developed a wearable armband that is non-invasive and easily removable (Figures 1 and 5).
In this section, we discuss the mechanical phenomena that enable Skinput, with a specific focus on the mechanical properties of the arm. Then we will describe the Skinput sensor and the processing techniques we use to segment, analyze, and classify bio-acoustic signals.
Bio-Acoustics
When a finger taps the skin, several distinct forms of acoustic energy are produced. Some energy is radiated into the air as sound waves; this energy is not captured by the Skinput system. Among the acoustic energy transmitted through the arm, the most readily visible are transverse waves, created by the displacement of the skin from a finger impact (Figure 2). When shot with a high-speed camera, these appear as ripples, which propagate outward from the point of contact (see video). The amplitude of these ripples is correlated to both the tapping force and to the volume and compliance of soft tissues under the impact area. In general, tapping on soft regions of the arm creates higher amplitude transverse waves than tapping on boney areas (e.g., wrist, palm, fingers), which have negligible compliance.
In addition to the energy that propagates on the surface of the arm, some energy is transmitted inward, toward the skeleton (Figure 3). These longitudinal (compressive) waves travel through the soft tissues of the arm, exciting the bone, which is much less deformable then the soft tissue but can respond to mechanical excitation by rotating and translating as a rigid body. This excitation vibrates soft tissues surrounding the entire length of the bone, resulting in new longitudinal waves that propagate outward to the skin.
We highlight these two separate forms of conduction – transverse waves moving directly along the arm surface, and longitudinal waves moving into and out of the bone through soft tissues – because these mechanisms carry energy at different frequencies and over different distances. Roughly speaking, higher frequencies propagate more readily through bone than through soft tissue, and bone conduction carries energy over larger distances than soft tissue conduction. While we do not explicitly model the specific mechanisms of conduction, or depend on these mechanisms for our analysis, we do believe the success of our technique depends on the complex acoustic patterns that result from mixtures of these modalities.
Similarly, we also believe that joints play an important role in making tapped locations acoustically distinct. Bones are held together by ligaments, and joints often include additional biological structures such as fluid cavities. This makes joints behave as acoustic filters. In some cases, these may simply dampen acoustics; in other cases, these will selectively attenuate specific frequencies, creating location specific acoustic signatures.
Figure 1. Transverse wave propagation: Finger impacts displace the skin, creating transverse waves (ripples). The sensor is activated as the wave passes underneath it.
Figure 2. Longitudinal wave propagation: Finger impacts create longitudinal (compressive) waves that cause internal skeletal structures to vibrate. This, in turn, creates longitudinal waves that emanate outwards from the bone (along its entire length) toward the skin.
Sensing
To capture the rich variety of acoustic information described in the previous section, we evaluated many sensing technologies, including bone conduction microphones, conventional
microphones coupled with stethoscopes [10], piezo contact microphones [2], and accelerometers. However, these transducers were engineered for very different applications than measuring acoustics transmitted through the human body. As such, we found them to be lacking in several
significant ways. Foremost, most mechanical sensors are engineered to provide relatively flat response curves over the range of frequencies that is relevant to our signal. This is a desirable property for most applications where a faithful representation of an input signal – uncolored by the properties of the transducer – is desired. However, because only a specific set of frequencies is conducted through the arm in response to tap input, a flat response curve leads to the capture of irrelevant frequencies and thus to a high signal-to-noise ratio.
While bone conduction microphones might seem a suitable choice for Skinput, these devices are typically engineered for capturing human voice, and filter out energy below the range of human speech (whose lowest frequency is around 85Hz). Thus most sensors in this category were not especially sensitive to lower-frequency signals (e.g., 25Hz), which we found in our empirical pilot studies to be vital in characterizing finger taps.
To overcome these challenges, we moved away from a single sensing element with a flat response curve, to an array of highly tuned vibration sensors. Specifically, we employ small, cantilevered piezo films (MiniSense100, Measurement Specialties, Inc.). By adding small weights to the end of the cantilever, we are able to alter the resonant frequency, allowing the sensing element to be responsive to a unique, narrow, low-frequency band of the acoustic spectrum. Adding more mass lowers the range of excitation to which a sensor responds; we weighted each element such that it aligned with particular frequencies that pilot studies showed to be useful in characterizing bio-acoustic input.