27-03-2012, 11:02 AM
Keystroke dynamics identity verificationdits problems and practical solutions
[EnzheYu_ChoS](2004)Kestroke_Dynamics_Identity_Verification-Its_Problems_and_Practical_Solutions.pdf (Size: 503.46 KB / Downloads: 39)
Introduction
In typing a phrase or a string of characters, the
typing dynamics or timing pattern can be measured
and used for identity verification. More specifically,
a timing vector consists of the keystroke
duration times interleaved with the keystroke interval
times at the accuracy of milliseconds (ms).
If a password of n characters is typed, a ð2nC1Þ
dimensional timing vector results, which consists
of n keystroke duration times, ðn 1Þ keystroke
interval times, and the return key (in most cases,
return key is meaningless, thus ignored). Fig. 1
illustrates the timing vector when a string
‘‘ABCD’’ (n ¼ 4) is typed. An actual example of a
9-dimensional timing vector is ½30; 60; 70;35; 60;
35; 75; 40; 55. The time unit is in milliseconds.
When a key is stroked before the previous key is
released, the keystroke interval time is represented
as negative (!0).
Identity verification using SVM novelty detector
User identity verification is a challenging task from
a pattern classification viewpoint. It is a 2-class
(owner vs. imposters) problem, but only the patterns
from the owner are available in advance.
Most previous researches (Brown and Rogers, 1993;
Goldberg, 1989; Leggett et al., 1991; Obaidat and
Sadoun, 1997) used both the owner’s and imposter’s
patterns to train their models. Yet, this is
not practical in real-world applications because
there are millions of potential imposters, thus it
is not possible to obtain all the prospective imposter
patterns. Nor is it desirable to publicize one’s
password to collect potential imposter’s timing
vectors at the risk of fatal intrusion.
GAeSVM wrapper approach for feature selection
Novelty detection models are built under the assumption
that the owner’s typing follows a consistent
pattern. But there are always inconsistent
typing patterns attributed to human error. One
popular method of tackling the problem is manual
preprocessing of data, i.e., either removing the
variable(s) whose values are dispersed, or removing
data samples which seem to be inconsistent
with other patterns (Cho et al., 2000). However,
practically, it is very difficult to correctly identify
noisy data or outliers.