14-06-2013, 02:32 PM
Active Property Testing
Active Property.pdf (Size: 285.36 KB / Downloads: 22)
Abstract
One motivation for property testing of boolean
functions is the idea that testing can provide a fast preprocessing
step before learning. However, in most machine learning
applications, it is not possible to request for labels of arbitrary
examples constructed by an algorithm. Instead, the dominant
query paradigm in applied machine learning, called active
learning, is one where the algorithm may query for labels, but
only on points in a given (polynomial-sized) unlabeled sample,
drawn from some underlying distribution D. In this work, we
bring this well-studied model to the domain of testing.
We develop both general results for this active testing model
as well as efficient testing algorithms for several important
properties for learning, demonstrating that testing can still
yield substantial benefits in this restricted setting. For example,
we show that testing unions of d intervals can be done with
O(1) label requests in our setting.
INTRODUCTION
Property testing and machine learning have many natural
connections. In property testing, given black-box access
to an unknown boolean function f, one would like with
few queries to distinguish the case that f has some given
property P (belongs to the class of functions P) from the
case that f is far from any function having that property.
Our Results and Structure of this Paper
After formally defining the active testing model in Section
II, we begin in Section III by presenting an active tester
for the class of unions of d intervals on the line. This
tester requires only a constant number of label requests
(independent of d) and applies for any (even unknown)
distribution D.
CONCLUSIONS
In this work we develop and analyze a model of property
testing that parallels the active learning model in machine
learning, in which queries are restricted to be selected
from a given (polynomially) large unlabeled sample. We
demonstrate that a number of important properties for
machine learning can be efficiently tested in this setting
with substantially fewer queries than needed to learn. We
additionally give a combination result allowing one to build
testable properties out of others, as well as develop notions
of intrinsic testing dimension that characterize the number
of queries needed to test, and which we then use to prove
a number of lower bounds.