25-10-2012, 03:52 PM
Machine Learning Techniques for Passive Network Inventory
Machine Learning.pdf (Size: 1.83 MB / Downloads: 61)
Abstract
Being able to fingerprint devices and services, i.e.,
remotely identify running code, is a powerful service for both
security assessment and inventory management. This paper
describes two novel fingerprinting techniques supported by
isomorphic based distances which are adapted for measuring
the similarity between two syntactic trees. The first method
leverages the support vector machines paradigm and requires a
learning stage. The second method operates in an unsupervised
manner thanks to a new classification algorithm derived from
the ROCK and QROCK algorithms. It provides an efficient and
accurate classification. We highlight the use of such classification
techniques for identifying the remote running applications. The
approaches are validated through extensive experimentations on
SIP (Session Initiation Protocol) for evaluating the impact of
the different parameters and identifying the best configuration
before applying the techniques to network traces collected by a
real operator.
INTRODUCTION
ASSUMING a communication service, device fingerprinting
aims to determine exactly the device version or the
protocol stack implemented by a piece of equipment implementing
the service. It is a challenging task which has impact
on both security assessment and network management especially
inventory management. Identifying the devices helps
to get a detailed view of alive equipments on a network for
planning future actions when needed. For example, if a new
security flaw is discovered for some device types, patching
them has to be fast due to zero-day attacks, but locating
them is not always obvious. Besides, the fingerprinting tools
can help to discover abnormal devices on a network as for
instance a rogue equipment which has to be disconnected.
Since device fingerprinting determines the software versions,
tracking copyright infringements is another application of
such techniques. Furthermore, some authentication systems
check the device type like for example allowing only some
specific hardphones on a VoIP (Voice over IP) network.
FINGERPRINTING FRAMEWORK
Applications
This section highlights some applications that can be supported
by fingerprinting techniques. Figure 1 is related to
security issues since the goal is to detect rogue devices which
spoof the user agent field in the messages. This field is present
in many protocol messages and indicates the device type.
However, it can be very easily spoofed. Since an attacker
uses dedicated tools for performing an attack, the user agent
should not be an usual one i.e. of the same type as normal
users. Hence, a spoofing technique is generally employed. In
figure 1, the fingerprinting tool captures the traffic between the
devices and the server or between two clients as for instance
in P2P networks. Considering the first communication, the
fingerprinting result indicates the same type of device as
the announced one. So, no counter-measure is applied. The
second case shows an attacker spoofing the user agent field.
By comparing this user-agent information with the automatic
identification resulting from fingerprinting, the tool is able
to detect the spoof and can decide to launch a countermeasure
which can vary from simple logging of the event
to deny access to the rogue device through interactive firewall
configuration.
Unsupervised classification, problem 2
1) ROCK and QROCK: Whereas our device representation
is based on syntactic trees which can be viewed as categorical
data, most well known techniques such as K-means, Kmedoids
or density based algorithms are suited for numerical
values [10]. Therefore, new kind of unsupervised approaches
dedicated to categorical data can be found in the literature
as for instance the ROCK algorithm [11]. This algorithm is
based on a graph representation where two nodes are linked
if they share at least one common neighbor. Two points are
neighbors if their inter-distance is less than a threshold