12-11-2012, 03:31 PM
Robust Correlation of Encrypted Attack Traffic through Stepping Stones by Flow Watermarking
Robust Correlation.pdf (Size: 570.38 KB / Downloads: 44)
Abstract
Network based intruders seldom attack their victims
directly from their own computer. Often, they stage their attacks
through intermediate “stepping stones” in order to conceal their
identity and origin. To identify the source of the attack behind
the stepping stone(s), it is necessary to correlate the incoming
and outgoing flows or connections of a stepping stone. To resist
attempts at correlation, the attacker may encrypt or otherwise
manipulate the connection traffic.
Timing based correlation approaches have been shown to be
quite effective in correlating encrypted connections. However,
timing based correlation approaches are subject to timing perturbations
that may be deliberately introduced by the attacker
at stepping stones.
In this paper we propose a novel watermark-based correlation
scheme that is designed specifically to be robust against timing
perturbations. Unlike most previous timing based correlation
approaches, our watermark-based approach is “active” in that
it embeds a unique watermark into the encrypted flows by
slightly adjusting the timing of selected packets. The unique
watermark that is embedded in the encrypted flow gives us a
number of advantages over passive timing based correlation in
resisting timing perturbations by the attacker. In contrast to
existing passive correlation approaches, our active watermark
based correlation does not make any limiting assumptions about
the distribution or random process of the original inter-packet
timing of the packet flow. In theory, our watermark based
correlation can achieve arbitrarily close to 100% correlation true
positive rate and arbitrarily close to 0% false positive rate at the
same time for sufficiently long flows, despite arbitrarily large
(but bounded) timing perturbations of any distribution by the
attacker.
INTRODUCTION
NETWORK based attacks have become a serious threat to
the critical information infrastructure on which we depend.
To stop or repel network-based attacks, it is critical to be able to
identify the source of the attack. Attackers, however, go to some
lengths to conceal their identities and origin, using a variety of
countermeasures. As an example, they may spoof the IP source
address of the attack traffic. Methods of tracing spoofed traffic,
generally known as IP traceback [23], [26], [9], [14] have been
developed to address this countermeasure.
Another common and effective countermeasure used by
network-based intruders to hide their identity is to connect
through a sequence of intermediate hosts, or stepping stones,
before attacking the final target. For example, an attacker at host
A may Telnet or SSH into host B, and from there launch an
attack on host C. In effect, the incoming packets of an attack
connection from A to B are forwarded by B, and become outgoing
packets of a connection from B to C. The two connections or
flows are related in such a case. The victim host C can use IP
traceback to determine the second flow originated from host B,
but traceback will not be able to correlate that with the attack flow
originating from host A. To trace attacks through a stepping stone,
it is necessary to correlate the incoming traffic with the outgoing
traffic at the stepping stone. This would allow the attack to be
traced back to host A in the example.
RELATED WORK
Existing connection correlation approaches are based on three
different characteristics: 1) host activity; 2) connection content
(i.e. packet payload); and 3) inter-packet timing characteristics.
The host activity based approach (e.g. DIDS [25] and CIS [11])
collects and tracks users’ login activity at each stepping stone.
The major drawback of host activity based methods is that the
host activity collected from each stepping stone is generally not
trustworthy. Since the attacker is assumed to have full control over
each stepping stone, he/she can easily modify, delete or forge user
login information. This defeats the ability to correlate based on
host activity.
Content based correlation approaches (e.g. Thumbprinting [27]
and SWT [33]) require that the payload of packets remains
invariant across (i.e., is unchanged by) stepping stones. Since the
attacker can easily transform the connection content by encryption
at the application layer, these approaches are suitable only for
unencrypted connections.
OVERVIEW OF WATERMARK-BASED CORRELATION
Overall Watermark Tracing Model
The watermark tracing approach exploits the observation that
interactive connections (i.e. Telnet, SSH) are bidirectional. The
idea is to watermark the backward traffic (from victim back to
the attacker) of the bidirectional attack connections by slightly
adjusting the timing of selected packets. If the embedded watermark
is both robust and unique, the watermarked back traffic
can be effectively correlated and traced across stepping stones,
from the victim all the way back to the attacker. As shown in
Figure 1, the attacker may connect through a number of hosts
(H1; : : : ;Hn) before attacking the final target. Assuming the
attacker has not gained full control on the attack target, the attack
target will initiate the attack tracing after it has detected the attack.
Specifically, the attack target will watermark the backward traffic
of the attack connection, and inform sensors across the network
about the watermark. The sensors across the network will scan
all traffic for the presence of the indicated watermark, and report
to the target if any occurrences of the watermark are detected.
LIMITS OF ADVERSARY’S TIMING PERTURBATION
We have shown that there exist special cases of brute force
timing perturbation that could completely remove any embedded
watermark from any distribution of inter-packet timing.
In this section, we analyze the limitations on the negative
impact of the adversary’s timing perturbations.We assume that the
key parameters of the watermark embedding method are unknown
to the adversary. We first identify the minimum distortion required
for the adversary to completely remove the embedded watermark
and the optimal strategy for doing so. We then analyze the
additional constraints imposed by real-time communication and
their implications for the adversary’s ability to interfere with or
distort the watermark. We show that it is generally infeasible for
the adversary to completely eliminate the embedded watermark
from a flow of packets in real-time.