Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Automatic Network Protocol Analysis

[attachment=32951]

Abstract

Protocol reverse engineering is the process of ex-
tracting application-level speciﬁcations for network pro-
tocols. Such speciﬁcations are very helpful in a number
of security-related contexts. For example, they are needed
by intrusion detection systems to perform deep packet in-
spection, and they allow the implementation of black-box
fuzzing tools. Unfortunately, manual reverse engineering
is a time-consuming and tedious task. To address this prob-
lem, researchers have recently proposed systems that help
to automate the process. These systems operate by ana-
lyzing traces of network trafﬁc. However, there is limited
information available at the network-level, and thus, the
accuracy of the results is limited.
In this paper, we present a novel approach to automatic
protocol reverse engineering. Our approach works by dy-
namically monitoring the execution of the application, an-
alyzing how the program is processing the protocol mes-
sages that it receives. This is motivated by the insight that
an application encodes the complete protocol and repre-
sents the authoritative speciﬁcation of the inputs that it can
accept. In a ﬁrst step, we extract information about the
ﬁelds of individual messages. Then, we aggregate this in-
formation to determine a more general speciﬁcation of the
message format, which can include optional or alternative
ﬁelds, and repetitions. We have applied our techniques to
a number of real-world protocols and server applications.
Our results demonstrate that we are able to extract the for-
mat speciﬁcation for different types of messages. Using
these speciﬁcations, we then automatically generate ap-
propriate parser code.

Introduction

Protocol reverse engineering is the process of extract-
ing application-level protocol speciﬁcations. The detailed
knowledge of such protocol speciﬁcations is invaluable for
addressing a number of security problems. For example,
it allows the automated generation of protocol fuzzers [24]
that perform black-box testing of server programs that ac-
cept network input. In addition, protocol speciﬁcations are
often required for intrusion detection systems [26] that im-
plement deep packet inspection capabilities. These sys-
tems typically parse the network stream into segments with
application-level semantics, and apply detection rules only
to certain parts of the trafﬁc. Generic protocol analyzers
such as binpac [25] and GAPA [2] also require protocol
grammars as input. Moreover, possessing protocol infor-
mation helps to identify and understand applications that
may communicate over non-standard ports or application
data that is encapsulated in other protocols [15, 20]. Fi-
nally, knowledge about the differences in the way that cer-
tain server applications implement a standard protocol can
help a security analyst to perform server ﬁngerprinting [5],
or guide testing and security auditing efforts [3].

System design

Automatic protocol reverse engineering is a complex
and difﬁcult problem. In the following section, we intro-
duce the problem domain and discuss the speciﬁc problems
that our techniques address. Then, we provide a high-level
overview of the workings of our system.

Problem scope

In [10], the authors introduce a terminology for com-
mon protocol idioms that allow a general discussion of the
problem of protocol reverse engineering. In particular, the
authors observe that most application protocols have a no-
tion of an application session, which allows two hosts to
accomplish a speciﬁc task. An application session consists
of a series of individual messages. These messages can
have different types. Each message type is deﬁned by a cer-
tain message format speciﬁcation. A message format spec-
iﬁes a number of ﬁelds, for example, length ﬁelds, cookies,
keywords, or endpoint addresses (such as IP addresses and
ports). The structure of the whole application session is
determined by the protocol state machine, which speciﬁes
the order in which messages of different types can be sent.
Using that terminology, we observe that automatic pro-
tocol reverse engineering can target different levels. In the
simplest case, the analysis only examines a single message.
Here, the goal of the reverse engineering process is to iden-
tify the different ﬁelds that appear in that message. A more
general approach considers a set of messages of a particu-
lar type. An analysis process at this level would produce
a message format speciﬁcation that can include optional
ﬁelds or alternative structures for parts of the message. Fi-
nally, in the most general case, the analysis process oper-
ates on complete application sessions. In this case, it is not
sufﬁcient to only extract message format speciﬁcations, but
also to identify the protocol state machine. Moreover, be-
fore individual message formats can be extracted, it is nec-
essary to distinguish between messages of different types

Analysis of multiple messages

When analyzing a single protocol message, our system
breaks up the byte sequence that makes up this message
into a number of ﬁelds. As mentioned previously, these
ﬁelds can be nested, and thus, are stored in a hierarchical
(tree) structure. The root node of the tree is the complete
message. Both length ﬁeld and delimiter analyses are used
to identify parts of the message as scope ﬁelds, delimited
ﬁelds, length ﬁelds, or target ﬁelds. Input bytes that cannot
be attributed to any such ﬁeld are treated as individual byte
ﬁelds or, if they are in a delimiter scope and end at a delim-
iter, as arbitrary-length token ﬁelds. We refer to ﬁelds that
contain other, embedded ﬁelds as complex ﬁelds. Fields
that cannot be divided further are called basic ﬁelds. In
the tree hierarchy, complex ﬁelds are internal nodes, while
basic ﬁelds are leaf nodes.
It is possible, and common, that different message in-
stances of the same type do not contain the same ﬁelds in
the same order. For example, in a HTTP GET request, the
client can send multiple header lines with different key-
words. Moreover, these headers can appear in an almost
arbitrary order. Another example is a DNS query where
the requested domain name is split into a variable num-
ber of parts, depending on the number of dots in the name.
By analyzing only a single message, there is no way for the
system to determine whether a protocol requires the format
to be exactly as seen, or whether there is some ﬂexibility
in the way ﬁelds can be arranged. To address this ques-
tion, and to deliver a general and precise message format
speciﬁcation, information from multiple messages must be
combined.

Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

seminar flower