Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: A Virtual Honeypot Framework
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
A Virtual Honeypot Framework
[attachment=24142]
1 Introduction
Internet security is increasing in importance as more
and more business is conducted there. Yet, despite
decades of research and experience, we are still unable
to make secure computer systems or even measure their
security.
As a result, exploitation of newly discovered vulnerabilities
often catches us by surprise. Exploit automation
and massive global scanning for vulnerabilities
enable adversaries to compromise computer systems
shortly after vulnerabilities become known [23].
One way to get early warnings of new vulnerabilities
is to install and monitor computer systems on a network
that we expect to be broken into. Every attempt
to contact these systems via the network is suspect.
We call such a system a honeypot. If a honeypot is
compromised, we study the vulnerability that was used
to compromise it. A honeypot may run any operating
system and any number of services. The configured services
determine the vectors an adversary may choose
to compromise the system.
A physical honeypot is a real machine with its own
IP address. A virtual honeypot is simulated by another
machine that responds to network traffic sent to the
virtual honeypot.
Virtual honeypots are attractive because they requirer
fewer computer systems, which reduces maintenance
costs. Using virtual honeypots, it is possible to
populate a network with hosts running numerous operating
systems. To convince adversaries that a virtual
honeypot is running a given operating system, we need
to simulate the TCP/IP stack of the target operating
system carefully, in order to fool TCP/IP stack fingerprinting
tools like Xprobe [1] or Nmap [7].
This paper describes the design and implementation
of Honeyd, a framework for virtual honeypots that simulates
computer systems at the network level. Honeyd
supports the IP protocol suites [24] and responds to
network requests for its virtual honeypots according to
This research was conducted by the author while at the Center
for Information Technology Integration of the University of
Michigan.
the services that are configured for each virtual honeypot.
When sending a response packet, Honeyd’s personality
engine makes it match the network behavior
of the configured operating system personality.
To simulate real networks, Honeyd creates virtual
networks that consist of arbitrary routing topologies
with configurable link characteristics such as latency
and packet loss. When networking mapping tools like
traceroute are used to probe the virtual network, they
discover only the topologies simulated by Honeyd.
Our experimental evaluation of Honeyd verifies that
fingerprinting tools are deceived by the simulated systems
and that our virtual network topologies seem realistic
to network mapping tools.
To demonstrate the power of the Honeyd framework,
we show how it can be used in many areas of system
security. For example, Honeyd can help with detecting
and disabling worms, distracting adversaries, or preventing
the spread of spam email.
The rest of this paper is organized as follows. Section
2 presents background information on honeypots.
In Section 3, we discuss the design and implementation
of Honeyd. Section 4 presents an evaluation of the
Honeyd framework in which we verify that fingerprinting
and network mapping tools are fooled to report the
specified system configurations. We describe how Honeyd
can help to improve system security in Section 5
and present related work in Section 6. We summarize
and conclude in Section 7.
2 Honeypots
This section presents background information on
honeypots and our terminology. We provide motivation
for their use by comparing honeypots to network
intrusion detection systems (NIDS) [17]. The amount
of useful information provided by NIDS is decreasing
in the face of ever more sophisticated evasion techniques
[19, 26] and an increasing number of protocols
that employ encryption to protect network traffic from
eavesdroppers. NIDS also suffer from high false positive
rates that decrease their usefulness even further.
Honeypots can help with some of these problems.
A honeypot is as a closely monitored computing resource
that we intend to be probed, attacked, or compromised.
The value of a honeypot is determined by
the information that we can obtain from it. Monitoring
the data that enters and leaves a honeypot lets
us gather information that is not available to NIDS.
For example, we can log the key strokes of an interactive
session even if encryption is used to protect the
network traffic. To detect malicious behavior, NIDS
require signatures of known attacks and often fail to
detect compromises that were unknown at the time it
was deployed. On the other hand, honeypots can detect
vulnerabilities that are not yet understood. For
example, we can detect compromise by observing network
traffic leaving the honeypot even if the means of
the exploit has never been seen before.
Because a honeypot has no production value, any attempt
to contact it is suspicious. Consequently, forensic
analysis of data collected from honeypots is less
likely to lead to false positives than data collected by
NIDS.
Honeypots can run any operating system and any
number of services. The configured services determine
the vectors available to an adversary for compromising
or probing the system. A high-interaction honeypot
simulates all aspects of an operating system. A
low-interaction honeypots simulates only some parts,
for example the network stack [22]. A high-interaction
honeypot can be compromised completely, allowing an
adversary to gain full access to the system and use it
to launch further network attacks. In contrast, lowinteraction
honeypots simulate only services that cannot
be exploited to get complete access to the honeypot.
Low-interaction honeypots are more limited,
but they are useful to gather information at a higher
level, e.g., learn about network probes or worm activity.
They can also be used to analyze spammers or for
active countermeasures against worms; see Section 5.
We also differentiate between physical and virtual
honeypots. A physical honeypot is a real machine on
the network with its own IP address. A virtual honeypot
is simulated by another machine that responds to
network traffic sent to the virtual honeypot.
When gathering information about network attacks
or probes, the number of deployed honeypots influences
the amount and accuracy of the collected data. A
good example is measuring the activity of HTTP based
worms [21]. We can identify these worms only after
they complete a TCP handshake and send their payload.
However, most of their connection requests will
go unanswered because they contact randomly chosen
IP addresses. A honeypot can capture the worm payload
by configuring it to function as a web server. The
more honeypots we deploy the more likely one of them
is contacted by a worm.
Physical honeypots are often high-interaction, so allowing
the system to be compromised completely, they
are expensive to install and maintain. For large address
spaces, it is impractical or impossible to deploy
a physical honeypot for each IP address. In that case,
we need to deploy virtual honeypots.
3 Design and Implementation
In this section, we present Honeyd, a lightweight
framework for creating virtual honeypots. The framework
allows us to instrument thousands of IP addresses
with virtual machines and corresponding network services.
We start by discussing design considerations,
then describe Honeyd’s architecture and implementation.
We limit adversaries to interacting with our honeypots
only at the network level. Instead of simulating
every aspect of an operating system, we choose to simulate
only its network stack. The main drawback of
this approach is that an adversary never gains access
to a complete system even if he compromises a simulated
service. On the other hand, we are still able to
capture connection and compromise attempts. However,
we can mitigate these drawbacks by combining
Honeyd with a virtual machine like Vmware [25]. This
is discussed in the related work section. For that reason,
Honeyd is a low-interaction virtual honeypot that
simulates TCP and UDP services. It also understands
and responds correctly to ICMP messages.
Honeyd must be able to handle virtual honeypots
on multiple IP addresses simultaneously, in order to
populate the network with numerous virtual honeypots
simulating different operating systems and services. To
increase the realism of our simulation, the framework
must be able to simulate arbitrary network topologies.
To simulate address spaces that are topologically dispersed
and for load sharing, the framework also needs
to support network tunneling.
Figure 1 shows a conceptual overview of the framework’s
operation. A central machine intercepts network
traffic sent to the IP addresses of configured honeypots
and simulates their responses. Before we describe
Honeyd’s architecture, we explain how network
packets for virtual honeypots reach the Honeyd host.
3.1 Receiving Network Data
Honeyd is designed to reply to network packets
whose destination IP address belongs to one of the simFigure
1: Honeyd receives traffic for its virtual honeypots
via a router or Proxy ARP. For each honeypot, Honeyd
can simulate the network stack behavior of a different
operating system.
ulated honeypots. For Honeyd, to receive the correct
packets, the network needs to be configured appropriately.
There are several ways to do this, e.g., we can
create special routes for the virtual IP addresses that
point to the Honeyd host, or we can use Proxy ARP [3],
or we can use network tunnels.
Let A be the IP address of our router and B the IP
address of the Honeyd host. In the simplest case, the
IP addresses of virtual honeypots lie within our local
network. We denote them V1, . . . , Vn. When an adversary
sends a packet from the Internet to honeypot Vi,
router A receives and attempts to forward the packet.
The router queries its routing table to find the forwarding
address for Vi. There are three possible outcomes:
the router drops the packet because there is no route
to Vi, router A forwards the packet to another router,
or Vi lies in local network range of the router and thus
is directly reachable by A.
We use the latter two cases to direct traffic for Vi to
B. The easiest way is to configure routing entries for Vi
with 1  i  n that point to B. In that case, the router
forwards packets for our virtual honeypots directly to
the Honeyd host. If no special route has been configured,
the router ARPs to determine the MAC address
of the virtual honeypot. As there is no corresponding
physical machine, the ARP requests go unanswered
and the router drops the packet after a few retries. We
configure the Honeyd host to reply to ARP requests for
Vi with its own MAC addresses. This is called Proxy
ARP and allows the router to send packets for Vi to
B’s MAC address.
In more complex environments, it is possible to tunnel
network address space to a Honeyd host. We use
Figure 2: This diagram gives an overview of Honeyd’s
architecture. Incoming packets are dispatched to the correct
protocol handler. For TCP and UDP, the configured
services receive new data and send responses if necessary.
All outgoing packets are modified by the personality
engine to mimic the behavior of the configured network
stack. The routing component is optional and used only
when Honeyd simulates network topologies.
the generic routing encapsulation (GRE) [9, 10] tunneling
protocol described in detail in Section 3.4.
3.2 Architecture
Honeyd’s architecture consists of several components:
a configuration database, a central packet dispatcher,
protocol handlers, a personality engine, and
an optional routing component; see Figure 2.
Incoming packets are processed by the central packet
dispatcher. It first checks the length of an IP packet
and verifies the packet’s checksum. The framework is
aware of the three major Internet protocols: ICMP,
TCP and UDP. Packets for other protocols are logged
and silently discarded.
Before it can process a packet, the dispatcher must
query the configuration database to find a honeypot
configuration that corresponds to the destination IP
address. If no specific configuration exists, a default
template is used. Given a configuration, the packet and
corresponding configuration is handed to the protocol
specific handler.
The ICMP protocol handler supports most ICMP
requests. By default, all honeypot configurations respond
to echo requests and process destination unreachable
messages. The handling of other requests
depends on the configured personalities as described
in Section 3.3.
For TCP and UDP, the framework can establish connections
to arbitrary services. Services are external
applications that receive data on stdin and send their
output to stdout. The behavior of a service depends entirely
on the external application. When a connection
request is received, the framework checks if the packet
is part of an established connection. In that case, any
new data is sent to the already started service application.
If the packet contains a connection request, a
new process is created to run the appropriate service.
Instead of creating a new process for each connection,
the framework supports subsystems. A subsystem is
an application that runs in the name space of the virtual
honeypot. The subsystem specific application is
started when the corresponding virtual honeypot is instantiated.
A subsystem can bind to ports, accept connections,
and initiate network traffic.
Honeyd contains a simplified TCP state machine.
The three-way handshake for connection establishment
and connection teardown via FIN or RST are fully supported,
but receiver and congestion window management
is not fully implemented.
UDP datagrams are passed directly to the application.
When the framework receives a UDP packet for
a closed port, it sends an ICMP port unreachable message
unless this is forbidden by the configured personality.
In sending ICMP port unreachable messages, the
framework allows network mapping tools like traceroute
to discover the simulated network topology.
In addition to establishing a connection to a local
service, the framework also supports redirection of
connections. The redirection may be static or it can
depend on the connection quadruple (source address,
source port, destination address and destination port).
Redirection lets us to forward a connection request for
a service on a virtual honeypot to a service running
on a real server. For example, we can redirect DNS
requests to a proper name server. Or we can reflect
connections back to an adversary, e.g. just for run we
might redirect an SSH connection back to the originating
host and cause the adversary to attack her own
SSH server. Evil laugh.
Before a packet is sent to the network, it is processed
by the personality engine. The personality engine adjusts
the packet’s content so that it appears to originate
from the network stack of the configured operating system.
3.3 Personality Engine
Adversaries commonly run fingerprinting tools like
Xprobe [1] or Nmap [7] to gather information about a
target system. It is important that honeypots do not
stand out when fingerprinted. To make them appear
real to a probe, Honeyd simulates the network stack
Fingerprint IRIX 6.5.15m on SGI O2
TSeq(Class=TD%gcd=<104%SI=<1AE%IPID=I%TS=2HZ)
T1(DF=N%W=EF2A%ACK=S++%Flags=AS%Ops=MNWNNTNNM)
T2(Resp=Y%DF=N%W=0%ACK=S%Flags=AR%Ops=)
T3(Resp=Y%DF=N%W=EF2A%ACK=O%Flags=A%Ops=NNT)
T4(DF=N%W=0%ACK=O%Flags=R%Ops=)
T5(DF=N%W=0%ACK=S++%Flags=AR%Ops=)
T6(DF=N%W=0%ACK=O%Flags=R%Ops=)
T7(DF=N%W=0%ACK=S%Flags=AR%Ops=)
PU(Resp=N)
Figure 3: An example of an Nmap fingerprint that specifies
the network stack behavior of a system running IRIX.
behavior of a given operating system. We call this the
personality of a virtual honeypot. Different personalities
can be assigned to different virtual honeypots. The
personality engine makes a honeypot’s network stack
behave as specified by the personality by introducing
changes into the protocol headers of every outgoing
packet so that they match the characteristics of the
configured operating system.
The framework uses Nmap’s fingerprint database as
its reference for a personality’s TCP and UCP behavior;
Xprobe’s fingerprint database is used as reference
for a personality’s ICMP behavior.
Next, we explain how we use the information provided
by Nmap’s fingerprints to change the characteristics
of a honeypot’s network stack.
Each Nmap fingerprint has a format similar to the
example shown in Figure 3. We use the string after
the Fingerprint token as the personality name. The
lines after the name describe the results for nine different
tests. The first test is the most comprehensive. It
determines how the network stack of the remote operating
system creates the initial sequence number (ISN)
for TCP SYN segments. Nmap indicates the difficulty
of predicting ISNs in the Class field. Predictable ISNs
post a security problem because they allow an adversary
to spoof connections [2]. The gcd and SI field
provide more detailed information about the ISN distribution.
The first test also determines how IP identification
numbers and TCP timestamps are generated.
The next seven tests determine the stack’s behavior
for packets that arrive on open and closed TCP ports.
The last test analyzes the ICMP response packet to a
closed UDP port.
The framework keeps state for each honeypot. The
state includes information about ISN generation, the
boot time of the honeypot and the current IP packet
identification number. Keeping state is necessary so
that we can generate subsequent ISNs that follow the
distribution specified by the fingerprint.
Figure 4: The diagram shows the structure of the TCP
header. Honeyd changes options and other parameters to
match the behavior of network stacks.
Nmap’s fingerprinting is mostly concerned with an
operating system’s TCP implementation. TCP is a
stateful, connection-oriented protocol that provides error
recovery and congestion control [18]. TCP also supports
additional options, not all of which implemented
by all systems. The size of the advertised receiver windows
varies between implementations and is used by
Nmap as part of the fingerprint.
When the framework sends a packet for a newly established
TCP connection, it uses the Nmap fingerprint
to see the initial window size. After a connection has
been established, the framework adjusts the window
size according to the amount of buffered data.
If TCP options present in the fingerprint have been
negotiated during connection establishment, then Honeyd
inserts them into the response packet. The framework
uses the fingerprint to determine the frequency
with which TCP timestamps are updated. For most
operating systems, the update frequency is 2 Hz.
Generating the correct distribution of initial sequence
numbers is tricky. Nmap obtains six ISN samples
and analyzes their consecutive differences. Nmap
recognizes several ISN generation types: constant differences,
differences that are multiples of a constant,
completely random differences, time dependent and
random increments. To differentiate between the latter
two cases, Nmap calculates the greatest common
divisor (gcd) and standard deviation for the collected
differences.
The framework keeps track of the last ISN that was
generated by each honeypot and its generation time.
For new TCP connection requests, Honeyd uses a formula
that approximates the distribution described by
Figure 5: The diagram shows the structure of an ICMP
port unreachable message. Honeyd introduces errors into
the quoted IP header to match the behavior of network
stacks.
the fingerprint’s gcd and standard deviation. In this
way, the generated ISNs match the generation class
that Nmap expects for the particular operating system.
For the IP header, Honeyd adjusts the generation
of the identification number. It can either be zero,
increment by one, or random.
For ICMP packets, the personality engine uses the
PU test entry to determine how the quoted IP header
should be modified using the associated Xprobe fingerprint
for further information. Some operating systems
modify the incoming packet by changing fields from
network to host order and as a result quote the IP and
UDP header incorrectly. Honeyd introduces these errors
if necessary. Figure 5 shows an example for an
ICMP destination unreachable message. The framework
also supports the generation of other ICMP messages,
not described here due to space considerations.