03-08-2012, 02:27 PM
Code-Red: a case study on the spread and victims of an Internet worm
codered.pdf (Size: 327.58 KB / Downloads: 18)
Abstract
On July 19, 2001, more than 359,000 computers
connected to the Internet were infected with the Code-
Red (CRv2) worm in less than 14 hours. The cost of this
epidemic, including subsequent strains of Code-Red, is estimated
to be in excess of $2.6 billion. Despite the global
damage caused by this attack, there have been few serious
attempts to characterize the spread of the worm, partly
due to the challenge of collecting global information about
worms. Using a technique that enables global detection of
worm spread, we collected and analyzed data over a period
of 45 days beginning July 2nd, 2001 to determine the characteristics
of the spread of Code-Red throughout the Internet.
In this paper, we describe the methodology we use to trace
the spread of Code-Red, and then describe the results of our
trace analyses. We first detail the spread of the Code-Red
and CodeRedII worms in terms of infection and deactivation
rates. Even without being optimized for spread of infection,
Code-Red infection rates peaked at over 2,000 hosts per
minute. We then examine the properties of the infected host
population, including geographic location, weekly and diurnal
time effects, top-level domains, and ISPs. We demonstrate
that the worm was an international event, infection activity
exhibited time-of-day effects, and found that, although
most attention focused on large corporations, the Code-Red
worm primarily preyed upon home and small business users.
We also qualified the effects of DHCP on measurements of
infected hosts and determined that IP addresses are not an
accurate measure of the spread of a worm on timescales
longer than 24 hours. Finally, the experience of the Code-
Red worm demonstrates that wide-spread vulnerabilities in
Internet hosts can be exploited quickly and dramatically,
and that techniques other than host patching are required
to mitigate Internet worms.
Keywords—Code-Red, Code-RedI, CodeRedI, CodeRedII,
worm, security, backscatter, virus, epidemiology
CAIDA, San Diego Supercomputer Center, University of California,
San Diego. E-mail: fcshannon,dmoore,kcg[at]caida.org.
Support for this work is provided by DARPA NMS Grant N66001-
01-1-8909, NSF grant NCR-9711092, Cisco Systems URB Grant, and
Caida members.
I. INTRODUCTION
At 18:00 on November 2, 1988, Robert T. Morris released
a 99 line program onto the Internet. At 00:34 on
November 3, 1988, Andy Sudduth of Harvard University
posted the following message: “There may be a virus loose
on the Internet.” Indeed, Sun and VAX machines across
the country were screeching to a halt as invisible tasks utilized
all available resources [1] [2].
No virus brought large computers across the country to
a standstill – the culprit was actually the first malicious
worm. Unlike viruses and trojans which rely on human
intervention to spread, worms are self-replicating software
designed to spread throughout a network on their own. Although
the Morris worm was the first malicious worm to
wreak widespread havoc, earlier worms were actually designed
to maximize utilization of networked computation
resources. In 1982 at Xerox’s Palo Alto Research Center,
John Shoch and Jon Hupp wrote five worm programs
that performed such benign tasks as posting announcements
[3]. However, research into using worm programs
as tools was abandoned after it was determined that the
consequences of a worm malfunction could be dire.
In the years between the Morris worm in November
1988 and June 2001, Several other worms achieved limited
spread through host populations. TheWANK (Worms
Against Nuclear Killers) worm of October, 1989 attacked
SPAN VAX/VMS systems via DECnet protocols [4]. The
Ramen worm, first spread in January of 2001 targeted the
wu-ftp daemon on RedHat Linux 6.2 and 7.0 systems [5].
Finally, the Lion Worm targeted the TSIG vulnerability in
BIND in March of 2001 [6].
While all of these worms caused some damage, none
approached the $2.6 billion cost of recovering from the
Code-Red and CodeRedII worms [7]. We can no longer afford
to remain ignorant of the spread and effects of worms
as information technology plays a critical role in our global
economy.
II. BACKGROUND
On June 18, 2001, eEye released information about
a buffer-overflow vulnerability in Microsoft’s IIS web
servers [8]. Microsoft released a patch for the vulnerability
eight days later, on June 26, 2001 [9]. Then on July
12, 2001, the Code-RedI worm began to exploit the aforementioned
buffer-overflow vulnerability in Microsoft’s IIS
web servers.
Upon infecting a machine, the worm checks to see if the
date (as kept by the system clock) is between the first and
the nineteenth of the month. If so, the worm generates a
random list of IP addresses and probes each machine on
the list in an attempt to infect as many computers as possible.
However, this first version of the worm uses a static
seed in its random number generator and thus generates
identical lists of IP addresses on each infected machine.
The first version of the worm spread slowly, because each
infected machine began to spread the worm by probing
machines that were either already infected or impregnable.
On the 20th of every month, the worm is programmed to
stop infecting other machines and proceed to its next attack
phase in which it launches a Denial-of-Service attack
against www1.whitehouse.gov from the 20th to the
28th of each month. The worm is dormant on days of the
month following the 28th.
On July 13th, Ryan Permeh and Marc Maiffret at eEye
Digital Security received logs of attacks by the worm and
worked through the night to disassemble and analyze the
worm. They christened the worm “Code-Red” both because
the highly caffeinated “Code Red” Mountain Dew
beverage fueled their efforts to understand the workings
of the worm and because the worm defaces some web
pages with the phrase “Hacked by Chinese”. There is no
evidence either supporting or refuting the involvement of
Chinese hackers with the Code-RedI worm. The first version
of the Code-Red worm (Code-RedI v11) caused little
damage. Although the worm’s attempts to spread itself
consumed resources on infected machines and local area
networks, it had little impact on global resources.
The Code-RedI v1 worm is memory resident, so an infected
machine can be disinfected by simply rebooting it.
However, the machine is still vulnerable to repeat infection.
Any machines infected by Code-RedI v1 and subsequently
rebooted were likely to be reinfected, because
each newly infected machine probes the same list of IP
addresses in the same order.
At approximately 10:00 UTC in the morning of July
19th, 2001, we observed a change in the behavior of the
worm as infected computers began to probe new hosts.
At this point, a random-seed variant of the Code-RedI v1
worm began to infect hosts running unpatched versions of
Microsoft’s IIS web server. The worm still spreads by
1Although the initial Code-Red worm did not carry a suffix denoting
its temporal position, we have added the suffix “I” in the interest of
clarity, in the same manner as The Great War later came to be known
as World War I.
probing random IP addresses and infecting all hosts vulnerable
to the IIS exploit. Unlike Code-RedI v1, Code-
RedI v2 uses a random seed in its pseudo-random number
generator, so each infected computer tries to infect a different
list of randomly generated IP addresses at an observed
rate of roughly 11 probes per second (pps). This seemingly
minor change had a major impact: more than 359,000 machines
were infected with Code-RedI v2 in just fourteen
hours [10][11].
Because Code-RedI v2 is identical to Code-Red v1 in
all respects except the seed for its pseudo-random number
generator, the only direct damage to the infected host
is the “Hacked by Chinese” message added to top level
web pages on some hosts. However, Code-RedI v2 had a
greater impact on global infrastructure due to the sheer volume
of hosts infected and probes sent to infect new hosts.
Code-RedI v2 also wreaked havoc on some additional
devices with web interfaces, such as routers, switches,
DSL modems, and printers [12]. Although these devices
were not susceptible to infection by the worm, they either
crashed or rebooted when an infected machine attempted
to send them the unusual http request containing a copy of
the worm.
Like Code-RedI v1, Code-RedI v2 can be removed from
a computer simply by rebooting it. However, rebooting the
machine does not prevent reinfection once the machine is
online again. On July 19th, the number of machines attempting
to infect new hosts was so high that many machines
were infected while the patch for the vulnerability
was being applied.
On August 4, 2001, an entirely new worm, CodeRedII
began to exploit the buffer-overflow vulnerability in Microsoft’s
IIS web servers [13] [14]. Although the new
worm is completely unrelated to the original Code-RedI
worm, the source code of the worm contained the string
“CodeRedII” which became the name of the new worm.
Ryan Permeh and Marc Maiffret analyzed CodeRedII
to determine its attack mechanism. When a worm infects
a new host, it first determines if the system has already
been infected. If not, the worm initiates its propagation
mechanism, sets up a “backdoor” into the infected machine,
becomes dormant for a day, and then reboots the
machine. Unlike Code-RedI, CodeRedII is not memory
resident, so rebooting an infected machine does not eliminate
CodeRedII.
Initial intuition might lead one to believe that this
twenty-four hour delay will retard the spread of the worm
so severely that it will never compromise a large number
of machines, this is not the case. The delay adds a layer
of subterfuge to the worm, since perusal of logs showing
connections to the machine around the time that the machine
begins to demonstrate symptoms of the infection (i.e.
when it starts to actively spread the worm) will not yield
any unusual activity.
After rebooting the machine, the CodeRedII worm begins
to spread. If the host infected with CodeRedII has
Chinese (Taiwanese) or Chinese (PRC) as the system language,
it uses 600 threads to probe other machines. On
all other machines it uses 300 threads. CodeRedII uses
a more complex method of selecting hosts to probe than
Code-RedI. CodeRedII generates a random IP address and
then applies a mask to produce the IP address to probe.
The length of the mask determines the similarity between
the IP address of the infected machine and the probed machine.
CodeRedII probes a completely random IP address
1/8th of the time. Half of the time, CodeRedII probes a
machine in the same /8 (so if the infected machine had
the IP address 10.9.8.7, the IP address probed would start
with 10.), while 3/8ths of the time, it probes a machine
on the same /16 (so the IP address probed would begin
with 10.9.). Like Code-RedI, CodeRedII avoids probing
IP addresses in the 224.0.0.0/8 (multicast) and 127.0.0.0/8
(loopback) address spaces. The bias toward the local /16
and /8 networks means that an infected machine may be
more likely to probe a susceptible machine, based on the
supposition that machines on a single network are more
likely to be running the same software as machines on unrelated
IP subnets.
The CodeRedII worm is much more dangerous than
Code-RedI because CodeRedII installs a mechanism for
remote, administrator-level access to the infected machine.
Unlike Code-RedI, CodeRedII neither defaces web pages
on infected machines nor launches a Denial-of-Service attack.
However, the backdoor installed on the machine allows
any code to be executed, so the machines could be
used as “zombies” for future attacks (Denial-of-Service or
otherwise).
III. METHODOLOGY
In this section, we detail our trace collection methodology,
how we validated that the traffic we trace is from the
spread of the worms, and describe our approaches for characterizing
the type of hosts infected and their geographics
locations.
Our analysis of the Code-RedI worm covers the spread
of the worm between July 4, 2001 and August 25, 2001.
Before Code-RedI began to spread, we were collecting
data in the form of a packet header trace of hosts sending
unsolicited TCP SYN packets into our /8 network. When
the worm began to spread extensively on the morning of
July 19, we noticed the sudden influx of probes into our
network and began our monitoring efforts in earnest.
The data used for this study were collected from two locations:
a /8 network and two /16 networks. Two types
of data from the /8 network are used to maximize coverage
of the expansion of the worm. Between midnight
and 16:30 UTC on July 19, a passive network monitor
recorded headers of all packets destined for the /8 research
network. After 16:30 UTC, a filter installed on a campus
router to reduce congestion caused by the worm blocked
all external traffic to this network. Because this filter was
put into place upstream of the monitor, we were unable
to capture IP packet headers after 16:30 UTC. However, a
backup data set consisting of sampled netflow [15] output
from the filtering router was available for the /8 throughout
the 24 hour period. The data from the /16 networks
were collected with Bro between 10:00 UTC on July 19
and 7:00 on July 20 [16]. We merged these three sources
into a single dataset. Hosts were considered to be infected
if they sent at least two TCP SYN packets on port 80 to
nonexistent hosts on these networks during this time period.
The requirement of two packets helps to eliminate
random source denial-of-service attacks from the Code-
Red data.
Early on July 20, the filter was removed and we resumed
packet header data collection. Although we collected data
through October, we include data through August 25, 2001
in this study. No significant changes were observed in
Code-RedI or CodeRedII activity between August 2001
and the pre-programmed shutdown of CodeRedII on October
1, 2001.
0
200
400
600
800
1000
1200
1400
1600
1800
00:00
07/05
00:00
07/12
00:00
07/19
Unique hosts per 2 hour bucket
Time (UTC)
All port 80 probes
CRv1 candidates
Fig. 1. Background level of unsolicited SYN probes and the
beginning of the spread of the Code-RedI worm.
A constant background level of unsolicited TCP SYN
packets, most likely port scans seeking to identify vulnerable
machines, target the IPv4 address space. In our /8, this
rate fluctuates between 100 and 600 hosts per two hour
period, with diurnal and weekly variations. On July 12,
the static-seed version of the Code-RedI worm began to
spread. We noticed that the hosts that appeared clearly infected
with Code-RedI v1 probed the same set of 23 IP
addresses within our /8 research network. In Figure 1, we
used the criterion of probing these 23 addresses to separate
the Code-RedI v1 probes from the background port scans.
To confirm that the 23 addresses were actually among
those probed by the worm, we reverse engineered the exploit
to extract the IP addresses probed by its static-seed
pseudo-random number generator. We obtained a disassembled
version of the worm from eEye [17] and identified
the code responsible for spreading the worm. The worm
creates one hundred threads, each with its own static-seed
and thus its own distinct, although not disjoint, set of IP
addresses probed sequentially.
We examined the PRNG (Pseudo-Random Number
Generator) code used to generate the target sequences and
wrote a C implementation to generate the first one thousand
IP addresses probed by each thread (approximately
the first one million IP addresses). We extracted the IP addresses
that fell in our /8 and found the same 23 address
sequence we predicted from our packet trace data. The
23 addresses that fall in our research network actually occur
very early in the generated sequences2. A machine
newly infected with Code-RedI v1 probes our /8 network
23 times in the first fifteen minutes of propagation.
Once we had identified the IP addresses initially probed
by the worm, we compared this sequence to the hosts
we observed probing the 23 target addresses in our research
network. We discovered that the first three hosts
that probed our /8 research network were not contained in
the IP address sequence probed by any thread. We believe
that the individual (or individuals) responsible for
the Code-RedI worm compromised these machines and
seeded them with the worm to initiate the epidemic. The
first two machines both appear to be located in the United
States, one in Cambridge, Massachusetts and the other in
Atlanta, Georgia. The third address appears to be in the
city of Foshan in China’s GuangDong province. However
there remains no evidence linking Chinese hackers to the
development or deployment of the Code-RedI worm.
We classify infected hosts using the DNS name of each
host and a hand-tuned set of regular expression matches3
(e.g. DNS names with “dialup” represent modems, “dsl”
or “home.com” identifies broadband, etc.) into the follow-
2IP addresses in the monitored class A network occurred early in each
of the 100 threads started on Code-RedI v1 infected machines. Probe
sequence numbers within their threads included: 8, 12, 14, 20, 22, 25,
26, 29, 32, 34, 36, 40, 41, 41, 43, 43, 44, 45, 45, 51, 56, 57, 59. Thus
we are able to detect the compromise of a new host almost instantly
as we receive many probes from the host in the first minute following
infection.
3The regular expressions are available at http://www.caida.
org/tools/measurement/misc/HostClassify
ing categories: mail servers, name servers, web servers,
IRC servers, firewalls, dial-up, broadband, other (unclassifiable)
hosts, and hosts with no hostname. The prevalence
of each type of host is discussed in Section IV-B.4.
We also used Ixia’s IxMapping [18] service to determine
the latitude, longitude, and country of each IP address infected
with the worm. IxMapping uses public data sources
such as WHOIS and DNS, as well as specialized measurement
to geographically place IP addresses. We identified a
rough approximation of the timezone of each infected host
based on this longitude.