Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: ANALYSIS AND DETECTION OF METAMORPHIC COMPUTER VIRUSES PROJECT REPORT
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
ANALYSIS AND DETECTION OF METAMORPHIC
COMPUTER VIRUSES


[attachment=66767]

Abstract


Computer virus writers commonly use metamorphic techniques to produce viruses that
change their internal structure on each infection. It is generally believed that these
metamorphic viruses are extremely difficult to detect. Metamorphic virus generating kits
are readily available, so that little knowledge or skill is required to create these
potentially devastating viruses.
In this project, we first analyze four virus creation kits to determine the degree of
metamorphism provided by each. We are able to precisely quantify the degree of
metamorphism produced by these virus generators. While the best generator, the Next
Generation Virus Creation Kit (NGVCK), produces virus variants that differ greatly from
one another, the other three generators we examined are much less effective.
We then show that three popular commercial virus scanners cannot detect any of the
NGVCK viruses in our test set. We proceed to develop an effective metamorphic virus
detection technique based on hidden Markov models (HMM). With this HMM detector,
we are able to classify a given program as belonging to a particular virus family or not.
Using this approach, we can detect all metamorphic viruses in our test set with extremely
high accuracy. We also present a simpler detection method that detects metamorphic
viruses with high accuracy.
Our results show that the best available metamorphic generator is effective at morphing
viral code and that the resulting morphed viruses are not detectable using popular
commercial virus scanning software. Surprisingly, these viruses differ sufficiently from
non-viral code so that they are detectable using a similarity technique that we present in
this paper. It remains an interesting open question whether metamorphic viral code can be
constructed which is undetectable using our techniques.



INTRODUCTION


“A computer virus is a program that recursively and explicitly copies a possibly evolved
version of itself” [19]. A virus copies itself to a host file or system area. Once it gets
control, it multiplies itself to form newer generations. A virus may carry out damaging
activities on the host machine such as corrupting or erasing files, overwriting the whole
hard disk, or crashing the computer. Some viruses may print text on the screen or simply
do nothing. These viruses remain harmless but keep reproducing themselves. In any case,
viruses are undesirable for computer users.
Over the past two decades, the number of viruses has been increasing rapidly. We have
seen several attacks that caused great disruption to the Internet and brought huge damage
to organizations and individuals. For example, in 1999, the infamous Melissa virus
infected thousands of computers and caused damage close to $80 million; while the Code
Red worm outbreak in 2001 affected systems running Windows NT and Windows 2000
server and caused damage in excess of $2 billion [23]. Computer virus attacks will
continue to pose a serious security threat to every computer user.
To simplify the virus creation process, virus writers have made virus construction kits
readily available on the Internet [22]. This allows people who do not have any expertise
in assembly coding to generate their own viruses. Virus writers also recognize that for
their viruses to have a chance to escape detection, the viruses created must look different
from one another so that a virus signature cannot be easily extracted. Some kits come
equipped with the ability to generate automatically morphed variants from a single
configuration file. Precisely how effective are these code morphing generators? How
different do the morphed variants look? We generated variants of metamorphic viruses
using some of these tools and measured the similarity between the morphed variants.
Detecting metamorphic viruses is challenging. The problem with simple signature-based
scanning is that even small changes in the viral code may cause a scanner to fail. In



Encrypted Viruses

The simplest way to change the appearance of a virus is to use encryption. An encrypted
virus consists of a small decrypting module (a decryptor) and an encrypted virus body. If
a different encryption key is used for each infection, the encrypted virus body will look
different. Typically, the encryption method is rather simple, such as xor of the key with
each byte of the virus body. Simple xor is very practical because xoring the encrypted
code with the key again will give the original code and so a virus can use the same
routine for both encryption and decryption.
With encryption, the decryptor remains constant from generation to generation. As a
result, detection is possible based on the code pattern of the decryptor. A scanner that
cannot decrypt or detect the virus body directly can recognize the decryptor in most
cases.


First Generation Scanners


The simplest approach to virus detection is string scanning. First generation scanners
look for “virus signatures” which are sequences of bytes (strings) extracted from viruses
in files or in memory. A good signature for a virus consists of sequences of text strings or
byte codes found commonly in the virus but infrequently in benign programs. Usually, a
human expert converts the virus binary code into assembly code, looks for sections that
signify viral activities and picks the corresponding bytes in the machine code to be the
virus signature. More efficient methods use statistical techniques to extract good
signatures automatically [8].
Virus signatures are organized into databases. To identify virus infection, virus scanners
check specific areas in files or system areas and match them against known signatures in
databases. Some simple scanners also support wildcard search strings, such as “?02 33C9
8BD1 419C” where the wildcard is indicated by ‘?’. Wildcard strings allow skipped bytes
and regular expressions and can sometimes be used to detect encrypted or even
polymorphic viruses [19]. Using a search string from the common code areas of all
known variants of a virus to scan for the virus family is known as generic detection [19].
A generic string typically contains wildcards.




6. CONCLUSION

Virus writers and anti-virus researchers generally agree that metamorphism is the way to
generate undetectable viruses. Several virus writers have released virus creation kits and70
claimed that they possess the ability to automatically produce morphed virus variants that
look substantially different from one another.
To see how effective these code morphing engines are, and how much difference exists
between variants of a given virus, we measured the similarity between virus variants
generated by four virus generators downloaded from the Internet. Our results show that
the effectiveness of these generators varies widely. While the best generator, Next
Generation Virus Creation Kit (NGVCK), is able to create viruses that share only a few
percent of similarity, the other generators produce viruses that are over 60% similar, on
average. In addition, our similarity graphs show that some of these variant pairs have
long segments of identical assembly opcodes at identical positions of the virus files.
Compared to random utility files which have a similarity of about 35%, we see that some
of the virus creation kits do not effectively morph the viral code.
Not only do NGVCK viruses show low similarity among themselves, they show even
lower similarities when compared to viruses generated by other generators (from 0 to
5.5%). When compared to normal random files, the similarity scores are almost always
zero, with only a few exceptions. We conclude that NGVCK viruses have the highest
degree of metamorphism among the four virus families we tested. In addition, NGVCK
viruses are very different from normal programs and viruses in other families.
To detect metamorphic virus variants, we experimented with hidden Markov models
(HMMs) to capture the statistical properties of viruses in the same family. We generated
200 NGVCK viruses, trained 25 models and used the trained models to classify both
viruses and random non-viral programs. Of the 25 models, 23 were able to identify all the
normal programs by their scores alone. This means we can easily distinguish a NGVCK
virus from a normal program.71
The models also distinguished between VCL32 (Virus Creation Lab for Win32) viruses
and other viruses not belonging to the NGVCK family. They assigned higher scores to
VCL32 viruses, which were the only viruses we tested that have some similarities to the
NGVCK family. Even so, seven of our models were able to perfectly distinguish the
NGVCK viruses from the VCL32 viruses by scores. The other models produced different
number of false positives and false negatives, depending on the threshold used in the
classifying process. Using -4.5 as the threshold, 17 of the models achieved a 100%
detection rate, with a false positive rate ranging from 0% to 7.7%.
If the variants of a metamorphic virus are sufficiently different that signature-based
scanning cannot detect a newly morphed variant, the HMM approach provides a feasible
solution. As with any statistical detection method, false predictions are possible. In our
tests, false positives were all due to viruses from a different family than those in the
training set, rather than normal non-viral programs. Therefore, we can view these false
positives in a positive light, since the HMM detects additional viruses which have
statistical properties similar to the viruses that the HMMs represent.
The number of states N of a model does not seem to have much impact on the
performance of the HMM. We saw only small differences in the performance measures
for models with N from 3 to 6. Since the time to train a model and the time to score a
program increases with the number of states N, we may want to use a smaller N if time is
crucial to the detection process. The trained models grouped the observed opcodes under
the hidden states according to the probabilities that they were seen. This should help us
infer features of the NGVCK viruses.
The fact that NGVCK viruses have assembly code structures that are different from
normal programs and other viruses makes them distinguishable by our straight-forward
similarity index alone. Our two tests that used similarity indices to classify 105 programs
were both 100% accurate. This result illustrates that even though the NGVCK viruses72
show a high degree of metamorphism, it is still relatively easy to detect them since they
are “too different” from normal programs. The similarity index approach is remarkably
effective when the virus code structure is significantly different from normal non-viral
code.
We scanned the viruses from the four families with three virus scanners. Viruses in the
three families other than the NGVCK were detected by the three scanners. All NGVCK
viruses escaped detection by these signature-based scanners. While the NGVCK viruses
were not detected by the scanners we tested, we have shown that both the similarity index
approach and the HMM approach are very effective in dealing with these viruses.
For viruses to avoid detection, they not only need a high degree of metamorphism, but
also a degree of similarity to normal programs. None of the virus construction kits we
tested satisfy both of these requirements. Three of the four virus generators fall short on
metamorphism, while the one generator that is highly metamorphic lacks sufficient
similarity to non-viral code. As a result, all these viruses are relatively easy to detect. An
interesting open question is whether it is possible to satisfy both metamorphic and
similarity conditions and thereby create a truly undetectable virus.