13-05-2013, 02:11 PM
On Protection by Layout Randomization
Protection by Layout.pdf (Size: 440.97 KB / Downloads: 15)
ABSTRACT
Layout randomization is a powerful, popular technique for software protection. We present it and
study it in programming-language terms. More specically, we consider layout randomization as
part of an implementation for a high-level programming language; the implementation translates
this language to a lower-level language in which memory addresses are numbers. We analyze
this implementation, by relating low-level attacks against the implementation to contexts in the
high-level programming language, and by establishing full abstraction results.
INTRODUCTION
Several techniques for protection are based on randomization (e.g., [Druschel and
Peterson 1992; Yarvin et al. 1993; Kc et al. 2003; Forrest et al. 1997; Bhatkar et al.
2005; Bhatkar et al. 2003; Barrantes et al. 2005; Berger and Zorn 2006; Novark et al.
2008; Erlingsson 2007; Novark and Berger 2010]). The randomization may concern
the layout of data and code within an address space, data representations, or the
underlying instruction set. In all cases, the randomization introduces articial
diversity that can serve for impeding attacks. In particular, layout randomization
can thwart attacks that rely on knowledge of the location of particular data and
functions (such as system libraries). In addition, randomization can obfuscate
program logic, against reverse engineering.
DISCUSSION OF RESULTS
Layout randomization can be applied in a variety of systems contexts. In some (in
particular, in kernel mode), accesses to unmapped memory addresses may be fatal
violations that result in immediate termination. In others (often in user mode),
erroneous accesses may take place repeatedly without causing execution to abort;
a program that performs an erroneous access may recognize that it has done so.
This distinction leads to two models for what happens when an attacker accesses
an unused address in data memory (rather than an address that houses a private
location). In one model, such accesses are fatal violations; in the other, such accesses
are not fatal and can be detected.
In both cases, our main results concern translations between the high-level lan-
guage with locations and a lower-level language with natural-number addresses. (In
contrast, one could study layout randomization by focusing exclusively on low-level
behavior, as in [Berger and Zorn 2006, Section 6].) In the high-level language, there
is a distinct type of locations loc and, assuming that the expression M has this
type, one can write expressions like !locM and M :=loc M0 for reading from and
storing into a location. In the low-level language, on the other hand, if M has type
nat then one can write !natM and M :=nat M0 for reading from and writing to a
natural-number address, which may be obtained as the result of arbitrary numerical
computations in M.
TECHNICAL PRELIMINARIES
This section presents basic technical material on which both Sections 4 and 5 rely.
It describes both high- and low-level memory models and the common components
of the languages considered in this paper.
3.1 Memory models
We begin with a discussion of our memory models. We need two: an abstract one,
for the high-level language, and a more concrete one, for the low-level language.
For the abstract model we assume a nite set Loc of locations, ranged over by l,
and further assumed to be the disjoint union of two sets, PubLoc and PriLoc, of
public and private locations. Stores, ranged over by s, are maps s : Loc