28-07-2012, 10:49 AM
Basics of Compiler Design
Basics of Compiler Design.pdf (Size: 1.62 MB / Downloads: 50)
Introduction
What is a compiler?
In order to reduce the complexity of designing and building computers, nearly all
of these are made to execute relatively simple commands (but do so very quickly).
A program for a computer must be built by combining these very simple commands
into a program in what is called machine language. Since this is a tedious and errorprone
process most programming is, instead, done using a high-level programming
language. This language can be very different from the machine language that the
computer can execute, so some means of bridging the gap is required. This is where
the compiler comes in.
A compiler translates (or compiles) a program written in a high-level programming
language that is suitable for human programmers into the low-level machine
language that is required by computers. During this process, the compiler will also
attempt to spot and report obvious programmer mistakes.
The phases of a compiler
Since writing a compiler is a nontrivial task, it is a good idea to structure the work.
A typical way of doing this is to split the compilation into several phases with
well-defined interfaces. Conceptually, these phases operate in sequence (though in
practice, they are often interleaved), each phase (except the first) taking the output
from the previous phase as its input. It is common to let each phase be handled by a
separate module. Some of these modules are written by hand, while others may be
generated from specifications. Often, some of the modules can be shared between
several compilers.
A common division into phases is described below. In some compilers, the
ordering of phases may differ slightly, some phases may be combined or split into
several phases or some extra phases may be inserted between those mentioned below.
Interpreters
An interpreter is another way of implementing a programming language. Interpretation
shares many aspects with compiling. Lexing, parsing and type-checking are
in an interpreter done just as in a compiler. But instead of generating code from
the syntax tree, the syntax tree is processed directly to evaluate expressions and
execute statements, and so on. An interpreter may need to process the same piece
of the syntax tree (for example, the body of a loop) many times and, hence, interpretation
is typically slower than executing a compiled program. But writing an
interpreter is often simpler than writing a compiler and the interpreter is easier to
move to a different machine (see chapter 13), so for applications where speed is not
of essence, interpreters are often used.
Compilation and interpretation may be combined to implement a programming
language: The compiler may produce intermediate-level code which is then interpreted
rather than compiled to machine code. In some systems, there may even be
parts of a program that are compiled to machine code, some parts that are compiled
to intermediate code, which is interpreted at runtime while other parts may be kept
as a syntax tree and interpreted directly. Each choice is a compromise between
speed and space: Compiled code tends to be bigger than intermediate code, which
tend to be bigger than syntax, but each step of translation improves running speed.
Using an interpreter is also useful during program development.
The structure of this book
The first part of the book describes the methods and tools required to read program
text and convert it into a form suitable for computer manipulation. This process
is made in two stages: A lexical analysis stage that basically divides the input text
into a list of “words”. This is followed by a syntax analysis (or parsing) stage
that analyses the way these words form structures and converts the text into a data
structure that reflects the textual structure. Lexical analysis is covered in chapter 2
and syntactical analysis in chapter 3.
The second part of the book (chapters 4 – 10) covers the middle part and backend
of interpreters and compilers. Chapter 4 covers how definitions and uses of
names (identifiers) are connected through symbol tables. Chapter 5 shows how you
can implement a simple programming language by writing an interpreter and notes
that this gives a considerable overhead that can be reduced by doing more things before
executing the program, which leads to the following chapters about static type
checking (chapter 6) and compilation (chapters 7 – 10. In chapter 7, it is shown
how expressions and statements can be compiled into an intermediate language,
a language that is close to machine language but hides machine-specific details.
In chapter 8, it is discussed how the intermediate language can be converted into
“real” machine code. Doing this well requires that the registers in the processor
are used to store the values of variables, which is achieved by a register allocation
process, as described in chapter 9.