15-05-2014, 10:27 AM
Using C as a Compiler Target Language for Native Code Generation in Persistent Systems
Target Language for Native.pdf (Size: 47.38 KB / Downloads: 237)
Abstract
Persistent programming languages exhibit several requirements
that affect the generation of native code, namely: garbage
collection; arbitrary persistence of code, data and processes;
dynamic binding; and the introduction of new code into a running
system. The problems of garbage collection are not unique to
persistent systems and are well understood: both code and data
may move during a computation if a compacting collector is
employed. However, the problems of garbage collection are
exacerbated in persistent systems which must support garbage
collection of both RAM resident and disk resident data. Some
persistent systems support a single integrated environment in
which the compiled code and data is manipulated in a uniform
manner, necessitating that compiled code be stored in the object
store. Furthermore, some systems assume that the entire state of a
running program is resident in a persistent store; in these systems
it may be necessary to preserve the state of a program at an
arbitrary point in its execution and resume it later. Persistent
systems must support some dynamic binding in order to
accommodate change. Thus code must be capable of binding to
arbitrary data at a variety of times. This introduces the additional
complexity that code must be able to call code contained in the
persistent store produced by another compilation. In this paper
native code generation techniques using C as a target language for
persistent languages are presented.
Introduction
When orthogonal persistence is introduced to a programming
language, several requirements emerge which affect code generation:
a.data in the system may persist,
b.code in the system may persist, and
c.the dynamic state of the system may persist.
Since all data may potentially persist, it must be held in a suitable
form. Typically, a persistent object store will support one or more
object formats onto which all data must be mapped. For example,
objects must be self-describing to support automatic garbage
collection and persistent object management. In particular, it must be
possible to discover the location of all inter-object pointers contained
in an arbitrary object. As a consequence, the code generation
techniques employed must ensure that the objects constructed by the
code conform to the appropriate object formats.
In languages that support first class functions and procedures, a
further consequence of persistence is that these values may also
persist. This implies that executable code must be mapped onto
persistent objects. This requirement would defeat most traditional
code generation techniques since the traditional link phase links
together all the procedural values contained in a single compilation
unit using relative addresses. If all code resides in relocatable
persistent objects then the compiler/linker cannot determine the
relative positions of code segments at run-time. Furthermore,
facilities such as garbage collection and persistent object management
may result in code segments moving during execution.
Persistent systems support potentially long-lived applications whose
functionality may evolve over time. To accommodate this, many
persistent systems provide facilities to dynamically generate new
source code which is compiled and linked into the running system.
This facility may be provided by making the compiler a persistent
procedure [6, 7] .
Choosing A Compiler Target Language
Perhaps the most obvious method of code generation is to generate
native code directly. This has the advantage that the writer of the
code generator has complete control over:
• the mapping of code onto objects,
• linkage to the run-time support, and
• the location of pointers in data structures and registers.
Generating native code directly is also extremely costly since the
compiler produced is architecture-dependent. An alternative to
generating native code directly is to utilise existing code generation
tools. Some advantages of this approach include:
• reuse of existing code generation technology,
• sophisticated optimisers are available, and
• the compilers can abstract over architecture-specific features.
The ability to reuse existing code generation technology is a
significant advantage. For example, even low level tools such as
assemblers include optimisers which relieve the compiler of the
complexities of generating and backpatching instruction sequences.
Higher level tools, such as compilers, incorporate more sophisticated
optimisers which have been the subject of considerable research and
development effort. Thus, this approach is a potentially cost-effective
method of generating high quality code.
Seven Tricks for Compiling Persistent Languages
In this section, seven tricks are described which may be employed to
efficiently solve the problems described in Section 1. They are:
1.the introduction of native code into a running system,
2.the ability to call other native code,
3.linking to persistent data and environment support code,
4.linking to the static environment,
5.reducing memory accesses,
6.the ability to run programs that cope with garbage collection
and snapshot, and
7.reducing memory allocation overhead by allocating frames
lazily.
The Introduction of Native Code Into a Running System
In order to support both integrated programming environments and
run-time reflection, the Napier88 system contains a compiler that is
callable at run-time. Various compiler interfaces exist and are
described elsewhere [5]. All the interfaces take a description of the
source text and environment information as parameters and produce
an executable procedure injected into an infinite union type.
This functionality requires that the code generation technology be
capable of supporting the dynamic generation of native code and its
introduction into the persistent system.
The Ability to Call Other Native Code
Since Napier88 supports first class procedures, a piece of native code
must be able to call arbitrary compiled Napier88 procedures. These
procedures may either be in the static environment of the caller,
extracted from a data structure in the store, or passed as a parameter.
When C is used as a target language, procedure call conventions may
be based on jumps (gotos) or C function calls.
A major reason for using C as an intermediate form is to obtain
access to the considerable optimisation technology already in
existence. This optimisation technology is given more scope when
Napier88 procedures are encoded as C functions with all invocation
performed using C function calls. This presents the C compiler with
independent compilation units over which its optimisers may operate.
However, due to the presence of first class procedures, many global
optimisations such as in-line expansions are not possible.
Reducing Memory Accesses
As described in Section 3.4, in order to support block structure, all
Napier88 variables may be placed in an activation record contained in
a heap object. In the interpreted system, in order to perform a
computation such as “a := a + b”, the PAM pushes the values of a and
b onto the stack, incrementing a stack pointer each time. Then the plus
instruction pops both values, adds them and pushes the result. Finally,
the result is removed from the stack, the stack pointer decremented
and the result written to its destination. Such computation is
expensive because it dynamically maintains stack pointers and
operates on memory rather than in registers. This expense can be
reduced in three complementary ways: the elimination of stack
pointers, the transformation of source expressions into C expressions
and the use of local C variables.
Conclusions
This paper presenta techniques for generating native code for
persistent programming languages. C is used as a compiler target
language resulting in a portable and efficient code generation
technique whose performance approaches that of equivalent C
programs. The full functionality of a strongly typed persistent object
store is freely available without the undesirable aspects of
programming in C. The code generation techniques presented permit:
•the co-existence of interpreted and native code,
•code to be mapped onto relocatable persistent objects,
•linked to the necessary run-time support and other generated
code,
•the use of compacting garbage collectors,