25-05-2012, 04:37 PM
A Java Virtual Machine Architecture for Very Small Devices
java virtual mechine.pdf (Size: 80 KB / Downloads: 178)
INTRODUCTION
The work described here is a continuation in spirit of the Spotless
project [7], begun in 1998. The goal then was to build a small but
complete Java virtual machine, with the Palm Pilot the target
platform. The resulting artifact turned into the KVM, which
became the basis of the CLDC standard [10].
The main goal of the present effort was once again to build a
small Java virtual machine, but one that is smaller and more
mature (CLDC compliant, with verification and exact garbage
collection). The target platform is the next generation of smart
cards, which have 32-bit processors, but may have no more than 8
KB of RAM, 32 KB of non-volatile memory (NVM, typically
EEPROM), and 160 KB of ROM. Such a VM would obviously
also be suitable for other embedded devices with similar
characteristics.
A secondary goal was to write the system as much as possible in
the Java language, for both portability and ease of debugging. The
system is therefore called Squawk, in homage to the Squeak
Smalltalk system [12].
BACKGROUND
The size of Java classfiles has long been recognized as an issue,
especially for embedded devices. A number of efforts have been
made to reduce the size, both as a transmission format and as an
execution format. Pugh [4] developed techniques for compressing
classfile components for transmission, and achieved sizes ranging
from 17% to 49% of the comparable JAR files. Rayside et al. [5],
in contrast, focused on execution format, specifically reducing
constant pool size and code size. Reductions in JAR file size of
roughly 50% were obtained through optimizations of the constant
pool, and smaller improvements were realized through code
optimizations. Clausen et al. [2] developed a compressed
bytecode scheme using macros, achieving space savings of around
30%, but at a runtime cost of up to 30%. Tip et al. [8] developed
techniques for extracting only the components necessary for a
particular application from a set of classfiles, with the resulting
archives reduced to 37.5% of their original size.
A primary goal of the above techniques was to perturb the Java
execution engine as little as possible. In contrast, the Java Card
paradigm introduces a muscular transformer between classfiles
and interpreter, which enables a different, space-optimized
interpreter.
Efforts have been made to implement Java in the Java language:
The JavaInJava system [6] was an early effort that encountered
performance issues and ran roughly 3 orders of magnitude slower
than a native implementation. The Jalapeno/Jikes [1][15] system
has extensively explored the optimization technology needed to
produce a high-performance system. The Joeq system [16], an
open source effort, is also focusing on performance. Both Jikes
and Joeq are relatively large.
DESIGN
The project’s main goal of building a CLDC-compliant system on
a small device led to a few straightforward consequent goals,
namely minimizing the size of transmitted classfiles, the RAM
needed for classfile loading, the size of loaded classfiles, the
RAM required during execution, and the size of the interpreter
and memory system. In addition, the system had to effectively
deal with NVM.
These goals were in turn realized through a few clear design
choices.
• Standard CLDC classfiles are too large and complex to
process on a device with 8K of RAM. Employing a Java
Card™ technique [11], they are instead preprocessed offdevice
and packaged into a smaller representation, called a
suite, which then can be verified and loaded on-device in 8
KB.
• Certain types of code make verification and garbage
collection substantially more complicated. The classfile
preprocessor, called the translator, identifies these problems
and produces simple, semantically equivalent suite code.
• The standard bytecode set is not as space efficient as it could
be. The translator transforms the bytecodes into a more
compact form.
Object Construction
The standard protocol for constructing a Java object considerably
complicates verification: the result of the new bytecode is an
uninitialized object that must be regarded as a different type. This
in turn entails creating a separate type for each new bytecode
executed in each method. The overhead of maintaining these extra
types is simply too high for Squawk, given the small amount of
RAM available.
Squawk therefore handles object construction differently. Figure 1
shows a simple Java expression. When compiled with javac this
expression becomes the bytecode sequence shown in Figure 2.
This shows the new bytecode being used to create a prototype
object. A copy of this object is made on the stack and the object
constructor is called with this parameter. When the constructor
returns the copy of the uninitialized object (which is now
initialized) is stored into a local variable.
Bytecode Resolution
The standard bytecodes reference class members (fields and
methods) symbolically via a name string rather than with an index
or pointer. These symbolic references must be resolved sooner or
later through some lookup process. In many Java interpreters this
is done as the bytecodes are executed, and a common optimization
is to then patch the bytecode stream with special resolved
bytecodes that are often termed quick or fast bytecodes. This
optimization is not suitable for systems that execute bytecodes in
read-only or slow-to-write memory. Instead, in the Squawk
system the bytecodes are resolved, as much as possible, when they
are written into the memory of the target device during loading.
At this time a field access can be resolved to an absolute offset
within an object, and a method invocation can be resolved to a
offset within a vtable. As a result the Squawk system does not
need the symbolic constant pool found in standard Java systems,
which saves a great deal of space.
Thread scheduling
Thread scheduling is done in the class java.lang.Thread. A
very small native method called from this class causes the virtual
machine to stop running one thread and start running another one.
Thus the thread scheduling code is factored out of the core VM,
which makes it easy to change the scheduling algorithm. Thread
preemption is achieved by decrementing a counter every time the
interpreter executes a branch. When the counter reaches zero the
interpreter automatically invokes the method
Thread.yield(), which allows another thread to execute.