22-12-2012, 06:00 PM
C to Java: Converting Pointers into References
C to Java.pdf (Size: 156.05 KB / Downloads: 22)
Abstract
We consider the problem of converting C pointers to the less
exible concept of references.
Our main application is converting scientic applications from C to Java. We provide a general
method to model essentially all features of pointers using references. The model is easily implemented
in Java. We give optimizations that map key facilities like arrays and structures onto
the obvious Java equivalents, arrays and objects. These improvements make the conversion \optimal"
for all typed pointers. For untyped pointers, we can still fall back on the general model,
hence providing general automatic conversion from C to Java code, whose eciency improves
with the quality of the C code.
Introduction
The C language is well known for its overly
exible pointers. A pointer may represent the address
of an individual value, an array, a structure, or a position in an array or structure. Pointers can be
cast into other pointer types or integers, split into pieces and later rejoined, and can take part in
arithmetic operations. While this gives the programmer immense
exibility, most of these abilities
are considered unnecessary, and when used are dangerous and lead to code that is dicult to debug.
For this reason, many modern languages such as Java support a much more restricted model
called references. The programmer has limited access to the references themselves; typically, one
can only assign to a reference andtest whether two references point to the same object. Other
uses of a reference are implicitly about the object it refers to; usually, the programmer does not
explicitly dereference or take the address of an object. This leads to signicantly safer and simpler
code.
Since C is popular, there is a lot of code written in it. Many want to extend such programs
and exploit features of more modern programming languages. For example, one may want to port
an application to Java for demonstration on the World Wide Web. Since C and Java are similar in
syntax, the main diculty in converting between the two is the conversion between major concepts,
in particular pointers and references.
Related Work
Much work has been done on converting Java to C for optimization purposes; see for example
Harissa [3], j2c [4], JCC [5], and Toba [6]. However, little has been done on the reverse direction.
Of particular relevance are c2j [7] and c2j++ [8]. Both are tools for translating C++ to Java.
c2j++ is based on c2j; the main dierence is that c2j++ is written in Java, whereas c2j is written
in C++.
Unfortunately, the tools are limited. They essentially ignore the problem of pointers, which is the
main diculty of C-to-Java conversion. For example, they simply remove all pointer dereferences,
that is, remove asterisks and convert \->" to \.". This is often correct when dealing with objects
and pointers to objects, but (for example) never works for arrays. For scientic applications, arrays
are perhaps the most important and common data structure, so their conversion is important.
Indeed, we pay much attention to their ecient representation in Java in this paper.
Another related project is a recent Fortran-to-Java converter f2j [9]. It is of interest because it
uses an intermediate representation of C, that is, it converts Fortran code into C, and converts the
result to Java. However, this was done purely for convenience (since a Fortran-to-C converter was
already available), and does not support general C to Java conversion; in particular, pointers are
completely avoided. In comparison to our work, it also implements facilities for call-by-reference
and multi-dimensional arrays, but the methods they use do not work in general (for both Fortran
and C).
A couple of minor results in this paper have been partially achieved before. Section 4.1 discusses
possible methods for passing basic values such as integers to functions by reference in Java. We
know of one system supporting this, EPP [10]. EPP is a framework for Java preprocessors and one
example, called ref [11], adds an operator \&" for passing values by reference. It implements this
using one-element arrays, but in such a way that does not allow taking general addresses of basic
values, unlike our methods in Section 4.1.
Block Model
This section denes what we call the block model of pointers. It allows us to abstract essentially
all features of C pointers into a simple theoretical framework. The few limitations are discussed in
Section 3.4.
At any time in the model, the memory consists of a collection of independent \top-level" blocks.
Blocks can be dynamically created. Each block consists of a basic C value (character, integer,
or
oating-point value of some size), a pointer, or a structure containing several sub-blocks. A
particular kind of structure of interest is the array, where each sub-block has the same type.
Pointers can either be the special value null or refer to a (not necessarily top-level) block.
Let us introduce some notation for this model. We write p to denote the block pointed to by
p. Note that p is also a block, since pointers are stored in blocks, and we associate blocks with
their values. Hence, we can construct pointers to pointers, and so on. We write b = (b1; : : : ; bn)
when b is a structure block with sub-blocks b1; : : : ; bn.
Stack
The block model only models heaps. What about variables allocated on the stack? One possibility
is to model each stack element as a heap allocation. In fact, while this adds extra overhead, it is
the most ecient general method. This is because, while Java has a stack, it does not support
taking the address of anything on the stack [13]. Objects cannot be allocated on the stack (unlike
C++ for example); only references to objects can be allocated on the stack. One cannot pass basic
values by reference or otherwise nd their addresses. Hence, all objects whose addresses are used at
some point, which is all that our model is supposed to cover, must indeed be on the heap in Java.
That is, if we encounter a stack object in the C source whose address is used, we must convert it
into a heap allocation if we plan on converting to Java.
Implementation in Java
The block model is easy to implement in Java. We make an abstract class Block, which includes
an entry parent of type StructB. StructB is a nal class extending Block, consisting of an array
of children. There are several other nal classes extending Block. For example, IntB represents a
block consisting of a single integer val, and PointerB represents a block pointing to some other
block target. One can then dene how to compare two pointers, increment a pointer, and in
general do arithmetic on pointers, as we have described. An example of the conversion process is
given in Figure 1.
Extensions
The block model illustrates that we can automatically convert essentially any C code into Java.
It supports almost all \safe" C code, modelling basic C types, pointers, and structures. Pointers
are untyped, so they can be cast in the code as desired (including support for the void * type).
However, the block model does not support union types or casting between integers and pointers.
In general, such features represent extremely unwieldy type mangling that is fortunately uncommon
in \well-written" code. Hence, the lack of such support is usually of limited importance.
However, there are several possible modications to the block model that support more C
features. For example, the most common form of union types, where only one entry is valid at
any time, can be represented by an unknown block with one entry. Before accessing it, we check
whether the entry is of the correct type, and if not replace it with a newly created block.
Call-By-Reference
Passing values by reference is perhaps the most fundamental use of C pointers. In Java, values are
passed by reference if and only if they are objects or arrays. The rst question we must ask, then,
is how to pass basic variables (that is, variables of basic type such as integer) by reference in Java.
Let us call a variable addressable if its address is taken at some point in the C source. Otherwise,
we do not need to worry about the variable.
The most obvious method is to replace all addressable basic variables with objects that simply
contain an entry of that type (Figure 3). When such a variable is declared, it must be created using
the new operator. If we wish to use or modify the value of the variable, we use or modify the entry
in the corresponding object. The address of the variable is simply the corresponding object.
A similar method is to replace addressable basic variables with one-element arrays (Figure 4).
Again, their declarations must now include creations with the new operator. Accessing the variable
corresponds to accessing the rst element of the array, and the address of the variable is the array
itself.