01-10-2012, 12:08 PM
Comprehensive Path-sensitive Data-flow Analysis
Comprehensive Path-sensitive.pdf (Size: 4.35 MB / Downloads: 29)
Abstract
Data-flow analysis is an integral part of any aggressive optimizing compiler.
We propose a framework for improving the precision of data-flow analysis in
the presence of complex control-flow. We initially perform data-flow analysis
to determine those control-flow merges which cause the loss in data-flow
analysis precision. The control-flow graph of the program is then restructured
such that performing data-flow analysis on the resulting restructured
graph gives more precise results. The proposed framework is both simple, involving
the familiar notion of product automata, and also general, since it is
applicable to any forward or backward data-flow analysis. Apart from proving
that our restructuring process is correct, we also show that restructuring
is effective in that it necessarily leads to more optimization opportunities.
Furthermore, the framework handles the trade-off between the increase in
data-flow precision and the code size increase inherent in the restructuring.
We show that determining an optimal restructuring is NP-hard, and propose
and evaluate a greedy heuristic.
Introduction
It is becoming increasingly difficult to get performance improvement using
compiler optimizations, as epitomized by Proebsting’s Law [18]. But developers
still want that extra 5-15% improvement in the running times of their
applications, and compiler optimizations are a safer alternative to manual
optimizations carried out by developers which might introduce errors [19].
Data-flow analysis [1] is an integral part of any aggressive optimizing
compiler. Information gathered using data-flow analysis is used in code optimizations
such as constant propagation, dead-code elimination, common
sub-expression elimination, to name a few. Data-flow analysis uses a finite
abstraction of the program states and involves computing a fixed-point solution
for a set of equations obtained from the control-flow graph of the program.
The results of this analysis are used to guide compiler optimizations.
Thus, the analysis should be safe in order for the resulting optimizations to
be safe.
Data-flow Analysis
Data-flow analysis [1] is a static analysis technique used to glean information
about the program as a whole and it’s possible behaviors. It forms the core of
any aggressive compiler optimization framework. It is also used extensively
in program understanding and verification tools.
Any data-flow anlysis should be safe i.e. it should be conservative and
over-approximate the runtime behaviour of the program. This ensures that
the code transformations and optimization carried out using the analysis
results do not change the semantic meaning of the program. A typical overapproximation
which is carried out is that all control-flow paths in the CFG
are assumed to be possible, even if some are infeasible. An analysis which
does not consider a control-flow path which could be taken at runtime is
deemed unsafe.
Constant Propagation
In the Constant Propagation problem we are interested in knowing whether
a variable x always has a constant value at a program point. The use of the
variable at that point can then be replaced by that constant value. Dead-code
elimination, common sub-expression elimination, redundancy elimination are
some of the other analysis which benefit from precise constant propagation
analysis.
Specifically, in the Constant Propagation data-flow analysis framework C,
a variable x can have any one of the three data-flow facts associated with it:
(1) >, which means that nothing has been asserted about x, (2) ci, which
means that x has a constant value, (3) ?, which means that x has been
determined to not have a constant value.
L can be thought of as a vector of data-flow values, one component for
each variable in the program. Given ` 2 L and a variable x, we will use
the notation `[x] to denote the data-flow value associated with x in `. For
example, S(u)[x] represents the data-flow value of variable x at node u 2 V
given a solution S to the Constant Propagation problem C.