24-11-2012, 12:25 PM
Lossless Compression of High-volume Numerical Data from Simulations
1Lossless Compression.pdf (Size: 221.26 KB / Downloads: 81)
Abstract
Applications in scientific computing operate with high-volume numerical data and
the occupied space should be reduced. Traditional compression algorithms cannot
provide sufficient compression ratio for such kind of data. We propose a lossless algorithm
of delta-compression (a variant of predictive coding) that packs the higher-order
differences between adjacent data elements. The algorithm takes into account varying
domain (typically, time) steps. The algorithm is simple, it has high performance and
delivers high compression ratio for smoothly changing data. Both lossless and lossy
variants of the algorithm can be used. The algorithm has been successfully applied
to the output from a simulation application that uses a solver of ordinary differential
equations.
Introduction
Applications in scientific computing often operate with large volumes of output and input
data. Despite the huge capacity of the modern disk devices, the space required for data storage
is often larger than the hardware allows. Data transfer over communication networks
is another bottleneck in scientific computing. Text compression tools cannot compress
binary numeric data. Image compression algorithms are not intended for such data. Signal
compression algorithms don’t work directly with numerical data of time series, when time
steps are varying. Also, they are typically designed for measured values, not for simulation
results and therefore not intended for lossless compression. Therefore some new data
compression algorithms must be designed.
Fixed step delta-compression
In this section we consider the simplest variant of delta-compression algorithm. Our algorithm
computes the first (and higher-order) differences using 64-bit integer arithmetic; after
that meaningless 0-s or 1-s are truncated in the result.
Using fixed step extrapolation
The algorithm of differences of ordermcan be formulated differently in terms of extrapolation
of order (m−1). When this reformulation is done, just slightly different8 computations
take place, and these can be seen from different point of view. We introduce the extrapolation
technique here (instead of differences) in order to proceed later to varying step
extrapolation algorithm in Section 4.
Varying step extrapolation algorithm
The previous algorithms assumed that the sequence a can be approximated by polynomials
with fixed steps between extrapolation points. However in practice, simulations use adaptive
ODE solvers and produce state variable values for varying, non-equidistant time steps.
Therefore we should consider smooth functions with values taken with varying steps, and
adapt the compression algorithm for such application data.
Application to simulation results
The compression algorithms were applied to the output from a numerical solver of ordinary
differential equations, which serves as a component in our software for dynamic simulation
of bearing [1]. The program produces some quantities for every time step and writes them
to the output file for analysis and further simulation. Every quantity (position, force etc.)
changes very slowly from one step to another. Extreme accuracy and lossless compression
is necessary, since a relative error of order 10−10 can substantially change the simulation
results.
To chose a particular algorithm and its order the compression routines estimate achievable
compression ratio by subsampling, trying different algorithms and choose the best
one.
Conclusion
A lossless algorithm for floating-point data compression has been developed. It has similarities
to image compression, since it works on bit level. It resembles wavelet compression
since it uses floating-point computations for compression and decompression. The algorithm
works best if the data are values of a function in some points, and this function is
close to a polynomial.
The algorithm uses subtraction of one 64-bit integer representation of floating-point
value from another (Int(aj)−Int('m(tj))). If the difference would be computed between
floating-point representations (aj − 'm(tj)) there would be no win in data storage.
The algorithm is implemented as a C++ class and linked to an industrial-level application.
The measurements show high compression ratio (in comparison with traditional tools)
as well as high speed[5]. In the future we are going to test and measure the algorithm with
the data samples from other applications.