02-01-2013, 09:08 AM
CPPC: Correctable Parity Protected Cache
1CPPC.pdf (Size: 1.45 MB / Downloads: 77)
ABSTRACT
Due to shrinking feature sizes processors are becoming more
vulnerable to soft errors. Write-back caches are particularly
vulnerable since they hold dirty data that do not exist in other
memory levels. While conventional error correcting codes can
protect write-back caches, it has been shown that they are
expensive in terms of area and power. This paper proposes a new
reliable write-back cache called Correctable Parity Protected
Cache (CPPC) which adds error correction capability to a parityprotected
cache. For this purpose, CPPC augments a write-back
parity-protected cache with two registers: the first register stores
the XOR of all data written to the cache and the second register
stores the XOR of all dirty data that are removed from the cache.
CPPC relies on parity to detect a fault and then on the two XOR
registers to correct faults. By a novel combination of byte shifting
and parity interleaving CPPC corrects both single and spatial
multi-bit faults to provide a high degree of reliability. We
compare CPPC with one-dimensional parity, SECDED (Single
Error Correction Double Error Detection) and two-dimensional
parity-protected caches. Our simulation results show that CPPC
provides a high level of reliability while its overheads are less
than the overheads of SECDED and two-dimensional parity.
INTRODUCTION
Soft errors are a growing concern for microprocessor reliability.
Since about 60% of the on-chip area is occupied by caches in
today’s microprocessors, caches have a considerable impact on
microprocessor reliability. Most caches today are write-back
caches. They contain dirty data which have no back-up copies in
other memory levels, and therefore are highly vulnerable to soft
errors. To protect against soft errors in write-back caches, two
protection codes are common.
RELATED WORK
Protecting caches with two-dimensional parity was proposed in
[12]. In this approach the horizontal parity (along a row) detects
errors and the vertical parity (along a column) corrects them. In
this scheme, spatial multi-bit errors are corrected by interleaving
data array rows among vertical parity rows, so that adjacent data
array rows are protected by different vertical parity rows. Since
the vertical parity of the cache changes on every Store, the new
vertical parity must be computed by reading the old data from the
cache and XORing it with the old vertical parity. This operation is
called “read-before-write” and is done on every Store. Twodimensional
parity (or row-diagonal parity) is also used in RAID
6 (Redundant Array of Inexpensive Disks). However, unlike in
RAIDs, the two-dimensional parity in caches requires that the
read-before-write operation is executed for every cache miss in
addition to all Stores. Furthermore, in the case of a miss, an entire
cache line must be read. The energy cost of these read-beforewrite
operations is therefore high.
To mitigate the area overhead of error protection codes in a last
level cache, the codes could be saved in main memory, as was
proposed in [23]. This scheme facilitates the implementation of
very strong protection codes at the cost of more memory accesses.
In ICR (In Cache Replication) [24] cache lines that have not been
accessed for a long time are allocated to replicas of dirty blocks.
ICR essentially trades off reduced effective cache size for better
reliability. Thus the miss rate of the cache may be higher or,
alternatively, dirty blocks may be left unprotected. In [25] a
variant of ICR in which a small cache keeps a redundant copy of
dirty blocks is proposed, but this scheme is not area-efficient for
large caches. Several papers [2, 15] advocate early write-backs in
order to increase the reliability of write-back caches. These
schemes reduce the total number of dirty blocks and therefore the
vulnerability of L1 caches to soft errors. Their energy
consumption is high, especially when data locality is low and the
number of write-backs is large. In [20] a small cache for saving
check bits or replicas is proposed. In this scheme, a large amount
of the dirty data remains unprotected if data locality is low.
3. BASIC CPPC
In this section, we first describe the architecture of the basic
CPPC. As mentioned earlier, a CPPC can be a cache at any level
of the cache hierarchy. However, to keep the description simple,
we first describe the necessary hardware modifications to an L1
cache and the operation of a basic L1 CPPC. Then, later, in
Section 3.5, we describe the modifications necessary for an L2
CPPC.
Normal L1 CPPC operation
During normal CPPC operation (no fault is detected) every Load
checks the parity bit(s) associated with the word. On every Store,
the new data is XORed into R1 in parallel with writing into the
cache. When a Store updates an already dirty word, it is necessary
to first read the old word before storing the new word. To
implement this read-before-write, the cache operation on a Store
is slightly different from a regular cache. On every Store, the perword
dirty bit is first checked in conjunction with the tag
comparison. If the dirty bit is set the old data is read and XORed
with R2. If the dirty bit is not set, the Store proceeds normally.
Note that even in a regular cache every Store must first check the
tags to confirm a hit before writing data. On a byte Store, the new
byte is XORed with the corresponding byte of R1 based on the
address bits and the old byte is XORed with the corresponding
byte of R2, if the word in the cache is dirty. Figure 2 shows the
write operation in a basic CPPC.
Illustrative example
Figure 4 illustrates why the basic CPPC with interleaved parity
fails to correct vertical spatial MBEs. This example starts with the
same assumptions and initial conditions as the example of Figure
3. Consider the case that a single particle strike flips two bits that
are vertically adjacent after the processor has executed two Stores.
In this example, the particle strike flips the MSB of each of the
two words, Word0 and Word1. When the processor reads Word0
the basic CPPC fault recovery mechanism XORs Word1 with R1
and R2. However, since Word1’s MSB is also faulty the XOR
operation produces a 1, instead of the correct value, 0. Hence, the
equation for error correction shown in the figure fails and the
basic CPPC cannot correct the vertical 2-bit fault.
To solve this problem, in Figure 5, we rotate Word1 by one byte
to the left before XORing it with R1. Note that the data stored in
the cache is NOT rotated. Therefore, bit j of R1 is derived from
the XOR of bit j of Word0 and bit (j+8) mod 16 of Word1. With
this approach a vertical fault in bit 0 of the two adjacent words
can be corrected. The fault recovery equations are modified as
shown in the figure to account for the byte rotation. As the
equations of Figure 5 show faulty bits can be corrected using byte
shifting.
Spatial multi-bit error coverage
With registers R1 and R2 in the CPPC of Figure 6, all spatial
multi-bit faults are locatable by the fault locator, except for some
special cases. For instance, in the case of an 8*8 fault, all parity
bits of all 8 words detect errors and all bits of R3 are set. Hence,
there is no way to figure out which 8 bits of different words are
faulty. Another situation is when a spatial multi-bit fault does not
affect adjacent rows and only affects rows distant by 4 rows from
each other. For example, a fault which flips bits from byte 0 of a
class 0 word and from byte 0 of a class 4 word cannot be located.
This is because the content of R3 is identical in both cases and the
fault locator cannot figure out whether the fault has occurred in
byte 0 or in byte 4 of the two faulty words. These errors remain
DUEs and the probability of such events is very small.
CONCLUSIONS AND FUTURE WORK
CPPC is a new reliable write-back cache which increases the
granularity of error correction from words or blocks to larger
granularities up to the entire cache in order to save resources such
as area and energy. In CPPC, the granularity of protection can be
adapted to the desirable amount of reliability. Moreover, CPPC
decouples the protection of clean data from dirty data in order to
assign fewer resources to the clean data in the cache.
To tolerate spatial multi-bit faults CPPC does not require physical
bit interleaving which greatly increases cache energy
consumption. The byte-shifting scheme proposed in this paper is
added to the basic CPPC to tolerate spatial multi-bit errors.
Instead of the byte-shifting technique, CPPCs can also be built
with more pairs of registers to correct all spatial multi-bit errors
with negligible overheads.
CPPC is quite efficient in L1 caches and its efficiency is even
better in L2 caches. Based on our experimental results, CPPC
adds only 7% energy overhead and practically no other overheads
to an L2 parity-protected cache while it provides both single-bit
and multi-bit error correction for dirty data.