03-09-2016, 09:56 AM
1452236292-ClockDomainAwareTestVLSIDAT151.rtf (Size: 461.63 KB / Downloads: 4)
Abstract — this paper proposes an integration method between DFT and ATPG to improve the pattern compression by pulsing interactive clocks (PIC) simultaneously. The proposed algorithm can accurately mask the unreliable cross clock domain transitions for any clock skews. In addition, it identifies the required flops to be inserted hold paths, and combined with ATPG to reduce the pattern count by up to 39% without compromising the test quality.
The modern IC design often partitions circuit into multiple asynchronous clock domains to manage the design complexity while simplifying the timing closure and the performance optimization. During functional mode, such asynchronous domains are operated independently and cross domain synchronizers are inserted when needed to transfer data or signals across the clock domains. Such clocking scheme must be handled properly by ATPG during structure test to avoid capturing any unpredictable transitions cross asynchronous clock domains and result in simulation mismatches. On the other hand, pulsing one clock per pattern will create too many patterns to fit in tester. There are two strategies for ATPG to utilize multiple clocks and reduce the pattern count. The first strategy is to concatenate different clocks in different capture cycles, referring as multiple clock compression (MCC) scheme. MCC requires a sequential ATPG if such clock sequence is targeted directly which can suffer from longer run time and/or more ATPG aborted faults. In [3], the authors show that with efficient analysis, most faults can be targeted combinationally and concatenated to form MCC patterns.
The second strategy is to pulse multiple clocks at the same cycle. A linear complexity structure analysis (e.g. [3]) is usually conducted for this strategy to analyze the interaction between each clock pair. One common ATPG approach, referred as pulsing compatible clocks (PCC), only allow to pulse multiple clocks without any interactions to each other’s, so no risk of creating unpredictable transitions across any clock domains. PCC scheme prohibits two clocks to be pulsed simultaneously even just a single cross domain flop exists between the clock pair. The other ATPG approach, referred as pulsing interactive clocks (PIC) scheme, allows interactive clocks to be pulsed at the same cycle. Test volume reduction using PIC during ATPG was proposed in [1], which uses a 2-phase ATPG approach to gain the pattern compression benefit of PIC without comprising the test coverage. ATPG is allowed to pulse interactive clocks during phase-1 to minimize the pattern count and only pulses compatible clocks to pick up the remaining faults in phase-2. For some designs, such PIC scheme does not obtain pattern compression benefit due to the masking of unpredictable cross domain transitions. During phase-1 ATPG, one variable, is computed to measure the ratio of cross domain flops and restrict the simultaneous pulses an interactive clock pair if exceeds a pre-defined threshold. A second variable, , is introduced to determine the portion of the faults that are masked due to unpredictable cross domain transition and phase-1 is terminated when is higher than a pre-defined threshold, with phase-2 ATPG to pick up remaining faults using PCC scheme. There are two main challenges for this approach. The first challenge is the precise masking of the cross domain transitions when interactive clocks are pulsed simultaneously. Since the modern design may contain clocks with different frequencies, the ATPG may not assume any bounded clock skew or clock event order among different clock domains. In addition, any potential glitches on the cross domain paths should also be masked to prevent simulation mismatches. On the other hand, over-masking when cross domain paths are stable should be prevented as it will leads to either test quality lost or extra pattern overhead.
The second challenge is the selection of best thresholds for and to obtain the optimal pattern reduction. Because of the second challenge, PIC method may not always obtain better result than PCC scheme, where simultaneously interactive clock pulses are avoided. In this paper, we present a methodology to overcome these challenges so that better pattern compression can be obtained consistently. The proposed method analyzes the cross domain transitions using ATPG to identify the cross domain source (CDS) flops that contributes to the majority of the fault masking and adding extra DFT logic to hold the values for those flops. The solution eliminates the selection of and as the pattern critical cross domain paths are stable after the DFT modification. The experimental result indicates that by properly inserting DFT logic to constraint some of the CDS flops, it can obtain most of the pattern compression benefit. The other benefit of the proposed method is to identify the realistic lower bound of the pattern count for each clock domains and percentage of the pattern count reduction that can be achieved by PIC scheme as well as the hardware overhead. This information is useful for DFT engineers or designers to determine if PIC scheme can be applied before the proposed DFT modification.
The paper is organized as follows. In Section II, we will briefly describe the multiple clock domain analysis as well as the benefit and challenge of pulsing multiple clock domains during ATPG and the related techniques used by the paper. The proposed algorithm and analysis are explained in Section III. The experimental results are shown in Section IV followed by conclusion in Section V.
II. PRELIMINARIES
This section briefly introduces the previous published works and techniques related to the paper. First, the multiple clock domain analysis and the estimation of pattern count lower bound when applying PIC scheme during ATPG are described. Second, the hardware solution to handle interactive clock domains used by the paper is summarized. Finally, the masking of cross clock domain transitions based on a per-pattern false path simulation method is explained.
A. Clock Domain Analysis
First, the terminology used by the paper is given. A clock domain N(cki) is set of pins and gates that are launched and captured by the same clock source cki. Cross clock domain N(cki, ckj) is set of pins and gates that are launched by clock cki, and captured by ckj, where i≠j. The set of cross domain source (CDS) flops, denoted as FCDS(cki, ckj), is a set of flops in N(cki, ckj) that are launched by cki. Similarly, The set of cross domain destination (CDD) flops, denoted as FCDD(cki, ckj), is a set of flops in N(cki, ckj) that are captured by ckj. For simplicity, FCDS and FCDD are referred as the list of CDS and CDD flops in the design. A clock pair (cki, ckj) is compatible if FCDS(cki,
ckj) and FCDS(ckj, cki) are empty (which also implies that FCDD(cki, ckj) and FCDD(ckj, cki) are also empty). Otherwise, cki and ckj are interactive. Given a set of clocks in a design, the clock domains and compatible clock pairs can be derived by a structure trace of the clock domains in the linear time complexity to the design flat gate count (e.g. [3]).
One example circuit, D1, is used to illustrate a real design with multiple clock domains and the potential pattern count reduction when multiple clocks are allowed to be pulsed simultaneously. The design contains 11 clock domains, ck1 to ck11, with relevant properties of the clock domain shown in Table 1. The second column (|F(cki)|) records the number of faults in the clock domains. The third (|FCDS|) and fourth column (|FCDD|) shows the number of CDS and CDD flops in the clock domains. We perform ATPG by assuming all clock domains are synchronous (i.e. skew balanced) so allowed any clocks to be pulsed simultaneously and 2,446 patterns are created, which is the pattern count lower bound that we like to achieve. These patterns are grouped by the capture clock pulse and the pattern count of each group is depicted in the fifth column (|P(cki)|) with the percentage to the total patterns shown in the last column (%|P|). Each row summarizes the data of each individual clock domain, with the last row (Total) showing the summary of the entire circuit. Note that each pattern is allow to pulse multiple clocks to achieve best compression as needed by ATPG decision. For example, there are many patterns (> 90%) pulsing ck5 and ck8 together. Since some of the clock domains are overlapping each other, so the total design |F(cki)| is not the summation of each row (ck1 to ck11). Similar situations are for |FCDS|,
|FCDD|, |P(cki)| and %|P|.
Although these test patterns cannot be applied directly to tester due to clock skew not considered, it is an accurate estimation of the required patterns for each clock domain when no masking on the cross domain paths. It can be seen that three clock domains, ck5, ck8 and ck9 are the main contributors to the pattern count for design D1.
Table 1 Clock Domain Properties of case D1
Clock |F(cki)| |FCDS| |FCDD| |P(cki)| % |P|
ck1 5K 23 3 67 2.7 %
ck2 3K 54 45 71 2.9 %
ck3 120K 34 46 60 2.5 %
ck4 3K 46 16 70 2.9 %
ck5 2,577K 2,154 20,211 2,220 90.8 %
ck6 9K 35 101 65 2.7 %
ck7 7K 32 29 70 2.9 %
ck8 405K 3,495 1,717 2,298 93.9 %
ck9 1,642K 5,309 13,433 1,622 66.3 %
ck10 6K 55 19 169 6.9 %
ck11 124K 67 2,245 267 10.9 %
Total 4,378K 8,125 41,480 2,446 100 %
The clock domain analysis result can be represented by a data transfer graph as shown in Fig. 1 for better visualization, where each circle represents one clock domain and one bidirectional line represents the pair of the clocks that are interactive. The number on the line indicates the number of cross domain flops (summation of CDS and CDD flop count) and the number in each circle indicates the number of test patterns to detect all faults on the corresponding clock domain. For example, clock domain ck5 contains 2,220 patterns and is interactive with clock domain ck3, ck4, ck8 and ck9, with 18, 226, 984, and 21,081 cross domain flops respectively.