

# An Extensive Survey on Efficient Coding Schemes for Fault-Tolerant Parallel Filters

# Minku Kumar<sup>1</sup> & Prof. Ashish Raghuwanshi<sup>2</sup>

<sup>1</sup>M-Tech Research Scholar, <sup>2</sup>Research Guide Department of Electronics & Communication Engineering,

IES, Bhopal

Abstract: -The trends in computing processor technology are driving toward multicores through miniaturization that can pack many processors in a given chip area. This miniaturization has led to a significant increase in the o occurrence of soft errors, where a single bit flip impacts the output of the computing system. This in-turn affects the performance of the application running on the system.CMOS technology scaling is bringing new challenges to the designers in the form of new failure modes. The challenges include long term reliability failures and particle strike induced random failures. Studies have shown that increasingly, the largest contributor to the device reliability failures will be soft errors. Due to reliability concerns, the adoption of soft error mitigation techniques is on the increase. As the soft error mitigation techniques are increasingly adopted, the area and performance overhead incurred in their implementation also becomes pertinent. This research work addresses the problem of providing low cost soft error mitigation.

Index Terms—coding, parallel filters, soft errors.

#### I. INTRODUCTION

Technology scaling has enabled us to keep pace with the power, performance, area and functionality requirements of electronic circuits. Along with the advantages, it has also given challenges due to increased leakage current, reliability failures, etc [2]. Reliability failures include systematic failures due to the aging effects of silicon structures caused by Negative Bias Temperature Instability (NBTI) and Hot Carrier Injection (HCI) [3] and random failures due to atmospheric particle strikes, called as soft errors [4]. The contribution of particle strike induced failures to the overall device fail rate is more than ten times than that due to a hard reliability fail [5]. This highlights the importance of the requirement of soft error mitigation in safety critical systems.

Initially soft errors were a concern only for safety critical applications. But with the scaling of technology, soft errors are becoming pertinent even for electronic devices in consumer market space. Due to the prohibitive cost associated with the design, manufacturing and other collaterals required for integrated circuits, very often devicesdesigned with strong emphasis in one market segment will find use in another, e.g.products reused across catalog (consumer) and automotive markets. Hence the reliability requirements pose the dual challenges of meeting the strict reliability goals required for one market and at the same time adhering to the affordable costs driving the other market. This makes it increasingly important to reduce the overall cost of implementation of methodologies adopted and incorporated into circuits for soft error mitigation. This implies less area overhead, less impact on performance and ease of implementation. In addition, from a reliability perspective there are also requirements regarding the error detection latency, time required to roll back to a known good state etc.

Since their inception, control systems have been an enabling technology. Control systems were introduced during the industrial revolution with devices like the James Watt fly ball governor, [2]. Over the past 40 years, the developments in analog and digital electronics have resulted in dramatic increases in the computational power of microcomputers and microcontrollers. These developments provided for the implementation of advanced control techniques. These advanced control techniques enabled the successful development of high performance applications such as:

• Guidance and control systems for aerospace vehicles such as commercial aircraft, guided missiles, advanced fighter aircraft, launch vehicles and satellites. These control systems provide stability and tracking in the presence of large environmental and system uncertainties.

• Control systems in the manufacturing industries from automotive to integrated circuits, which are associated with computer-controlled machines, provide the precise positioning and assembly required for high-quality, highyield fabrication of components and products.

• Industrial process control systems, particularly in the hydrocarbon and chemical processing industries, maintain high product quality. Productquality is maintained by monitoring thousands of sensors signals and making corresponding adjustments to hundreds of valves, heaters, pumps and other actuators.



• Control of communication systems such as the telephone system, cell phones, and the Internet are especially pervasive. These control systems regulate the signal power levels in transmitters and repeaters, manage packet buffers in network routing equipment and provide adaptive noise cancelation to respond to varying transmission line characteristic. Control systems have reached a high level of theoretical development and there exists a myriad of applications. However, the development of new sensors and actuatorsfor old and new applications continues. Therefore, the demand for new theoreticalconcepts and approaches, to handle increasingly complex applications remains high.

## Challenges of Technology Scaling

Continued evolution of technology in the semiconductor domain has lead to smaller area, higher operating frequencies and lower voltage levels. This has helped us in integrating more and more complex functions into a single System-on-Chip (SoC). In addition to these benefits, the evolution of technology provided an increased set of design challenges required to address the permanent failures and transient failures which came up with technology scaling. Permanent failures can be classified as either extrinsic or intrinsic. Extrinsic faults caused by manufacturing defects result in early failure of devices called as infant mortality. Intrinsic faults arise due to degradation phenomena which result in the wear-out of silicon chips.

The "bathtub curve" shown in Figure 1.1 depicts the life time of a device. During the initial part of the device operation, extrinsic faults due to defects induced during manufacturing process lead to high failure rate. Burn-in process is used to eliminate this. The next phase is the flat portion of the curve which indicates the useful lifetime of the device. During this stage, the failures are due to radiation induced transient faults. Finally near the end of a chip's lifetime, wear-out mechanisms cause an increase in the failure rate.



Figure 1.1The bathtub curve

The dashed line indicates the variation in failure rates with technology scaling. With technology scaling, infant mortality is becoming more prevalent. This makes it mandatory for devices in new technology nodes to have burn-in test step as a part of the production test flow to screen extrinsic reliability fails. Also, radiation induced failures are on the rise. Effectively, the operational lifespan of the device is reduced. Permanent faults and time induced faults must be detected before shipping the device. In contrast, radiation induced transient fault must be taken care in the field by appropriate error detection and correction circuitry.

## Wear-out Failure Modes in Future Technologies

Operating life failures in semiconductor devices are caused due to Electromigration, Hot Carrier Injection (HCI) and Negative Bias Temperature Instability (NBTI). This can affect the transistors or electrical wires in the device. These types of faults first give rise to intermittent delay faults and later result in permanent failures.

#### Soft Error Trends

The initial part of the research on soft errors has mostly concentrated on soft errors of memories and later on other sequential state holding elements. The soft error contribution of combinational logic has been overlooked and not much of research has focussed on this front.

The SER contribution of different logic elements to the overall FIT rate of the circuit. Figure 1.2 shows the observations from this study. It is observed that the contribution of combinational logic to the overall FITrate is 11%. It is increasing and can no longer be ignored. Shivakumaret. al. [12] also did a detailed study of the SER trends in memories, sequential elements and combinational logic.



Figure 1.2 Soft error contributions of memory and logic

## II. LITERATURE REVIEW

Z. Gao, P. Reviriego, Z. Xu, X. Su, J. Wang and J. A. Maestro, [1] As the complexity of communications and signal processing systems increases, so does the number of blocks or elements that they have. In many cases, some of those elements operate in parallel, performing the same processing on different signals. A typical example of those elements is digital filters. The increase in complexity also

poses reliability challenges and creates the need for faulttolerant implementations. A scheme based on error correction coding has been recently proposed to protect parallel filters. In that scheme, each filter is treated as a bit, and redundant filters that act as parity check bits are introduced to detect and correct errors. In this brief, the idea of applying coding techniques to protect parallel filters is addressed in a more general way. In particular, it is shown that the fact that filter inputs and outputs are not bits but numbers enables a more efficient protection. This reduces the protection overhead and makes the number of redundant filters independent of the number of parallel filters. The proposed scheme is first described and then illustrated with two case studies. Finally, both the effectiveness in protecting against errors and the cost are field-programmable evaluated for а gate array implementation.

M. Nicolaidis, [2] Innanometric technologies, circuits are increasingly sensitive to various kinds of perturbations. Soft errors, a concern for space applications in the past, became a reliability issue at ground level. Alpha particles and atmospheric neutrons induce single-event upsets (SEU), affecting memory cells, latches, and flip-flops, and transients (SET), initiated single-event in the combinational logic and captured by the latches and flipflops associated to the outputs of this logic. To face this challenge, a designer must dispose a variety of soft error mitigation schemes adapted to various circuit structures, design architectures, and design constraints. In this paper, authors describe various SEU and SET mitigation schemes that could help the designer meet her or his goals.

A. L. N. Reddy and P. Banerjee,[3] The increasing demands for high-performance signal processing along with the availability of inexpensive high-performance processors have results in numerous proposals for specialpurpose array processors for signal processing applications. A functional-level concurrent error-detection scheme is presented for such VLSI signal processing architectures as those proposed for the FFT and QR factorization. Some basic properties involved in such computations are used to check the correctness of the computed output values. This fault-detection scheme is shown to be applicable to a class of problems rather than a particular problem, unlike the earlier algorithm-based error-detection techniques. The effects of roundoff/truncation errors to finite-precision due arithmetic are evaluated. It is shown that the error coverage is high with large word sizes.

S. Pontarelli, G. C. Cardarilli, M. Re and A. Salsano,[4] In this paper, the design of a finite impulse response (FIR) filter with fault tolerant capabilities based on the residue number system is analyzed. Differently from other approaches that use RNS, the filter implementation is fault tolerant not only with respect to a fault inside the RNS moduli, but also in the reverse converter. An architecture allowing fault masking in the overall RNS FIR filter is presented. It avoids the use of a trivial triple modular redundancy (TMR) to protect the blocks that performs the final stages of the RNS based FIR computation.

Byonghyo Shim and N. R. Shanbhag, [5] In this paper, authors present energy-efficient soft error-tolerant techniques for digital signal processing (DSP) systems. The proposed technique, referred to as algorithmic soft error-tolerance (ASET), employs low-complexity estimators of a main DSP block to achieve reliable operation in the presence of soft errors. Three distinct ASET techniques - spatial, temporal and spatiotemporalare presented. For frequency selective finite-impulse response (FIR) filtering, it is shown that the proposed techniques provide robustness in the presence of soft error rates of up to P/sub er/=10/sup -2/ and P/sub er/=10/sup -3/ in a single-event upset scenario. The power dissipation of the proposed techniques ranges from 1.1 X to 1.7 X (spatial ASET) and 1.05 X to 1.17 X (spatio-temporal and temporal ASET) when the desired signal-to-noise ratio SNR/sub des/=25 dB. In comparison, the power dissipation of the commonly employed triple modular redundancy technique is 2.9 X.

Z. Gao, W. Yang, X. Chen, M. Zhao and J. Wang,[6] Relative to the Triple Modular Redundancy (TMR) scheme, the arithmetic residue codes based fault-tolerant DSP design consumes much less resources. However, the price for the low resource consumption is the fault missing problem. The basic tradeoff is that, smaller modulus used for the fault checking consumes fewer resources, but the fault missing rate is higher. The relationship between the value of modulus and the fault missing rate is analyzed theoretically in this paper for fault-tolerant FIR filter design, and the results are verified by FPGA implemented fault injections.

# III. PROBLEM IDENTIFICATION

Fault-tolerant parallel filters had been presented in the previous research work. The proposed scheme exploits the linearity of filters to analysis an error correction mechanism. In particular, two redundant filters whose inputs are linear combinations of the original filter inputs are used to detect and locate the errors. The coding of those linear combinations was formulated as a general problem to then show how it can efficiently be implemented. The practical implementation was illustrated with two case studies that were evaluated for an FPGA implementation and compared with a previously technique. That technique relies on the use of ECCs such that each filter is treated as a bit in the ECC.



## **IV. CONCLUSION**

In modern VLSI systems are a major reliability concern. These upsets originate from two primary sources: cosmic ray particles occurring in the space environment and alpha particles emitted from the radioactive decay of uranium and thorium impurities located within the chip itself such as the silicon die, interconnects, and ceramic packaging. Soft errors due to SEUs have been a known problem affecting semiconductor memories for quite some time.

#### REFERENCES

[1] Z. Gao, P. Reviriego, Z. Xu, X. Su, J. Wang and J. A. Maestro, "Efficient Coding Schemes for Fault-Tolerant Parallel Filters," in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 62, no. 7, pp. 666-670, July 2015.

[2] M. Nicolaidis, "Design for soft error mitigation," in IEEE Transactions on Device and Materials Reliability, vol. 5, no. 3, pp. 405-418, Sept. 2005.

[3] A. L. N. Reddy and P. Banerjee, "Algorithm-based fault detection for signal processing applications," in IEEE Transactions on Computers, vol. 39, no. 10, pp. 1304-1308, Oct 1990.

[4] S. Pontarelli, G. C. Cardarilli, M. Re and A. Salsano, "Totally Fault Tolerant RNS Based FIR Filters," 2008 14th IEEE International On-Line Testing Symposium, Rhodes, 2008, pp. 192-194.

[5] Byonghyo Shim and N. R. Shanbhag, "Energy-efficient soft error-tolerant digital signal processing," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 14, no. 4, pp. 336-348, April 2006.

[6] Z. Gao, W. Yang, X. Chen, M. Zhao and J. Wang, "Fault missing rate analysis of the arithmetic residue codes based fault-tolerant FIR filter design," 2012 IEEE 18th International On-Line Testing Symposium (IOLTS), Sitges, 2012, pp. 130-133.

[7] P. P. Vaidyanathan, Multirate Systems and Filter Banks, Englewood Cliffs, N.J., USA: Prentice Hall, 1993.

[8] A. Sibille, C. Oestges and A. Zanella, MIMO: From Theory to Implementation, New York, NY, USA: Academic, 2010.

[9] N. Kanekawa, E. H. Ibe, T. Suga and Y. Uematsu, Dependability in Electronic Systems: Mitigation of Hardware Failures, Soft Errors, and Electr.

[10] C. L. Chen and M. Y. Hsiao, "Error-correcting codes for semiconductor memory applications: A state-of-the-artreview," IBM J. Res. Develop., vol. 28, no. 2, pp. 124–134, Mar. 1984.

[11] P. Reviriego, C. J. Bleakley, and J. A. Maestro, "Structural DMR: A technique for implementation of soft-error-tolerant FIR filters," IEEETrans. Circuits Syst. II: Exp. Briefs, vol. 58, no. 8, pp. 512–516, Aug. 2011.

[12] P. Reviriego, S. Pontarelli, C. Bleakley and J. A. Maestro, "Area efficient concurrent error detection and correction for parallel filters," IET Electron.Lett., vol. 48, no 20, pp. 1258– 1260, Sep. 2012.

[13] Z. Gao et al., "Fault tolerant parallel filters based on error correction codes," IEEE Trans. Very Large ScaleIntegr. Syst., vol. 23, no. 2, pp. 384–387, Feb. 2015.

[14] R. W. Hamming, "Error correcting and error detecting codes," Bell Sys.Tech. J., vol. 29, pp. 147–160, Apr. 1950.

[15] A. Chatterjee, and M. A. d'Abreu, "The design of faulttolerant linear digital state variable systems: Theory and techniques," IEEE Trans. Comput., vol. 42, no. 7, pp. 794–808, Jul. 1993.

[16] C. N. Hadjicostis, "Coding Approaches to Fault Tolerance in Dynamic Systems," Ph.D. dissertation, MIT Press, Cambridge, MA, USA, 1999.