

# Positive Feedback Adiabatic Logic for Power-Gating in Asynchronous Circuits

Rasneen Kombathiyil<sup>1</sup>, Prithviraj A<sup>2</sup>

<sup>1</sup>PG Scholar, <sup>2</sup>Assistant Professor

Abstract - Power gating is used to shut down certain power domains in a chip while leaving others active. Power shutdown can significantly reduce leakage power. In this project presents an asynchronous fine-grain power-gated logic (AFPL). AFPL circuit is consists of efficient charge recovery logic (ECRL) gates and a handshake controller. ECRL gates get power and become active only when performing useful computations and thus them helps to have negligible leakage power dissipation. In the AFPL circuit, with the incorporation of the partial charge reuse (PCR) mechanism, reducing the energy dissipation by reusing the part of the charge on the output nodes of one ECRL gate which is entering the discharge phase to charge the output nodes of another ECRL gate about to evaluate. So, PCR helps to reduce the power consumption. The C\*-element in AFPL-PCR offers that if its outputs are no longer required, an ECRL gate can achieve early discharging without waiting for the next empty token to arrive at this stage. Hence, AFPL-PCR reduces delay than AFPL without PCR. AFPL-PCR also reduces area than AFPL without PCR. In the proposed method, positive feedback adiabatic logic (PFAL) gates are using instead of ECRL gates in order to reduce power dissipation and implementation of carry select adder (CSA) instead of kogge stone adder(KSA) helps to increase the speed.

Keywords - Asynchronous fine-grain power-gated logic (AFPL), Efficient charge recovery logic (ECRL), Partial charge reuse (PCR), Positive feedback adiabatic logic (PFAL), Carry select adder (CSA), Kogge stone adder (KSA).

## 1. INTRODUCTION

According to Moore's law, as chip density increases, power consumption is becoming a major problem for the contemporary systems. There are two types of power: static and dynamic[1]. Dynamic power dissipation is due to the charging and discharging of load capacitance. Static power dissipation is due the leakage currents when no signals are changing their values. The main sources of leakage currents which causes to static power dissipation include subthreshold leakage, gate leakage, gate-induced drain leakage, and junction leakage [2]. Leakage power dissipation is becoming a significant contributor to the total power dissipation, as threshold voltage, channel length, and gate oxide thickness continue to shrink. There many techniques for reducing leakage power dissipation in CMOS circuits. Leakage current reduction techniques consists of transistor stacking [3], reverse body biasing, dual threshold CMOS and power gating. Power gating is one of the most effective techniques for leakage current reduction, among these techniques. Power gating is used to shut down certain power domains in a chip while leaving others active. Power shutdown can significantly reduce leakage power. In power gating techniques, by inserting sleep transistors (power gating transistors) between power supply rails and transistor stacks increasing the effective resistance of leakage paths. In active mode, these power gating transistors are on and decreases resistance between power supplies and ground. In sleep mode, these power gating transistors are off and increases resistance between power supplies and ground. So leakage power dissipation decreases.

# **2. SYSTEM MODEL**

In asynchronous circuits, local handshaking is using for transferring data between neighboring modules. so they are active only when performing useful work. That is why, asynchronous circuits switch only when active. Asynchronous circuits in inactive mode have no dynamic dissipation but have leakage dissipation[4]. So asynchronous circuits are power-gated at the gate level of granularity.

In previous work [5], combinational blocks in the conventional asynchronous four-phase bundled-data pipeline is equipped with both a header and a footer sleep transistor. So the latch controller in a pipeline stage detects valid input data and it absorbs the data in the data latch and turns on the sleep transistors of the associated combinational block. An acknowledge signal is sent back to this stage, when the output data are received by the next pipeline stage, and the latch controller can turn off the sleep transistors of the associated combinations. This scheme has the following disadvantages. That are the hardware overhead is large and each combinational block requires a standalone data latch, a complex latch controller

comprised of six logic gates and two C-elements, two sleep transistors, and an inverter chain for matching delay. And the  $2^{nd}$  one is that only the combinational blocks are powergated, and the other hardware, including data latches, latch controllers, and inverter chains, still suffer leakage dissipation.

So in order to avoid the problems related to clock generation and synchronous clock routing, Asynchronous Adiabatic Logic (AAL) is a new solution, was proposed to combing the benefits of the adiabatic logic circuits with that of asynchronous logic systems [6]. The overall system consists of two main blocks, namely logical block and Control and Regeneration (C&R) block. In AAL system as opposed to the conventional synchronous adiabatic circuits, s instead of driving each adiabatic logic unit with an externally clock phase, each block is controlled and powered using the control signal generated by the C&R block with the help of the logical output of the previous stage. At the same time is the input to the current logical stage. The figure shows that data out signal of the logical block 1 is not only going into logical block 2 as data input, but at the same time is used to generate a control signal for logical block 2 using C&R block 1.

The proposed AAL consists of adiabatic logic gates for the implementation of logic part. It may be any differential adiabatic logic circuit. Here, C&R block is the key to the concept of AAL and type of logic used in AAL effects the architecture of this block mainly.

# **3. PREVIOUS WORK**

In this section, the AFPL architecture is presented. AFPL can be combined with the PCR mechanism. When AFPL incorporates the PCR mechanism, it is denoted by AFPL-PCR, otherwise, it is denoted by AFPL w/o PCR. Fig. below shows the structure of the AFPL pipelines. In AFPL w/o PCR a stage, denoted by Si, consists of an efficient charge recovery logic (ECRL) gate Gi, by which the logic function of the stage implements. Handshake controller HCi in order to handle handshaking with the neighboring stages and provides power to ECRL [7] logic gate Gi. There are many different stages denoted by Si, Si+1....In this AFPL-PCR a pipeline stage, denoted by Si+1, has an additional unit called PCR unit which controls charge reuse between pipeline stages Si (may be 1<sup>st</sup> stage)and Si+2(may be 3<sup>rd</sup> stage).

An asynchronous system is made up of many autonomous modules and when it needs to exchange information it will communicates with its neighboring modules. The asynchronous system has not a global clock for the synchronization between those autonomous modules but it uses local handshake signals, request and acknowledge. The four phase dual-rail protocol is a handshake protocol used in the AFPL pipeline, in which the request signal is encoded into the data signals (i.e., there is no separate request signal) Here, n pairs of wires are required to encode n-bit data. For example a one-bit information d, is encoded with a pair of wires d.t and d. f. For logic1 (d.t, d. f) = (1, 0), for logic 0(d.t, d. f) = (0, 1), for empty token(d.t, d. f) = (0, 0). In the four-phase dual-rail protocol, four actions in the transferring of data from the sender to the receiver. The steps are the following:

- 1) A valid codeword from sender on the communication channel;
- 2) The valid codeword acquires by the reciever from the communication channel, then the receiver asserts the acknowledge signal;
- By issuing an empty codeword(i.e., taking all data wires LOW) the sender responds to indicate that the data on the communication channel is no longer valid; and
- 4) Deasserts the acknowledge signal by the receiver to complete the AFPL pipeline is a sequence of alternating valid tokens and empty tokens. So verified that there is always a valid token between two consecutive empty tokens in the data stream.



Fig.1. AFPL w/o PCR pipeline

In the AFPL-PCR pipeline, the arrival of a valid token at 3<sup>rd</sup> stage forces 1<sup>st</sup> stage Si to discharge and turns on the switch in PCRi+1. Part of the charge on the output nodes of 1<sup>st</sup> gate Gi are reused by 3<sup>rd</sup> gate Gi+2 to charge its output nodes in



order to reduce energy dissipation. In AFPL-PCR by using C\*-element, power supply VDD is cut off from Vpi immediately when acknowledgement arrives, that is why most of the charge flowing through the PCR unit comes from Vpi rather than power supply. In this paper, ECRL was chosen to implement the function blocks of AFPL. Fig. below shows the structure of an ECRL inverter.

By adopting different design styles, the power consumption of the electronic devices can be reduced. An attractive solution for such low power electronic applications is aadiabatic logic style. By using Adiabatic techniques, some of energy stored at load capacitance can be recycled instead of dissipated as heat and energy dissipation in PMOS network can be minimized.



Adiabatic logic is commonly used to reduce the energy loss during the charging and discharging process of circuit operation. Adiabatic logic is also known as "energy recovery" or "charge recovery" logic families. As the name suggests that, the stored energy during charging, instead of dissipating at the towards the ground, it recycles the energy back to the power supply. So thereby reducing the overall power dissipation and hence the power consumption also decreases. The main reasons in the reduction of power dissipation is that in the adiabatic logic uses AC power supply instead of constant DC supply.

The structure of efficient charge recovery logic (ECRL) with different logics are available. Like AND, OR, NAND, NOR and inverter etc.

Hence adiabatic logic families are more better than the conventional CMOS logic for low power application over the wide range of parameter variations.

In existing method, implementing a Kogge Stone adder using ECRL logic gates with and without PCR mechanism. So we can see that power dissipation is decreased in AFPL with PCR mechanism than AFPL without PCR mechanism.

## 4. PROPOSED METHODOLOGY

In proposed method, implementing a carry select adder using ECRL and PFAL logic gates [8] by incorporating with and without PCR mechanism. So delay and power dissipation will decrease than existing method.

The structure of Positive Feedback Adiabatic Logic (PFAL) with different logics are available. Like AND, OR, inverter etc. Here, two n-mos transistors realize the logic functions. These n-mos transistors logic family also generates both positive and negative outputs. The major differences with respect to ECRL are that the core is made by two PMOSFETs and two NMOSFETs, rather than by only two PMOSFETs transistors as in ECRL. The functional blocks are in parallel with the PMOSFETs in PFAL logic. So the equivalent resistance will be smaller when the capacitance needs to be charged. So less energy dissipated than ECRL logic gates. During the recovery phase, the loaded capacitance gives back energy to the power supply and the supplied energy decreases. That is why this partial energy recovery circuit structure so called Positive Feedback Adiabatic Logic (PFAL).

Carry Select Adder (CSLA) is one of the fastest adders used in many data-processing processors to perform fast arithmetic functions. It is used in many computational systems to alleviate the problem of carry propagation delay by independently generating multiple carries and then select a carry to generate the sum. Generally, Ripple Carry Adders (RCA) to generate partial sum and carry by considering carry input 0 or 1, then the final sum and carry are selected by the multiplexers (mux).So by using CSA we can reduce delay than the existing method. Proposed block diagram is shown below:



Fig.3. Proposed block diagram

These are the following steps in the proposed work:

- 1) Split the adders (KS or CSA) into stages depending on the bit length
- 2) Construct each gate in the adder stages using ECRL or PFAL logic.
- 3) Design Handshake Controller using Completion detector and C-element
- 4) Check for present stage input and acknowledgement from the next stage
- 5) Enable the corresponding stage based on HC output
- 6) Continue this process for all stages to get adder output.

So, totally four combinations designed by using with and without PCR mechanism.

1) Kogge Stone (KS) adder with ECRL logic

- 2) Kogge Stone (KS) adder with PFAL logic
- 3) Carry Select (CS) adder with ECRL logic
- 4) Carry Select (CS) adder with PFAL

## 5. SIMULATION/EXPERIMENTAL RESULTS

The AFPL architecture is synthesized in Spartan 2E starter board as the evaluation development board. The family is Spartan 2E, the device used is XC2S600E, the package is FG676 and the speed is -6. The top level source type is HDL, the synthesis tool is XST (VHDL/Verilog), and the simulator is ISE Simulator (VHDL/Verilog). The power analysis is done using XPower. The power analysis of AFPL with and without PCR is obtained as follows:

Here, showing the actual parameter values and HTML power report obtained by XPower for the 1<sup>st</sup> combination for with and without PCR.The actual parameter values of AFPL w/o PCR are given below. The voltage is shown in V, the current in mA and the power in mW. The startup current is 500 mA and the Vccint is 1.8V.The total power is 818.55mW.

Table-1: Actual parameter values of AFPL w/o PCR

|                 | Voltage<br>(V) | Current<br>(mA) | Power<br>(mW) |
|-----------------|----------------|-----------------|---------------|
| Vccint          | 1.8            |                 |               |
| Dynamic         |                | 436.08          | 784.95        |
| Quiescent       |                | 15.00           | 27.00         |
| Vcco33          | 3.3            |                 |               |
| Dynamic         |                | 0.00            | 0.00          |
| Quiescent       |                | 2.00            | 6.60          |
| Total Power     |                |                 | 818.55        |
| Startup Current |                | 500.00          |               |

The HTML Power Report of AFPL w/o PCR is shown in Table 2. The current is given in mA and the power in mW. The current from the inputs is 7mA and the power is 13mW. The Vccint is 1.8V and the total estimated power consumption is 819 mW.

Table-2: HTML power report of AFPL w/o PCR

| Power summary                     | I (mA) | P (mW) |
|-----------------------------------|--------|--------|
| Total estimated power consumption |        | 819    |
| Vccint 1.80V:                     | 451    | 812    |
| Vcco33 3.30V:                     | 2      | 7      |
| Inputs:                           | 7      | 13     |
| Logic:                            | 349    | 629    |
| Outputs:                          |        |        |
| Vcco33                            | 0      | 0      |
| Signals:                          | 80     | 143    |
| Quiescent Vccint 1.80V:           | 15     | 27     |
| Quiescent Vcco33 3.30V:           | 2      | 7      |

In the first combination, ie, Kogge Stone (KS) adder with ECRL logic, power consumption without PCR mechanism is 819 mW and power consumption with PCR mechanism is 711 mW. It is due to the power gating method done by the PCR mechanism. In all other combinations power consumption is reduced by the incorporation of PCR mechanism. That is shown in table below.

Table-3: Power consumption comparison

| ADDER WITH<br>LOGIC         | Power<br>consumption<br>without PCR | Power<br>consumption with<br>PCR |
|-----------------------------|-------------------------------------|----------------------------------|
| KS adder with<br>ECRL logic | 819 mW                              | 711 mW                           |
| KS adder with<br>PFAL logic | 794 mw                              | 525 mW                           |
| CS adder with<br>ECRL logic | 604 mW                              | 136 mW                           |
| CS adder with<br>PFAL logic | 103 mW                              | 76 mW                            |

The C\*-element in AFPL-PCR offers that if its outputs are no longer required, an ECRL gate can achieve early discharging, without waiting for the next empty token to arrive at this stage. For example, using PCR mechanism the 3rd stage is activated once the data arrived in the 1st stage itself. In the method without PCR the 3rd stage is activated based on the input arrival in 3rd stage and acknowledgement from 5th stage. That's why the delay is less in AFPL-PCR, than AFPL without PCR. In all combinations delay is reduced by the incorporation of PCR mechanism. That is shown in table below.



Gate count calculation in xilinx is based on the look up table. Depending on the placement and routing for example, 3 gates can occupy the same LUT or 3 gate can be occupied by 3 different LUTs. That's why there is a variation in area and the gate count is reduced by the incorporation of PCR mechanism. That is shown in table below.

Table-5: Gate count comparison

| ADDER WITH<br>LOGIC         | GATE COUNT<br>WITHOUT PCR | GATE COUNT<br>WITH PCR |
|-----------------------------|---------------------------|------------------------|
| KS adder with<br>ECRL logic | 6597                      | 4600                   |
| KS adder with<br>PFAL logic | 6540                      | 4107                   |
| CS adder with<br>ECRL logic | 5027                      | 4021                   |
| CS adder with<br>PFAL logic | 3039                      | 750                    |

Simulations have been done using Modelsim. Here, implemented a 8 bit length Kogge Stone adder and a 8 bit length carry select adder using ECRL and PFAL logic gates. Output waveform of Kogge Stone adder and CS adder are given below.

| wave - default          | -               |                             |    |
|-------------------------|-----------------|-----------------------------|----|
| le Edit View Add Fo     | ormat Tools Win | ndow                        |    |
| 🗋 🚅 🖬 🍜   👗 🛍           | 🛍 🗅 📿   M 🗄     | 🗄 🖫 🖹 🏂 🍱 🎝 麗 📔 🕇 💷 100     | ns |
| 💽 💁 🗈   📴 3+            | ‴ 0             | <u>.</u>                    |    |
| Messages                | 1               |                             |    |
|                         | 11001100        | 00000011 (1111111 (11001100 |    |
| <b>⊥</b> -� /ecrl_ks/ra | 00110011        | 11111100 0000000 000110011  |    |
| ⊥→ /ecrl_ks/vb          | 11110000        | 0000000 /11110000           |    |
|                         | 00001111        | 11111111 0000011111         |    |
| /ecrl_ks/vc             | 0               |                             |    |
| /ecrl_ks/rc             | 1               |                             |    |
| /ecrl_ks/a              | 1               |                             |    |
| <b>≞</b> –∲ /ecrl_ks/vs | 10111100        | 00000011 00001111 10111100  |    |
|                         | 01000011        | 11111100 11110000 01000011  |    |
| /ecrl_ks/vouts1         | 0               |                             |    |
| /ecrl_ks/routs1         | 1               |                             |    |
| /ecrl_ks/vps1           | 1               |                             |    |
| /ecrl_ks/vouts2         | 0               |                             |    |
| /ecrl_ks/routs2         | 1               |                             |    |

Fig.7. Output waveform of Kogge Stone adder

| Message             | IS       |          |          |          |  |
|---------------------|----------|----------|----------|----------|--|
| 🖃 🔶 /csa_8/va       | 11110000 | 0000000  | 11110000 |          |  |
| 🖃 🔷 /csa_8/vb       | 00011000 | 00011000 |          |          |  |
| 🔷 /csa_8/vc         | 0        |          |          |          |  |
| 🔷 /csa_8/a          | 1        |          |          |          |  |
| · ∎                 | 11001000 | 00011000 | 00001000 | 11001000 |  |
| 🖃 - 🔶 /csa_8/rs     | 00110111 | 11100111 | 00000100 | 00110111 |  |
| 🖅 🔶 /csa_8/vouts1   | 00       | 00       |          |          |  |
|                     | 11       | 11       |          |          |  |
| 😐 - 🔶 /csa_8/vps1   | 11       | 11       | 00       | 11       |  |
|                     | 10       | 10       |          |          |  |
|                     | 01       | 01       |          |          |  |
|                     | 11       | 11       | 00       | 11       |  |
|                     | 11       | 11       |          |          |  |
| 🛨 🔶 /csa_8/routs3   | 00       | 00       |          |          |  |
| 😐 - 🔷 /csa_8/vps3   | 11       | 11       | 00       | 11       |  |
|                     | 00       | 01       |          | 00       |  |
| 😐 - 🔶 /csa_8/routs4 | 11       | 10       |          | 11       |  |
| 🚛 🔶 /csa_8/vps4     | 11       | 11       | 00       | 11       |  |

Fig.8. Output waveform of carry select adder

In this section author need to describe experimental/simulation results with graphs and appropriate tables.

### 6. CONCLUSION

Adiabatic logic families used for low power applications due their lower power consumption than conventional CMOS logic families .Here, the Kogge Stone adder and carry select adder are designed using ECRL logic gates and PFAL logic gates with a new power-gating scheme called PCR mechanism. Implementation of the carry select adder using PFAL logic family with AFPL-PCR mechanism can reduce static power dissipation and increase speed. PFAL gates are more better than ECRL gates due their low equivalent resistance during the calculation of power consumption. CSA is used in this project in order to reduce delay instead of Kogge stone adder which is used in the existing method. So this project is an efficient method for low power consumption and high speed. Simulations have been done using Modelsim. The AFPL architecture is synthesized in Spartan 2E starter board. The power analysis is done using XPower. Hence, from the results it is clear that the total power consumption can be decreased to a great extend using PCR mechanism. So this is an efficient method for low power consumption. Area and delay also reduced by using PCR mechanism.

#### 7. FUTURE SCOPES

As the future work, other types of adders and multipliers can implement using ECRL and PFAL logic gates by incorporating PCR mechanism as a power-gating technique.

#### REFERENCES

- Meng-Chou Chang, Member, IEEE, and Wei-Hsiang Chang systems, (june2013)"Asynchronous fine-grain power-gated logic" Vol. 21, IEEE transactions on very large scale integration (vlsi)
- [2] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, "Leakage current mechanisms and leakage reduction techniques in deepsubmicrometer CMOS circuits," Proc. IEEE, vol. 91, no. 2, pp. 305–327, Feb. 2003
- [3] S. Narendra, S. Borkar, V. De, D. Antoniadis, and A. Chandrakasan, "Scaling of stack effect and its application for leakage reduction," in Proc. Int. Symp.Low Power Electron. Design, pp. 195–200, 2001.
- [4] C. Ortega, J. Tse, and R. Manohar, "Static power reduction techniques for asynchronous circuits," in Proc. IEEE Symp. Asynchronous CircuitsSyst., May 2010.
- [5] T. Lin, K.-S. Chong, B.-H.Gwee, and J. S. Chang, "Finegrained power gating for leakage and short-circuit power reduction by using asynchronous-logic," in Proc. IEEE Int. Symp. Circuits Syst., pp. 3162–3165, May 2009.
- [6] M. Arsalan and M. Shams, "Asynchronous adiabatic logic," in Proc. IEEE Int.Symp. Circuits Syst., pp. 3720–3723, May 2007.
- [7] Y. Moon and D. K. Jeong, "An efficient charge recovery logic circuit" IEEEJ.Solid-State Circuits, vol. 31, no. 4, pp. 514–522, Apr. 1996
- [8] J. Fischer, E. Amirante, A. B. Stoffi, and D. S. Landsiedel, (2004) "Improving the positive feedback adiabatic logic family," in Advances in Radio Science, pp. 221–225, 2004.

#### **AUTHOR'S PROFILE**

**Rasneen Kombathiyil** has received her B.Tech degree in Electronics and Communication Engineering from M.E.S College of Engineering, Kuttippuram in the year 2011. At present she is pursuing M.Tech in VLSI Design at Nehru College of Engineering and Research Centre, Thrissur. Her areas of interest include PCB board design, Communication and VLSI Design.

**Prithviraj A** has received his B.Tech degree in Electronics and Communication Engineering in the year 2009 from Amrita Vishwa Vidya Peetham and M.E. Degree in VLSI Design in the year 2011 from Amrita Vishwa Vidya Peetham. At present he is working as an Assistant Professor in the Department of Electronics and Communication Engineering at Nehru College of Engineering and Research Centre, Thrissur. His areas of interest are VLSI Design and communication systems. ISSN: 2395-2946

K. Roy, S. Mukhopadhyay, and H.

Remya Ramachandran has rece