### Precomputation-based Guarding for Dynamic and Leakage Power Reduction

Afshin Abdollahi, Massoud Pedram University of Southern California Farzan Fallah, Indradeep Ghosh Fujitsu Labs of America

The 21<sup>st</sup> International Conference on Computer Design October 13-15, 2003 San Jose, California



# Introduction

- Sources of Power Consumption
  - Switching Power
  - Leakage Currents
  - Short Circuit Currents
- Power Reduction Techniques
  - Clock Gating (for Switching Power)
  - Power Supply Gating (for Leakage Current)











### **Our Approach**

- Combines precomputation and ground gating
- Reduces both switching and leakage power
- Solves the input sharing problem
- Can disable registers as well if input sharing problem does not exist.



# Adders

- Partition the adder into MSP and LSP.
- The goal is to disable the MSP when it is not necessary to operate.
- If the sign extension part of operands exceed the MSP range, the MSP is disabled.

MSPLSPA= 1111,11111101,0011B= 0000,00000011,0110Sum= 0000,00000000,1001



















# Results

- 32-bit functional units
- Process
  - 70nm technology
  - Supply voltage = 0.9V
  - NMOS transistors' threshold voltage = 0.2V
  - PMOS transistors' threshold voltage = -0.22V
  - Sleep transistors' threshold voltage = 0.5V
- Simulation Tools
  - PowerMill to estimate the power in transistor-level
  - SPICE to estimate the delay of circuits
- Test Bench
  - Trace of 1000 vectors corresponding to ALU unit in the data-path of a processor executing a JPEG decoder program



| Results (Adder)          |      |       |              |  |  |
|--------------------------|------|-------|--------------|--|--|
| Method                   | Regs | Adder | Total Saving |  |  |
| 12-bit signal-gating     | 32%  | 25%   | 26%          |  |  |
| 12-bit operand-isolation | 0%   | 21%   | 17%          |  |  |
| 12-bit guarding          | 44%  | 25%   | 38%          |  |  |
| 14-bit signal-gating     | 34%  | 35%   | 27%          |  |  |
| 14-bit operand-isolation | 0%   | 28%   | 15%          |  |  |
| 14-bit guarding          | 50%  | 28%   | 41%          |  |  |
| 16-bit signal-gating     | 35%  | 45%   | 27%          |  |  |
| 16-bit operand-isolation | 0%   | 35%   | 15%          |  |  |
| 16-bit guarding          | 55%  | 52%   | 48%          |  |  |
| 18-bit signal-gating     | 8%   | 49%   | 1%           |  |  |
| 18-bit operand-isolation | 0%   | 36%   | 2%           |  |  |
| 18-bit guarding          | 12%  | 23%   | 3%           |  |  |

# Results (Multiplier)

| Method                                                          | Regs | Multiplier | Total Saving |
|-----------------------------------------------------------------|------|------------|--------------|
| 22_bit_input1 Gating<br>16_bit_input2 Gating                    | 41%  | 58%        | 49%          |
| 22_bit_input1 Guarding<br>16_bit_input2 Guarding                | 43%  | 60%        | 52%          |
| 22_bit_input1 Guarding<br>input2_dynamic                        | 49%  | 67%        | 56%          |
| 22 &16_bit_input1 Hybrid<br>Guarding<br>input2_dynamic Guarding | 51%  | 71%        | 60%          |

|            |                                                                | Dolou             | Aroo             |
|------------|----------------------------------------------------------------|-------------------|------------------|
| Circuit    | Guarding Method                                                | Delay<br>Overhead | Area<br>Overhead |
| Comparator | 10-bit Guarding                                                | 25%               | 30%              |
| Adder      | 18-bit Guarding<br>(reduced switching activity)                | 15%               | 10%              |
| Multiplier | 22&16_bit_input1 hybrid<br>guarding<br>input2_dynamic guarding | 9%                | 6%               |

### Results (FR500 VLIW processor)

- Technology: 0.18um
- Supply voltage: 1.8V
- Used an instruction set simulator to generate a trace.
- Used the precomputation-based guarding method for the ADD/SUB module.
- Used full guarding for other modules to disconnect them from the ground when they were not used to execute any instruction.

### Results (FR500 VLIW processor)

- Power reduction = 81%
- Area overhead = 9%
- Delay overhead = 12%
- Operand isolation:
  - Power reduction = 58%
  - Area overhead = 11%
- Precomputation combined with operand isolation:
  - Power reduction = 61%
  - Area overhead = 14%

# **Conclusions and Future Work**

#### Conclusions

- Combining precomputation and power supply gating
- Reducing switching and leakage power
- Solving the input sharing problem or disabling the registers when the input sharing problem does not exist
- At least 20% higher power saving in compare to existing methods
- Future work
  - Developing hierarchical precomputation architecture
  - Developing an algorithm for performing near optimum dynamic guarding for every circuit