Massoud Pedram

Stephen and Etta Varra Professor

Department of EE-Systems

University of Southern California


Back to the research page

1999-2013 Projects

 

 

USC SPORT: System Power Optimization and Regulation Technologies

Project URL: SPORT Lab

We investigate power estimation and low power design of CMOS VLSI circuits and systems all different abstraction levels. Our emphasis is on developing mathematically rigorous analysis and optimization algorithms and power-aware design methodologies for solving various problems of practical interest and import. Our most recent work has focused on energy-efficient enterprise computing, reliability-power efficiency tradeoffs in VLSI circuits and systesm, design of hybrid energy storage systems, dynamic power/thermal management in chip multiprocessors, core-level voltage and frequency scaling, low power displays, ASIC design with power gating and multiple voltage islands, and current source based modeling of power and timing in VLSI circuits. More details about various ongoing projects are included below.

 

Architecture Development and Design Optimization of Hybrid Electrical Energy Storage Systems

There are examples of actual deployment of grid-scale EES systems to mitigate the gap between the supply and demand. In addition, most stand-alone renewable energy sources, such as solar energy, wind power, and hydropower require an EES system. However, current EES systems are mainly homogeneous, that is, they consist of a single type of EES element, and therefore, suffer from a fundamental shortcoming that will plague every homogeneous EES: key metrics (normalized with respect to capacity) of any homogeneous EES cannot be better than those of its individual storage elements. Consequently, a homogeneous approach is not viable for any system where none of the existing types of EES elements can fulfill all the required performance metrics - such as power density, energy density, cost per unit capacity, weight per unit capacity, round-trip efficiency, cycle life, and environmental effects. This limitation is preventing the adoption of a wide range of socially and economically useful technologies, such as wide adoption of grid-scale EES and electric vehicles (EVs), and causing significant inefficiencies for many others. Hence, elimination of this limitation of homogeneous EES systems is the primary motivation for our research.

Our approach for improving performance of EES systems is to exploit different types of EES elements, where each type has its unique strengths and weaknesses, to design hybrid EES system architecture and control policies that dramatically improve the key performance characteristics of the storage system. This approach will exploit fundamental properties that provide a heterogeneous energy storage system (HEES) with the potential to achieve a combination of performance metrics that are superior to that for any of its individual EES components. In fact, in some cases, it is possible for a HEES system to attain values of individual metrics that are close to their respective best values across its constituent EES elements. For example, it is possible that a HEES system can achieve the power density of its constituent EES component that has the highest power density (which is likely to have the highest cost) and, at the same time, achieve a cost close to that of its cheapest component (which is likely to have low density). Simply speaking, we pursue HEES since it holds the promise of providing us with the best of all worlds. Such dramatic improvements can be provided only by a HEES system that is well-designed and well-controlled. Hence, development of design and control techniques for HEES is our goal.

Tutorial given at the 2011 International Symposium on Quality Electronic Design, Santa Clara, CA -- Hybrid Electrical Energy Storage Systems

As of today, no single type of electrical energy storage (EES) element fulfills high energy density, high power delivery capacity, low cost per unit of storage, long cycle life, low leakage, and so on, at the same time. Following a review of conventional EES, we introduce a HEES (hybrid EES) system comprising heterogeneous EES elements based on the concepts of computer memory hierarchy. We introduce HEES design considerations aiming at the optimal charge management for various cost metrics

 

Design, Optimization, and Mapping of Quantum Algorithms to Quantum Circuit Fabrics

Quantum information processing has captivated atomic and optical physicists as well as theoretical computer scientists by promising a model of computation that can improve the complexity class of several challenging problems. To be able to do efficient quantum computation, one needs to have an efficient set of computer-aided design tools in addition to the ability of working with favorable complexity class and controlling quantum mechanical systems with a high fidelity and long coherence times. This is comparable with the classical domain where a Turing machine, a high clock speed and no error in switching were not adequate to design fast modern computers. Quantum circuit and layout design with algorithmic techniques and CAD tools are the focus of our research. We conduct research that spans the areas of computer programming, data structure and algorithms, and optimization while maintaining a strong relevance to quantum computing. For quantum circuit design, our research results in a systematic synthesis framework with favorable results for some families of functions, including modular exponentiation [6], quantum adders and multiplexers [1] and complex Toffoli gates [2]. In quantum layout design, we proposed several techniques to design quantum fabrics that use either MOVE [7] or SWAP [3,5] operation to change the location of quantum information or approximate communication overhead [4].

For more information, see the following papers.

  • Afshin Abdollahi, Mehdi Saeedi, and Massoud Pedram, "Reversible Logic Synthesis by Quantum Rotation Gates," Quantum Information and Computation , Vol. 13, No. 9-10, pp. 0771-0792, 2013 (arXiv:1302.5382).

  • Mehdi Saeedi and Massoud Pedram, "Linear-Depth Quantum Circuits for n-qubit Toffoli gates with no Ancilla" ( arXiv:1303.3557 ), 2013.

  • Mehdi Saeedi, Alireza Shafaei, and Massoud Pedram, "Constant-Factor Optimization of Quantum Adders on 2D Quantum Architectures," to appear in 5th Conference on Reversible Computation (RC), 2013 (arXiv:1304.0432).

  • Mohammad Javad Dousti and Massoud Pedram, "LEQA: Latency Estimation for a Quantum Algorithm Mapped to a Quantum Circuit Fabric," to appear in Proc. of the 50th Design Automation Conf. (DAC), Jun. 2013.

  • Alireza Shafaei, Mehdi Saeedi, and Massoud Pedram, "Optimization of Quantum Circuits for Interaction Distance in Linear Nearest Neighbor Architectures," to appear in Proc. of the 50th Design Automation Conf. (DAC), Jun. 2013.

  • Alireza Shafaei, Mehdi Saeedi, and Massoud Pedram, "Reversible Logic Synthesis of k-Input, m-Output Lookup Tables," Design Automation and Test in Europe (DATE), Mar. 2013.

  • Mohammad Javad Dousti and Massoud Pedram, "Minimizing the Latency of Quantum Circuits during Mapping to the Ion-Trap Circuit Fabric," Design Automation and Test in Europe (DATE), Mar. 2012.

  •  

    Designing Reliable and Power-Efficient VLSI Circuits and Systems

    Digital information management is the key enabler for the unparalleled rise in productivity and efficiency gains experienced by the world economies. Computing and information processing systems are important elements of the world's digital infrastructure by providing ever-present and ever-increasing general purpose and data-driven processing and storage capabilities for both wired and mobile users. As such, they are also significant drivers of economic growth and social change. However, continued expansion of computing and information processing systems is now hindered by their unsustainable and rising power needs, with associated electrical energy costs and peak power draw requirements. Moreover governments, people, and corporations are becoming increasingly concerned about the environmental impact of these systems i.e., their carbon footprint. Separately from all this, with the increasing levels of variability in the characteristics of nanoscale CMOS devices and on-chip interconnects and continued uncertainty in the operating conditions of VLSI circuits, achieving power efficiency and high performance in computing and information processing systems under process, voltage, and temperature variations as well as interconnect wear-out and device aging has become a daunting, yet vital, task.

    Keynote speech given at the 2011 International Symp. on Physical Design , Santa Barbara, CA -- Robust Design of Power-Efficient VLSI Circuits

    It is against the backdrop of rising power demands and energy costs as well as increased device- and circuit-level variability and aging effects that I present a number of best practices and methods for improving the power-performance efficiency of VLSI circuits and systems. The reviewed techniques range from dynamic power management to design of power-aware circuits, and from power/clock gating to leakage power minimization. A key issue to be addressed is how to deal with process and environment-induced variability of circuit parameters through statistical modeling and robust optimization and how to manage uncertainty about the workload and input data characteristics through observations and closed feedback loop control.

     

    Green Computing: Reducing Energy Cost and Carbon Footprint of Information Processing Systems

    Our research aims to develop technical approaches for improving energy efficiency in the enterprise computing systems and data centers ranging from server-level power/thermal management to energy balancing and HVAC control in the data center to application software with builtin power tuning levers. This is a critically important topic with many different beneficiaries and players and excellent opportunities for research and development.

    Keynote speech given at the 2010 International Workshop on IT and Future Society, Jeju Island, South Korea -- Energy Efficient Enterprise Computing and Green Datacenters

    Datacenters provide the supporting infrastructure for a wide range of economic activities based on digital information. As such, they are extremely important drivers of economic growth. They are also at the center of societal changes enabling new media for cyber-social interactions. However, the continued growth of datacenters is now hindered by their unsustainable and rising energy needs. Apart from datacenter energy consumption and associated costs, corporations and governments are also concerned about the environmental impact of datacenters, in terms of their CO2 footprint. In my talk I will describe a number of techniques for improving the energy efficiency of enterprise computing platforms and datacenters ranging from task scheduling and server consolidation to combined power and cooling optimizations and adaptive control algorithms built on a variety of mathematical optimization frameworks.

    Lecture given at the 2009 SIGDA Design Automation Summer School -- Energy-Efficient Computing

    The increasing demand for higher processing power and storage capacity, along with the shift to high-density computing, is driving the energy expenses of data centers through the roof. A data center is a facility used to house computer systems and associated components, such as telecommunications and storage systems. Data centers sit at the center of the ICT ecosystem. Indeed, by the end of 2009, energy costs will emerge as the second-highest operating cost (behind labor) in 70% of data center facilities worldwide. According to an Environmental Protection Agency report, data centers in the US alone consumed about 61 billion kilowatt-hours in 2006 for a total electricity cost of about $4.5 billion. If current trends continue, this demand would double by 2012. In an energy-constrained world, this level of consumption is unsustainable and comes at increasingly unacceptable social and environmental costs. Data center energy efficiency has thus become a public policy concern and it is imperative that data centers implement efficient methods to minimize their energy use

     

    Stochastic Approaches for Dynamic Thermal Management in High Performance Microprocessor Chips

    Sponsor: National Science Foundation - Computer Systems Research

    Project Summary: Peak power dissipation and the resulting temperature rise have become the dominant limiting factors to processor performance and a significant component of its design cost. Expensive packaging and heat removal solutions are needed to achieve acceptable substrate and interconnect temperatures in high-performance microprocessors. Current thermal solutions are designed to limit the peak processor power dissipation to ensure its reliable operation under worst-case scenarios. However, the peak power and ensuing peak temperature are hardly ever observed. Dynamic thermal management (DTM) has been proposed as a class of micro-architectural solutions and software strategies to achieve the highest processor performance under a peak temperature limit. When the chip approaches its thermal limit, a DTM controller initiates hardware reconfiguration, slow-down, or shutdown to lower the chip temperature. Possible response mechanisms include micro-architectural adaptations e.g., fetch toggling, register file resizing, and issue width reduction, and/or on-the-fly performance adjustment e.g., dynamic voltage and frequency scaling and functional unit shut-down. The proposed research aims to develop a new DTM solution that takes a global, predictive approach based on constructing and utilizing a continuous-time Markovian decision process model of the microprocessor chip and the application programs. The offline algorithms developed by this framework are provably optimal whereas the online versions of these algorithms are easily deployable and highly flexible. The project thus produces temperature-aware policies and techniques for ensuring that the microprocessor chips operate within the allowed temperature zone while having maximum possible performance yet not being over-designed.

    A Stochastic Local Hot Spot Alerting Technique -- In an ASPDAC-08 conference paper, we addressed the questions of how and when to identify and issue a hot spot alert in a microprocessor. These are important questions since temperature reports by thermal sensors may be erroneous, noisy, or arrive too late to enable effective application of thermal management mechanisms to avoid chip failure. More precisely, we presented a stochastic technique for identifying and reporting local hot spots under probabilistic conditions induced by uncertainty in the chip junction temperature and the system power state. In particular, we introduced a stochastic framework for estimating the chip temperature and the power state of the system based on a combination of Kalman Filtering (KF) and Markovian Decision Process (MDP) model. Experimental results demonstrated the effectiveness of the framework and show that the proposed technique alerts about thermal threats accurately and in a timely fashion in spite of noisy or sometimes erroneous readings by the temperature sensor.

    Continuous Frequency Adjustment Technique Based on Dynamic Workload Prediction -- In a VLSI Design-08 conference paper, we presented a technique for continuous frequency adjustment (CFA) which enables one to adjust the frequency values of various functional blocks in the system at very low granularity so as to minimize energy while meeting a performance constraint. A key feature of the proposed technique is that the workload characteristics for functional blocks are effectively captured at runtime to generate a frequency value that is continuously adjusted, thereby eliminating the delay and energy penalties incurred by transitions between power-saving modes. The workload prediction is accomplished by solving an initial value problem (IVP). Applying CFA to a real-time system in 65nm CMOS technology, we demonstrate the effectiveness of the proposed technique by reporting 13.6% energy saving under a performance constraint.

    A Unified Framework for System-level Design: Modeling and Performance Optimization of Scalable Networking System -- In an ISQED-07 conference paper, we presented a new unified modeling framework, called the extended queuing Petri net (EQPN), which combines extended stochastic Petri net and G/M/1 queuing models, to realize the design of reliable systems during the design time, while improving the accuracy and robustness of power and temperature optimization for high-speed scalable networking systems. The EQPN model is employed to represent the performance behaviors and to minimize power consumption of the system under performance constraints through mathematical programming formulations. Being able to model the system with the EQPN would enable the users to accomplish the design of reliable and optimized system at the beginning of design cycle. The proposed system model was compared with existing stochastic models with real simulation data.

    Minimizing Power Dissipation during Write Operation to Register Files -- In an ISLPED-07 conference paper, we introduced a power reduction mechanism for the write operation in register files (RegFiles), which adds a conditional charge-sharing structure to the pair of complementary bit-lines in each column of the RegFile. Because the read and write ports for the RegFile are separately implemented, it is possible to avoid pre-charging the bit-line pair for consecutive writes. More precisely, when writing same values to some cells in the same column of the RegFile, it is possible to eliminate energy consumption due to precharging of the bit-line pair. At the same time, when writing opposite values to some cells in the same column of the RegFile, it is possible to reduce energy consumed in charging the bit-line pair thanks to charge-sharing. Motivated by these observations, we modified the bit-line structure of the write ports in the RegFile removing the per-cycle bit-line pre-charging and employing conditional data dependent charge-sharing. Experimental results on a set of SPEC2000INT / MediaBench benchmarks showed an average of 61.5% power savings with 5.1% area overhead and 16.2% increase in write access delay. Lower power dissipation also resulted in lower substrate temperature in the RegFile.

    Active Bank Switching for Temperature Control of the Register File in a Microprocessor -- In a GLS-VLSI-07 paper, we described an effective thermal management scheme, called active bank switching, for temperature control in the register file of a microprocessor. The idea is to divide the physical register file into two equal-sized banks, and to alternate between the two banks when allocating new registers to the instruction operands. Experimental results show that this periodic active bank switching scheme achieves 3.4℃ of steady-state temperature reduction, with a mere 0.75% average performance penalty.

    Dynamic Thermal Management for MPEG-2 Decoding In an ISLPED-06 paper, we presented an effective dynamic thermal management (DTM) scheme for MPEG-2 decoding by allowing some degree of spatiotemporal quality degradation. Given a target MPEG-2 decoding time, we dynamically select either an intra-frame spatial degradation or an inter-frame temporal degradation strategy in order to make sure that the microprocessor chip will continue to stay in a thermally safe state of operation, albeit with certain amount of image/video quality loss. For our experiments, we used the MPEG-2 decoder program of MediaBench and modify/combine Wattch and HotSpot for the power and thermal simulations and measurements, respectively. Our experimental results demonstrated that we can achieve thermally safe state with spatial quality degradation of 0.12 RMSE and with frame drop rate of 12.5% on average.

    Stochastic Dynamic Thermal Management: A Markovian Decision-based Approach -- In an ICCD-06 paper, we introduced a stochastic DTM technique in high-performance VLSI system with especial attention to the uncertainty in temperature observation. More specifically, we presented a stochastic thermal management framework to improve the accuracy of decision making in DTM, which performs dynamic voltage and frequency scaling to minimize total power dissipation and on-chip temperature. Multi-objective optimization with the aid of a mathematical programming solver was used to reduce operating temperature. Experimental results with a 32-bit embedded RISC processor demonstrated the effectiveness of the technique and show that the proposed algorithm ensures thermal safety under performance constraints.

     

    System-Wide Dynamic Voltage Scaling and Power Management in Battery-Powered Embedded Systems

    Sponsor: National Science Foundation - Computer Systems Research

    Project Summary: One of the key problems confronting computer system designers is the management and conservation of energy sources. This challenge is evident in a number of ways. The goal may be to extend the battery lifetime in a computer system comprising of a processor and a number of memory modules, I/O cores, and bridges. This is especially important in light of the fact that power consumption in a typical portable electronic system is increasing rapidly whereas the gravimetric energy density of its battery source is improving at a much slower pace. Other goals may be to limit the cooling requirements of a computer system or to reduce the financial burden of operating a large computing facility. The objective of this research is to develop system-wide power optimization algorithms and techniques that eliminate waste or overhead and allow energy-efficient use of the various memory and I/O devices while meeting an overall performance requirement. More precisely, this project tackles two related problems: dynamic voltage and frequency scaling targeting the minimization of the total system energy dissipation and global power management in a system comprising of modules that are potentially managed by their own local power management policies, yet must closely interact with one another in order to yield maximum system-wide energy efficiency. The broader impacts of this project include the development of energy-aware computer systems as the key for cost-effective realization of a large number of high-performance applications running on battery-powered portable platforms and the education and training of young researchers and engineers to be able to address complex and intertwined energy efficiency/performance challenges that arise in the context of designing next-generation information technology products and services.

    Flow-Through-Queue based Power Management for Gigabit Ethernet Controller -- Computer networking is beginning to support multi-gigabit data transfer rates. In an ASPDAC-07 paper we presented an energy-efficient packet interface architecture and a power management technique for gigabit Ethernet controllers, where low-latency and high-bandwidth are achieved to meet the pressing demands of extremely high frame-rate data. More specifically, we presented a predictive-flow-queue (PFQ) based packet interface architecture to adjust the operating frequencies of various functional blocks in the system at a fine granularity so as to minimize the total system energy dissipation while meeting the performance constraints. A key feature of the proposed architecture is that runtime workload prediction of the network traffic is implemented so as to generate an operating frequency value that is continually adjusted, thereby eliminating the delay and energy penalties incurred by transitions between power-saving modes. Furthermore, a modeling approach based on Markov processes and queuing models is employed, which allow one to apply mathematical programming formulations for energy optimization. Experimental results with a designed 65nm gigabit Ethernet controller show that the proposed energy-efficient architecture and power management technique can achieve system-wide energy savings under tighter performance constraints.

    Dynamic Voltage and Frequency Management Based on Variable Update Intervals or Frequency Setting -- In an ICCAD-06 paper, we developed an efficient adaptive method to perform dynamic voltage and frequency management (DVFM) for minimizing the energy consumption of microprocessor chips. Instead of using a fixed update interval, our DVFM system makes use of adaptive update intervals for optimal frequency and voltage scheduling. The optimization enables the system to rapidly track the workload changes so as to meet soft real-time deadlines. The method, which is based on introducing the concept of an effective deadline, utilizes the correlation between consecutive values of the workload. Since in real situations the frequency and voltage update rates are dynamically set based on variable update interval lengths, voltage fluctuations on the power network are also minimized. The technique, which may be implemented by simple hardware and is completely transparent from the application, leads to power savings of up to 60% for highly correlated workloads compared to DVFM systems based on fixed update intervals.

    Power-Aware Scheduling and Voltage Setting for Tasks Running on a Hard Real-Time System -- In an ASPDAC-06 paper, we presented a solution to the problem of minimizing energy consumption of a computer system performing periodic hard real-time tasks with precedence constraints. In the proposed approach, dynamic power management and voltage scaling techniques are combined to reduce the energy consumption of the CPU and devices. The optimization problem is initially formulated as an integer programming problem. Next, a three-phase heuristic solution, which integrates power management, task scheduling and task voltage assignment, is provided. Experimental results show that the proposed approach outperforms existing methods by an average of 18% in terms of the system-wide energy savings.

    Hierarchical Power Management with Application to Scheduling -- In an ISLPED-05 paper, we presented a hierarchical power management (HPM) architecture which aims to facilitate power-awareness in an energy-managed computer (EMC) system with multiple self-power-managed components. The proposed architecture divides the PM function into two layers: system-level and component-level. Although the system-level PM has detailed information about the global state of the EMC and its various computational and memory resources, it cannot directly control the power management policies of the constituent components, which are typically designed and manufactured by different IC vendors. In particular, the system-level PM resorts to adaptive service request flow regulation and online application scheduling to force the component-level PM's to function in such a way that would minimize the total system energy dissipation while meeting an overall eerformance target. Preliminary experimental results show that HPM achieves a 25% reduction in the total system energy compared to the "best" component-level PM policies.

    Dynamic Voltage and Frequency Scaling for Energy-Efficient System Design -- This talk, which was given at NSTU, Taiwan in 2005, summarizes the results of our research in the area of dynamic voltage and frequency scaling (DVFS). More precisely, the first part of this talk describes an intra-process DVFS technique targeted toward non real-time applications running on an embedded system platform. The key idea is to make use of runtime information about the external memory access statistics in order to perform CPU voltage and frequency scaling with the goal of minimizing the energy consumption while translucently controlling the performance penalty. The proposed DVFS technique relies on dynamically-constructed regression models that allow the CPU to calculate the expected workload and slack time for the next time slot, and thus, adjust its voltage and frequency in order to save energy while meeting soft timing constraints. This is in turn achieved by estimating and exploiting the ratio of the total off-chip access time to the total on-chip computation time. The proposed technique has been implemented on an XScale-based embedded system platform and actual energy savings have been calculated by current measurements in hardware. The second part of this talk describes a DVFS technique that minimizes the total system energy consumption for performing a task while satisfying a given execution time constraint. We first show that in order to guarantee minimum energy for task execution by using DVFS it is essential to divide the system power into fixed, idle and active power components. Next, we present a new DVFS technique, which considers not only active power, but also idle and fixed power components of the system. This is in sharp contrast to previous DVFS techniques, which only consider the active power component. The fixed plus idle components of the system power are measured by monitoring the system power when it is idle. The active component of the system power is estimated at run time by a technique known as workload decomposition whereby the workload of a task is decomposed into on-chip and off-chip based on statistics reported by a performance monitoring unit (PMU). We have implemented the proposed DVFS technique on the BitsyX platform; an Intel PXA255-based platform manufactured by ADS Inc., and performed detailed energy measurements.

     

    Hardware/Software Support and Algorithms for Dynamic Backlight Scaling in TFT LCDs

    Sponsor: National Science Foundation - Computing Processes and Artifacts

    Project Summary: Display components have become a key focus of efforts for maximization of the battery lifetime in a wide range of portable, display-equipped, microelectronic systems and products. A particularly effective technique in reducing the power consumption of all kinds of displays is the dynamic backlight scaling technique, where the intensity of the backlight lamp and the LCD transmittance function are changed concurrently and in proportion so that the same visual perception is created in the human eyes at much lower levels of power consumption. This research therefore aims to develop spatiotemporal and/or color-aware backlight scaling techniques for pixel transformation of the displayed still images or video streams so as to maximize the energy saving in a target platform. The new techniques , which take advantage of the human visual system characteristics to minimize distortion between the original and backlight-scaled images/videos, will be implemented and demonstrated on the Apollo Testbed II hardware platform. The broader impact of the research is to significantly reduce the power consumption of typical handheld devices, increasing their discharge-cycle lifetime, thereby, enabling more widespread and convenient use of such devices. The backlight dimming technology can also be applied in AC-powered systems where the key concern is the energy cost to the individual user as well as the society at large. This technology has the potential to reduce the typical energy bill of a desktop computer by 30% or so (when the system is being used). This research, if successful, will expedite introduction of advanced display technologies (such as LED-based backlighting for LCDs, or organic LED-based displays) since it will reduce their power cost without sacrificing quality.

    LCD (Liquid Crystal Display) TVs are becoming the main stream in FPD (Flat Panel Display) market. In spite of their superb performances (e.g. vivid image representation and high native resolution) compared to other types of TVs such as PDP (Plasma Display Panel), LCDs suffer from a number of well-known shortcomings such as motion blur artifact, low contrast ratio, and low brightness. Furthermore, backlighting for the modern LCD panels is typically done with the aid of a 2-D array of individually luminance-controlled white LED's, each of whom serves as the backlight for a fixed-size region on the LCD panel. We are currently investigating dimming and scanning of the 2-D LED array with the aid of appropriately time-shifted and duty cycle adjusted Pulse Width Modulation (PWM) signals. The goal is both to minimize the total power dissipation of the LED array drivers while improving the static contrast ratio and eliminating the motion blur artifact in LCD TVs. More precisely, we are developing a 2-D PWM-driven backlight dimming technique which simultaneously dims certain regions of the LCD screen and sets the pixel values by applying an optimal pixel value transformation function. In addition, we are investigating a 2-D backlight scanning technique which determines a new duty cycle for the PWM signal for each white LED driver so as to preserve the original backlight intensity for the LED while ensuring that the LED can be completely turned off for a period of time during each frame. This off time, which is about 8ms in the target display system, greatly reduces the motion blur. At the same time, if the pixel value updates due to refresh operation take place during this off time, the viewer will only see the changed pixel values corresponding to the new frame and will not be subjected to effects arising from pixel value transitions while the pixels are being exposed to back light. Both of the proposed ideas are being implemented in a Xilinx FPGA (Spartan 3E) and tested on a Samsung 40-inch LCD TV.

    B2Sim: A Fast Micro-Architecture Simulator Based on Basic Block Characterization -- State-of-the-art architectural simulators support cycle accurate pipeline execution of application programs. However, it takes days and weeks to complete the simulation of even a moderate-size program. During the execution of a program, program behavior does not change randomly but changes over time in a predictable/periodic manner. This behavior provides the opportunity to limit the use of a pipeline simulator. More precisely, in a CODED-06 paper, we presented a hybrid simulation engine, named B2Sim for (cycle-characterized) Basic Block based Simulator, where a fast cache simulator e.g., sim-cache and a slow pipeline simulator e.g., sim-outorder are employed together. B2Sim reduces the runtime of architectural simulation engines by making use of the instruction behavior within executed basic blocks. We integrated B2Sim into SimpleScalar and achieved on average a factor of 3.3 times speedup on the SPEC2000 benchmark and Media-bench programs compared to conventional pipeline simulator while maintaining the accuracy of the simulation results with less than 1% CPI error on average.

    Backlight Dimming in Power-Aware Mobile Displays -- In a DAC-06 paper, we introduced a temporally-aware backlight scaling technique for video streams. The goal is to maximize energy saving in the display system by means of dynamic backlight dimming subject to a video distortion tolerance. The video distortion comprises of (1) an intra-frame (spatial) distortion component due to frame-sensitive backlight scaling and transmittance function tuning and (2) an inter-frame (temporal) distortion component due to large-step backlight dimming across frames modulated by the psychophysical characteristics of the human visual system. The proposed backlight scaling technique is capable of efficiently computing the flickering effect online and subsequently using a measure of the temporal distortion to appropriately adjust the slack on the intra-frame spatial distortion, thereby, achieving a good balance between the two sources of distortion while maximizing the backlight dimming-driven energy saving in the display system and meeting an overall video quality figure of merit.
    The proposed dynamic backlight scaling approach is amenable to highly efficient hardware realization and has been implemented on the Apollo Testbed II. Actual current measurements demonstrate the effectiveness of proposed technique compared to the previous backlight dimming techniques, which have ignored the temporal distortion effect.

    DTM: Dynamic Tone Mapping for Backlight Scaling -- In a DAC-05 paper, we presented an approach for pixel transformation of the displayed image to increase the potential energy saving of the backlight scaling method. The proposed approach takes advantage of human visual system (HVS) characteristics and tries to minimize distortion between the perceived brightness values of the individual pixels in the original image and those of the backlight-scaled image. This is in contrast to previous backlight scaling approaches which simply match the luminance values of the individual pixels in the original and backlight-scaled images. Moreover, the proposed dynamic backlight scaling approach, which is based on tone mapping, is amenable to highly efficient hardware realization because it does not need information about the histogram of the displayed image. Experimental results show that the dynamic tone mapping for backlight scaling method results in about 35% power saving with an effective distortion rate of 5% and 55% power saving for a 20% distortion rate.

    HEBS: Histogram Equalization for Backlight Scaling -- In a DATE-05 paper, we presented a method for finding a pixel transformation function that minimizes the backlight intensity while maintaining a pre-specified image distortion level for a liquid crystal display. This is achieved by first finding a pixel transformation function, which maps the original image histogram to a new histogram with lower dynamic range. Next the contrast of the transformed image is enhanced so as to compensate for the brightness loss that arises from backlight dimming. The proposed approach relies on an accurate definition of the image distortion, which accounts for both the pixel value differences and a model of the human visual system and is amenable to highly efficient hardware realization. Experimental results show that histogram equalization for backlight scaling results in about 45% power saving with an effective distortion rate of 5% and 65% power saving for a 20% distortion rate. This is higher power savings compared to previously reported dynamic backlight scaling approaches.

     

    Design Techniques and Tools to Enable and Enhance Coarse-Grain Power Gating in ASIC Designs

    Sponsor: National Science Foundation - Computing Processes and Artifacts

    Project Summary: The semiconductor industry's $261 B in 2006 revenue does not accurately reflect its crucial role in enabling a $47 T ($61 T on a PPP basis) world economy to thrive and grow. This industry underpins the systems and technologies on which the people and governments of the world rely on for future prosperity. This industry is currently facing some extraordinary challenges, including variability of nano devices as well as excessive power dissipation in circuits and systems. In order for the industry to continue to expand and prosper, it is critical to address these challenges heads on. The proposed research takes on one of these two fundamental challenges, i.e., the "power crisis". More precisely, this project focuses on coarse-grain power gating in ASIC designs, which switches entire blocks/rows of standard cells. This choice is due to lower cost and greater leakage savings of coarse-grain power gating compared to its fine-grain counterpart, which inserts the header or footer in each standard cell in the ASIC design library. The project results are expected to include the following: (i) Distributed sleep transistor placement and sizing; (ii) Sleep signal scheduling to minimize the peak current demand on wakeup; (iii) Mode transition energy minimization to enable more frequent mode transitions; (iv) Local sleep signal generation for autonomous power gating; and (v) Power gating to enable multiple power modes. This project aims to address each of these tasks by developing algorithmic or mathematical programming solutions to solving each step and by developing a design flow and prototype software tools that enable widespread adoption of this very interesting and important technology in the ASIC design.

    Coarse-Grain MTCMOS Sleep Transistor Sizing Using Delay Budgeting -- Current state-of-the-art sleep transistor sizing algorithms minimize the total sleep transistor width subject to a maximum IR voltage drop on the virtual node of each MTCMOS switch cell. In these approaches, the DC noise constraint for the virtual node of a switch cell is somehow related to the tolerable delay increase in the circuit. Using a single maximum IR voltage drop value on all virtual nodes is over constraining the problem. Instead, we would like to set the DC noise constraint for the virtual node of each MTCMOS switch based on the minimum tolerable delay increase (i.e., the positive timing slack) for any logic cell in the corresponding module. The voltage drop allocation on the virtual nodes of the MTCMOS switches should thus be closely related to the timing slack allocation to individual cells in the circuit. In a DATE-08 paper, we introduced a new approach for minimizing the total sleep transistor width for a coarse-grain MTCMOS circuit assuming a given standard cell and sleep transistor placement. Our algorithm takes a maximum allowed circuit slowdown factor and produces the sizes of various sleep transistors in the standard cell layout while considering the DC parasitics of the virtual ground net. We showed that the problem can be formulated as a sizing with delay budgeting problem and solved efficiently using a heuristic sizing algorithm which implicitly performs maximum current calculation through sleep transistors while accounting for different current flow paths in the virtual ground net through adjacent sleep transistors. This technique uses at least 40% less total sleep transistor width compared to other approaches.

    Sizing and Placement of Charge Recycling Transistors in MTCMOS Circuits -- In an ICCAD-07 paper, we showed that the sizing and placement problems of charge-recycling transistors in charge-recycling multi-threshold CMOS (CR-MTCMOS) can be formulated as a linear programming problem, and hence, can be efficiently solved using standard mathematical programming packages. The proposed sizing and placement techniques allow us to employ the CR-MTCMOS solution in large row-based standard cell layouts while achieving nearly the full potential of this power-gating architecture, i.e., we achieve 44% saving in switching energy due to the mode transition in CR-MTCMOS compared to standard MTCMOS.

    Charge Recycling in MTCMOS Circuits: Concept and Analysis -- Design of a suitable power gating (e.g., multi-threshold CMOS or super cutoff CMOS) structure is an important and challenging task in sub-90nm VLSI circuits where leakage currents are significant. In designs where the mode transitions are frequent, a significant amount of energy is consumed to turn on or off the power gating structure. It is thus desirable to develop a power gating solution that minimizes the energy consumed during mode transitions. In a DAC-06 paper and an IEEE SSCS DLP talk in October 2006, we described such a solution by recycling charge between the virtual power and ground rails immediately after entering the sleep mode and just before wakeup. The proposed method can save up to 43% of the dynamic energy wasted during mode transition while maintaining the wake up time of the original circuit. It also reduces the peak negative voltage value and the settling time of the ground bounce.

     

    Statistical Static Timing Analysis and Circuit Optimization: A Current Source Model-Based Approach

    Sponsor: Seminconductor Research Corp.

    Project Summary The down scaling of layout geometries to 45nm and below has resulted in a significant increase in the packing density and the operational frequency of VLSI circuits. The conventional static timing analysis (STA) techniques model signal transitions as saturated ramps with known arrival and transition times and propagate these timing parameters from the circuit primary inputs to the primary outputs. However the different waveforms with identical arrival time and slew (transition) time applied to the input of a logic cell or an interconnect line can result in very different propagation delays through the component depending on the exact form of the applied signal waveform. In addition, as we move towards the 45nm and lower minimum feature sizes for the devices, process variations are becoming an ever increasing concern for the design of high performance integrated circuits. The process variations can cause excessive uncertainty in timing calculation, which in turn calls for sophisticated analysis techniques to reduce the uncertainty.

    Recent Results of the Current Source Model-Based Approach for Timing Analysis -- Our work focuses on the development of an accurate current source model of a CMOS logic cell with extensions to handle multiple input switching and statistical parameter variability. The work also includes development of efficient methods to generate the CSMs of logic cells, which are typically present in a standard cell library. The work addresses integration of CSMs of logic cells with a waveform propagation engine in order to produce a highly efficient and robust CSM-based static timing analyzer.

     

    Optimal Design of Power Delivery Network for System on Chip

    Partial support from the National Science Foundation

    Project Summary: Utilizing multiple voltage domains (also known as voltage island) is one of the most effective techniques to minimize the overall power dissipation - both dynamic and leakage - while meeting a performance constraint. In a system designed with multiple voltage domains, the power delivery network (PDN) is responsible for delivering power with appropriate voltage levels to different functional blocks (FB's) on the chip. Voltage regulator modules (VRM's) which are in charge of voltage conversion and regulation are inevitable components in this network. The selection of appropriate VRM's plays a critical role in the power efficiency of the PDN.

    Design of an Efficient Power Delivery Network in an SoC to Enable Dynamic Power Management In an ISLPED-07 paper, we introduced a new technique to design the power delivery network for a SoC design to support dynamic voltage scaling. In this technique the power delivery network is composed of two layers. In the first layer, DC-DC converters with fixed output voltages are used to generate all voltage levels that are needed by different loads in the SoC design. In the second layer of the power delivery network, a power switch network is used to dynamically connect the power supply terminals each load to the appropriate DC-DC converter output in the first layer. Experimental results demonstrate the efficacy of this technique.

    Optimal Selection of Voltage Regulator Modules in a Power Delivery Network -- Typically a star configuration of the VRM's, where only one VRM resides between the power supply and each FB, is used to deliver currents with appropriate voltage levels to different loads in the circuit. In a DAC-07 paper, we showed that using a tree topology of suitably chosen VRM's between the power source and FB's yields higher power efficiency in the PDN. We formulated and efficiently solved the problem of selecting the best set of VRM's in a tree topology as a dynamic program and efficiently solve it.

     

    Power Efficient SRAM Cell and Array Design

    Partial support from the National Science Foundation

    Project Summary: In many modern microprocessors, caches occupy a large portion of the die. For example, in Intel's Itanium 2 Montecito processor, more than 80% of the die is dedicated to caches. Since the leakage power dissipation is roughly proportional to the area of a circuit, the leakage power of caches is one of the major sources of power consumption in high performance microprocessors. Our research on SRAM design focuses on leakage reduction in such memory structures and on judicious use of multiple Vth and multiple tox transistors in a large SRAM array and power-ground-gated, data-retentive SRAM cells.

    Low-Leakage SRAM Design in Deep Submicron Technologies -- This January-2008 presentation has two parts. In the first part, a method based on dual-Vt and dual-Tox assignment is presented to reduce the total leakage power dissipation of SRAMs while maintaining their performance. The proposed method is based on the observation that read and write delays of a memory cell in an SRAM block depend on the physical distance of the cell from the sense amplifier and the decoder. Thus, the idea is to deploy different configurations of six-transistor SRAM cells corresponding to different threshold voltage and oxide thickness assignments for the transistors. Unlike other techniques for low-leakage SRAM design, the proposed technique incurs neither area nor delay overhead. In addition, it results in a minor change in the SRAM design flow. The leakage saving achieved by using this technique is a function of the values of the high threshold voltage and the oxide thickness, as well as the number of rows and columns in the cell array. Simulation results with a 65nm process demonstrate that this technique can reduce the total leakage power dissipation of a 64 512 SRAM array by 33% and that of a 32 512 SRAM array by 40%. In the second part, a gated-supply, gated-ground data retention technique for CMOS SRAM cells to enable design of robust and ultra low-power caches in very deep submicron CMOS technologies is presented. We show that, given a fixed value of the voltage difference on the power rails of the SRAM cell during the standby mode, the proposed power-ground-gating (PG-gating) solution achieves significantly higher leakage power savings compared to either power supply (P) gating or ground (G) gating techniques while improving the static noise margin and soft error rate. In particular, it is shown that optimum ground and supply voltage levels exist for which the SRAM cell leakage is minimized subject to a hold static noise margin constraint. When the PG-gated cell is not accessed for read/write operations, it is biased to the optimum values of ground and supply voltages, resulting in minimum leakage power consumption. Simulation results demonstrate that the PG-gating technique has a higher hold and read static noise margin, lower soft error rate, and also higher leakage saving compared to single P or G gating techniques at the expense of an increase in the area overhead. Moreover, the PG-gated cell exhibits less leakage variability under process and temperature variations compared to single P or G gating techniques. Moreover, its hold static noise margin is more robust to process variations. For a 64Kb SRAM array designed in 130nm CMOS technology with Vdd=1.3V and a 180mV hold static noise margin, the leakage power of PG-gated design is 60% lower than that of a low power G-gated design.

    ===================================================================

    ===================================================================

     

    Minimizing Leakage Power in CMOS Designs

    Support from miscellaneous sources

    Project Summary: In many new designs, the leakage component of power consumption is comparable to the dynamic component. Many reports indicate that, in sub-65 nm CMOS technology node, 40% or even higher percentage of the total power consumption is due to the leakage of transistors and this percentage will increase with technology scaling unless effective techniques are used to bring leakage under control. This research focuses on minimizing leakage in CMOS VLSI circuits.

    Minimizing Leakage Power in CMOS: Technology and Design Issues -- This tutorial given at EPFL in July 2008 focuses on circuit techniques and design methods to accomplish this goal. The first part of the presentation provides an overview of basic physics and technology and scaling trends that have resulted in the significant increase in sub-threshold and gate leakage currents. The part provides an in-depth description of multiple, Vdd, multiple-Vt, and multiple Tox techniques for leakage minimization in light of process variations and substrate temperature changes. The second part of this presentation describes a number of design optimization techniques for controlling leakage current, including, state assignment, technology mapping, and precomputation-based signal guarding. It will also present runtime mechanisms for leakage control including body bias control, transition to minimum leakage state, and power gating.

    Circuit and Design Automation Techniques for Leakage Minimization of CMOS VLSI Circuits -- This tutorial given at Samsung Research in October 2006 focuses on circuit techniques and design methods to accomplish leakage minimization in CMOS VLSI circuits. The first part of the presentation provides an overview of basic physics and technology and scaling trends that have resulted in the significant increase in sub-threshold and gate leakage currents. The part provides an in-depth description of multiple, Vdd, multiple-Vt, and multiple Tox techniques for leakage minimization in light of process variations and substrate temperature changes. This part will address the use of high permittivity gate dielectric, metal gate, novel device structures and circuit based techniques for controlling the gate tunneling current. The second part of this presentation describes a number of design optimization techniques for controlling leakage current, including, state assignment, technology mapping, and precomputation-based signal guarding. It will also present runtime mechanisms for leakage control including body bias control, transition to minimum leakage state, power gating, etc.

     

    Battery Aware Hierarchical Wireless Sensor Network for Distributed Data Collection

    Project Summary: Wireless sensor networks (WSN) have gained considerable attention in applications where spatially distributed events are to be monitored. Recent technological advances have led to the emergence of small battery-powered sensors with considerable processing and communication capabilities. We consider a distributed, hierarchical wireless sensor network of energy-constrained nodes. Each node in this network has limited computation and storage resources, wireless communication capability, and a limited energy source in the form of a battery. This network of autonomous nodes performs collaborative problem solving, such as providing situational and tactical awareness to the first respondents in an emergency situation, carrying out automatic intrusion detection/deterrence, or object recognition and tracking. The problem of interest is maximizing the network lifetime while providing a minimum quality of service requirement subject to some performance constraints (e.g., the response time.) Energy is considered as a key network resource that must be allocated and dispensed properly to maximize the network lifetime. We analyze network and wireless link properties and develop protocols that compensate/account for effects of extreme variations in wireless link dependability, many-to-one nature of the communication in a mixed multi-tier WSN, local high-contention nodes in the network, and relatively high cost of maintenance. This research addresses battery awareness of a monitoring sensor network as an intrinsic aspect of the distributed data collection task. This project will produce battery-aware algorithms and techniques for wireless sensor network design and deployment as the key enabler for cost-effective realization of many applications. The broader impact of this project will be to assist in the critical ongoing efforts to deploy networks of energy-constrained sensors and distribution/collection nodes for environmental, medical and security applications.

    Lifetime-Aware Hierarchical Wireless Sensor Network Architecture with Mobile Overlays -- With power efficiency and lifetime awareness becoming critical design concerns, we focus on energy-aware design of different layers of the WSN protocol stack. In a RAW-07 conference paper, we presented and analyzed a hierarchical wireless sensor network with mobile overlays, along with a mobility-aware multi-hop routing scheme, in order to optimize the network lifetime, delay, and local storage size. Furthermore, we show how certain physical layer attributes may affect the overall network lifetime. More specifically, we have investigated how certain adaptive modulation schemes may affect overall energy balancing in the network and hence its lifetime. Finally, we investigate new lifetime models which can be used to obtain more practical design criteria for energy-aware system design.

     

    Controlling Uncertainty and Handling Variability in System-Level Dynamic Power Management

    Project Summary: Variability represents diversity or heterogeneity in a well-characterized population. Fundamentally a property of Nature, variability is usually not reducible through further measurement or study. For example, different dies have different leakage power dissipations, no matter how carefully we measure them. Uncertainty represents partial ignorance or lack of perfect information about poorly-characterized phenomena or models. Fundamentally a property of the observer, uncertainty is usually reducible through further measurement or study. For example, even though an observer may not know the leakage power dissipation of every die coming out of a manufacturing plant, he or she can surely take more samples to gain additional (albeit still imperfect) information about the leakage power distribution. With the increasing levels of variability in the characteristics of nanoscale CMOS devices and VLSI interconnects and continued uncertainty in the operating conditions of VLSI circuits, achieving power efficiency and high performance in electronic systems under process, voltage, and temperature variations as well as current stress, device aging, and interconnect wear-out phenomena has become a daunting, yet vital, task. This research tackles the problem of system-level dynamic power management (DPM) in systems which are manufactured in nanoscale CMOS technologies and are operated under widely varying conditions over the lifetime of the system. Such systems are greatly affected by increasing levels of process variations typically materializing as random or systematic sources of variability in device and interconnect characteristics, and widely varying workloads and temperature fluctuations usually appearing as sources of uncertainty. At the system level this variability and uncertainty is beginning to undermine the effectiveness of traditional DPM approaches. It is thus critically important that we develop the mathematical basis and practical applications of a variability-aware, uncertainty-reducing DPM approach with the following unique features and capabilities.

    Improving the Efficiency of Power Management Techniques by Using Bayesian Classification In an ISQED-08 paper, we presented a supervised learning based dynamic power management (DPM) framework for a multicore processor, where a power manager (PM) learns to predict the system performance state from some readily available input features (such as the state of service queue occupancy and the task arrival rate) and then uses this predicted state to look up the optimal power management action from a pre-computed policy lookup table. The motivation for utilizing supervised learning in the form of a Bayesian classifier is to reduce overhead of the PM which has to recurrently determine and issue voltage-frequency setting commands to each processor core in the system. Experimental results reveal that the proposed Bayesian classification based DPM technique ensures system-wide energy savings under rapidly and widely varying workloads.

    Resilient Dynamic Power Management under Uncertainty In a DATE-08 paper, we presented a stochastic framework to improve the accuracy of decision making during dynamic power management, while considering manufacturing process and/or design induced uncertainties. More precisely, the uncertainties are captured by a partially observable semi-Markov decision process and the policy optimization problem is formulated as a mathematical program based on this model. Experimental results with a RISC processor in 65nm technology demonstrate the effectiveness of the technique and show that the proposed uncertainty-aware power management technique ensures system-wide energy savings under statistical circuit parameter variations.

     

    Design Methodologies and Techniques for Optimizing Power Consumption and Performance in Pipeline Circuits

    Project Summary: Excessive power dissipation and resulting temperature rise have become one of the key limiting factors to processor performance and a significant component of its cost. In modern microprocessors, expensive packaging and heat removal solutions are required to achieve acceptable substrate and interconnect temperatures. Due to their high utilization, pipeline circuits of a high-performance microprocessor are major contributors to the overall power consumption of the processor, and consequently, one of the main sources of heat generation on the chip. Our research is expected to propose techniques to minimize power consumption in pipeline circuits at different design levels and, at the same time, produce guidelines and tools for optimizing their power dissipation.

    A Mathematical Solution to Power Optimal Pipeline Design by Utilizing Soft Edge Flip Flops -- In an ISLPED-08 paper, we presented a technique to address the problem of reducing the power consumption in a synchronous linear pipeline, based on the idea of utilizing soft-edge flip-flops (SEFF) for time borrowing and voltage scaling in the pipeline stages. We described a unified methodology for optimally selecting the supply voltage level of a linear pipeline and optimizing the transparency window of the SEFF so as to achieve the minimum power consumption subject to a total computation time constraint. We formulated the problem as a quadratic program that can be solved optimally in polynomial time. Our experimental results demonstrated that this technique is quite effective in reducing the power consumption of a pipeline circuit under a performance constraint. Next, we will improve the pipeline stages by using optimally designed flip-flops. Also, we will consider the effect of higher order constraints such as the interdependency between the setup and hold time, and then generalize the problem to the non-linear pipelines with multi-stage feed forward and feedback paths.

     

    Performance and Reliability Analysis and Optimization in Sub-45nm CMOS Circuits

    Project Summary: With the CMOS technology in the nanometer regime, reliability is becoming a major design concern. It seems in future designer will need to make power-performance-reliability tradeoffs at all levels of the VLSI circuit and system design. In this area our current research focuses on building accurate, fast and easy to use fault and reliability device models and incorporating these models into CAD tools. Because of reliability concerns physical scaling of CMOS has already been slowed. Many nanotechnologies are emerging that are an order of magnitude smaller than CMOS but all these technologies are far below CMOS in terms of reliability. Our current research also focuses on discovering new hybrid architectures that promise VLSI scaling at the system level in future technologies.

    Probabilistic Error Propagation in a Logic Circuit Using the Boolean Difference Calculus -- A gate level probabilistic error propagation model is presented which takes as input Boolean function of the gate, signal probability, the probability for signal being "1", and error probability at the gate inputs, and the gate error probability and generates the error probability at the output of the gate. The presented model uses the Boolean difference calculus and can be efficiently applied to the problem of calculating the error probability at the primary outputs of a multi-level Boolean circuit with a time complexity which is linear in the number of gates in the circuit. This is done by starting from the primary inputs and moving toward the primary outputs by using a post-order (reverse DFS) traversal. Experimental results demonstrate the accuracy and efficiency of the proposed approach compared to the other known methods for error calculation in VLSI circuits.

    Apollo Testbed

    We research three major areas in low power design of VLSI circuits and systems: software and system level power prediction and optimization, architectural/behavioral power estimation and optimization, and system-level dynamic power management.

    We investigate the problem of simultaneous scheduling and mapping of the computational and communication processes in a generalized task flow graph to HW/SW resources on a VLSI chip so as to minimize the energy dissipation while satisfying a given deadline and/or throughput constraint. As part of this research we examine the problem of modeling energy-latency characteristics of a given application program (for example, specified in a standard programming language such as C/C++) which is to be mapped to custom hardware and/or run on an embedded processor. We develop efficient, yet accurate, estimators at this high-level of design abstraction without having to do detailed compilation of the application program into the hardware and/or software components. This capability is in turn essential in achieving effective power-aware hardware/software co-design. At the same time we develop optimization techniques for power-conscious compiler targeting the StrongARM microprocessor. We research a number of problems related to power analysis and optimization at the behavioral/architectural level. In particular, we address early power estimation for combinational and sequential logic blocks. Examples include power estimation of a finite state machine circuit prior to state encoding, or of a combinational logic circuit before logic synthesis and mapping. We also develop power characterization of Intellectual Property (IP) cores at the architectural level and develop an automatic clock-gating tool for HDL descriptions. We consider dynamic power management techniques, which exploit the idleness of system components, and study the problem of determining optimal management policies for a variety of system models. In particular, we focus on operating system (OS) directed control policies and seek to develop realistic models of the hardware and software components and the system environment.

    The key research results include development of prototype software programs that perform power prediction of C/C++ and HDL descriptions of complex applications and systems, provide system-level component modeling and characterization for power, and optimize the application software (C/C++ or HDL) and the system software (OS) to achieve low power dissipation.

     

    Analysis and Design Techniques for Battery-Powered Digital CMOS Circuits

    In the past, the major concerns of the VLSI designer were area, speed, and cost; power consideration was typically of secondary importance. In recent years, however, this has begun to change and, increasingly, power is being given comparable weight to other design considerations. Several factors have contributed to this trend, including the remarkable success and growth of the class of battery-powered, personal computing devices and wireless communications systems that demand high-speed computation and complex functionality with low power consumption. In these applications, extending the battery service life is a critical design concern. There also exists a significant pressure for producers of high-end products to reduce their power consumption. The main driving factors for lower power dissipation in these products are the cost associated with packaging and cooling as well as the circuit reliability.

    Our research focuses on the problem of maximizing the battery service life in battery-powered CMOS circuits. In particular, we recently proposed an integrated model of the VLSI hardware and the battery sub-system that powers it. We showed that, under this model and for a fixed operating voltage, the battery efficiency (or utilization factor) decreases as the average discharge current from the battery increases. The implication is that the battery life is a super-linear function of the average discharge current. Furthermore, even if the average discharge current remains the same, different discharge current profiles (distributions) may result in very different battery lifetimes. The maximum battery life is achieved when the variance of the discharge current distribution is minimized. Finally, we demonstrated that accounting for the dependence of battery capacity on the average discharge current changes the shape of the energy-delay trade-off curve and hence the value of the operating voltage that results in the optimum energy-delay product for the target circuit. Consequently, we proposed a more accurate metric (i.e., the battery discharge rate times delay product as opposed to the energy-delay product) for comparing various low power optimization methodologies and techniques targeted toward battery-powered electronics. Analytical derivations as well as simulation results demonstrate the importance of correct modeling of the battery-hardware system as a whole.

    Our research has far-reaching implications for the design of battery-powered electronics by shifting the focus from power and energy minimization to battery service life maximization. It also brings up a number of new and exciting research problems, including, but not limited to, static and dynamic voltage scaling rules to maximize the battery service life subject to performance constraints, optimal choice of battery cells for a given VLSI circuit, circuit and architectural design of the VLSI system hardware to match the output characteristics of the battery cells that power it, use of multiple battery cells and dynamic power management schemes to maximize the service life of the battery subsystem, and even integrated on-chip battery-hardware design (micro-batteries for micro-electronics).

    Portable electronic devices tend to be much more complex than a single VLSI chip; They contain many components, ranging from digital and analog to electro-mechanical and electro-chemical. Hence reducing power consumption only in the digital VLSI circuits is insufficient. System designers have started to respond to the requirement of power-constrained system designs by a combination of technological advances and architectural improvements. Dynamic power management which refers to selective shut-off or slow-down of system components that are idle or underutilized has proven to be a particularly effective technique. Incorporating an effectual dynamic power management scheme in the design of an already-complex system is a difficult process that may require many design iterations and careful debugging and validation. The goal of a dynamic power management policy is to reduce the power consumption of an electronic system by putting system components into different states, each representing certain performance and power consumption level. The policy determines the type and timing of theses transitions based on the system history, workload and performance constraints.

    Our research focuses on the development of an abstract stochastic model of a power-managed electronic system and formulating the problem of system-level power management as a stochastic optimization problem based on the theories of continuous-time Markov decision processes and stochastic networks. This problem will be solved exactly and efficiently using a "policy iteration" approach. Extensions to more complex systems, non-stationary system behavior and non-Markovian decision making will be considered.

     

    Design Methodologies and Techniques for Temperature-dependent Reliability, Performance and Signal Integrity Analysis and Optimization of VLSI Interconnects

    Due to the ever-increasing failure rates in DSM interconnects, interconnect reliability has become a critical design concern in today's VLSI circuits. However, interconnect reliability and performance (i.e., speed) are tightly coupled and any approach to improve one metric has to consider its effect on the other. Temperature plays a very important role in determining both circuit reliability and performance. The proposed research focuses on detailed yet efficient characterization and quantification of electromigration (EM) and thermomigration (TM) induced failures in VLSI interconnect as well as design automation techniques to combat and control these failures. These techniques will work in a two-dimensional tradeoff space of performance and reliability (PR-space). The proposed research is expected to advance our understanding of EM and especially TM-induced failures in integrated circuits (IC's) and, at the same time, produce guidelines, algorithms, and tools for achieving a non-dominated operating point in the PR-space.

    Our work also focuses on the analysis and modeling of non-uniform chip temperature profile and the study of its effects on different aspects of signal integrity in very high performance VLSI interconnects. First, we will develop computationally efficient methods to calculate the thermal profile of VLSI interconnect lines. A temperature-dependent distributed RC interconnect delay model will be developed next. The model can be applied to a wide variety of interconnect layouts and temperature distributions to quantify the impact of these thermal non-uniformities on signal integrity issues. Using this model, we will show that global nets (including clock and power/ground distribution networks as well as long busses and set/reset lines) are the nets that are the most vulnerable to the thermal non-uniformities in the substrate. We will therefore develop computer-aided design techniques for constructing a thermally-driven zero skew clock routing tree, a power/ground distribution network, optimal buffer insertion in long interconnect lines, and, more generally, chip-level dynamic thermal management policies.

     

    Power-Aware Memory Bus Encoding

    This research develops encoding techniques to minimize the switching activity on a time-multiplexed Dynamic RAM (DRAM) address bus. The DRAM switching activity can be classified either as external (between two consecutive addresses) or internal (between the row and column addresses of the same address). For external switching activity in a sequential access pattern, we will develop an optimal encoding, PYRAMID code. Extensions of the basic code address different types of DRAM devices and bus architectures, and explore static vs. dynamic coding schemes. To minimize internal switching activity, we propose scattered paging and redundant coding techniques for both random and sequential access patterns. The proposed codes are expected to reduce power dissipation on the memory bus by a factor of two or more.

    We also develop encoding techniques for minimizing the switched capacitance on a non-multiplexed address bus between the processor and static memory. More precisely, we have developed the ALBOZ code, which is constructed based on transition signaling and the limited-weight codes, and with enhancements to make it adaptive and irredundant, results in up to 87% reduction in the bus switching activity at the expense of a small area overhead for realizing the encoder/decoder circuitry. Furthermore, building on T0 and Offset-Xor encoding techniques, we have developed three irredundant bus-encoding techniques that decrease switching activity on the memory address bus by up to 83% without the need for redundant bus lines. The power dissipation of encoder and decoder circuitry has also been calculated and shown to be small in comparison with the power savings on the memory address bus itself.

     

    Apollo: Adaptive Power Optimization and Control for the Land Warrior

    Project URL: Apollo Testbed

    The Apollo project aims at significantly reducing power dissipation of next-generation mobile DoD computing and communication systems by means of operating system-directed power management, power-aware software compilation, and system-level synthesis and optimization of the integrated hardware/software platform subject to performance and quality-of-service constraints.

    We consider dynamic power management techniques and study the problem of determining optimal management policies for a variety of system models. In particular, we focus on operating system (OS) directed control policies and seek to develop realistic models of the hardware and software components and the system environment in the Land Warrior System (LWS). We characterize power consumption of common arithmetic logic and memory blocks and develop instruction-level power macro-models for the StrongARM microprocessor and TI's digital signal processor 320c-5410 in addition to the major subsystems in the (next-generation) LWS.

    We investigate the problem of developing techniques for power-conscious architectural organization and optimization techniques targeting a StrongARM-based hardware platform that we are constructing based on the Intel's Assabet and Neponset boards plus a number of external devices. This platform is called the Apollo Testbed (AT). We also develop system and application software for the AT. This task will include development of the ARMLinux drivers for all external devices, the "map" application, and the utility software needed for the AT usage scenario that is provided to us by the IPM team of the Army CECOM.

    We develop encoding techniques to minimize the switching activity on a time-multiplexed Dynamic RAM (DRAM) address bus. We develop redundant (i.e., with INVERT bit) memory bus encoding techniques that reduce the switching activity on the bus between the FLASH memory and the processor. The proposed codes are expected to reduce power dissipation on the memory bus by a factor of two or more. We develop algorithms and techniques for power optimization of the FLASH and main memory hierarchy in the AT. More precisely, we explore use of different data representations for the images stored in the map database so as to reduce power-consuming accesses to the FLASH memory (which acts as the secondary storage in the AT) at the expense of more intensive computations on the SA 1110. We study and analyze the impact of various architectural optimization techniques on the power saving of the AT. Such techniques include power optimization and control for the LCD, the camcorder, and the network (wireless LAN) interface card.

    This work is done in collaboration with Prof. Niraj Jha of Princeton University. Dr. Jha will tackle both periodic and aperiodic task graphs, automatically generate and transform task graphs from the system specification, estimate system power and synthesize low-power system architectures. The system synthesis tools that will be developed include all supporting databases and simulation engines. The tools will synthesize a given system specification written in C or Hardware Description Language (HDL) into a low-power system architecture. He will analyze, model and optimize the power consumed by a real-time operating system (RTOS). He will develop behavioral synthesis tools for low power application-specific integrated circuits (ASICs). The work will be implemented on top of the Princeton university's synthesis system called IMPACT. Additional research topics are common-case computation, leakage power optimization and run-time adaptation in behavioral synthesis for low power.

     

    Low-Power Fanout Optimization

    Low-Power Fanout Optimization Using MTCMOS and Multi-Vt Techniques

    Although much research has been done to address fanout optimization problem in VLSI circuits, there is little work on low-power fanout optimization. More specifically, since both capacitive and leakage power dissipation of a fanout chain are proportional to its area, it has been widely accepted that power minimization of the fanout tree is equivalent to its area optimization. We have shown that due to short-circuit power dissipation, minimizing area does not necessarily result in a minimized power dissipation solution. In particular, the solution obtained from an area optimized fanout tree may dissipate excessive short-circuit power. We formulate the problem of minimizing the power dissipation of a fanout chain and show how to build a fanout tree out of these power-optimized chains. Additionally, to suppress the leakage power dissipation in a fanout tree, we use multi channel length (LGate) and multi-Vt techniques. In the presence of multi-LGate and multi-Vt options, we accurately model the delay and power dissipation of inverters as posynomials; therefore, our proposed problem formulation results in a convex mathematical program comprising of a posynomial objective function with posynomial inequality constraints which can be efficiently solved.