Sponsor: Intelligence Advanced Research Projects Activity (IARPA).
We developed a row-based design methodology covering cell placement, clock tree synthesis, and routing steps for large SFQ circuits. The proposed placement tool initiates by running a conventional placer, which places fixed-height cells on logic rows. Next, in each row, cells with the same logic level are grouped together, and cell groups are then sorted based on their logic level. Furthermore, instead of a costly H-tree network, a clock row is dedicated above each logic row, which uses a combination of spliters/JTLs and, if cells are not sorted based on their logic level order, PTLs to distribute the clock pulse to all cells in the corresponding row. Effectiveness of the proposed approach is evaluated for a 32bit Kogge-Stone adder.Related work:
Sponsors: DARPA PERFECT program, and National Science Foundation (the Software and Hardware Foundations).
As the geometric dimension of transistors scales down, FinFET devices are proved to better address the challenges facing conventional planar CMOS devices. Due to manufacturing limitations, deeply-scaled FinFET devices below 10nm feature size have not been manufactured. Nevertheless, it is crucial to investigate the performance of such devices with lower feature sizes in order to shed some light on further studies on novel process techniques and circuit structures. In our work, the 7nm-gate-length FinFET structure models are built up and simulated using the Synopsys TCAD tool suite. We generate the predicted performance of the FinFET devices with different design parameters, supply voltages, and die temperatures, from which SPICE-compatible compact models are extracted and used further in circuit-level design and optimization.Related work:
A standard cell library containing timing and power information at different input and output conditions, i.e., input slew rates and output load capacitance, is required to enable logic synthesis, time and power analysis with the most advanced FinFET device technology. We generate 7nm FinFET device models by using Synopsys TCAD simulator and characterize standard cells through HSPICE simulations. Multiple supply voltages ranging from the near-threshold to the super-threshold regime are supported in our 7nm FinFET technology nodes, allowing both high performance and low power usage. In addition, devices with multiple threshold voltages are supported to enable multi-threshold technology. Synthesis results demonstrate that 7nm FinFET technology can achieve 15X circuit speed improvement and 350X energy consumption reduction, against the 45nm CMOS technology. 7nm FinFET standard cell libraries are available at here.
FinFET has been proposed as an alternative for bulk CMOS in current and future technology nodes due to more effective channel control, reduced random dopant fluctuation, high ON/OFF current ratio, lower energy consumption, etc. Key characteristics of FinFET operating in the sub/near-threshold region are very different from those in the strong-inversion region. Therefore, FinFET sizing again becomes the focus of attention.
The manufacturing of modern semiconductor devices involves a complex set of nanoscale fabrication process that are energy and resource intensive. It is important to understand and reduce the environmental impacts of manufacturing and usage of semiconductor circuits. We presented the first life-cycle energy and inventory analysis of FinFET integrated circuits and a comparative analysis with CMOS technology. A gate-to-gate inventory analysis is provided accounting for manufacturing, assembly, and use-phase. The functional unit used in this work is a (FinFET or CMOS) processor with the same functionality and performance level. Two types of applications are considered: high-performance servers and low-power mobile devices. The following conclusions are observed: (i) FinFET circuits achieve lower use-phase energy consumption compared with CMOS counterparts, and (ii) FinFET circuits can achieve less manufacturing and assembly energy because the effect of smaller size outweighs that of more complex manufacturing process.
The aggressive down-scaling of transistors to the sub-10nm regime exacerbates short channel effects as well as device mismatches. Under such circumstances, conventional 6T SRAM cells made of bulk CMOS devices suffer from poor read and write stabilities. Accordingly, in order to improve the cell stability, at the device level, planar CMOS transistors are replaced with FinFET devices, and, at the circuit level, more robust SRAM structures such as the 8T SRAM cell are adopted. Our research thus focuses on the design of high yield (i.e., robust against process variations) and energy efficient FinFET-based cache memories. For this purpose, we use a cross-layer design and optimization framework spanning device, circuit, and architecture levels.
Architectural Analysis of Caches in Deeply-scaled Technologies -- Future memory systems in deeply-scaled technologies (i.e., sub-10nm) necessitate FinFET support and more sophisticated SRAM cell structures. Accordingly, characteristics of SRAM cells need to be analyzed in order to find a desirable SRAM cell that simultaneously achieves high stability and low leakage power. Furthermore, evaluating such memory systems at the architecture-level requires modifications to the existing memory models and analysis tools. Hence, we developed P-CACTI which enhances CACTI by adding the following features:
Characteristics of FinFETs operating in the near/sub-threshold regime make it difficult to verify the timing of a circuit using conventional statistical static timing analysis (SSTA) techniques. Our work focuses on extending the CSM approach to handle VLSI circuits comprised of FinFET devices with independent gate control operating in the near/sub-threshold voltage regime and subject to process variations. In particular, we combine non-linear analytical models and low-dimensional CSM lookup tables to simultaneously achieve high modeling accuracy and time/space efficiency. The proposed model can be used to perform statistical static timing analysis (SSTA) based on the distribution of process variation parameters.Semi-Analytical CSM for FinFET Devices with Independent Gate Control
We develop a semi-analytical approach for FinFET CSM operating in the sub/near-threshold regime, accounting for the unit feature of independent gate control as well as process variations. The proposed technique determines all the component values in this equivalent circuit model given the applied voltages on the front-gate-controlled and back-gate-controlled fins, the output voltage, as well as process variation parameters for N-type and P-type FETs. Only 2D LUTs are needed in our semi-analytical method to reduce the storage space requirement.Related work:
Another advantage of CSM over traditional STA models is that it can accurately capture the multiple-input switching (MIS) effect. The conventional multiple-input switching current source model (MCSM) is not scalable, since it requires high-dimensional lookup tables to account for all input, output, and internal node voltages (e.g., for a 3-input NAND gate, a 6-D lookup table is needed). We propose to model the current through each transistor in an m-input logic gate by building 2-D lookup tables with the key being Vgs and Vds of the transistor in question. Having looked up the current through all transistors in the design, we can then calculate the new output and internal node voltages.Related work:
Sponsor: National Science Foundation
Conventional EES systems only consist of a single type of EES element. Unfortunately, no available EES element can fulfill all the desired performance metrics of an ideal storage means, e.g., high power/energy density, low cost/weight per unit capacity, high round-trip efficiency, and long cycle life. An obvious shortcoming of a homogeneous EES system is that the key figures of merit (normalized with respect to capacity) of the system cannot be any better than those of its constituent EES element.
An HEES system is comprised of different types of EES elements (e.g., batteries and supercapacitors), where each type has its unique strengths and weaknesses. The HEES system can exploit the strength of each type of EES element and achieve a combination of performance metrics that is superior to that of any of its individual EES components.Related work:
Based on the properties of the HEES system and characteristics of power sources (or load devices), we developed charge management policies (such as charge allocation, charge replacement, charge migration, bank re-configuration, SoH-management, and so on) to operate HEES system properly to achieve a near-optimal performance. The charge allocation is to maximize the charge allocation efficiency, defined as the ratio of energy received by EES banks and the total energy provided by power sources over a given time period, by properly distributing power of the incoming power to selected destination banks. The charge replacement problem in the HEES system is to adaptively select the EES banks and determine the discharging currents, from zero to a maximum limit, and the voltage level settings on a charge transfer interconnect (CTI) so that the given load demand is met and the charge replacement efficiency is maximized. While charge allocation and replacement deal with energy exchange with external power supply and load demand, charge migration is an internal energy transfer from one EES bank to another.
The lifetime of EES elements is one of the most important metrics that should be considered by the designers of the EES system. The EES system lifetime is usually described using the state of health (SoH), which is defined as the ratio of full charge capacity of a cycle-aged EES element to its designed capacity. State of health-aware charge management problem in HEES systems is to find charging/discharging current profiles for all EES banks and CTI voltage, aiming to improve both the cycle life of the EES arrays (mainly battery arrays) and overall cycle efficiency of the entire system.Related work:
A HEES prototype has been built based on our proposed HEES system architecture. The hardware part of the HEES prototype is comprised of three types of module: the EES bank module, the CTI module, and the converter module. For the software part, the user interface (UI) is designed using the LabVIEW, while control policies are implemented using the Mathscript module of the LabVIEW.Related work:
User interface and control unit: The user interface (UI) is designed using LabVIEW. The LabVIEW UI monitors the runtime status of the HEES prototype, including the CTI voltage, voltage and input/output current for each EES bank, and calculates the instantaneous charging or discharging efficiency using these information.
The deployment of residential HEES systems has the potential to alleviate the mismatch between electric energy generation and consumption. However, its wide application in residential units are prohibited due to the lack of a convincing and complete analysis of their economic feasibility.
In this project, we provide a complete cost-aware design and control flow of residential HEES systems. Specifically, we propose a two-step design and control method: first deriving daily management policies with energy buffering strategies and then determining the global problem of HEES specification based on the daily management results. We take into consideration the real-life factors such as the battery¡¯s capacity degradation, unit capital cost of EES elements, maintenance and replacing cost of the HEES system, etc. Simulation results show that this system achieves averagely 11.10% more profits compared to the none-buffering HEES system.
In addition, we present a design flow for HEES system in electric vehicles (EVs). Different from a residential HEES system, the EV HEES system is highly restricted by the weight requirement since larger weight results in higher traction energy consumption. We propose a Li-ion battery and supercapacitor hybrid system for EVs to reduce the daily cost and achieve high efficiencies.Related work:
SIMES is a simulation platform targeted at fast and accurate simulation for HEES systems. SIMES models various elements in a HEES system, including different types of energy storage systems, power conversion circuitry, charge transfer interconnects, etc. Most of the models are calibrated based on measurement data of actual hardware performed in our lab.
SIMES consists of three modules: Parser, Simulator and Visualizer. Parser parses input data in the form of an XML file and constructs HEES system model. Simulator simulates the operation of the constructed system. Visualizer is a graphical user interface which can visualize both the HEES system configuration and the simulation output.
SIMES enables end users to freely explore the design space of HEES systems, as well as testing custom power management policies. In addition, SIMES provides an easy-to-use interface which allows experienced users to implement their own component models as an extension.Related work:
Spornsor:Defense Advanced Research Projects Agency, National Science Foundation, and the Semiconductor Research Corporation.
Project Summary: While power modeling and dynamic power management in various VLSI platforms have been heavily investigated, there is one critical factor that has often been overlooked, and that is the power conversion efficiency of the power delivery network (PDN) in the platforms. The PDN provides power from a power source (e.g., battery) to all modules in a platform. In reality, voltage regulators (VRs), which play a pivotal role in the PDN, inevitably dissipate power, and power dissipations from all VRs can result in a considerable amount of power loss. This is mainly because of the characteristics of the VRs that their efficiency can drop dramatically under adverse load conditions (i.e., out-of-range output current levels). This project aims to improve the power conversion efficiency of the PDN. We have been proposing optimization methods that minimize the power loss of the PDN, thereby maximizing the system-wide power savings.
In a series of papers published in proceedings of the ISLPED-12 and IEEE T. CAD-14, we introduced optimization methods to improve the power conversion efficiency of the PDNs in mobile (smartphone) platforms. Starting from detailed models of the VR designs, two optimization methods have been presented: (i)static switch sizing to maximize the efficiency of a VR under statistical loading profiles, and (ii) dynamic switch modulation to achieve high efficiency enhancement under dynamically varying load conditions. To verify the efficacy of the optimization methods in actual smartphone platforms, we also presented a characterization procedure for the PDN. The procedure is as follows:(i) group the modules in the smartphone platform together and use profiling to estimate their average and peak power consumption levels, and (ii) build an equivalent VR model for the power delivery path from the battery source to each group of modules and use linear regression to estimate the conversion efficiency of the corresponding equivalent converter. Experimental results demonstrated the efficacy of the proposed methods.Related work:
In DATE-14 paper, we focused on the dynamic control of the multiple VRs in chip multicore processor (CMP) platforms that support per-core DVFS. Starting with a proposed platform with a reconfigurable VR-to-core power distribution network (PDN), two optimization methods have been presented to maximize the system-wide energy savings: (i) reactive VR consolidation to reconfigure the PDN for maximizing the power conversion efficiency of the VRs under the pre-determined DVFS levels for the cores, and (ii) proactive VR consolidation to determine new DVFS levels for maximizing the total energy savings without any performance degradation. Results from detailed simulations based on realistic experimental setups demonstrated significant VR energy loss reduction and total energy saving.Related work:
Sponsor: National Science Foundation
PV systems have been widely deployed in electronic and electrical systems of various scales, such as embedded systems, hybrid electric vehicles, home appliances, satellites and power plants. Due to the intermittent nature of solar energy, power management techniques are imperative for maximizing the output power of a PV system.
The PV panel exhibits a nonlinear output current-voltage (I-V) relationship, and under a given solar irradiance level there is an operating point (V, I) where the PV panel output power is maximized. We propose the maximum power transfer tracking technique, which considers the converter efficiency variation to maximize the output power of the whole PV system.Related work:
The solar irradiance levels on PV cells in a PV panel may be different from each other, and this is what we call partial shading effect. Partial shading effect significantly degrades the output power of a PV panel. For example, if one fourth of the PV panel is completely shaded, the PV panel will suffer from power loss of nearly 50%. To improve the output power of a PV panel under partial shading, we propose a reconfigurable PV panel structure and a dynamic programming algorithm to reconfigure the PV panel dynamically under partial shading. Simulation results demonstrate that our method can improve the PV system output power by up to 2.31X. We have also built reconfigurable PV prototypes and demonstrated the effectiveness of PV panel reconfiguration technique.Related work:
A PV system may suffer from PV cell faults, which are caused by contact failure, corrosion of wire, hail impact, moisture, etc. When some of the PV cells in a PV panel become defective, it can lead to lower output power and shorter lifespan for the PV system. Unfortunately, manual fault detection and elimination are expensive and almost impossible for remote PV systems (e.g., PV systems in orbital or deep space mission). We present design principles and runtime control algorithms for a fault-tolerant PV panel, which can detect and bypass PV cell faults in situ without manual interventions.Related work:
On top of regenerative braking-based battery charging scheme, a PV system mounted on the HEV/EV can collect energy to charge vehicle batteries whenever there is solar irradiance. To fully make use of the vehicle surface areas, PV cells will be mounted on the hood, rooftop, door panels, etc. However, due to the uneven distributions of the solar irradiance and temperature on different vehicle surface areas, the actual output power of the vehicle PV system may be depressed. We propose a dynamic PV panel reconfiguration algorithm, which updates the PV panel configuration according to the change of irradiance and temperature distributions on the PV panel to enhance the output power. Furthermore, we investigate the customization of the PV panel installation on HEV/EV and implement a high-speed, high-voltage PV reconfiguration switch network with IGBT and a controller. We derive the optimal reconfiguration period based on the driving profiles, taking into account the on/off delay of IGBT, computation overhead, and energy overhead.Related work:
Thermal design and management of smartphones is concerned with a new challenge: the constraint of skin temperature. This constraint refers to the fact that the temperature at the device skin must not exceed a certain upper threshold. Reference showed that most people experience a sensation of heat pain when they touch an object hotter than 45˚C. Ideally speaking, distributing the heat uniformly onto the device skin results in the most effective heat dissipation. However, in practice, majority of the heat flows in vertical direction from the AP die, and thus hot spots with a high temperature are formed on the device skin.
To address this design challenge, we designed Therminator, a compact thermal modeling-based component-level thermal simulator targeting at small form-factor mobile devices (such as smartphones). It produces temperature maps for all components, including the AP, battery, display, and other key device components, as well as the skin of the device itself, with high accuracy and fast runtime. Therminator results have been validated against thermocouple measurements on multiple devices and simulation results generated by Autodesk Simulation CFD. In addition, Therminator is very versatile in the sense of handling different device specifications and component usage information, which allows a user to explore impacts of different thermal designs and thermal management policies. New devices can be simply described through an input file (in XML format). Finally, Therminator has implemented a parallel processing feature, allowing users to use GPU to reduce the runtime by more than two orders of magnitude for high-resolution temperature maps.
Therminator takes two input files provided by users. The specs.xml file describes the smartphone design, including components of interest and their geometric dimensions (length, width, and thickness) and relative positions. Therminator has a built-in library storing properties of common materials (i.e., thermal conductivity, density, and specific heat) that are used to manufacture smartphones. In addition, users can override these properties or specify new materials through the specs.xml file. The power.trace file provides the usage information (power consumption) of those components that consume power and generate heat, e.g., ICs, battery, and display. The power.trace can be obtained through real measurements or other power estimation tools/methods. power.trace is a separate file so that one can easily interface a performance-power simulator with Therminator.
The cloud computing paradigm is quickly gaining popularity because of its advantages in on-demand self-service, ubiquitous network access, location independent resource pooling, and transference of risk. The ever increasing demand for the cloud computing service is driving the expense of the data centers through the roof. In order to control the expense of a data center while satisfying the clients' requests specified in the service level agreements (SLAs), one must find the appropriate design and management policy. We have proposed joint optimization of request dispatch, resource allocation, dynamic voltage and frequency scaling (DVFS), and geographically load balancing among different data centers in a cloud computing system, in order to enhance the cloud computing system’s net profit, which is the revenue it receives from processing service requests minus the overall energy cost.Related work:
The cloud computing paradigm is quickly gaining popularity because of its advantages in on-demand self-service, ubiquitous network access, location independent resource pooling, and transference of risk. The ever increasing demand for the cloud computing service is driving the expense of the data centers through the roof. In order to control the expense of a data center while satisfying the clients' requests, one must find the appropriate design and management policy. Being aware of the interdependency between the problem of placement and capacity provisioning when designing a data center and resource allocation when operating the data center, we propose a generalized concurrent placement, capacity provisioning, and request flow control optimization framework for a distributed cloud infrastructure. With the trend of dynamic utility pricing, we try to utilize energy storage devices such as battery cells to further lower the utility cost of a data center.Related work:
Because of the enlarging gap between the rapidly increasing power demand of mobile devices (e.g. smartphones, tablet PCs, etc.) and the limited growth of the volumetric/gravimetric energy density in rechargeable batteries, the battery lifetime has become a major concern in the design of these mobile devices. Apart from some well-known techniques including DVFS that can balance between the processing power and the power consumption of some components in the mobile device, computation offloading, a technique that transfers some local tasks to a server in the cloud, can also be used to extend the battery life of a mobile device. Our work propose an optimization framework for a mobile device based on a semi-Markov decision process model to determine the DVFS policy, proportion of tasks to be offloaded to the cloud, as well as the transmission bit rate used for offloading.Related work:
The prediction of the workload profile of the cloud service clients can help optimize the operational cost and improve the quality of service (QoS) for the cloud infrastructure provider. However, because of the complex dynamics in the users’ behavior, it is challenging to generate workload prediction results with high accuracy. By analyzing the cluster dataset released by Google, we identify the multi-fractal behavior of the workload profile, based on which we propose a prediction algorithm using fractional ordered derivatives and find that the alpha-stable distribution can be used to fit the distribution of a set of characteristics of the workload.Related work:
To cope with the variations and uncertainties that emanate from hardware and application characteristics, dynamic power management (DPM) frameworks must be able to learn about the system inputs and environment and adjust the power management policy on the fly. We present an online adaptive DPM technique based on model-free reinforcement learning (RL), which is commonly used to control stochastic dynamical systems. In particular, we employ temporal difference learning for semi-Markov decision process (SMDP) for the model-free RL. In addition a novel workload predictor based on an online Bayes classifier is presented to provide effective estimates of the workload states for the RL algorithm. In this DPM framework, power and latency tradeoffs can be precisely controlled based on a user-defined parameter. Experiments show that amount of average power saving (without any increase in the latency) is up to 16.7% compared to a reference expert-based approach. Alternatively, the per-request latency reduction without any power consumption increase is up to 28.6% compared to the expert-based approach.
We have further extended the RL-based DPM framework to (i) hierarchical DPM framework which jointly perform component-level DPM with a CPU scheduler, and (ii) DPM of a power-managed system with battery-based power supply or hybrid (battery + supercapacitor) power supply.Related work:
Dynamic voltage and frequency scaling (DVFS) has been studied for well over a decade. The state-of-the-art DVFS technologies and architectures are advanced enough such that they are employed in most commercial systems today. Nevertheless, existing DVFS transition overhead models suffer from signiﬁcant inaccuracies, for example, by correctly accounting for the effect of DC-DC converters, frequency synthesizers, and voltage and frequency change policies on energy losses incurred during mode transitions. Incorrect and/or inaccurate DVFS transition overhead models prevent one from determining the precise break-even time and thus forfeit some of the energy saving that is ideally achievable. Through detailed analysis of modern DVFS setups and voltage and frequency change policies provided by commercial vendors, we introduce accurate DVFS transition overhead models for both energy consumption and delay. In particular, we identify new contributors to the DVFS transition overhead including the underclocking-related losses in a DVFS-enabled microprocessor, additional inductor IR losses, and power losses due to discontinuous-mode DC- DC conversion. We report the transition overheads for three representative processors: Intel Core2Duo E6850, ARM Cortex-A8, and TI MSP430. Finally, we present a compact, yet accurate, DVFS transition overhead macro model for use by high-level DVFS schedulers.Related work:
Sponsors: National Science Foundation (the Software and Hardware Foundations).
With tens to possibly hundreds of cores integrated in current and future multiprocessor systems-on-chips (MPSoCs) and chip-multiprocessors (CMPs), multiple applications usually run concurrently on the system. However, existing mapping methods for reducing overall packet latency cannot meet the requirement of balanced on-chip latency when multiple applications are present. We address the looming issue of balancing minimized on-chip packet latency with performance-awareness in the multi-application mapping of CMPs.
The approach of adding express channels to the tile-based NoCs has gained increasing attention. However, this approach also greatly changes the packet delay estimation and traffic behaviors of the network, both of which have not yet been exploited in existing mapping algorithms. Therefore, we explore the opportunities in optimizing application mapping for express channel-based on-chip networks.Related work:
Compared with traditional bus structures, the relatively complex NoCs with routers and links can draw a substantial percentage of chip power. An effective approach to reduce NoC power consumption is to apply power gating techniques. We explore different power gating schemes of on-chip networks to achieve low energy consumption as well as small latency penalties.Related work: