The green shaded area focuses on how to get power-efficient RTL IP in a three-step process.
1. Setting Low Power Objectives
It really takes a team to get the lowest power for an ARM IP, and each member plays a slightly different role.
- ISA architects - keep the ARM architecture power-efficient
- System architects - which system and IP power management approach to take
- Technical lead - specifies and manages power targets with RTL designers
- RTL designers - code the power reduction scheme
- Implementation designers - both measure and analyze the power, collaborate with RTL designers
Objectives are met by first doing a top-down power budget, looking at each power component, having members compete for lowest-power, keeping area and toggles minimized, focusing on energy efficiency, and knowing how the RTL code gets synthesized into process-specific gates.
2. Using a Low-power Design Flow
The low-power development cycle has four major parts starting at requirements and ending with measurements against objectives.
Engineers at ARM use an EDA tool from Mentor called PowerPro to help in three tasks:
- Analysis of RTL and gate-level power
- Exploring RTL power
- Reducing RTL power
Here's the low-power IP design flow showing where the PowerPro tool comes into play for analysis, exploring and reducing power:
RTL Power Analysis
How can you quickly analyze power without a gate-level netlist? The PowerPro tool does a pseudo-synthesis step to create a gate-level prototype, which can take just an hour for a CPU or GPU design.
To further improve the accuracy of the gate-level prototype requires an estimate of the physical interconnect using SPEF (Standard Parasitic Exchange Format), so that step enables PowerPro to generate power numbers within 15% of actual gate-level results:
3. Optimization Techniques
The reports and power optimization suggestions from the PowerPro tool help the engineers to make trade-off decisions on achieving the lowest power numbers. One recommendation is to use combinational clock gating for most flops, and then show you the efficiency of clock gating being used. In the design efficiency report you get to see the total number of flops, percentage gating of flops, and efficiency of gating.
Any combinational redundancies in your design are reported so that you may take some design action:
- Redundant mux toggle
- Redundant memory data/address toggle
- Clock toggle-data stable
Redundant Mux Activityy
Clocks can be shut off by using an Enable signal on flops for a given time period, consider the following case:
Inside of PowerPro there's a calculation being made so that adding extra logic to control the power is still creating a lower power value than not making a change. Sequential redundancies are identified and recommendations are made for:
- Sequential clock gating
- Sequential data gating
- Redundant reset removal
Gate-Level Power Analysis
For a sign-off level of accuracy you want to know what the gate-level power numbers are for each of your blocks. In the ARM flow the gate-level simulation uses the Standard Delay Format (SDF) for highest accuracy of power. You can even see the change in current per time (di/dt) to get some early insight of power grid analysis.
So how does the early power number at RTL correlate to the final gate-level power? You can expect the early RTL power numbers to be within 15% of the gate-level power numbers, while getting feedback in minutes to hours instead of several days, a nice trade-off.
ARM engineers did power analysis, exploration, power scrubs and optimizations over the course of several weeks on various blocks of a recent GPU project, and this shows their progress in power reduction for a specific type of test:
I asked the white paper authors about how popular the PowerPro tool usage was at ARM. "ARM uses Mentor Graphics PowerPro in the design process for all classes of ARM IP such as: CPUs, GPUs, interconnect sub-systems, and display cores to meet power goals," said authors Stephane Forey and Jinson Koppanalil from ARM and Saurabh Kumar Shrimal and Richard Langridge from Mentor Graphics.
There is sufficient automation now available for power analysis, exploration and optimization at the RTL level that is helping leading-edge SoC companies like ARM get the most out of their architecture. Your team can now consider doing daily RTL power analysis at block and unit levels to get a quick idea of your power trends. Reports from the automated tools gives designers the info needed to make power trade-offs quite early in the design process.
Read the full 16 page White Paper here.