<u>SoCs for Portable Video Applications:</u> <u>Architecture level Considerations</u>

> Mahesh Mehendale <u>m-mehendale@ti.com</u>

> > EDP 2007



ienda

- u Video processing requirements of portable entertainment applications
- u Characterizing "variability" in digital video processing
- Low power design techniques and their applicability in the context of Digital Video Subsystem and the SoC
- u EDA challenges and Opportunities



### Video Processing Chain





### Personal Video Entertainment

- u Portable Video Recorder
- u Portable TV (DVB-T, DVB-H)
- u Portable Media Player
- u Digital Camcorder
- u Portable Navigation
- u Video phone
- u Web terminal



## <u>Portable Media Player –</u> <u>video interfaces</u>



🔱 Texas Instruments

# Customer care-abouts

### u Multi-standard, multi-format video processing



TEXAS INSTRUMENTS

# Video formats



TEXAS INSTRUMENTS

### **DV Engine Solution Space**



TEXAS INSTRUMENTS

# H.264 Encoder



# H.264 Decoder



Texas Instruments

# Driving Area Efficiency

u Leverage Decoder ó Encoder functionality overlap

- Programmable HWAs for similar compute functions but with different parameters (such as number of taps), and/or different coefficients
  - n DCT/IDCT, 8X8 vs 4x4
  - n Quantization, scaling
  - n Variable length coding
  - n Interpolation (half pixel, quarter pixel)
  - n Filtering
- Hardware-Software partition to meet the desired performance and programmability requirements with minimal area

### Data driven variability in Decoding

| Fetch<br>input bit<br>stream    | Entropy<br>decoder<br>and<br>reorder | Inverse<br>quantiza-<br>tion, IDCT<br>Fetch<br>reference<br>frame data<br>for MC | Motion<br>compensa-<br>tion and<br>add<br>residues | Deblocking<br>Filtering |  |
|---------------------------------|--------------------------------------|----------------------------------------------------------------------------------|----------------------------------------------------|-------------------------|--|
| Data rate,<br>System<br>traffic | Data per<br>frame                    | #MVs,<br>I/B/P<br>frames,<br>System<br>traffic                                   | #MV, MV<br>resolutio<br>n, I/B/P<br>frames         | Boundary<br>Strength    |  |



### Low Power Design – across all levels



Software Power Management Framework

#### SOC Design

DVFS AVS DPS SLM Multiple Domains (Voltage/Power/Clock)

#### Silicon IP

Dual-VT Cells Retention SRAM/Logic PM Cell Library Process/Temperature Sensor Low-Leakage Processes



# Component level support

#### Technology

Retention SRAM and logic

Dual-threshold voltages

Power management cell library

Process and temperature sensor

Design flow support

### Silicon IP

#### Description

SRAM and logic retention cells support dynamic power switching without state loss, lowering voltage and reducing leakage.

Higher threshold for lower leakage and lower threshold for higher performance.

Switching, isolation and level shifters support multiple domains in SOC implementations.

Adapts voltage dynamically in response to silicon processes and temperature variations.

Complete, nonintrusive support for easily integrating SmartReflex technologies.



# <u>SoC level Power</u> <u>Management Strategies</u>

### Technology

Adaptive Voltage Scaling (AVS)

Dynamic Power Switching (DPS)

Dynamic Voltage and Frequency Scaling (DVFS)

Multiple Domains (Voltage/Power/Clock)

Static Leakage Management (SLM)

### SOC architectural and design technologies

### Description

Maintains high performance while minimizing voltage based on silicon process and temperature.

Dynamically switches between power modes based on system activity to reduce leakage power.

Dynamically adjusts voltage and frequency to adapt to the performance required.

Enables distinct physical domains for granular power/performance management by software.

Maintains lowest static power mode compatible with required system responsiveness to reduce leakage power.



### Power optimal MHz-Vcc Operating Point

u Lower Vcc helps both dynamic and leakage power

- If Vcc is lowered while keeping MHz same can result in area increase – impacting cost and negating any power gain
- u At architecture level MHz/Vcc for a given technology drives the degree of parallelism and pipelining
- The choice of target format for power optimization impacts area efficiency – for example, an implementation which gives lowest power for 720P resolution is likely to be different (higher area) than the implementation which gives lowest power for D1

Power Reduction - at Application/Video stream level

- u If it's decode function turn off (clock gate/power down) encode functionality (e.g. Motion Estimation)
- For the standard and the profile to be processed, turn off hardware supporting all other standards and profiles (e.g. if MPEG4, turn off CABAC engine in entropy decoder)
- Dynamic frequency and voltage scaling set the DV engine frequency and voltage operating points depending on the resolution being supported D1 at 30fps requires ~2.66 times lesser compute than 720P at 30fps.

### Power Reduction @ Frame level

- u Turn off un-used hardware depending on I vs P vs B frame
- u Turn off un-used hardware depending on Interlaced vs Progressive content



## Power Reduction @ MB Level

- Turn off individual hardware accelerators as soon as the computation for the current Macro-block is done (due to variability, the pipeline cannot be fully balanced)
- During motion-compensation the compute requirements vary depending on 1 motion vector vs 4 motion vectors per macro block, they also vary depending on motion vector resolution in terms of pixel vs half pixel vs quarter pixel.
- Turn off deblocking filter, if boundary strength is 0 or there is significant change (gradient) across block boundary in the original image.



## <u>Dynamic vs Leakage Power</u> <u>Scaling with Resolution</u>





# Power Reduction for CIF

- u Compute requirements significantly lower, voltage scaling is limited by Vcc-min.
- Running the engine at lower frequency without lowering the voltage – does not help save energy
- u Multiple approaches:
  - Significant cycle overhead in completely switching off the engine and switching it back on – does not help at macro-block level, marginal gain at frame level, but done over a group of frames can give power reduction
  - Power down the engine but save the state using retention flops and putting memories in the retention mode – area overhead
  - Design the engine as a "bit slice" and switch off one half while processing CIF – has software implications.



## DVFS – applicability at SoC level?

- u Audio does not scale with resolution
- u Any system function which demands real-time response in a narrow time window
- Modules in the video output processing chain which are tied to the resolution of the display device as against resolution of the video being processed

Ø Implies multiple voltage domains- can have system level cost implications from PMU standpoint



# Managing data bandwidth

- Increasing resolution implies scaling the IO bandwidth accordingly – but may not be feasible, practical – DDR speed limitations, SDRAM limitation, power, area impact etc.
- Need architecture level solution to address this bottleneck
  - n On-chip buffers
  - n On the fly computation
  - n Improving efficiency of 2D transfers
  - n SDRAM data organization
  - n Algorithmic solutions?
- u At lower resolution, can minimize SDRAM power by powering down unused banks



### EDA Challenges and Opportunities

- u System level power estimation/modelling
- u Power management synthesis and verification
- u Physical design challenges
  - n Automated clock gating
  - n Physical design aware low power synthesis
  - n Multi-Vt optimization
  - n Timing closure at multiple corners (with DVFS need to sweep Vmin and Vmax range)
- Building a configurable IP generator supporting both run-time as well as compile-time scalability (e.g. building a MPEG4 Decode only engine optimized for power and area, with no software change)



### Summary

- Portable video entertainment market needs a multiformat, multi-standard digital video engine with HD capability at low cost and low power
- u Highlighted the "variability" in the digital video processing needs including data driven variability
- Discussed the entire spectrum of power management techniques and its applicability to the power minimization of the DV engine
- u Highlighted a few system level considerations and their architectural implications
- u Finally, presented EDA challenges and opportunity



# THANK YOU

