

# Validation Strategies with pre-silicon platforms

**Shantanu Ganguly** 

Synopsys Inc April 10 2014

### Agenda

- Market Trends
- Emulation HW Considerations
- Emulation Scenarios
- Debug



#### **Mobile & Internet-of-Things Driving Growth**

Convergence  $\rightarrow$  SoC complexity  $\rightarrow$  New verification challenges





3

**Synopsys Confidential** 

## SoC Multi Core Architecture Trends

#### Massive feature integration IP Cores Per SoC Average # Differing IP Cores Driven largely by Moore's Law (supply) 120 and convergence (demand) **Distributed architectures** 100 Higher scalability (and independence) Sharing memory 80 Multiple processors (Multicore) CPU 60 Special purpose (MPEG, GFX, ...) 40 Always on controller Distributed DMA 20 Removes centralized DMA bottleneck 0 Increasing software complexity 2008 2009 2010 2011 2012 Re-use with multiple platform SoCs Source: Semico Research, Aug 2010 Broader end use market coverage per Design Starts by Process Node SoC with software programmability 22nm 32nm 45nm 65nm 90nm 130nm 180nm

© 2014 SyrFrom NIT Alumni Meet Keynote Speech: Jim Hogan

DSP

#### SYNUF

2014

2015

2013

4

#### SoC Design Complexity & Cost – Out of Control

Source: I.B.S. Inc.

- Increasing complexity means increased risk
  - At 32nm, a typical design has ~50% chance to meet all objectives
  - At 22nm, that number drops to ~30%

- "Designer productivity must improve to match chip complexity"
  - The later a problem is detected, the more impact it will have on design schedules





5

Source: Gartner

SYNOPSYS<sup>®</sup> Accelerating Innovation

### Need 'Shift-Left' for Faster Time-to-Market

Earlier HW verification, earlier SW bring-up



Smart Verification Strategy - Static and Formal

Intelligent Verification Methodology - Integrated, automated flows

#### Earlier HW-SW Bring-up - Faster emulation

Verification Continuum - Seamless flow

> SYNOPSYS<sup>®</sup> Accelerating Innovation

### Agenda

- Market Trends
- Emulation HW Considerations
- Emulation Scenarios
- Debug



### **Three Ways to Emulate 256 Million Gates**



#### **Custom Emulation Chip Advantages**

|                       | Custom FPGA                                                                                        | Custom Processor                                                                                   | Commercial FPGA                                                                                       |
|-----------------------|----------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------|
| Compile<br>Time       | Faster due to<br>emulation-specific<br>interconnect<br>architecture.                               | Faster due to processor-type architecture.                                                         | Slower due to chip<br>place & route. Easily<br>run on parallel on a<br>small server farm.             |
| Debug                 | Built-in:<br>Change probes<br>without recompile.                                                   | Built-in:<br>Change probes<br>without recompile.                                                   | Mix of built-in<br>(readback) and FPGA<br>resource. Some<br>probe changes<br>require recompile.       |
| Emulation<br>Compiler | Single-vendor:<br>Fully integrated, all<br>vendor-developed<br>at vendor cost.<br>Small user base. | Single-vendor:<br>Fully integrated, all<br>vendor-developed at<br>vendor cost. Small<br>user base. | Chip-level half by<br>FPGA vendor's large<br>team at no cost.<br>Large user base good<br>for quality. |

# Commercial FPGA based solution superior for Emulation

- Highest capacity per chip
  - ZeBu Server 3 module emulates 60M gates in 9 emulation chips.
    - Palladium XP needs 54 chips, Veloce 2 needs 75 chips.
  - Components fit better, fewer design nets get cut.
- Means less interconnect HW
  - Highest performance: 2 to 5 MHz
  - Low power, small size, reliable
- Latest process every two years
  - From Xilinx, no development cost

#### == Lowest TCO



Great things come in

small packages

- Fastest, coolest, smallest, cheapest, most reliable logic emulation.



#### ZeBu Server-3 Hardware Performance Advantage

- 6-8X larger capacity emulation chips
  - fewer nets must cross chip-to-chip,
  - more nets stay on-chip where they are fast
- High bandwidth communication between emulation chips, modules, units, host
  - Each chip has 600 Gbps bandwidth to other chips
  - 33 Gbps bandwidth between modules
  - 640 Gbps bandwidth between units
  - 4 Gbps host communication bandwidth
- FPGA architecture advantages
  - Specialized HW for arithmetic operations in FPGA.
  - Wire-to-wire and gate-to-gate mapping, no modeling abstraction

| Application                | Design<br>Size | Performance |
|----------------------------|----------------|-------------|
| GPU Quad Cluster           | 60 MG          | 5.0 MHz     |
| GPU Dual Cluster           | 40 MG          | 4.95 MHz    |
| GPU Single Cluster         | 33 MG          | 4.75 MHz    |
| Customized GPU             | 50 MG          | 3.75 MHz    |
| Communication<br>Processor | 60 MG          | 2.9 MHz     |
| Consumer SoC               | 100 MG         | 2.8 MHz     |
| Broadband SOC              | 80 MG          | 2.5 MHz     |

SYNOPSYS<sup>®</sup> Accelerating Innovation

### ZeBu Server-3 Throughput Advantage

- Highest raw performance hardware
- Multi-threaded runtime
- Truly concurrent communication message port
  - No blocking message transfer
- Dedicated high speed HW resources for implementation of transactors and communication ports.
- Each transactor can be modeled as separate process for maximum parallelism.





### Agenda

- Market Trends
- Emulation HW Considerations
- Emulation Scenarios
- Debug



### **Verification: Architecture to Silicon**

#### Accuracy

#### Architecture Exploration

Does the architecture meet performance and power requirements?

Processor & GPU selection Memory System Dimensioning Interconnect Configuration Cache Coherency Global Interrupts Power Management

**Design Technology:** 

Traffic Models Transaction Level Models Performance Visualisation Hybrid-Simulation

#### HW / SW Integration

Prototype SW before first silicon

OS Integration Driver Development Virtualisation Performance Optimisation

#### Tools:

Debugger Transaction Level Models systemC Virtual Prototypes FPGA Prototypes Performance Visualisation

#### **Functional Verification**

Does the design function correctly &meet performance and power requirements?

Protocol Compliance Interconnect BW & Latency Cache Coherency Data integrity Power & Clock Domains

#### **Design Technology:**

Traffic Models Verification IP System Monitor Performance Visualisation

#### **System Validation**

Validate that the design functions as specified.

Graphics Video Audio Browser Application Gesture recognition Phone functions Camera functions

**Design Technology:** 

FPGA Prototypes HW Accelerated Simulation Hybrid-Simulation Verification IP Transaction Level Models IO Traffic

#### **Turn Around Time**



#### **Common HW-Assisted Verification Modes**



**Confidential** 

### **Summary of Verification Modes**

|                                         | External<br>Hardware                                                                                     | Challenges                                                                |  |  |
|-----------------------------------------|----------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------|--|--|
| Simulation                              | <ul><li>Everyone has one</li><li>Perfect for 90% of designs</li></ul>                                    | Never fast enough                                                         |  |  |
| In-Circuit<br>Emulation (ICE)           | <ul><li>Highest performance</li><li>Physical connections</li></ul>                                       | Rate adapter availability                                                 |  |  |
| Embedded<br>Testbench                   | <ul><li>Very fast</li><li>No physical connections</li></ul>                                              | <ul><li>Needs synthesizable TB</li><li>Limited to Software only</li></ul> |  |  |
| Co-Simulation<br>(Signal-Level)         | <ul> <li>Simple to use</li> <li>Leverages existing TB</li> <li>Good for DUT bring-up</li> </ul>          | Little performance gain                                                   |  |  |
| Transaction-Based<br>Verification (TBV) | <ul><li>Highest performance</li><li>Works with virtual devices</li><li>No rate adapters needed</li></ul> | Availability of transactors                                               |  |  |

**Confidential** 

SYNOPSYS<sup>®</sup> Accelerating Innovation

## **One Emulator, Many Applications**

Advanced verification use modes with ZeBu Server-3

Transaction-based Verification system-level SoC verification

#### Hybrid Emulation

architecture optimization & early software development

Simulation Acceleration up to 100x simulation performance SYNOPSYS' 2010001 1 1

Synthesizable Testbench higher performance In-circuit Emulation real-world connections

Power-aware Emulation UPF support and SAIF output



### **Environment for In-Circuit Emulation**



Confidential

SYNOPS

#### **Transaction-based Verification Environment**



© 2014 Synopsys. All rights reserved. 19 Confidential

#### ZeBu Power Analysis Generates **Power Profiles**

- Ideal for block and system-level analysis with hardware and software •
- Captures every transition within each clock period •
  - User-programmable: blocks, registers, buses, entire SoC
- Includes cumulative total spanning user-defined clock numbers •
  - Large span for highest performance
  - Short span for highest accuracy
- Works seamlessly with PrimeTime •



Confidential

Innovation

### **ZeBu Hybrid Emulation**

Architecture optimization and early software development



- RTL runs at high speed in ZeBu while processor model or other components run in virtual prototype
- Reduces need to have high level models for all components

S Accelerating Innovation

SYIIUF

### Agenda

- Market Trends
- Emulation HW Considerations
- Emulation Scenarios
- Debug



### Why Is SoC Debug so Complex?





# Verdi<sup>3</sup>'s Unique Technology





Confidential

# ZeBu Simulator-Like Debug with Verdi<sup>3</sup>





- Full visibility (RTL & gate level)
  - All registers, nodes, memories
  - Run-time and post-run debug
  - No recompiles required
- Open standard support
  - FSDB, VCD, etc.
- Transaction level debug
- iCSA integration for fast time to waveform



| 1                | Bun Control                                                                                                                                                                                                                                                                                                                | LA Control | Memory Control              |               |             |
|------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|-----------------------------|---------------|-------------|
| a Dell'A         | Run                                                                                                                                                                                                                                                                                                                        | Select LA  | Load From File              | Store To File | Edit Memory |
| Close<br>Restart | Group driverClk :<br>Clock driverClk : 1,713,030,053 Cycles<br>Free Step 2 driverClk                                                                                                                                                                                                                                       | Enable LA  | Memory:  <br>Store first @: |               |             |
| Save             | Group sysbench ;                                                                                                                                                                                                                                                                                                           |            | Store last @:               |               |             |
| Restore          | Clock         CLK:         809,455,466         Cycles           Clock         ACLX:         202,363,066         Cycles           Clock         ACPCLK:         202,363,066         Cycles           Clock         ACLX:         202,363,066         Cycles           Clock         CLK:         202,363,066         Cycles |            |                             |               | 131         |
| Monitor          | Gock LCLK1:         202,383,066 Cycles           Clock MEMCLK:         202,383,066 Cycles           Clock MEMCLK2:         404,727,733 Cycles                                                                                                                                                                              |            |                             |               |             |
|                  | Clock UARTCLK : 202,353,865 Cycles<br>Free Step 2 CLK                                                                                                                                                                                                                                                                      |            |                             |               |             |
| SVA              |                                                                                                                                                                                                                                                                                                                            |            |                             |               |             |
| Zemi 3           | Waveform Dump (off) File: Imonitor.vcd Cleck: None selected                                                                                                                                                                                                                                                                |            |                             |               |             |



SYNOPSYS<sup>®</sup> Accelerating Innovation

© 2014 Synopsys. All rights reserved.

25

### **Complex Debug Scenarios**



Confidential

Accelerating Innovation

SYNUPS

# ZeBu Post Run Debug

Billion-Cycle Full Visibility, Optimized for HW/SW Co-Verification



- Testbench captures DUT state periodically
- DUT inputs captured on every clock
- Data stored on Host PC disk drive
- To debug, user selects a restore point... and loads it into ZeBu to generate waveforms

#### **Host Machine**

Innovation



# **HW/SW Debug Overview**

Embedded Processor Debug with Synchronized RTL, C, Assembly

- Enables co-debug between RTL and SW
- HW and SW debug synchronized in time
- View C/Assembly source, C variables, stack, memory
- Debugs multiple core simultaneously
- Supports all popular cores
- Easy to support additional cores or custom cores
- Custom Core support without exposing CPU internals



### **HW/SW Debug Use Models**

#### Verification Environment with C-Tests

- Part of SoC verification schedule
- Hardware debug with C-tests /stimulus
- · C-tests may have minimal OS or boot code
- Requires concurrent software and hardware debug

Use HW/SW Debug for this task

#### **Driver Development**

- Software (driver) development
- Fast speed is required (>1MHz)
- Approximate hardware is acceptable

Use Virtualizer for this task

#### Prepare for First Silicon Bring-Up

- Debug synthetic tests mimic specific use scenarios
- Tests run on a bare-metal OS
- Develop and bug tests on a pre-silicon model
- Get ready for silicon bring up

Use HW/SW Debug for this task

#### First Silicon Debug

- Observed failure running test on first silicon
- Debug the failure and isolate a design bug
- Create/Validate software/firmware workaround

Use HW/SW Debug for this task

SYILLE

### The End



**Synopsys Confidential** 

