

# The parallelism mirage

Patrick Groeneveld, Chief Technologist, Magma Design Automation EDP Monterey April 2009

### **Core challenge: Two-fold Complexity Increase**

 System complexity: Dealing with the sheer size of the System

• 1 Billion transistors, 100M+ gates



# •Silicon complexity:

# Dealing with the physics of manufacturing technology

- Electrical parasitics.
- Leakage & dynamic power
- Process variability & manufactura



### **ITRS Roadmap: Design cost and Design Automation**



April 13, 2009 - EDP 2009- 3

# Time-warp back to the previous century



### Design community 1997: "**Mr. EDA:** *tear down this wall!*"



# Magma in august 1997

### Terra Bella Avenue, Mountain View, CA



April 13, 2009 - EDP 2009- 5



# The Magma revolution

### Replaced wall by a 'smooth' design flow

Gradual transition in a single executable

### Avoided and simplified timing closure iteration.

It's the Flow, stupid!



## **ITRS 2007: recommendations**



International Technology Roadmap for Semiconductors

"Ideally, the time-wasting **iteration between logic and layout synthesis** in today's design methodologies could be eliminated by fusing these stages to simultaneously optimize the logical structure as well as layout of a circuit." ITRS Design Roadmap 2007

"To avoid excessive guardbanding due to poor estimates, logical design and eventually system-level design must become more closely linked with physical design."



### Synthesis is from Mars, Analysis is from Venus

- Sign-off tools:
- Verification, extraction, STA, spice, DRC, LVS
- Highly accurate
- Big and slow
- Is the 'whiner'



- Implementation tools:
- RTL synthesis, Placement, Routing, Optimization, Humans
- Poor accuracy
- Lean, mean
- Is the 'hacker'



April 13, 2009 - EDP 2009- 8

### **CUDA & EDA: What's wrong with this picture??**



MAGMA.

# How IC design really works...



### Data model greases the wheels



### **Design is a trade-off**



### The nature of the IC design 'beast'

### Fact:

Pushing all objectives costs:

- Human design effort and
- Run time





# **Building a design flow for Multiple objectives**



### The truth about RTL2GDS2 Design Automation



Synthesis Algorithms do only *one* thing well Cannot handle multiple objectives System is easily over-constrained

Algorithms must use *inaccurate models* of the physical reality

Algorithmic steps do things that could cause problems at later steps



### The ABC of a well-engineered design flow

A: Avoid Predict problems, avoid many 'by construction'

> **B: Build** Synthesize using an algorithm

# Implementation is from Mars!

C: Correct Fix each objective by incremental

modifications (ECOs).





### Many ABC's

### Timing closure:

- Pre-Buffering, logic optimization
- Mapping, placement
- Gate level optimization

### **Routing closure:**

- congestion driven placement
- Global, track & detailed routers
- Incrementally fix DRCs

#### • Variability robustness:

- add margin, robust clock trees
- Gate-level optimization
- Fix setup and hold violations for each corner

### **Crosstalk:**

- Oversize weak drivers, shielding
- X-talk avoiding global and track routing
- ECO-level x-talk fixing



April 13, 2009 - Patrick Groeneveld - 17

# Trading off Avoidance and Correction





### Effectively use parallel hardware.

### Intelligent avoidance

- Early in flow
- Center process corners

### Reduce cycles

- Converge faster
- SITL



# Amdahl's law: Why parallelization gain tapers off



2.5x

2.8x





April 13, 2009 – EDP 2009- 19

90%

95%

10x

20x

### Parallelizing a single step in the flow



April 13, 2009 - EDP 2009- 20

### Parallelizing the flow: Can we break the barrier?



## **Parallelism overhead**

- Successful parallelization requires very low overhead
- What causes overhead?
  - 1: Interactions between threads
    - Dependencies, locks, unequal load distribution
  - 2: Resource bottlenecks
    - I/O bandwidth to memory or disk
  - 3: Partitioning and re-assembly
    - After threads are done
    - Cost of fixing border problems
- So the key is to define design tasks that are 100% independent
  - That is why most analysis tools are easier to parallelize.
  - That is why routing is better parallelizable than optimization



Find independent partitions



# Partitioning is *Evil*

## • Why is it evil?

- Overall quality suffers
  - Cannot optimize across boundaries
- Partitioning problem is not easy.
- Good partitions take (non -parallelizable) effort!
  - Algorithmic
  - Need to duplicate data



Partitioning: A necessary evil for the sake of parallelism?

## **Repeatablity: parallelism's silent killer**

4 processors,
16 jobs to do.





April 13, 2009 - EDP 2009- 24

# Hitting each parallelization sweetspot

- Hierarchical internal representation
- Each tool needs a different view on this hierarchy:
  - Logical synthesis (logical hierarchy)
  - Floorplan synthesis (modified hierarchy)
  - Coarse placer (flat with clusters)
  - Voltage island generation (floorplan objects)
  - Timer (tiles at flop boundaries)
  - Parasitic extraction (net + region based)
  - Global router (10 x 10 tiles)
  - Track router tiles (columns)
  - Detailed router tiles (50 x 50)
  - DRC checking (net-based, region based)





# **Finding partitions**

### To hit sweetspot, we need to partition in different ways throughout the flow



### Using fine-grain partitioning in IC synthesis



### Assemble

• Hard to keep partitions independent (logically and physically)



## **Coarse-grain partitioning: 'Hydra'**



Partition/budget

Build each block in parallel



April 13, 2009 - EDP 2009- 28

### Multiprocessing secrets "they" don't want you to know about

- Code is hard to write
- Code is hard to debug
- Adds significant partitioning and assembly overhead
- Narrow sweetspot in EDA analysis tools:
  - DRC, SPICE, perhaps STA
- Synthesis algorithms are tough due to dependencies
  - Repeatability costs efficiency
- Amdahl's law still holds
  - Realistic gain maxes out at 4x
- Using parallelism costs Quality of Result
- Parallel EDA startups were spectacular failures
  - Monterey, Athena, Liga





### How to really speed up: fewer design cycles



### Summary: it's the flow, not the algorithm!

- A few EDA analysis tools may parallelize OK:
  - SPICE, DRC
- Synthesis tool flows parallelize poorly
  - Nature of the algorithms and flows, data size
  - Customers not willing to pay quality hit
- Overall flow speedup saturates at 3x-4x

# We'll figure it out somehow

• Parallelism is only a part of the solution...

