# Machine Learning for Next Generation EDA

#### Paul Franzon

Cirrus Logic Distinguished Professor Director of Graduate Programs Department of Electrical and Computer Engineering NC State University 919.515.7351, paulf@ncsu.edu

# Outline

### Introduction

### Applying Machine Learning to EDA

- ♦ IP Reuse
- Physical Design
- Replacing Design Rules

### Machine Learning Acceleration

### Conclusions

# **Machine Learning**

Learning from Data



#### NC STATE UNIVERSITY

# ... Types of Learning

### Off-line learning

- Optimized neural network against a fixed training set using off-line optimization
- Usually labeled data

#### Incremental Learning

- Modify learning using inline data
- Usually labeled data

### On-line (In-line) learning

- Learn entirely using data in the field
- Alternate learning and inference cycles
- Sometimes unlabeled dta





# **Questions being addressed @ NCSU**

- What does machine learning mean to hardware designers?
- A. A. Building computation engines that specialize in machine learning.
- **B.** Applying machine learning to EDA.

# **Surrogate Modeling**

"Train" a global model that is fast to evaluate from multiple evaluations of a detailed model that is slow to evaluate



Start

Select initial

### NC STATE UNIVERSITY Surrogate Modeling

**Basic idea:** Accurately approximating the black-box design with *limited* number of samples.



### **Advantages:**

- Modeling accuracy and efficiency
- Fast to execute mathematical expressions vs. systematic simulation
- Various modeling techniques exists for choices: Kriging, Radial-based functions, neural networks, etc.

# Outline

### Introduction

### Applying Machine Learning to EDA

- IP Reuse
- Physical Design
- Improving DFM design closure\*
- Accurate modeling for high speed IO\*

### Conclusions

### \*Wont be presented today





### Applying Machine Learning to Electronic Design

Principal Investigators Elyse Rosenbaum, Illinois (Center Director) Paul Franzon, NCSU (Site Director) Madhavan Swaminathan, Georgia Tech (Site Director)







# Vision

### This is NOT our vision



# Vision

To enable fast, accurate design and verification of microelectronic circuits and systems by creating and applying machine learning algorithms to derive models used for electronic design automation

- These models can also be used to obscure IP



# **ML in EDA Progression**

1<sup>st</sup> Generation: Big data models for improving design productivity through machine learning

2<sup>nd</sup> Generation: "Little data" models for improving design productivity through machine intelligence

3<sup>rd</sup> Generation: Models and methods to flatten the design and verification hierarchy







# Center for Advanced Electronics Through Machine Learning

Joint NSF/industry funded center

Industry cost: \$50,000 per year Benefits: Rights to all IP; Early access to students; Mentor/guide/select projects



NC STATE



# INTELLECTUAL PROPERTY REUSE THROUGH MACHINE LEARNING Weiyi Qi, Bowen Li, Yang Yi, Brian Floyd, Paul Franzon North Carolina State University

### CENTER FOR ADVANCED ELECTRONICS THROUGH MACHINE LEARNING



# **Problem Statement**

• Port analog and custom digital IP from one technology to another, e.g.





# **Solution Alternatives**

| Approach                              | Pros                                                          | Cons                                                                  |
|---------------------------------------|---------------------------------------------------------------|-----------------------------------------------------------------------|
| Optimization using<br>Spice Model     | Direct, accurate                                              | Very long latency                                                     |
| Optimization using design equations   | Quick                                                         | Inaccurate                                                            |
| Expert design<br>system               | Works well                                                    | Requires expertize<br>to be captured for<br>each individual<br>design |
| Optimization using<br>Surrogate Model | Quick, accurate                                               | Requires SM to be fitted offline                                      |
| Bayesian<br>optimization              | Accurate, fewest<br>overall simulations,<br>Can start with SM | Longer latency than using SM only                                     |



# **Bayesian Optimization**

• We propose to use a Bayesian optimization technique for efficient design optimization:

Let *f* denote the statistical model and *D* the samples; we have:

$$P(f|\mathbf{D}) = \frac{P(\mathbf{D}|f)P(f)}{P(\mathbf{D})} \propto \frac{P(\mathbf{D}|f)P(f)}{\text{Likelihood Prior Model}}$$

• Bayesian optimization flow:



#### Two key components:

- (1) Statistical surrogate model:
  - Gaussian Process (GP) models or Student-T Process (TP) models
  - Fit existing data and predict performance expectation and uncertainty; prior models are updated with newly acquired sample to form posterior models

#### (2) Acquisition function:

• Determining next best sample to simulate



### Bayesian Optimization: Picking Next Point to Simulate

Probability of Improvement (PI) calculates how probable it is that simulating a new point will improve f(x) at that point



Plot from: Forrester, Alexander, Andras Sobester, and Andy Keane. *Engineering design via surrogate modelling: a practical guide*. John Wiley & Sons, 2008.



# **Circuit Blocks to be Studied**

• 77-GHz vehicular radar blocks:



![](_page_19_Picture_0.jpeg)

# **Preliminary Results:** Porting 77-GHz Balun from 8XP to 9HP

- 77-GHz PA uses balun at output. Conceptually simple, but surprisingly complicated to optimize through ML.
- Bayesian optimization subroutine first applied in existing technology (8XP).
- Then we reuse the balun in 9HP process with the same approach; the final optimized design will be used for tape-out

![](_page_19_Figure_5.jpeg)

![](_page_20_Picture_0.jpeg)

# Step 1: Defining Range and Requirements for Balun

An LC balun is a commonly used passive balun in microwave IC that converts a signal into a pair of out-of-phase signals, or vice versa, while suppressing the common mode on the balanced port output.

| Design Parameter | Range        |
|------------------|--------------|
| I0_I             | [30u, 1000u] |
| 10_w             | [2u, 100u]   |
| 10_s             | [3u, 20u]    |
| I1_I             | [30u, 1000u] |
| l1_w             | [2u, 100u]   |
| l1_s             | [3u, 20u]    |
| CO               | [20f, 200f]  |
| C1               | [20f, 200f]  |

![](_page_20_Figure_4.jpeg)

| S-Parameters              | Requirements       | S-Parameters                    | Requirements |
|---------------------------|--------------------|---------------------------------|--------------|
| S <sub>33</sub>           | N.A. (< -10 pref.) | $dB Loss(S_{23}, S_{13})$       | > -5         |
| $ S_{22} - S_{11} $       | < 0.1              | $ S_{23} - S_{13} $             | < 0.1        |
| $ \phi(S_{22} - S_{11}) $ | < 15               | $ \phi(S_{23} - S_{13}) - 180 $ | < 15         |

![](_page_21_Picture_0.jpeg)

# Step 2a: Design Analysis: Input Parameter Screening

- Design parameter <u>screening</u>
  - Not all design parameters are of equal importance
  - Large number of parameters will induce the curse of dimensionality
- The modified Morris' screening algorithm (Campolongo, 2007) uses onefactor-at-a-time (OFAT) sample scheme that depends *linearly* on the number of design parameters; suitable for complex design analysis.

![](_page_21_Figure_6.jpeg)

![](_page_22_Picture_0.jpeg)

# **Step 2b: Design Space Exploration**

- Design objective analysis:
  - Design objective analysis prevents over- or underestimated design goals
  - Designer can also learn about the design tradeoffs, either graphically or numerically, by examining the correlation coefficient table.
  - Designer can also find the upper- and lower-bounds for each design objective, and map them to [0,1] for multi-objective optimization

![](_page_22_Figure_6.jpeg)

![](_page_23_Picture_0.jpeg)

# Step 3: Optimization in Existing Technology

Three approaches for optimizing the balun design are compared:

- (1) Genetic programming: A representation of evolutionary programming algorithms that are widely used for analog design synthesis/reuse
- (2) Bayesian optimization: Use Gaussian process surrogate model
- (3) Bayesian optimization: Use Student T process surrogate model

![](_page_23_Figure_6.jpeg)

![](_page_24_Picture_0.jpeg)

# **Step 3: Balun Optimization Result in Existing Technology**

![](_page_24_Figure_2.jpeg)

| Metric                          | Target | Human Result | ML Result |
|---------------------------------|--------|--------------|-----------|
| S <sub>33</sub>                 | < -10  | -7.7         | -10.5     |
| $ S_{22} - S_{11} $             | < 0.1  | 0.22         | 0.06      |
| $ \phi(S_{22} - S_{11}) $       | < 15   | 25.5         | 9         |
| $ S_{23} - S_{13} $             | < 0.1  | 0.31         | 0.09      |
| $ \phi(S_{23} - S_{13}) - 180 $ | < 15   | 28.7         | 4.5       |
| $dB \ Loss(S_{23}, S_{13})$     | > -5   | -9.4         | -4.8      |

**S13** 

**S33** 

/ S22

**S11** 

# Step 4: Porting to New Technology (9HP)

Now we migrate the passive LC balun design into IBM 9HP node with **three key components kept consistent**, which makes IP migration a *push-button* process:

- (1) Design IP topology
- (2) Algorithm settings:
  - Surrogate model type & acquisition function
- (3) Design objective functions:
  - Objective scalarization weights

![](_page_25_Figure_7.jpeg)

![](_page_25_Figure_8.jpeg)

| Metric                                                                             | Target             | Result in 8XP | Result in 9HP |
|------------------------------------------------------------------------------------|--------------------|---------------|---------------|
| S <sub>33</sub>                                                                    | N.A. (< -10 pref.) | -10.5         | -9.7          |
| $ S_{22} - S_{11}  \\  \phi(S_{22} - S_{11}) $                                     | < 0.1<br>< 15      | 0.06<br>9     | 0.01<br>1.5   |
| $\begin{aligned}  S_{23} - S_{13}  \\  \phi(S_{23} - S_{13}) - 180  \end{aligned}$ | < 0.1<br>< 15      | 0.09<br>4.5   | 0.04<br>3.4   |
| $dB Loss(S_{23}, S_{13})$                                                          | > -5               | -4.8          | -2.4          |

![](_page_26_Picture_0.jpeg)

# MACHINE LEARNING IN PHYSICAL DESIGN

Bowen Li, Weiyi Qi, Billy Huggins, W. Rhett Davis, Paul Franzon ECE Department North Carolina State University

![](_page_26_Picture_3.jpeg)

![](_page_27_Picture_0.jpeg)

### **Physical Design**

![](_page_27_Figure_2.jpeg)

Source: Wikimedia Commons

28

#### **Problem Statement:**

How to set up control knobs to achieve specific desired outcomes

| Input Knob         | Meaning                    | Output      | Units     |
|--------------------|----------------------------|-------------|-----------|
| Clock Target       | Clock frequency            | Power       | W         |
| Num_layer          | Number of routing layers   | Area        | Sq.mm.    |
| Init_density_ratio | % cell area                | Setup Slack | ps        |
| skew               | Clock skew                 | Hold slack  | ps        |
| Sink_max_tran      | Clock tree leaf trans time | Congestion  | % density |
| Buf_max_tran       | Clock tree buffer tr time  | DRC error   | count     |
|                    |                            | count       |           |

![](_page_29_Picture_0.jpeg)

### **Initial Experiment**

#### **Cortex SOC**:

Gate count: 18k gates Net count: 18k nets Target clock: 10 ns

#### Design Goal:

Minimize area while meeting timing and being DRC clean.

![](_page_29_Figure_6.jpeg)

#### **Technology:** NCSU 45 PDK

![](_page_30_Picture_0.jpeg)

### **Building a Surrogate Model**

### Model building:

- Each routing run takes 40 minutes
- Total of ~50 runs needed to complete model
- Total time: Overnight
- Kriging Model

### Models fitted:

- Congestion
- Setup slack
- Hold slack

![](_page_30_Figure_11.jpeg)

# **Physical design results**

#### Design Iterations after model lookup

| lter. | CLKper | Den. | Layer | Max<br>Skew | Sink<br>Max<br>Tran | Cong.           | Viol | Hold<br>slack | Setup<br>Slack | Comments                                            |
|-------|--------|------|-------|-------------|---------------------|-----------------|------|---------------|----------------|-----------------------------------------------------|
| 1     | 10     | 0.6  | 8     | 300         | 400                 | 0.28H<br>/1.51V | 105  | -61.3ps       | 6.46ns         | Over-congested;<br>Hold time violated               |
| 2     | 10     | 0.5  | 8     | 300         | 400                 | 0.03H<br>/0.39V | 6    | -48.7ps       | 6.55ns         | Over-congested;<br>Hold time violated               |
| 3     | 10     | 0.45 | 8     | 300         | 400                 | 0.02H<br>/0.11V | 0    | 2.4ps         | 6.48ns         | No DRC errors; hold<br>fixed; hold margin is<br>low |
| 4     | 10     | 0.45 | 8     | 200         | 300                 | 0.02H<br>/0.17V | 0    | 10.5ps        | 6.37ns         | Final Design                                        |

![](_page_31_Figure_4.jpeg)

- Surrogate model provides guidance for design & optimization
- Able to achieve an optimal design with 4 iterations
- □ Human designer took 20 iterations

| # of Stand         | ard Cells | 39990                       |  |  |  |
|--------------------|-----------|-----------------------------|--|--|--|
| Area               | Core      | 98109.284 (313.224*313.224) |  |  |  |
| (µm <sup>2</sup> ) | Chip      | 54363.008 (233.158*233.158) |  |  |  |
| Cell De            | ensity    | 55.4 %                      |  |  |  |

![](_page_32_Picture_0.jpeg)

### **Building a more accurate model**

#### **1.** Data Selection

![](_page_32_Figure_3.jpeg)

![](_page_32_Figure_4.jpeg)

![](_page_33_Picture_0.jpeg)

### Surrogate Modeling for GR in Physical Design Surrogate Model Builders

- Artificial Neural Networks (ann)
- Kriging
- Radial Basis Function (rbf)
- Kriging genetic

![](_page_33_Figure_6.jpeg)

![](_page_34_Picture_0.jpeg)

# 3. Surrogate Modeling for GR in Physical Design

Model Accuracy: Root Relative Squared Error

![](_page_34_Figure_3.jpeg)

The Root Relative Squared Error (RRSE) is relative to squared error compared to a simple predictor (the average of values).

$$RRSE = \sqrt{RSE} = \sqrt{\sum \left( f\left( \frac{\Box}{x_i} \right) - y_i \right)^2 / \sum \left( \overline{y_i} - y_i \right)^2}$$

RRSE is close to 0  $\rightarrow$  model is much better than a simple predictor RRSE is close to or larger than 1  $\rightarrow$  model is worse than a simple predictor

RRSE < 0.5 is the target.

### **3. Surrogate Modeling for GR in Physical Design** Model Performances

|                | area_trial | TNS   | violating_path | WNS   | x_neg_1_4 | x_neg_5_8 | x_pos_0_10 | x_pos_11_20 | hold_slack_trial | power_trial |
|----------------|------------|-------|----------------|-------|-----------|-----------|------------|-------------|------------------|-------------|
| anngenetic     | 0.000      | 0.242 | 0.383          | 0.076 | 0.496     | 0.295     | 0.136      | 0.139       | 1.001            | 0.922       |
| ann            | 0.003      | 0.270 | 0.364          | 0.079 | 0.517     | 0.302     | 0.193      | 0.227       | 1.000            | 0.939       |
| annfixed       | 0.003      | 0.267 | 0.411          | 0.090 | 0.536     | 0.302     | 0.201      | 0.181       | 1.002            | 0.948       |
| rational       | 0.004      | 0.262 | 0.394          | 0.072 | 0.542     | 0.428     | 0.362      | 0.371       | 1.016            | 0.961       |
| gpmlgenetic    | 0.000      | 0.310 | 0.392          | 0.090 | 0.521     | 0.439     | 0.366      | 0.376       | 1.000            | 0.929       |
| kriginggenetic | 0.001      | 0.311 | 0.459          | 0.096 | 0.591     | 0.382     | 0.269      | 0.275       | 1.101            | 0.942       |
| lssvmgenetic   | 0.001      | 0.309 | 0.403          | 0.088 | 0.522     | 0.437     | 0.377      | 0.378       | 1.000            | 0.930       |
| elm            | 0.000      | 0.322 | 0.400          | 0.093 | 0.523     | 0.448     | 0.373      | 0.377       | 1.000            | 0.936       |
| kriging        | 0.000      | 0.338 | 0.461          | 0.176 | 0.612     | 0.405     | 0.297      | 0.307       | 1.003            | 1.003       |
| gpmldirect     | 0.004      | 0.326 | 0.429          | 0.099 | 0.547     | 0.449     | 0.410      | 0.421       | 1.000            | 0.930       |
| rbf            | 0.038      | 0.315 | 0.416          | 0.108 | 0.552     | 0.492     | 0.403      | 0.421       | 1.020            | 0.954       |
| rbfgenetic     | 0.012      | 0.322 | 0.418          | 0.098 | 0.542     | 0.473     | 0.441      | 0.466       | 1.000            | 0.966       |
| krigingpso     | 0.004      | 0.334 | 0.509          | 0.128 | 0.627     | 0.600     | 0.433      | 0.407       | 1.017            | 0.953       |
| krigingoptim   | 0.028      | 0.391 | 0.511          | 0.126 | 0.812     | 0.651     | 0.340      | 0.688       | 1.040            | 1.130       |
| krigingnsga    | 0.064      | 0.635 | 0.683          | 0.133 | 0.637     | 0.650     | 0.622      | 0.650       | 1.060            | 0.973       |
| ipol           | 0.079      | 0.369 | 0.798          | 0.098 | 0.812     | 0.791     | 0.808      | 0.829       | 1.064            | 1.397       |

36

![](_page_36_Picture_0.jpeg)

### **3. Surrogate Modeling for GR in Physical Design** GR Modeling Conclusion

![](_page_36_Figure_2.jpeg)

#### **Global Results can be predicted correctly:**

• Area, Total Negative Slack, number of violating paths, Worst Negative Slack, four groups of remaining tracks

#### **Best Model Builder:**

• Anngenetic

![](_page_37_Picture_0.jpeg)

### **4. Machine Learning for DR in Physical Design** DR Modeling Conclusion

![](_page_37_Figure_2.jpeg)

- Linear regression model for power and area
- Neural Networks model for hold slack
- Decision Tree models for hold slack and the number of DRC violations

![](_page_38_Picture_0.jpeg)

# Conclusions

- IP Reuse
  - Can result in more optimal analog designs than human designer
  - Can automate analog IP transfer between nodes
  - AND provide models for mixed signal verification
- Physical Design
  - Correct model choice permits problem to be modeled

# Acknowledgements

- Funding for cortical accelerators:
  - Google, DARPA

#### Students:

Joshua Schabel, Lee Baker, Sumon Dey, Weifu Li

### Funding for CAEML

NSF + 11 member companies

DARPA funded some of the background work (HEALICS)

- NCSU Faculty: Brian Floyd, Rhett Davis
- NCSU Students:

Bowen Li, Weiyi Qi, Yi Wang, Billy Hutchins, Tsing Zhu