Challenges and Opportunities on Thermal Modeling and Simulation for Advanced 3DIC System

Norman Chang, Ansys Fellow, IEEE Fellow

Chief Technologist, Electronics, Semiconductor, and Optics BU

EDPS

Oct., 2023



### Electronics, Semi and Optics – Thermal/Reliability as Common Challenge



Package Substrate for Silicon interpo

**FinFET and 3D-IC** 



Chip-Package-Board





HPC and Cloud



Automotive, Aerospace and Industrial

# 

#### Power density is the villain! Look at the trend



### Ref : Too Hot to Test, 2021



Multiphysics Solutions for SI/PI/TI/Reliability (Electromagnetics, Optics, Thermal) x (Die, 3D-IC, Package, Board)

### Thermal/Reliability challenges for Heterogeneous Integration 3DIC

- Early and layout on-chip and package/system thermal/stress analysis
- ESD/EMI/EMC/Rad hard of chip-package-system and on-chip wearout
- High-freq power and signal integrity including TSVs/Interposer
- Co-optimization of package/chip thermal/stress migitation with ML

### Aggressive System Scaling with 3DIC Requires Thermal/Stress Analysis Solution



Doug Yu, TSMC, ECTC keynote speaker, 2020 Focusing on PPAT (Power/Performance/Area/Temperature) D, 1101 Chips 2021



1.2 Trillion Transistors

46.225 mm<sup>2</sup> Silicon



815 mm<sup>2</sup> Silico

02023 ANSYS, Inc.

## Challenge for Accurate Multi-Die Thermal Analysis





Local Hotspot

**Global Hotspot** 

 Vertical die-to-die thermal crosstalk can cause additional ~20C to ~30C hotspot temperature difference



# High-Capacity Static Thermal Flow Much Needed for Large 3DIC

### Driving applications: HPC / AI / 5G

- ✓ Hierarchical CTM stitching technique to assemble the thermal model to handle heterogenous 3D-IC system
- Intelligent Adaptive Meshing can be applied to finish the hierarchical thermal simulation in hours and continue to innovate on fast and accurate hierarchical thermal simulation
- ✓ 3D-IC junction Tmax optimization with HTC applied on the package surface and heat spreader components included.





### 3D-IC system with CoWoS package

### Thermal result for large 3DIC

"Invited Paper: Solving Fine-Grained Static 3DIC Thermal with ML Thermal Solver Enhanced with Decay Curve Characterization", H. He, N. Chang, et al., ICCAD, 2023



### System-Aware Thermal Solution of 3D-IC



#### Driving applications: mobile / networking

|           | Item         | Size             | RedHawk-SC Electrothermal (ET) |
|-----------|--------------|------------------|--------------------------------|
| Backend   | die_0        | ~25.4 mm*14.4 mm | Detail CTM                     |
|           | die_1        | ~25.4 mm*14.4 mm | Detail CTM                     |
| Dackenu   | InFO         | ~26.4mm*31.3 mm  | Detail CTM                     |
|           | C4 bump      | ~0.4M            |                                |
| Pkg/Syste | PKG          | icepeak-Model    | Boundary condition             |
|           | <b>m</b> PCB | icepeak-Model    | Boundary condition             |
|           | Heat Sink    | icepeak-Model    | Boundary condition             |



1) and 2) structure diagram of 3DIC stack-up components

- 3) Tile-based metal density distribution in InFO CTM
- 4) Tile-based temperature-dependent power distribution in logic die CTM





RHSC-ET Thermal Results with Icepak Boundary Condition

"Comprehensive Thermal Solution in Advanced Large Scale 3DIC Design", DesignTrack, DAC 2023



©2023 ANSYS, Inc

### Farly design exploration: Thermal Sensitivity Analysis

**OptiSLang Sensitivity results** 



#### Sensitivity analysis plots interpretation

#### <u>CoP matrix</u>

- $y_1$  has more effect on  $T_{max}$  than  $x_1$
- <u>Response surface 3d plot</u>
  - Shows the approximation surface plot of objective function in terms of selected design variables
- <u>Residual plot</u>
  - Displays T<sub>max</sub> bounds for the given limits for x<sub>1</sub> and y<sub>1</sub>
  - Comparison between predicted and actual T<sub>max</sub> to assess the quality of prediction
- Parallel coordinates plot with clustering
  - Plots all design variables and objective function, clustered by k – means
  - Significance of y1 on Tmax shown by CoP matrix can also be observed here

#### Design variables x and y position of power tile

Workflow details

#scenario = 18

Total runtime = 51 minutes



# Thermal-Induced Stress Simulation Methodology

- Mechanical stress caused by change of temperature of a material
  - Thermal expansion during assembly or in operation
  - Temperature cycling in operation
  - Thermal impacts on strain/stress



Coefficient of thermal expansion (CTE) mismatch between two materials causes warpage and displacement

Step-temperature impact on displacement



Von Mises Stress



8

# Detailed On-Chip Sensor Based Thermal Throttling Simulation



64 Sensor locations highlighted on VEGA 20 of Radeon VII card (picture from AMD)



Vega 20: Under The Hood - The AMD Radeon VII Review: An Unexpected Shot At The High-End (anandtech.com)

- The trend is that there will be more and more on-chip thermal sensors for DVFS control
- Optimization of on-chip thermal sensor locations are much needed and can be achieved through architecture-level and detailed layout-level thermal simulation



## Chip, Package, System Aware Thermal Throttling Simulation

simulating thermal throttling in system



©2023 ANSYS, Inc.

### Fast Static/Transient Thermal Analysis Needed for 3DIC Multiphysics



Performance and reliability degradation

- Aging, EM, IR drops, stress, switching speed, etc.
- Fine grained thermal analysis on large 3DIC designs not possible using purely traditional FEA/CFD based approaches
- Long sequences of transient power need to be simulated to accurately predict how thermal hotspots change with time

Time

t=tn

t=0 t=t1

Architecture level fast static/transient thermal analysis for various optimizations are required. (i.e. power/DvD/thermal/stress/test/sensor place)

"Emerging Challenges on Thermal Modeling and Simulation for Advanced 3DIC Systems", N. Chang, Keynote, REPP, 2022

# Emerging Challenges and Opportunities on Thermal Modeling and Simulation for Advanced 3DIC System





Ref : Too Hot to Test workshop, Intel, 2021, https://youtu.be/0gPSbZqbXUg

Scan test

- ✓ Shift in : many chains w/ 100s of MHz, high Cdyn (about 3-10X of real-world application) w/ high total power
- Capture @speed : running at GHz of speed for several cycles, high power density / power, severe Vdroop and high Tj-rise at different locations;
  Tj-rise , Fmax , Vmin, Vdroop , Power,

Functional test

- Cache load / Structured Based Functional Test, system ported test
- Shmoo plot of Fmax, Vmin, Tj-rise, Vdroop, Power
- Thousands of test patterns each of 0.5-1msec generating high power density, Tj-rise, and Vdroop
- ✓ Tj-rise and Vdroop are correlated too due to leakage power exponential dependence of Tj-rise

Challenges and opportunities on thermal modeling and simulation for advanced 3DIC systems:

- Performing fine-grained static and transient thermal analysis on large 3DIC designs is required and demand adaptive meshing or machine-learning technology to overcome the limitation using traditional CFD/FEA based solvers.
- Architecture-level thermal and thermal-induced stress analysis are required due to the thermal coupling from cross-die horizontally and vertically with transient-based power profile among chiplets in 3DIC.
- Heterogeneous Integration 3DICs may consist of analog/mixed-signal and digital designs which have very different thermal and stress requirements that need to be co-optimized among chiplets and package in 3DIC.
- For Silicon Photonics 3DICs, accurate thermal gradient analysis is required for the co-optimization of 3DIC package and required thermal heater for PIC design.
- Testing of large 3DIC consisting of CPU/GPUs, etc. presents a major challenge due to multiple localized thermal hotspots and dynamic voltage drop affecting yield. Co-optimization of test techniques and localized thermal hotspots and Vdroop on 3DIC should be considered.



Grand Challenges in Thermal/Reliability Simulation for 3DIC

Anticipate Physical Integrity Challenges





### Possible Machine-learning based Static Thermal Solver with Distributed HTC

- Developed a novel Machine-Learning based Thermal solver to accurately predict chip temperatures for arbitrary power maps and distributed HTC patterns.
- The ML-Solver is inspired from keys ideas of traditional Ansys solvers. It iteratively solves for temperature on discrete subdomains given the power map, HTC and initial temperature. Flux conservation in each iteration is established using pre-trained ML models
- The ML-Solver is about 100x faster than current solvers and accurately predicts high-fidelity temperature maps on the chip.





Ranade, R., Haiyang, H., Pathak, J., Kumar, A., Wen, J. & Chang, N. (2022). A Thermal Machine Learning Solver for Chip Simulations. *4th ACM/IEEE Workshop on Machine Learning for CAD* 



### Thermal Multi-physics Solving Augmented by Distributed ML Framework



SeaScape Distributed Computational Platform



- 1. "Invited Paper: Solving Fine-Grained Static 3DIC Thermal with ML Thermal Solver Enhanced with Decay Curve Characterization", H. He, N. Chang, et al., ICCAD, 2023
- 2. "High-Speed, Low-Storage Power and Thermal Predictions for ATPG Test Patterns", Z. Liang, N. Chang, et al., ITC, 2023
- 3. "A Composable Machine-Learning Approach for Steady-State Simulations on High-resolution Grids", R. Ranade, et al., Neurips, 2022
- 4. "A Thermal Machine Learning Solver for Chip Simulation", R. Ranade, H. He, J. Pathak, N. Chang, A. Kumar, J. Wen, IEEE MLCAD, 2022
- 5. "ML-based Fast On-chip Transient Thermal Simulation for Heterogeneous 2.5D/3D IC Designs", N. Chang, A. Kumar, J. Wen, H. He, S. Pan, D. Geb, W. Xia, S. Asgari, M. Abarham, Q. Li, Y. Li, Z. Feng, IEEE VLSI-DAT, 2022
- 6. "On-chip Transient Hot Spot Detection with a Multiscale ROM in 3DIC Designs", D. Geb, S. Asgari, A. Kumar, J. Wen, N. Chang, S. Pan, M. Abarham, H. He, V. Gandhi, IEEE ECTC, 2022
- 7. "Security Integrity Analytics by Thermal Side-Channel Simulation: an ML-Augmented Auto-POI Approach", J. Wen, H. Chen, M. Abarham, H. He, S. Pan, L. Lin, W. Li, G. Ni, A. Kumar, D. Geb, S. Asgari, N. Chang, T. Lou, R. Jang, DesignCon, 2022
- 8. Rishikesh Ranade, Chris Hill, Haiyang He, Amir Maleki, Norman Chang, and Jay Pathak. 2021b. A composable autoencoder-based iterative algorithm for accelerating numerical simulations. arXiv preprint arXiv:2110.03780 (2021).
- 9. "ML-augmented Methodology for Fast Thermal Side-channel Emission Analysis", N. Chang, D. Zhu, L. Lin, D. Selvakumaran, J. Wen, S. Pan, W. Xia, H. Chen, C. Chow, G. Chen, IEEE ASP-DAC, 2021
- 10. "Model-based Digital Twin for Anomaly Detection of On-chip Transient Thermal Response", A. Kumar, N. Chang, E. Yang, W. Chuang, J. Wen, S. Pan, W. Xia, D. Geb, M. Shih, Y. Li, H. He, S. Asgari, M. Abraham, S. Cho, R. Jang, DesignCon, 2021
- 11. Haiyang He and Jay Pathak. 2020. An unsupervised learning approach to solving heat equations on chip based on auto encoder and image gradient. arXiv preprint arXiv:2007.09684 (2020).
- 12. "DNN-based Fast Static On-chip Thermal Solver", J. Wen, S. Pan, N. Chang, W. Chuang, W. Xia, Deqi Zhu, A. Kumar, E. Yang, K. Srinivasan, Y. Li, IEEE SEMI-THERM, 2020.



# 3D-IC Thermal Integrity



Power Distribution in 3D



**Heat Dissipation** 



Simulation is driving Implementation

Machine Learning is enabling Simulation-based optimization



# ML-based Power & Thermal Design Space Exploration in System Technology Co-optimization (STCO)



Need for "Thermal aware" Architecture Validation with the Help of Machine Learning



# ML-based Adaptive Metamodel of Prognosis Framework





### Optimization of Mobile Pkg Material Calibration for Thermal/Stress Integrity

#### As-is process/Challenges

- Sensitivity analysis of thermal material properties of mobile AP
- Fast and Accurate equivalent virtual thermal testing model  $\rightarrow$  Simple Model
- Trial & Error approach for fine tuning material → Expensive!
- Too many trials (1000+) need to be performed for 10+ parameters
- Challenges:
  - Significant manual effort for 1000+ trials
  - Accurate simple model for transient thermal analysis
  - Reduced Dependency on package type

#### Ansys Value Stream

- Robust workflow integration and optimization with optiSLang-AEDT Icepak
- Reduced input BC conditions and material properties (h,K,CP and Den)
- Sensitivity analysis with thermal material parameter of components.

#### Outcome

- Extract optimized equivalent properties of Simple model that is well matched with reference data
- Automatic DOE reduction to reduce the overall time for optimization.
- Reduced time for optimization and increased accuracy
  - 2~4 Weeks  $\rightarrow$  4~5 Days

"Thermal Model Simplification of Mobile Device with Adaptive Metalmodel of Optimal Prognosis (AMOP)", V. Krishna, et al., iTherm, 2022





- Thermal plays a key role in 3DIC Multiphysics interaction and demand for fast early architecture tradeoff and layout-level sign-off solutions
- Customized ML technology enable innovative applications in the speed-up of Multiphysics simulation and fast in-process co-optimization
- optiSLang can work with Ansys Multiphysics tools and other companies' tools for co-optimization of design workflow including thermal-centric co-optimization scenario



# Acknowledgement

- Thanks to Akhilesh, David, Mehdi, Saeed, Haiyang, Rishi, Wenbo, Prakash, Chris, Lang, Jerome, Preeti, Hua, Zakir, Chunta, Jibin, Mallik, Jay, Tianhao, Prith, and Ying for many discussions of fast thermal simulations and GPT applications
- Thanks to Zhe-Jia, Yu-Chong, Tsao-Her, Gary, David, Piin-Chen, Owen, Roger, and James of NTU for ChatGPT/thermal/DvD ML discussion



