# **Telairity Semiconductor**



#### Introduction

- Today most ASIC designs are very complex
  - ⋄ 500k to 20M gates
- Complexity has caused specialization
  - The "Tall Thin Engineer" is gone!
- Logic designer (RTL) is not involved with physical design
  - More like "C" programming
- Desire to make RTL design independent of physical design as much as possible
  - ♦ Tools will figure it out!
- Timing closure & SI the major ASIC problems
- What does history tell us.....

- 1960 to 1980 system history (very small designs)
  - Partitioning and backplane design were a critical part of the design cycle
  - Wiring delays were a major issue
  - Reduction of crosstalk and reflections were a major part of the design effort

- 1980 to 1990 CMOS chip history (small designs)
  - ♦ ASIC chips were primarily made with Gate Arrays
    - Chips were slow because
      - Conservative design, large transistors
      - Wiring not a big issue
    - RTL design was used since the 70's
    - Logic synthesis became popular
      - Productivity improved 2 to 3x
  - Standard Cells for ASSP designs
    - Standard Cells were faster than GA
    - Wiring not an issue because of ASIC comparison
    - Floor planning used

- 1990 to 2000 CMOS chip history (large)
  - Transistor density increased
  - Standard Cells becoming popular
    - Standard Cells still looked good compared to GA
    - Standard cells dominant in US & Europe
  - ♦ 0.35 & 0.25um wires not a big issue
  - Synthesis still produced good results
    - No floor planning

- 2000 to 2010 DSM forecast (very large)
  - ♦ Gate density continues to increase
  - ♦ 0.18um SI issues start showing up
  - ♦ 0.13um SI issues reported a serious problem
  - Pre-synthesis floor planning used widely for high speed designs
- Crosstalk and other SI issues are reported to be getting worse

## 500um Fixed-Length Wires



#### 500um Wires Scaled



Total Capacitance → M3 Resistance
M3 RC Delay

#### **Process Scaling**



## Typical Wireload Model, FO=4

| Gates (K) | Capacitance (fF) | Wire Length (um) |
|-----------|------------------|------------------|
| 10        | 5.2              | 26*              |
| 20        | 13.1             | 66               |
| 40        | 17.0             | 85               |
| 80        | 19.0             | 95               |
| 160       | 20.0             | 100              |

<sup>\* 3000</sup> gate block wires vary from 5um to 200um

#### Crosstalk Delay



### Components of Delay



- Driver & Load Dly (ps)
   Wire Dly (ps)
- SI Dly (ps)

Note:

- 1. Intrinsic delay = 2 inverters
- 2. Wire delay is resistance of driver \* (wire C + load C)
- 3. 1x inverters

## Solution to Improve Performance

- Pre engineered hard IP building blocks where
  - Wiring, floor planning and proper gate loading is the focus
- IP Building blocks ("groups") of ~1000 gates
  - ♦ Optimized for 95%+ reuse

- High performance
  - ♦ Clock rate close to the theoretical maximum
  - ♦ 400MHz with 15 stages of logic in 0.18um

# Wires Engineered



#### **Telairity**

- Groups for building blocks
- Pre-Engineered M1, M2, & M3
- No Routing thru M1,M2,M3
- Crosstalk in Group wires managed
- M4, M5 + for Global Routing
- Global Crosstalk eliminated

#### Typical wiring

- Gates for building blocks
- Local & global wires mixed on any of the metal layers



# **Group Categories**

- Combinatorial Logic
  - ♦ Control logic PLA types of structures
  - ♦ Arithmetic/Logical Add, Subtract, Shift
- Sequential Logic
  - ♦ Finite state machines
  - ♦ Sequencers
  - Address generators
- Memories
  - ♦ High density SRAM for cache memories
  - ♦ Multi ported SRAM for Register files, queues, stacks
  - ♦ High density ROM for state machines
- "Trailers" Special small groups which attach to standard groups
  - Repeaters, Clock buffers, Multiplexers
  - Flip Flops for pipelining

## **Group Examples**

| Group Name     | Average Wire<br>Length (um) | Number of Gates |
|----------------|-----------------------------|-----------------|
| Buf_32x20_r2w1 | 111                         | 2265            |
| Iseq_8         | 42                          | 2473            |
| Ad2_32         | 25                          | 873             |
| Adsb3_80       | 29                          | 3029            |
| Perm_80        | 121                         | 654             |
| Bth_mux_32     | 39                          | 2719            |

#### Note:

- 1. Average wire length variation 25um to 121 um
- 2. Perm\_80 differs from wire load table by 5x!
- 3.Average ad2\_32 for critical path 43um with range from 5um to 227um

# Demonstration Chip

- Test chip fabricated
  - ♦ 0.18µm technology from UMC
  - Implemented dedicated DSP algorithms
    - SISD FIR filter
    - SIMD FIR filter
    - SIMD FFT
  - ♦ 450,000 gates
- Results
  - ♦ Performance ~400MHz
    - Worst case temp & voltage, typical process
    - Typically 2x faster
  - ♦ 23 Group types plus 34 Trailer types for entire chip

## Wire Lengths for the FFT

|       | Average wire length (um) | Median wire length (um) |
|-------|--------------------------|-------------------------|
| M2/M3 | 51                       | 15                      |
| M4/M5 | 284                      | 125                     |

#### Metal Utilization for the FFT

| Percent Routing Utilization |
|-----------------------------|
| 55.0                        |
| 23.6                        |
| 43.4                        |
| 6.4                         |
| 20.1                        |
|                             |

#### **Conclusion**

- Technology has been scaling
  - ♦ No new crosstalk problem
- Wiring is longer because
  - ♦ Designs are larger
  - No focus on wiring
- Longer wires cause the chips to run slower than expected
- New approach described
  - Hard IP building blocks optimized for speed
  - Performance 2 times that of a typical synthesized ASIC chip