

### SoC methodology: the common view



### SoC methodology: a partial bright spot?



Nice vision, but ...

bluespec

## SoC methodology: a partial bright spot?



3

### SoC methodology: a solution



BSV = Bluespec SystemVerilog, based on composable, parallel atomic transactions



# A flavor of the power of composable atomic transactions



#### **Control issues**

Example: A 2x2 switch with stats

 Packets arrive on two input FIFOs, and must be switched to two output FIFOs

- Certain "interesting packets" must be counted
- Must have max throughput (no idle cycles)



Dynamic, data-dependent access to shared resources!

bluespec

#### The meat of the BSV code

```
module mkSmallSwitch (IfcSmallSwitch);

...

// for-loop generates two rules
for (Integer j = 0; j < 2; j = j + 1)
    rule move_packet;
    let x = in[j].first(); in[j].deq();
    if (x[0] == 0)
        out[0].enq (x);
    else
        out[1].enq (x);
    if (interesting(x)) c <= c + 1;
    endrule
...
endmodule: mkSmallSwitch
```





## Real designs have even more complicated control



10

## Atomic transactions are the best known tool to tame complex concurrency

**For decades:** in Operating Systems, Databases, Distributed Systems

**Recently:** for software for multi-core/multi-threaded architectures

"I think we ultimately will see <u>atomic</u> <u>transactions in most, if not all, languages.</u> That's a bit of a guess, but I think it's a good bet."

Burton Smith, Technical Fellow, Parallel Computing



**Very recently:** HW support for Transactional Memory in processors





11

BSV's atomic transactions compose across module boundaries using atomic interface methods



### **Atomic Interface Methods**

RTL



**BSV** 



A small sample of the <u>informal</u>, written interface specification:

An error occurs if a push is attempted while the FIFO is full.

Thus, there is no conflict in a simultaneous push and pop when the FIFO is full. A simultaneous push and pop cannot occur when the FIFO is empty, since there is no pop data to prefetch. However, push data is captured in the FIFO.

A pop operation occurs when pop req n is asserted (LOW), as long as the FIFO is not empty. Asserting pop req n causes the internal read pointer to be incremented on the next rising edge of clk. Thus, the RAM read data must be captured on the clk following the assertion of pop req n.

All BSV interfaces are transactional

```
interface FIFO #(t);
method Action enq (t x);
method x_type first ();
method Action deq();
endinterface
```

13

14

bluespec

Atomicity of interface methods encapsulates the complex control logic necessary for correct module composition (and, by implication, IP reuse)



Bluespec generates correct control logic to interface properly at every module instantiation



Logic around every instantiation is at risk & every corner case for every aspect of every instantiation's interface must be exercised!

BSV
by FIFO



Logic around every instantiation is correct by construction (control logic arising out of atomicity semantics) bluespec

Let's swap in a different FIFO with the same interface ports, BUT ...

The new FIFO allows simultaneous enq/deq when EMPTY instead of when FULL (→ change to external control logic)

### **RTL**



The control logic around every instantiation must change & be retested!

### **BSV**



The surrounding control logic is automatically resynthesized from atomic semantics.

bluespec

15

## Disciplined composition of modules into subsystems and systems



Modules contain rules, which use methods provided by submodules in their interfaces. Methods, too, can use other methods.

Rules, which compose across the system, are guaranteed atomic.



bluespec

BSV's extreme parameterization enables a single source for a family of microarchitectures

17

## Example: a butterfly switch (crossbar)



## Butterfly switch: module (< 60 lines, fully synthesizable)



## Example: IFFT (in 802.11a and other apps)

Microarchitectures: from single combinational function ...



bluespec 21

### ... to a superfolded circular pipeline: Just one Bfly-4 node!



## Because of such parameterization, encapsulation, and reuse,

- All these designs were done in less than one day, with a single parameterized source!
- Very quick exploration of area and power tradeoffs
- Transparent, predictable, controllable microarchitecture, despite high-level spec

| Combinational            |
|--------------------------|
| Pipelined                |
| Folded (16 Bfly 4s)      |
| Super Folded (8 Bfly 4s) |
| Super Folded (4 Bfly 4s) |
| Super Folded (2 Bfly 4s) |
| Super Folded (1 Bfly 4)  |

Nirav Dave, Mike Pellauer, Steve Gerding, Arvind MEMOCODE 2006

bluespec )

#### 23

## The underlying rule-based atomicity semantics are crucial!

- Each microarchitecture variation changes the resource sharing
  - Synthesis based on atomicity allows control logic to track these variations automatically
  - I.e., *Control Adaptive* parameterization
- Latency insensitive methodology (elastic pipes, GALS) allows robust plug and play of microarchitecture choices



### Example: H.264 decoder

- Complete decoder, not just kernel blocks
- Range of implementations from a common source
  - from QCIF: 176x144 @ 15 frames/sec
  - to 1080p: (1280x1080)p @ 60 frames/sec
- Synthesized at 180 nm
- < 10K lines of BSV source code</p>
  - Original reference code: > 80K lines of C
  - H.264 slice of FFMPEG: 20K lines of C (unsynthezable)

This BSV code is open sourced: http://csg.csail.mit.edu/oshd

25 bluespec

## BSV: well placed for formal verification



## Rules: formality and refinement

- Many formal specification languages use the same computation model, because it is parallel, and because atomicity enables reasoning about correctness with invariants
  - UNITY (Chandy&Mishra), TLA+ (Lamport), Event B (Abrial), ...
- The Rules computation model can be used from high levels of abstraction (executable specs) to lower levels (implementations)
  - Vast literature on provably correct refinement

bluespec.

## Implications of BSV's full synthesizability

- Even high-level models can be executed on FPGAs
  - Early exploration
  - Early SW development
- Verification testbenches can be executed on FPGAs
  - Much faster than Verililog sim for verification

Bluespec provides 'push button' infrastructure to map components with TLM interfaces to FPGAs

bluespec )

### Example: AXI Virtual Platform Demo







### **AXI Demo Execution Speeds**

| Bluespec lines of code: | 2,000 (including comments) |
|-------------------------|----------------------------|
| ASIC gates:             | 125K                       |
| Virtex-4 FX100 slices:  | 4,723 (10% utilization)    |



22 day Verilog sim  $\rightarrow$  1 day Bluesim  $\rightarrow$  53 sec Emulation

bluespec )

### Some customer use cases:

- IP creation
- Modeling
- Architecture Exploration
- Verification

bluespec.

## ASIC IP creation at three major semiconductor companies

- High performance video data mover for video subsystem1
- System DMA for wireless handset platform1
- Image DMA2
- LCD controller2
- Turbo Viterbi2

Why BSV?

Complex concurrency

Parameterization

(1) seen silicon; derivatives in progress (2) in progress

bluespec

## Modeling at a major IP company

- Cycle-accurate model of a production LPDDR memory controller
  - Will be shipped to each customer of the IP company

• Very quick model creation (< 2 man months) Why BSV? · Potential for much faster sim (FPGA) · Parameterization Potential to replace IP creation as well • Fast sim (Bluesim)

bluespec 35

Architecture exploration of processor microarchitectures on FPGAs (@ a major microprocessor company)

- Background: existing microarchitect's workbench for exploring alternative microarchitecture for future processors
  - Written in C++, developed over a decade
  - Highly parameterized and configurable, to facilitate experimentation on alternatives
  - Heavily used, but running out of gas for simulation speed (multicore, multithreading)
- New: rewrite in BSV to synthesize and run on FPGAs,
  - expected performance advantage > 1000x
- Status: Demonstrated for 5-stage pipeline model and

pipelined out-of-order model; development continues replace C++

Why BSV?

Why tool with expressive power for speed

Only tool with expressive power for speed · and be synthesizable to FPGA, for speed

Architecture exploration of processor microarchitecture on FPGAs (@ a major microprocessor/systems company)

- Goal: flexible platform for fast exploration of microarchitectures for multithreaded + multicore CPU systems
- Is being implemented in BSV in order to synthesize and run on FPGAs
  - Status: executing significant prefix of Linux boot

sequence, on an FPGA platform, within 6 months of start of project

Why BSV?

Why BSV?

Very quick creation of high level, architecturally

very quick creation of high level, for speed accurate model accurate synthesizable to FPGA, for speed and be synthesizable to FPGA.

37

bluespec

## Verification @ Qualcomm

- BSV for complex transactors on EVE platform
  - TLM interfaces
  - Functions: data transformation, clock management, timestamp management, statistics management
  - and, ... moving testbench functionality to EVE side



Quick creation of complex test infrastructure and be synthesizable to FPGA, for speed Mhy BSV?

bluespec

### It's general purpose and practical

Typical IPs in an SoC; IPs done in BSV (with good QoR)



## Summary: BSV→ a high level, disciplined approach to SoC/IP design







## End

Ligari FLFOR

typid DM(ES) (ant)

match of Later (Later);

Dategor (Ma\_doptis=18)

PETER AND CHARLES AND CHARLES

malyaddis i ng bil palit ka



