

# System-wide visibility in post-silicon to drive meaningful analytics

EDPS Symposium September 2017



#### Agenda

- Some obvious statements
- Some problems with existing approaches
- Key Requirements
- Some examples of Performance analysis and Debug
- Use cases
- Summary

# Some obvious statements



- SoCs have become increasingly complicated and they are not going to get simpler.
  - Contain several processors, from different vendors
    - Verified in isolation and come with test suite
  - Contain 100s of SIP
    - Each verified in isolation
  - Contain complex interconnects
    - Verified for certain, identified conditions
  - Software created by large disparate teams.
    - If lucky, modules and subsystem verified for certain, identified conditions.
  - All this has to successfully work together
- Understanding real world system behaviour is just plain HARD!

# Some Problems with existing approaches



- Processor-centric, not system-centric
  - Processors are a very small part of the overall system
- Hard to get a handle on bus behaviour, memory controllers, let alone interactions between blocks etc.
- Where they include analytics it's lip service very little smarts
  - Knock-on effect of fast data pipe off-chip
- Intrusive
- Ad hoc
- Developing, but still essentially signal-based.
  - Hard to close timing
- In-field monitoring is not easy

### **Key requirements**



- A System-centric vendor-neutral debug and monitoring infrastructure
  - One that enables access to different proprietary debug schemes used today by various cores
  - Allows for monitors into interconnects, interfaces and custom logic
  - These need to be run-time configurable
    - Re-use the hardware to provide visibility for different scenarios.
    - Run-time configuration of cross-triggering
    - Support 10s if not 100s of cross-triggering events
  - These can be interrogated after a problem to determine actual status
  - Need to be power aware
  - Security built-in
  - Can be used during the whole development flow and more importantly in the field



21 September 2017

UL-001538-PT



#### Some examples of Performance analysis and debug



#### **Example of UltraSoC Enabled SoC**



21 September 2017

UL-001538-PT





UL-001538-PT



UL-001538-PT

21 September 2017

11

#### **Example 3 : Deadlock Detection**



- Many different types but consider this as an example
  - CPU (master) asserts arvalid and issues a read address to the Slave
  - Slave asserts rvalid and outputs read data but never sees rready asserted
- Configure bus monitor trace to trigger when transaction duration exceeds threshold (programmable up to 16k cycles)
  - Trace not output until triggered.
  - When triggered by deadlocked transaction, trace will output most recent transactions up to and including the deadlocked transaction
  - Trace identifies transaction ID and address, identifying both master and slave of deadlocked transaction



#### **Example 4 : Data Corruption Detection**



To detect the initiators doing write access to a same memory location (or a range) - MemAddress. We can configure our Bus Monitor do something like:

```
if <Address> == MemAddress && <RW> == Write
then if Count > 1
        CaptureTrace()
        SendEventMessage()
        else
        IncrementCount()
fi
```

Where:

- <> are AXI bus fields being observed by the bus monitor.
- CaptureTrace() puts the transaction into the trace buffer
- SendEventMessage() is an instruction to the monitor to send an event out on our message bus
- IncrementCount increments the counter by 1
- NB This is pseudo-code actual filtering is down in hardware and not software



# **Cross Triggering – Example 1**



#### **Runtime Configuration**

- Bus Monitor A outputs Event on DMA access
- Set the period of the Status Monitor's Interval Timer
- Configure the Status Monitor to observe the following sequence:



- Output trigger from SM when entering the STALL state
- Configure Trace Receiver(s) to enable tracing on receipt of trigger



Example ARM Subsystem

#### 21 September 2017

UL-001538-PT

# **Cross Triggering – Example 1 (cont)**







## **Example of Instrumented SoC**



- The SI provides independent memorymapped channels (mailboxes)
- Software and hardware can post writes to these channels which can be used to understand system wide behaviour
  - The data is timestamped
    - Or no data if only timestamp needed.
- The channels can be filtered
- Each channel can be enabled to provide events which can be used for cross-triggering
- The Virtual Console provides bidirectional channels

# **Simple SI visualization**







## **Integration with external tools**



#### **Key Features**



|                                                | Monitoring does not impact system performance.                        |  |  |  |  |  |
|------------------------------------------------|-----------------------------------------------------------------------|--|--|--|--|--|
| Non-intrusive                                  | Instrumentation (light intrusion) seamlessly incorporated.            |  |  |  |  |  |
|                                                | Detect items of interest in hardware, at wirespeed.                   |  |  |  |  |  |
|                                                | Massively reduce trace bandwidth & memory.                            |  |  |  |  |  |
| "Smart" monitors                               | Home in on problems efficiently                                       |  |  |  |  |  |
| Protocol-aware bus monitors (AXI, ACE, ACE-    |                                                                       |  |  |  |  |  |
| lite, OCP, OCP 2.0, CHI etc.)                  | Identify specific transactions; easily spot problems                  |  |  |  |  |  |
| Full support for all standard processors (ARM, | Easily support heterogeneous architectures; "mix & match" across      |  |  |  |  |  |
| RISC-V, MIPS, Xtensa, CEVA, etc.)              | vendors; fix hardware, software or HW+SW integration problems         |  |  |  |  |  |
|                                                | Easy to place & route; extensible & versatile; allows local processor |  |  |  |  |  |
| Message-based protocol                         | for "autonomous" control in the field                                 |  |  |  |  |  |
| Powerful status monitor                        | Configurable smart logic analyzer for custom logic                    |  |  |  |  |  |
| Secure                                         | Powerful security architecture                                        |  |  |  |  |  |
| Para Matal Socurity                            | Provides for observation of target system in order to raise 'alarm'   |  |  |  |  |  |
| Bare Metal Security                            | FIDVILLES TOT ODSETVATION OF LATGEL SYSTEM IN OTHER TO FAISE AIAMIN   |  |  |  |  |  |



#### **Use Cases**



#### **Classic Debug**

- In this case the SoC may be on a prototype board or in the final product form.
- This allows for device validation and bring-up.
- Typically done with board attached to work station.
- CPU breakpoints, starting, stopping of software executing on the SoC.
- More and more of the system will be integrated (brought up) and exploration of the whole SoC, under realistic conditions, takes place.

| 🖕 Project Explorer 😂 🛛 🖻 😫 🖤 🖤 🗖 🧯            | ouartz_display_rw_all.udt                                  | B juliac II                                                                                   |     | Monitor Counters                                                              | d                   | 10 · · · D | & Monitor Matc                | X Downstream        | K Monitor Time 20 10 2                  |
|-----------------------------------------------|------------------------------------------------------------|-----------------------------------------------------------------------------------------------|-----|-------------------------------------------------------------------------------|---------------------|------------|-------------------------------|---------------------|-----------------------------------------|
| - O multicore                                 |                                                            | ar = -1.8 * (lccFIXED BITS);                                                                  |     | Time Module                                                                   | Qualifier Cou.      | 14         |                               |                     | 1                                       |
| > Eo arm0                                     |                                                            | el = display_base + y*DISPLAY_WIDTH;                                                          |     |                                                                               | axi monitor port 10 | 117132     |                               | -                   |                                         |
| > D aml                                       |                                                            | x_width = half_screen ? DISPLAY_WIDTH/                                                        | 2   | 2956452551xbm2                                                                | axi monitor port (0 | 117132     | xbm1x0                        | 0 1                 |                                         |
| · En xtersa                                   | =//                                                        | <pre>(x=0; x!=x_width; x++) {     short zr = (x * 2.0 / DISPLAY WIDTH -</pre>                 |     |                                                                               | axi monitor port (0 | 0          | 3100000                       | 1122241             | 111111111111111111111111111111111111111 |
| ill instructions.pdf                          |                                                            | short zi = (y * 1.2 / DISPLAY HEIGHT -                                                        |     |                                                                               | axi monitor port 10 | 189473     |                               |                     | 111111111111111111111111111111111111111 |
| multicore.multilaunch.launch                  |                                                            | short zr = ar:                                                                                |     | 2961452555xbm2                                                                | axi monitor port 10 | 799488     | xbm1:1                        | 11111394776R        |                                         |
| ii multicore.start-agent.launch               |                                                            | short zi = ai;                                                                                |     | 296145255! xbm2                                                               | axi monitor port (0 | 799488     | 19000000                      |                     | *******                                 |
| 😂 quartz_display_rw_all.udt                   |                                                            | int zr2, zi2;                                                                                 |     | 296585820(xbm1                                                                | axi monitor port (0 | 0          |                               |                     |                                         |
| Debug 21                                      |                                                            | <pre>int iter_count = 0; do (</pre>                                                           |     | 2965858200 xbm1                                                               | axi monitor port 10 | 189473     | xbm2.0                        | 0 1                 |                                         |
|                                               |                                                            | In2 - In*In >> FIXED BITS:                                                                    |     | 2966452555 xbm2                                                               | axi monitor port 10 | 129472     | 3000000                       | Addie:              |                                         |
| multicore.arm1 (UltraDebug Remote Target) +   |                                                            | zi2 = zi*zi >> FIXED BITS;                                                                    |     | 2966452555 xbm2                                                               | axi monitor port C0 | 129472     |                               | all whether it      | 44444444444448588238933                 |
| multicore.arm1 (UltraDebug Remote Target) *   |                                                            | zi = ((2*zr*zi) >> FIXED_BIIS) +                                                              | ci  |                                                                               | axi monitor port (0 | 0          | xbm2:1                        |                     |                                         |
| Thread #1 (Running : User Request)            |                                                            | zr = zr2-zi2 + cr;                                                                            |     | 297085820(xbm1                                                                | axi monitor port 10 | 189473     |                               |                     |                                         |
| C/UltraSoC/demos/(29340/zwng-topis/ar         |                                                            | <pre>iter_count ++; ) while (zr2+zi2 &lt; 4*(1&lt;<fixed &<="" bits)="" pre=""></fixed></pre> |     | 2971452551xbm2                                                                | axi monitor port 10 | 986752     |                               |                     |                                         |
| multicore.xtensa (UltraDebug Remote Targe     |                                                            | I murra (scarers e a (scorers) a                                                              | •   | 2971452555xbm2                                                                | axi monitor port (0 | 986752     | 30%                           |                     |                                         |
| a julia stensa elf                            |                                                            | <pre>*pixel = colors[iter_count-1];</pre>                                                     |     | 2975858200 xbm1                                                               | axi monitor port (0 | 0          | sm1                           | 0 1                 |                                         |
| P Thread #] <main> (Running : User Rec</main> |                                                            | pixel**;                                                                                      |     | 2975858200 xbm1                                                               | axi monitor port 10 | 189473     | 1                             |                     |                                         |
| al xt-ocd.exe                                 |                                                            | <pre>ar += pixel_size;</pre>                                                                  |     | 2976452555xbm2                                                                | axi monitor port 10 | 124710     |                               |                     |                                         |
| S. C. Land Association Production Contraction | )                                                          | *= pixel size;                                                                                |     | 2976452555 xbm2                                                               | axi monitor port (0 | 124710 00  |                               |                     | 0 4 2 3 4 5 6 7 8 9 6                   |
| Landard ID                                    | )                                                          | - pinci_nici,                                                                                 |     |                                                                               |                     |            | 10,235,867,047 12,105,875,895 | 13 4 15 16 17 18 19 | 0 12 12 13 14 15 16 17 18 19 10         |
| System # 🛛 🕀 🖻 🗇                              | )                                                          |                                                                                               |     | Go to: Index ·                                                                | Gio 📝 Scroll        |            | 12.1003.023.0353              |                     | -1 Z                                    |
| Sort By: Hierarchy                            | )                                                          |                                                                                               |     | Monitor Data  Traffic Generators Memory Configuration Disassembly V Variables |                     |            |                               |                     |                                         |
| ✓ mel (message_engine)                        |                                                            |                                                                                               |     |                                                                               |                     |            |                               | DM                  | 18 9 2 X % 10 B                         |
| V jtml (jtm)                                  | < and a                                                    |                                                                                               | 12  | Name                                                                          |                     | Type       |                               | Value               |                                         |
|                                               | Virtual Console 11 Con                                     | ple 🕂 Error Log 👔 🔹 🖬 🖛                                                                       |     |                                                                               |                     |            |                               |                     |                                         |
|                                               | C.Channet vc1.0                                            | ore to the tag of the tag of the                                                              |     |                                                                               |                     |            |                               |                     |                                         |
|                                               | C.C.Inannec VELO                                           |                                                                                               |     |                                                                               |                     |            |                               |                     |                                         |
| 9.01                                          | 13, 0.0000000536                                           |                                                                                               |     |                                                                               |                     |            |                               |                     |                                         |
|                                               | 14, 0.0000000357                                           |                                                                                               |     |                                                                               |                     |            |                               |                     |                                         |
|                                               | 15, 0.0000000238<br>46, 0.00000000159                      |                                                                                               |     |                                                                               |                     |            |                               |                     |                                         |
| 47, 0.000<br>48, 0.000                        | 47, 0.0000000106<br>48, 0.0000000071                       |                                                                                               |     |                                                                               |                     |            |                               |                     |                                         |
|                                               |                                                            |                                                                                               |     |                                                                               |                     |            |                               |                     |                                         |
|                                               | 19, 0.0000000047                                           |                                                                                               |     |                                                                               |                     |            |                               |                     |                                         |
|                                               | Sot a result (S): <void><br/>Sloucester 1801/0)&gt;</void> |                                                                                               | 728 |                                                                               |                     |            |                               |                     |                                         |



# In field debugging and analysis

- In this case the SoC is in the final form and issues such as integration of the software can be debugged.
- The performance of the system can be analysed.
- The software being used could be the IDE as shown previously or specific views of key flows of data through the system.
  - These could be traffic to the memory controller
  - DMA completion times
  - Depth of FIFOs in RF interface
  - Performance of processing engines within the SoC
  - Cache behaviour
  - Etc.
- This can be used to help diagnose why a product has 'hung-up' in the field. During operation the device has been continuously capturing trace in circular buffers in the monitors. This effectively gives a system wide core dump.
  - Trace data is extracted from the device and analysed and replayed to give the last N transaction before the failure occurred.
  - The device does not need to be attached as the trace could have been extracted in the field and shipped to the manufacturer



#### 21 September 2017



## In field analysis

- The areas of interest can be extracted from the system-core dump and specific views created which can be analysed by domain specific engineers
  - These could be memory controller designers, RF interface designers etc.
- Traces extracted from the field can be used for the next generation architecture of the SoC





## **Corporate and IoT use – Performance and Security**

- Monitoring of server farms
  - An example would be observing utilisation of the individual servers and the resources such as memory and disks
  - DDoS can be reported back to root/base.
  - Security and safety can be monitored in a similar manner
  - Updates would be maintained by the root/base.
  - Any breaches of security can be reported back to base.





#### Standalone and unconnected use

- In this there is a self contained Analytics Subsystem.
  - Any communication, if required is done over the air.
  - Many systems will not even have wireless connection
- Detect unauthorized access
  - Eg processors reading from key store
  - Eg Attempt to read decrypted boot code
- Update audit & verification
- Scan internal/external regions
- Detect frequent access / DoS
- Ensure system operates in the 'bounds of safety'.
  - If any divergence, invoke fail safe state



#### **Summary**



- UltraSoC provides a complete advanced universal on-chip analytic and debug platform
  - Full visibility of whole SoC
  - Non-intrusive
  - Independent provider enabling free-selection of IP
  - Multi-vendor and multi-processor in one environment
  - USB connectivity for faster debug or I/O constrained devices
  - Advanced analytics: forensics, optimization, dynamic, power saving
  - Bare metal security
  - Low-power and power-down; power domains & clock domains
  - Full support for large heterogeneous SoC
  - Fully message-based communication
  - Data-flow management and security
  - Silicon proven