The Variability Expeditions: Variability-Aware Software for Efficient Computing With Nanoscale Devices.

### Rajesh K. Gupta

Nikil Dutt, UCI Punit Gupta, UCLA Mani Srivastava, UCLA Lucas Wanner, UCLA Steve Swanson, UCSD



STANFORD

Lara Dolecek, UCLA Subhashish Mitra, Stanford YY Zhou, UCSD Tajana Rosing, UCSD Alex Nicolau, UCI Ranjit Jhala, UCSD Sorin Lerner, UCSD Rakesh Kumar, UIUC Dennis Sylvester, UMich

RVINE



### To a software designer, all chips look alike



To a hardware engineer, a chip is delivered as per contract in a data-sheet.









#### www.ti.com

#### 1 AM1705 ARM

- 1.1 Features
- 1.1 Teatures
- Highlights
   375/456-MHz A
- ARM9 Memory
- Programmable
- Enhanced Dire 3 (EDMA3)
- Two External N
- Three Configur
- Modules
- Two Serial Peri
  Multimedia Car
- Multimedia Car Card Interface
- Two Master/Sla
- USB 2.0 OTG P
   Two Multichan
- 10/100 Mb/s Et
- One 64-Bit Gen
- One 64-bit Gen
- Three Enhance

Electrical Characteristic

0.4~V during power down or there is an undesired high current in the ESD protection diodes. There are n requirements for the fall times of the power supplies.

The recommended power down sequence is:

- 1. Drop IV<sub>DD</sub>/PLLV<sub>DD</sub> to 0 V.
- Drop EV<sub>DD</sub>/SDV<sub>DD</sub> supplies.

#### 5.5 Current Consumption

All of the below current consumption data is lab data measured on a single device using an evaluation board. Table 8 shows the typical current consumption in low-power modes at various  $f_{sys/2}$  frequencies. Current measurements are taken after executing a STOP instruction.

#### Table 8. Current Consumption in Low-Power Mode<sup>1,2</sup>

|                                       | Voltage | Typical <sup>3</sup> (mA) |        |        |        |           | Peak <sup>4</sup> (mA) |  |  |  |
|---------------------------------------|---------|---------------------------|--------|--------|--------|-----------|------------------------|--|--|--|
| Mode                                  | (V)     | 44 MHz                    | 56 MHz | 64 MHz | 72 MHz | 83.33 MHz | 83.33 MHz              |  |  |  |
| 80 88 8 K                             | 3.3     |                           |        | 1.     | 33     |           |                        |  |  |  |
| Stop Mode 3<br>(Stop 11) <sup>5</sup> | 2.5     |                           | 15.19  |        |        |           |                        |  |  |  |
|                                       | 1.5     |                           |        | 0.5    | 519    |           |                        |  |  |  |
|                                       | 3.3     |                           |        | 1.     | 93     |           |                        |  |  |  |
| Stop Mode 2<br>(Stop 10) <sup>5</sup> | 2.5     |                           | 15.19  |        |        |           |                        |  |  |  |
|                                       | 1.5     | 1.25                      |        |        |        |           |                        |  |  |  |
| STC 298 15 24                         | 3.3     | 1.83                      |        |        |        |           |                        |  |  |  |
| Stop Mode 1<br>(Stop 01) <sup>5</sup> | 2.5     | 15.23                     |        |        |        |           |                        |  |  |  |
| (otop ot)                             | 1.5     | 8.24                      | 10.22  | 9.55   | 10.61  | 12.1      | 12.1                   |  |  |  |
|                                       | 3.3     | 2.23                      | 2.33   | 2.41   | 2.5    | 2.61      | 2.61                   |  |  |  |
| Stop Mode 0<br>(Stop 00) <sup>5</sup> | 2.5     | 16.2                      | 16.47  | 16.62  | 16.91  | 17.24     | 17.24                  |  |  |  |
| (0.00 00)                             | 1.5     | 8.32                      | 10.32  | 9.66   | 10.73  | 12.25     | 12.25                  |  |  |  |
|                                       | 3.3     | 2.23                      | 2.33   | 2.41   | 25     | 2.6       | 4.07                   |  |  |  |
| Wait/Doze                             | 2.5     | 16.2                      | 16.48  | 16.62  | 16.91  | 17.24     | 18.77                  |  |  |  |
|                                       | 15      | 11 53                     | 14 36  | 14 29  | 15 92  | 18.21     | 35.45                  |  |  |  |

#### Electrical Characteristics

#### 5.8.1 SDR SDRAM AC Timing Characteristics

The following timing numbers indicate when data will be latched or driven onto the external bus, relative to the memory bus clock, when operating in SDR mode on write cycles and relative to SD\_DQS on read cycles. The SDRAM controller is a DDR controller with an SDR mode. Because it is designed to support DDR, a DQS pulse must remain supplied to the device for each data beat of an SDR read. The ColdFire processor accomplishes this by asserting a signal called SD\_SDR\_DQS during read cycles. Take care during board design to adhere to the following guidelines and spees with regard to the SD\_SDR\_DQS signal and its usage.

#### Table 12. SDR Timing Specifications

| Symbol | Characteristic                                                                                  | Symbol               | Min            | Max                    | Unit      | Notes |
|--------|-------------------------------------------------------------------------------------------------|----------------------|----------------|------------------------|-----------|-------|
|        | Frequency of Operation                                                                          |                      | 60             | 83.33                  | MHz       | 1     |
| SD1    | Clock Period (t <sub>CK</sub> )                                                                 | t <sub>SDCK</sub>    | 12             | 16.67                  | ns        | 2     |
| SD3    | Pulse Width High (t <sub>CKH</sub> )                                                            | t <sub>SDCKH</sub>   | 0.45           | 0.55                   | SD_CLK    | 3     |
| SD4    | Pulse Width Low (t <sub>CKL</sub> )                                                             | t <sub>SDCKL</sub>   | 0.45           | 0.55                   | SD_CLK    | 3     |
| SD5    | Address, SD_CKE, SD_CAS, SD_RAS, SD_WE,<br>SD_BA, SD_CS[1:0] - Output Valid (t <sub>CMV</sub> ) | tSDCHACV             | <u> </u>       | 0.5 × SD_CLK<br>+ 1.0  | ns        |       |
| SD6    | Address, SD_CKE, SD_CAS, SD_RAS, SD_WE,<br>SD_BA, SD_CS[1:0] - Output Hold (t <sub>CMH</sub> )  | t <sub>SDCHACI</sub> | 2.0            |                        | ns        |       |
| SD7    | SD_SDR_DQS Output Valid (t <sub>DQSOV</sub> )                                                   | togsov               | -              | Self timed             | ns        | 4     |
| SD8    | SD_DQS[3:2] input setup relative to SD_CLK (t <sub>DQSIS</sub> )                                | t <sub>DQVSDCH</sub> | 0.25 × SD_CLK  | 0.40 × SD_CLK          | ns        | 5     |
| SD9    | SD_DQS[3:2] input hold relative to SD_CLK (t <sub>DQSIH</sub> )                                 | t <sub>DQISDCH</sub> | Does not apply | 0.5 SD_CLK fixe        | ed width. | 6     |
| SD10   | Data (D[31:0]) Input Setup relative to SD_CLK<br>(reference only) (t <sub>DIS</sub> )           | t <sub>DVSDCH</sub>  | 0.25 × SD_CLK  |                        | ns        | 7     |
| SD11   | Data Input Hold relative to SD_CLK (reference only) $(t_{\text{DIH}})$                          | t <sub>DISDCH</sub>  | 1.0            | 20 <del></del> 2       | ns        | Ź     |
| SD12   | Data (D[31:0]) and Data Mask(SD_DQM[3:0]) Output<br>Valid (t <sub>DV</sub> )                    | t <sub>SDCHDMV</sub> | -              | 0.75 × SD_CLK<br>+ 0.5 | ns        |       |

1.43 1.76 4.1 ned at r-power ning

)) (I<sup>2</sup>C ISB0)

omorated

5









Time or part



### New Hardware-Software Interface..



### Time or part

Builds upon a 50-year rich research in fault tolerance.

### UNO Computing Machines Seek Opportunities based on Sensing Results



Metadata Mechanisms: Reflection, Introspection



### **Building Machines that leverage move from Crash & Recover to Sense & Adapt**





# Example: Procedure Hopping in Clustered CPU, Each core with its voltage domain



- A core increases voltage if monitored delay is high
- A procedure hops from one core to another if its voltage variation is high
- Less 1% cycle overhead in EEMBC.

 $V_{DD} = 0.81V$ 



|                        | 00                    | _                      |                        |
|------------------------|-----------------------|------------------------|------------------------|
| <b>f</b> <sub>0</sub>  | <b>f</b> <sub>1</sub> | <i>f</i> <sub>2</sub>  | <b>f</b> <sub>3</sub>  |
| 862                    | 909                   | 870                    | 847                    |
| <i>f</i> <sub>4</sub>  | <i>f</i> <sub>5</sub> | <i>f</i> <sub>6</sub>  | <b>f</b> <sub>7</sub>  |
| 826                    | 855                   | 877                    | 893                    |
| f <sub>8</sub>         | f <sub>9</sub>        | <b>f</b> <sub>10</sub> | <b>f</b> <sub>11</sub> |
| 820                    | 826                   | 909                    | 847                    |
| <b>f</b> <sub>12</sub> | f <sub>13</sub>       | f <sub>14</sub>        | <b>f</b> <sub>15</sub> |
| 901                    | 917                   | 847                    | 901                    |

| 1    | _ | Λ  | n | <b>N</b> | 1 |  |
|------|---|----|---|----------|---|--|
| חח ۷ |   | υ. | 9 | 9        | V |  |

| f <sub>0</sub>  | f <sub>1</sub>  | f <sub>2</sub>  | f <sub>3</sub>        |
|-----------------|-----------------|-----------------|-----------------------|
| 1408            | 1389            | 1408            | 1370                  |
| f <sub>4</sub>  | f <sub>5</sub>  | f <sub>6</sub>  | <i>f</i> <sub>7</sub> |
| 1370            | 1408            | 1408            | 1408                  |
| f <sub>8</sub>  | f <sub>9</sub>  | f <sub>10</sub> | f <sub>11</sub>       |
| 1370            | 1370            | 1389            | 1370                  |
| f <sub>12</sub> | f <sub>13</sub> | f <sub>14</sub> | f <sub>15</sub>       |
| 1408            | 1408            | 1389            | 1389                  |

### VA-V<sub>DD</sub>-Hopping=(0.81V, 0.99V)

| 00                    |                       | <b>U</b> (             | •                      |
|-----------------------|-----------------------|------------------------|------------------------|
| <b>f</b> <sub>0</sub> | <b>f</b> <sub>1</sub> | <i>f</i> <sub>2</sub>  | f <sub>3</sub>         |
| 862                   | 909                   | 870                    | 847                    |
| f <sub>4</sub>        | <b>f</b> <sub>5</sub> | <i>f</i> <sub>6</sub>  | <b>f</b> <sub>7</sub>  |
| 1370                  | 855                   | 877                    | 893                    |
| f <sub>8</sub>        | f <sub>9</sub>        | <b>f</b> <sub>10</sub> | <b>f</b> <sub>11</sub> |
| 1370                  | 1370                  | 909                    | 847                    |
| f <sub>12</sub>       | f <sub>13</sub>       | f <sub>14</sub>        | <b>f</b> <sub>15</sub> |
| 901                   | 917                   | 847                    | 901                    |

8

### HW/SW Collaborative Architecture to Support Intra-cluster Procedure Hopping



- The code is easily accessible via the shared-L1 I\$.
- The data and parameters are passed through the shared stack in TCDM.
- A procedure hopping information table (PHIT) keeps the status for a migrated procedure.

#### **ViPZonE: Exploiting Memory Power Variability** Applicatior **Application Layer** Source code annotations **Upper OS Layer** Special GLIBC library, kernel system calls SO **Lower OS Laver** DIMM power variability-aware zoning and allocation DIMM **Memory Controller** Power Hardware Profiles DIMM 1 DIMM 2 DIMM n **Applications** Power App developers can optimize Performance **Microarchitecture and Compilers** dynamic allocations for reduced **Errors Runtime** power

Aging

Linux + Glibc implementation





provides energylevel MidFid<2>; provides energylevel HiFid<3>; } { On\_event SysinfoChanged

call SysinfoRead; if Error > Delta

call Timer(DownSample);}





# GRAND CHALLENGE, QUESTIONS AND RESEARCH PROGRESS

**RESEARCH AND ITS ORGANIZATION** 

### **Expedition Grand Challenge & Questions**

2.

3.

4.

5.



### "Can microelectronic variability be controlled and utilized in building better computer systems?"

#### I.D. Overview of Expedition's Plan

Our Expedition plan has three goals: (a) to address the fundamental technical challenges in the realization of the UnO computing machines; (b) to create experimental systems at different scales to evaluate the idea in real-life application contexts; and, (c) to leverage the educational and other broader impact opportunities offered by such a rethinking of traditional computing machines.

In pursuit of these goals, our objectives include addressing the following interlinked questions:

What are most effective ways to detect variability? sensors embedded in the circuit and software instrumentation, which poses the challenge of

sensors embedded in the circuit and software instrumentation, which poses the challenge o minimizing area, time, and energy costs.

### What are software-visible manifestations?

the trade-off between quality and overhead of information exchanged from hardware to software (termed "hardware signatures").

### What are software mechanisms to exploit variability?

explicitly provide alternative algorithms optimized for different hardware manifestations but which share as much code as possible to improve code density, debuggability, etc. Alternatively, compilers may automatically generate different code configurations, perhaps even dynamically at run time without algorithm intervention. In either case, some level of run-time assist from the OS will be needed.

### How can designers and tools leverage adaptation?

about the application behavior (such as the quality metrics and the reaction to variable performance and error rate) to be passed down to the design flow, as well as effective design automation algorithms for incorporating this information as soft constraints during synthesis, placement, routing etc. This operation may need to be done at run-time in the case of hardware platforms that expose circuit-level "knobs" such as sleep modes, voltage scaling, and frequency scaling, or are implemented on in-field reconfigurable devices, e.g., soft processor cores on FPGAs.

### How do we verify and test hw-sw interfaces?

One might allow under-verification of hardware by ensuring the correctness of the overall behavior of an opportunistic application and its associated software stack rather than that of the hardware alone.

### Three Goals:

- Address fundamental technical challenges (understand the problem)
- b. Create experimental systems (proof of concept prototypes)
- c. Educational and broader impact opportunities to make an impact (ensure training for future talent).

### **Research Organization**



### • Four thrust areas

- 1. Measurement and Modeling
- 2. Design Tools and Testing Methodologies
- 3. Microarchitecture and Compilers
- 4. Runtime Support
- Two Cross-cutting thrusts
  - 5. Applications and Testbeds
  - 6. Outreach and Education

# Thrusts traverse institutions on testbed vehicles seeding various projects



| Group A: Signature<br>Detection and<br>Generation                                                      | Group B: Variability<br>Mitigation<br>Measures                   | Group C:<br>Opportunistic<br>Software and<br>Abstractions  |
|--------------------------------------------------------------------------------------------------------|------------------------------------------------------------------|------------------------------------------------------------|
| Characterizing variability in power<br>consumption for modern computing<br>platforms, and implications | Mitigating variability in solid-state storage devices            | Effective error resilience                                 |
| Runtime support and software adaptation for variable hardware                                          | Hardware solutions to better understand and exploit variability  | Negative bias temperature instability and electromigration |
| Probabilistic analysis of faulty<br>hardware                                                           | VarEmu emulation-based testbed for<br>variability-aware software | Memory-variability aware runtime<br>systems                |
| Understanding and exploiting variability in flash memory devices                                       | Variability-aware opportunistic<br>system software stack         | Design-dependent ring oscillator and software testbed      |
| FPGA-based variability simulator                                                                       | Application robustification for stochastic processors            | Executing programs under relaxed semantics                 |
|                                                                                                        |                                                                  |                                                            |

### Two years of building an Expedition



- Kickoff, review, tape-outs and builds-ins
  - 82 peer-reviewed publications, 21% collaborative
  - 54 events/releases on variability.org/news
  - 64 presentations on variability.org/presentations
- A collaborative community
  - 15 faculty, 25 GSRs, 1 postdoc, 10+ UG, 300 K-8-12



### **Timeline in Progress**





### Research: From Measurements to Signatures

- Year 1 was mostly focused on characterization of variability (IC designer centric)
  - What is the extent of variation and can it be sensed? Can it be used in the HW/SW stack?
- Year 2 focused on proof-of-concept methods to use variability information (Programmer centric)
  - From observation to systematic control.
  - Can we construct **useful signatures** that can enable systematic observability (and controllability) of variation?
- Year 3 sees the two streams coming together: expanding collaborations across teams, emerging testbeds & tools.

### Important Takeaways



19

To ensure effective use by software, we need accurate characterization (of performance, power).



- 1. Variability imposes a limit on how accurate the models can get to
  - Mean error ~20% + 12% due to variability for 34% overall error in Nehalem 45nm CPUs
  - 15-20% variation across 22 DIMMs
  - 20-24% read, 40-67% write variation in Flash
  - Rooted in inherent non-observability of power states.

# Important Takeaways (continued)



2. Instrumentation and sensing is necessary to ensure 'high-level' observability of variation

- "High enough for semantic value." Averages may not be sufficient.
- 3. Sensing for delay, power, aging and degradation is feasible and indeed necessary
  - Important difference between failure prediction and error detection. Notion of static & dynamic variability management.
- 4. Variability can be leveraged in software
  - media applications, duty cycle, security sensitive applications. Notion of 'tunable error' and its observability criteria.

### Important Takeaways (continued)

V

2. Instrumentation and sensing is necessary to ensure 'hig At the end of two years, we have a complete end-to-end initial e 3. S feas platform with sensing chip, boardlevel feedback, OS supporting dutyror cycled tasks driven by variability, and API for such machines. 4. V

 media applications, duty cycle, security sensitive applications. Notion of 'tunable error' and its observability criteria.

### **Expedition Experimental Platforms &** Artifacts



- Interesting and unique challenges in building research testbeds that drive our explorations
  - Mocks up don't go far since variability is at the heart of microelectronic scaling. Need platforms that capture *scaling* and *integration* aspects.
- Testbeds to observe (Molecule, GreenLight, Ming), control (Oven, ERSA)







# Red Cooper Testbed: in-situ visibility



- Customized chip with processor + speed/leakage sensors available since April 2011
- Testbed board to finish the sensor feedback loop on board



# Ferrari Chip: Closing Loop On-Chip



- On-Chip Sensors
  - Memory mapped i/o and control
  - Leakage sensors, DDROs, temperature sensors, reliability sensors
- Better support for OS and software.



Available April 2013







## **From Control to Software Abstractions**



### Going forward

- Leon3 (Sparc) sensorized chip tapeout
- Software abstractions: PL and Runtime
  - A formal/consistent way of exposing hardware signatures
  - A full Linux software stack working
- Verification methods
  - Performance & power invariants at RT-level in the presence of variability (with TI) using probabilistic model checking
    - Similar to property checking against Monte Carlo simulations
  - Automatic generation of invariants and assertion synthesis.

# Reaching out and building a community

Building our teams across 6 six sites Building our mentors and champions Creating early adopters Inspiring talent

# **Emerging Synergies**



|            | UCSD     | UCLA    | UCI     | UIUC    | UM    | Stanford |
|------------|----------|---------|---------|---------|-------|----------|
| Red Cooper | Х        | Х       |         |         | Х     |          |
| Molecule   |          | Х       | Х       |         |       |          |
| VIPZONE    |          | Х       | Х       |         |       |          |
| VarEMU     | Х        | Х       |         |         | Х     |          |
| Ferrari    | Х        | Х       |         |         | Х     |          |
| ERSA/LLVM  | X        | Х       |         | X       |       | Х        |
|            | Software | Systems | LL Code | LL Code | Chips | Sensors  |

- Examples of collaborative discovery
  - Lara Dolecek working with Steve Swanson & Mitra
  - Dennis Sylvester at the center of chip/platform characterization
  - Nik Dutt, Alex Nicolau and Rakesh Kumar on code scheduling
  - Rakesh Kumar, Sorin Lerner, Ranjit Jhala on code analysis and programming language support for variability.



# Thank You!

