

## Digital Computing Beyond Moore's Law

#### **Supercomputing Frontiers, Singapore**

### March 16, 2017

### John Shalf

#### Lawrence Berkeley National Laboratory



### Post Exsascale Landscape MIND THE GAP!



#### **Moore's Law**

Lithography Scaling 2x increased density 2x lower power Every 2 years!

#### Now – 2025

Moore's Law continues through ~5nm -- beyond which diminishing returns are expected. End of Moore's Law 2025-2030?

## Post Moore Scaling

New materials and devices introduced to enable continued scaling of digital electronics performance and efficiency.

2025+

2016

2016-2025





# *"I predict Moore's Law will never end." That way I will only be wrong once!"*

Alan Kay: Communications of the ACM 1989





## SO WHAT IF MOORE'S LAW ENDS ? WHY SHOULD I CARE?

Nothing lasts forever... Especially an exponential trend!

## IT challenge for future electricity supply Global Semiconductor market size ~ \$5Trillion by 2030



www.alliancetrustinvestments.com/sri-hub/posts/Energy-efficient-data-centres www.iea.org/publications/freepublications/publication/gigawatts2009.pdf



Moore's Law is an economic theory. *There are ways to continue scaling of digital technology after the end of classical lithographic scaling* 

(e.g. end of Dennard Scaling in ~2004 No more exponential clock frequency scaling Move to exponentially increasing parallelism)

# Post-Lithographic Scaling Options

hni

BERKELEY LAB

There are other ways to continue Moore's Scaling!











#### **IDA/iARPA Study 2014**

- Invest in extending reach of computing to new areas where digital is not efficient by studying Quantum, Neuromorphic
- But don't forget that you need digital (it offers a kind of computation that is not well replicated alternatives)

## See Our Article in 2016 December Issue of IEEE Computer!



#### COVER FEATURE REBOOTING COMPUTING



John M. Shalf, Lawrence Berkeley National Laboratory Robert Leland, Sandia National Laboratories

| <b>TABLE 1.</b> Summary of techology options for extending digital electronics. |                                                    |               |            |        |             |  |
|---------------------------------------------------------------------------------|----------------------------------------------------|---------------|------------|--------|-------------|--|
| Improvement Class                                                               | Technology                                         | Timescale     | Complexity | Risk   | Opportunity |  |
| Architecture and software advances                                              | Advanced energy management                         | Near-Term     | Medium     | Low    | Low         |  |
|                                                                                 | Advanced circuit design                            | Near-Term     | High       | Low    | Medium      |  |
|                                                                                 | System-on-chip specialization                      | Near-Term     | Low        | Low    | Medium      |  |
|                                                                                 | Logic specialization/dark silicon                  | Mid-Term      | High       | High   | High        |  |
|                                                                                 | Near threshold voltage (NTV) operation             | Near-Term     | Medium     | High   | High        |  |
| 3D integration and packaging                                                    | Chip stacking in 3D using thru-silicon vias (TSVs) | Near-Term     | Medium     | Low    | Medium      |  |
|                                                                                 | Metal layers                                       | Mid-Term      | Medium     | Medium | Medium      |  |
|                                                                                 | Active layers (epitaxial or other)                 | Mid-Term      | High       | Medium | High        |  |
| Resistance reduction                                                            | Superconductors                                    | Far-Term      | High       | Medium | High        |  |
|                                                                                 | Crystaline metals                                  | Far-Term      | Unknown    | Low    | Medium      |  |
| Millivolt switches (a<br>better transistor)                                     | Tunnel field-effect transistors (TFETs)            | Mid-Term      | Medium     | Medium | High        |  |
|                                                                                 | Heterogeneous semiconductors/strained silicon      | Mid-Term      | Medium     | Medium | Medium      |  |
|                                                                                 | Carbon nanotubes and graphene                      | Far-Term      | High       | High   | High        |  |
|                                                                                 | Piezo-electric transistors (PFETs)                 | Far-Term      | High       | High   | High        |  |
| Beyond transistors<br>(new logic<br>paradigms)                                  | Spintronics                                        | Far-Term      | Medium     | High   | High        |  |
|                                                                                 | Topological insulators                             | Far-Term      | Medium     | High   | High        |  |
|                                                                                 | Nanophotonics                                      | Near/Far-Term | Medium     | Medium | High        |  |
|                                                                                 | Biological and chemical computing                  | Far-Term      | High       | High   | High        |  |

# Accelerated development & optimization For new Logic and Memory Devices



# **Long Term: New Materials**



New architectures and packaging



## We might already be too late

Historically it is 10 years from lab to Fab...

But lets talk about it anyways.

# Borkar-Shalf Criteria for New Device Technology



1.Gain

2.Signal to Noise

3.Scalability

- 4.Manufacturability





# Systems, Packaging and Architecture

90

CADU

F

## Now and Intermediate term: 3D Stacking and Advanced Packaging



## **R&D in manufacturing at 2-nm node** CXRO EUV Test Facility at LBNL







### **Choose to Scale Something Else**

(The future of Moore's Law Might not be about logic density...)





### Stanford N3XT

## Increase Logic Density and Efficiency using Specialization





First 10 years

## **Current Architectures are Wasteful**

(how far can we push architecture scaling using specialization?)





### Need to Accelerate Pace of Discovery for Advanced Architectures





## **Open Hardware for Flexible SoCs** (Synthesis & Simulation)



#### **Chisel**

DSL for rapid prototyping of circuits, systems, and arch simulator components



Back-end to synthesize HW with different devices Or new logic families

#### **RISC-V**

Open Source Extensible ISA/Cores





Re-implement processor With different devices or Extend w/accelerators

#### **OpenSOC**

Open Source fabric To integrate accelerators And logic into SOC



Platform for experimentation with specialization to extend Moore's Law

### **Combining Compact Device Models with Hardware Architectural Simulators**



George Michelogiannakis Dilip Vasudevan







 Using electron-phonon coupling to calculate the heat generation and dissipation at atomic scale







the electron (left) and hole (right) localizations in a bulk CH<sub>3</sub>NH<sub>3</sub>PbI<sub>3</sub> material. The small dots are atoms.

## **Beyond Moore Modeling Workflow**





End-to-End Post-Moore Design Space Exploration Tool Flow

## Incorporating Emerging Device Models into Architecture Simulation



- Next steps:
  - NCFETs, CNFETs
  - More complicated logic blocks



TFET Spice Simulation - Inverter George Michelogiannakis Dilip Vasudevan



TFET Spice Simulation - Adder



Design Architectures Around Design Patterns

BERKELEY LAB

| 7 Giants of Data (NRC) | 7 Motifs of Simulation |
|------------------------|------------------------|
| Basic statistics       | Monte Carlo methods    |
| Generalized N-Body     | Particle methods       |
| Graph-theory           | Unstructured meshes    |
| Linear algebra         | Dense Linear Algebra   |
| Optimizations          | Sparse Linear Algebra  |
| Integrations           | Spectral methods       |
| Alignment              | Structured Meshes      |

Identify common computational patterns

## Organizing principles for Non-Von "Spatial Computing"



- Data Movement will remain a challenge even with exotic materials, but especially CMOS
- Copper is as good of a conductor as you can expect at room temperature
- With even lower power switches, challenges skews even more to data movement (NEED Spatial Computing approach)
- Push towards more parallelism (more tesselation of the memory structures).... Strong Scaling

Strong Scaling extrapolates to *limit case* with no separation of memory and compute (e.g. one PDE cell per processing element)



# **Concept: Solid State Virtual Fluid**

Extreme (spatial) Specialization + New Devices + New programming models

.....



#### Programming Model Challenges for Non-VonNeumann & Specialized Architectures



Modern languages (including many classes of DSLs Were designed with instruction processors in mind



### A Framework for Accelerated Technology Development Beyond Moore's Law

#### Drive Focus and Impact via a Multiscale **CoDesign Framework** BERKELEY LAB

















# **Reducing Solution Space**





# Conclusion



- The end of lithography scaling as we know it is coming within a decade (*about when Exascale is done*)
- Neuromorphic and Quantum do not address this challenge
  - They expand computing to exciting new areas!
  - But do not replace Digital logic where
  - And *all* are affected by lithography challenge!
- But it need not mean the end of Moore's Law
  - We believe in *More Moore!*
  - But it will require *innovation*!
- Requires a LOT of lead time, so we must start today!



#### **Extra**



## Scientific Computing on Non-Von Neumann Digital Electronics







### PIM is NOT Non-Von Neumann Its just better packaging







## These ARE Non-Von Neumann



### Cost of Data Movement Increasing Relative to Ops





- Data Movement will remain a challenge even with exotic materials, but especially CMOS
- Copper is as good of a conductor as you can expect at room temperature
- With even lower power switches, challenges skews even more to data movement (NEED Spatial Computing approach)
- Push towards more parallelism (more tesselation of the memory structures).... Strong Scaling

Strong Scaling extrapolates to *limit case* with no separation of memory and compute (e.g. one PDE cell per processing element)



# **Spatial Computing**

#### PDE on a Block Structured Grid Extrapolated to Non-Von Neumann







### PDE on a Block Structured Grid Extrapolated to Non-Von Neumann







## PDE on a Block Structured Grid Extrapolated to Non-Von Neumann









PDEcell / PICcell: Ultra-simple compute engine (50k gates) calculates finitedifference updates, and particle forces from neighbors. Microinstructions specify the PDE equation, stencil, and PIC operators. *Novel features:* variable length streaming integer arithmetic and novel PIC particle virtualization scheme.



# **Concept: Solid State Virtual Fluid**

Extreme (spatial) Specialization + New Devices + New programming models

.....





Scalar waves in 3D are solutions of the hyperbolic wave equation:  $-\phi_{,tt} + \phi_{,xx} + \phi_{,yy} + \phi_{,zz} = 0$ **Initial value problem**: given data for  $\phi$  and its first time derivative at initial time, the wave equation says how it evolves with time

### **Discretized Representation**



Numerical solve by discretising on a grid, using explicit *finite differencing* (centered, second order)

 $\phi^{n+1}_{i,j,k} = 2\phi^{n}_{i,j,k} - \phi^{n-1}_{i,j,k}$   $+ \Delta t^{2} / \Delta x^{2} (\phi^{n}_{i+1,j,k} - 2 \phi^{n}_{i,j,k} + \phi^{n}_{i-1,j,k})$   $+ \Delta t^{2} / \Delta y^{2} (\phi^{n}_{i,j+1,k} - 2 \phi^{n}_{i,j,k} + \phi^{n}_{i,j-1,k})$   $+ \Delta t^{2} / \Delta z^{2} (\phi^{n}_{i,j,k+1} - 2 \phi^{n}_{i,j,k} + \phi^{n}_{i,j,k-1})$   $\underbrace{\text{time}}_{\bullet}$ 

# **Decomposing Into PDE Weights**









### **PDE Domain Specific Representation**

.....







Modern languages (including many classes of DSLs Were designed with instruction processors in mind