A Fine-Grain Architecture for Extreme Performance beyond Moore’s Law

Maciej Brodowicz
Advanced Parallel Computing Engineer
Indiana University, USA

Abstract:

Click here for Maciej Brodowicz’s presentation slides.

The end of Dennard scaling, the approaching end of Moore’s Law with nano-scale technologies, the flat-lining of clock rates constrained by power considerations, and the exhaustion of ILP all undermine and diminish performance advantage provided by incremental extensions of conventional practices based on von Neumann architecture principles. The tradeoff priorities dictated by the legacy of the von Neumann model force structures that optimize for FPU utilization. While once the dominant precious resource, logic is now the least critical factor with memory capacity and bandwidth being most important. Data movement bandwidth and latency are also crucial. It is the premise of the work reported here that future performance gain as well as reduced costs in time, energy, and space will be achieved through non von Neumann architectures that respond to the emerging tradeoffs. Addressing the key operational challenges of starvation, latency, overhead, and contention compels the development of non von Neumann parallel architecture for post Moore’s Law extreme-scale computing. Among those considered in the past are dataflow, systolic, vector pipelining, and cellular automata. The Simultac Fonton architecture to be presented combines these execution models within an overall cellular structure, the computing cells of which (Fontons) incorporate a set of local rules enabling a global emergent behavior of general purpose parallel computing. The Simultac system architecture merges distributed structures of mesh interconnect, associative memory arrays, and N-dimensional ALU pipelines to maximize local communication and memory bandwidth, minimize local memory latency, and treat ALUs as high availability (rather than high utilization) resources in support of memory and communication. Global operation for application computation is guided by the abstract ParalleX execution model that governs dynamic adaptive resource management and task scheduling. This invited talk will describe the Simultac Fonton non von Neumann architecture and results from recent design studies and simulations to demonstrate the opportunities for future performance gain in the Neo-Digital Age.

Bio-Data:

Maciej Brodowicz is an assistant scientist at the Center for Research in Extreme Scale Technologies and research faculty member in the Intelligent Systems Engineering department at Indiana University. He received M.S. degree in Electrical Engineering from Warsaw University of Technology and Ph.D. in Computer Science from University of Houston. While at California Institute of Technology he developed parallel I/O support for ASCI applications, evaluated and optimized the performance of large scale parallel simulations, and investigated upcoming and alternative high-performance architectures. At Louisiana State University he participated in specification of the ParalleX execution model and development of its implementation, the HPX runtime system. His current research interests include computer architecture, runtime systems, execution models, and scalable I/O.