SUPERCOMPUTING FRONTIERS 2017

MARCH 13 – 16, 2017

Matrix@Biopolis, Singapore

TUTORIALS – MARCH 13, 2017

CONFERENCE TUTORIALS

SCF17 Tutorials offer attendees a variety of short courses on key topics and technologies relevant to high performance computing, programming & novel architectures. These tutorials also provide the opportunity to interact with recognized leaders in the field and to learn about the latest technology trends, theory, and practical techniques.

Our tutorials are open to all conference attendees except those who are on 1-Day passes but registrations for the tutorials are required. For those of you who are only interested in attending the tutorials but not the conference from 14 – 16 March 2017, we have introduced a special tutorials-only fee of SG$150. Please check out the details on our registration page.

Information For All Tutorials Registrants

The rooms assigned for the tutorials will be announced on the morning of March 13. Please proceed to Level 4 of the Matrix to sign in for your selected tutorials and to find out where your tutorial will be held.

Our registration desk will open from 8am onwards.

Please note that pre-registration is essential to guarantee your seat for the tutorials. If you have not registered for your tutorials, entry will only be granted if there is still available space.

TUTORIAL 1: DEEP LEARNING INSTITUTE

Time: 9.00am – 5.00pm

Venue: Level 4, Matrix Building, Biopolis

Breaks:
Morning Tea Break – 10.30am – 11.00am
Lunch Break – 1.00pm – 2.00pm
Afternoon Tea Break – 3.30pm – 4.00pm

Presenter(s):
Nicolas Walker, Senior Solutions Architect, NVIDIA SEA
Jeff Adie, HPC / Deep Learning Solutions Architect, NVIDIA APJ
Aik Beng Ng, Deep Learning Solutions Architect, NVIDIA APJ

Important Notes to Participants:

Participants are required to bring their own laptop.

Labs System Requirements: Laptop / Web browser
Connecting to an IPython Notebook requires a WebSocket connection and only the following browsers are known to work:
• * Chrome 14 or above
• * Firefox 6 or above
• * Opera 12.10 or above
• * Safari 5 or above
• * Internet Explorer 10 or above

You can verify your system is supported by going to this test site and verifying the “WebSockets (Port 80)” section has all green check marks.

Abstract:
Learn the latest techniques on how to design, train, and deploy neural network-powered machine learning in your applications. You’ll explore widely used open-source frameworks and NVIDIA’s latest GPU-accelerated deep learning platforms. You will learn:

Lab 1:
Getting Started with Deep Learning
Learn how to leverage deep neural networks (DNN) within the deep learning workflow to solve a real-world image classification problem using NVIDIA DIGITS. You’ll walk through the process of data preparation, model definition, model training and troubleshooting, validation testing and strategies for improving model performance using GPUs. On completion of this lab, you will be able to use NVIDIA DIGITS to train a DNN on your own image classification application.

Pre-Requisite:

Helpful to have:
- What is Deep Learning (Wiki page)
- Basic Python programming

Lab 2:
Deep Learning for Image Segmentation
Abstract: There are a variety of important applications that need to go beyond detecting individual objects within an image and instead segment the image into spatial regions of interest. Examples of image segmentation uses include medical imagery analysis where it is often important to separate the pixels corresponding to different types of tissue, blood or abnormal cells so we can isolate a particular organ and self-driving cars where it is used to understand road scenes. In this lab you’ll learn how to train and evaluate an image segmentation network.

Pre-Requisite:

Required:
- “Getting Started with Deep Learning” Lab completion
- You know about things like forward and backpropagation, activations, stochastic gradient descent (SGD), convolutions, pooling, bias.
- You are familiar with convolutional neural networks (CNN).
Helpful to have:
- Image recognition experience
- TensorFlow experience
- Python experience
- Standalone SSH Client (I.e. PuTTY)

Lab 3:
Neural Network Deployment
Abstract: Once a deep neural network (DNN) has been trained using GPU acceleration, it needs to be deployed into production. The step after training is called inference as it uses a trained DNN to make predictions from new data.
In this lab we will show different approaches to deploying a trained DNN for inference. The first approach is to directly use inference functionality within a deep learning framework, in this case DIGITS and Caffe. The second approach is to integrate inference within a custom application by using a deep learning framework API, again using Caffe but this time through it’s Python API. The final approach is to use the NVIDIA TensorRT™ which will automatically create an optimized inference run-time from a trained Caffe model and network description file. You will learn about the role of batch size in inference performance as well as various optimizations that can be made in the inference process. You’ll also explore inference for a variety of different DNN architectures trained in other DLI labs.

Pre-Requisite:

Required:
- “Getting Started with Deep Learning” Lab completion
Helpful to have:
- Caffe framework experience
- Basic Python programming

TUTORIAL 2: MULTI AND MANY CORES COMPUTING OPTIMIZATION WITH INTEL TOOLS

Time: 9.00am – 5.00pm

Venue: Level 4, Matrix Building, Biopolis

Breaks:
Morning Tea Break – 10.30am – 11.00am
Lunch Break – 1.00pm – 2.00pm
Afternoon Tea Break – 3.30pm – 4.00pm

Presenter(s):
Rama Kishan Malladi, Application Engineeer, Intel Software Service Group
Jyotsna Khemka, Technical Consulting Engineer, Intel Software Service Group

Abstract:

Slot 1: 9:00 – 10:30
Intel® Architecture for Software Developers
Learn about the parallel architecture, technical advances and features of the latest and future Intel processors, especially Xeon, Xeon Phi – newest Knights Landing. Brief introduction to latest Intel® architecture Intel® Xeon Phi™ coprocessor architecture overview.

Slot 2: 11:00 – 1:00
Vectorization for Intel® C++ & Fortran Compiler
• Introduction to SIMD for Intel® Architecture
• Vector Code Generation
• Compiler & Vectorization
• Validating Vectorization Success
• Reasons for Vectorization Fails
• Vectorization of Special Program Constructs & Loops

Slot 3: 12:00 – 1:00
Understanding vectorization and how it impacts performance: Vectorization Advisor
* Features
* Workflow
* Understanding the results
* Demo

Slot 4: 2:00 – 3:30
Intel® Advisor Roofline
A Roofline chart is a visual representation of application performance in relation to hardware limitations, including memory bandwidth and computational peaks
The Roofline provides insight into:
• Where your performance bottlenecks are
• How much performance is left on the table because of them
• Which bottlenecks are possible to address, and which ones are worth addressing
• Why these bottlenecks are most likely occurring
• What your next steps should be
New Era for OpenMP*: Beyond Traditional Shared Memory Parallel Programming
OpenMP* has been the de facto standard for parallel programming on shared memory systems OpenMP 4.5 added major enhancements to features to the OpenMP API to keep pace with the latest technological trends in HPC and beyond. The new features in OpenMP 4.5 significantly improve the expressiveness of OpenMP on modern architectures and increase its applicability for complex application codes. This presentation describes the evolution of OpenMP and provides an overview of the major new elements including support for accelerators/coprocessors/GPUs, SIMD extensions, doacross loop, task loop, task priority, and thread affinity.

Slot 5: 4:00 – 5:00
Intel’s Machine Learning
Intel is committed to AI and is making major investments across technology, training, resources and R&D to advance AI for business and society. In this session, we cover the Intel portfolio – software and hardware solutions for machine learning in the data center from general purpose (Xeon, Xeon Phi) to targeted silicon (FPGA and Nervana technology). At the edge, Intel also offers a portfolio of processors (Core, Atom, Joule, etc) that utilize common intelligent APIs for distributed and collaborative intelligence.

TUTORIAL 3: ACTING ON INSIGHT – TIPS FOR DEVELOPING AND OPTIMISING APPLICATIONS WITH ALLINEA TOOLS ON NVIDIA GPUS AND INTEL KNL

Time: 9.00am – 5.00pm

Venue: Level 4, Matrix Building, Biopolis

Breaks:
Morning Tea Break – 10.30am – 11.00am
Lunch Break – 1.00pm – 2.00pm
Afternoon Tea Break – 3.30pm – 4.00pm

Presenter(s):
Patrick Wohlschlegel, Senior Engineering Manager, Enterprise & HPC Tools, ARM

Abstract:
These tutorials provide practical insight and advice on enabling end-users to get as much science as possible from available HPC resources using insights from the Allinea development and optimization tools available on the NSCC Supercomputer. The two independent session include hands-on workshops which helps HPC developers to improve workloads and work more efficiently and effectively during supercomputer allocated time, on both NVIDIA GPUs (morning session) and Intel Knights Landing (afternoon session).

The tutorials will increase your understanding of a range of time-saving techniques associated with these specialist tools for debugging, application analysis and code optimization, covering:
– General development best practices: cut down the time spent resolving one-off problems arising during code development
– Proactive optimization of HPC applications: lessons from the “Allinea Performance Roadmap”
– Failsafe automation: testing as part of your daily workflow to eradicate bugs and performance problems.

TUTORIAL 4: OPENPOWER AND LINUX: THE HARDWARE, SOFTWARE AND HPC WORKLOADS

Time: 9.00am – 1.00pm

Venue: Level 4, Matrix Building, Biopolis

Breaks:
Morning Tea Break – 10.30am – 11.00am
Lunch Break – 1.00pm – 2.00pm

Presenter(s):
Jeremy Kerr, Open POWER Architect, IBM Australia
Abhisek Chatterjee, Program Director – OpenSource Technology Development & OpenPower Enablement, IBM Australia

Abstract:
The OpenPOWER project delivers the high-performance and highly parallel POWER processor architecture on an entirely open source software stack – including workload applications, the operating system, and even low-level system firmware.
This technically-focussed tutorial describes the evolution of the OpenPOWER hardware and software architecture, and our leading design principles.
We will explain the initialisation and management processes of OpenPOWER platforms, particularly in the context of high performance computing environments.

As OpenPOWER is specifically designed for Linux, we are able to make targeted optimisations for the Linux operating system and infrastructure.
We’ll cover the key details of the operating system implementation, and how it integrates with the platform firmware. In addition to implementing the base platform, we have been porting and optimising common HPC workloads and infrastructure to OpenPOWER.
The tutorial will provide detail on those efforts, and explain some of the deployments of those so far.
For those working with their own code, we’ll discuss some key guidelines and hints for running workloads efficiently on OpenPOWER, and the compilers and development tools that are available.

TUTORIAL 5: HIGH PERFORMANCE DATA ANALYTICS 101

Time: 9.00am – 1.00pm

Venue: Matrix Building, Biopolis

Breaks:
Morning Tea Break – 10.30am – 11.00am
Lunch Break – 1.00pm – 2.00pm

Presenter(s):
Hui Liang, Lead Data Scientist, APJ Innovation Center, Hewlett Packard Enterprise, Singapore
Wei Ann Lim, Data Scientist, APJ Innovation Center, Hewlett Packard Enterprise, Singapore

Abstract:

This tutorial includes the basic introduction to some fundamental machine learning technologies.It will also give the attendees an overview of what a data analysis pipe is like and what are the available tools for each of the phase of the pipeline.

Track 1: Machine Learning
Introduction to Machine Learning
Linear Regression
Logistic Regression
Clustering

Track 2: Text Analytics
Introduction to Text Analysis
Applications
Specific Techniques for Text Analysis

Track 3: HPC + Data Analytics
Introduction to Deep Learning
Application of Deep Learning in Natural Language Processing