With increased data created daily, data analytics is one of the most progressive fields of computer science. However, many engineers and researchers are confronted with hundreds of gigabytes of data that are often unstructured and complex. Therefore, extracting information from them is difficult due to their complexity and size. In this bundle, you can learn how to prepare data and get a general idea about its characteristics to create meaningful models leveraging such data. Additionally, some techniques for making this process scalable on the HPC architectures are shown. For this purpose, open-source programming languages R and Python are used.

- Understanding the theoretical background of exploratory data analysis and modelling
- Scale data analysis for Big Data in R and Python
- Creating basic Machine and Deep Learning models in R and Python
- Deciding whether to use Machine or Deep Learning methods
- Building data processing pipelines for Machine or Deep Learning tasks
- Knowing how to set up and run data analysis in parallel on an HPC cluster with R and Python
- Parallelization of Machine and Deep Learning tasks to use multiple compute nodes and/or multiple accelerators (GPUs)

Mathematical models describing real-world engineering problems such as modelling flows around turbine blades, wind turbulence around a vehicle or boat, improving turbine engine performance, modelling of weather predictions, etc. are quite large in size due to many unknowns involved. On top of that, they present time dependency, meaning that the solution changes at each time step – at every second or even at smaller steps. Solving such problems is not a simple task and the use of HPC technologies and infrastructures enables to perform large simulations and to reduce their runtime.

- be familiar with the theoretical background of the Computational Fluid-Mechanics
- be familiar with the most common discretization techniques of the Navier-Stokes equations (Finite Volume, Finite Element)
- Meshing concepts and possibilities within CFD packages

- Introduction to High-Performance Computing;
- Introduction to Continuum Fluid Mechanics;
- Introduction to Turbulence Modeling;
- Introduction to the Finite Volume method;
- Finite Volume Discretization Techniques of Differential Operators;
- Hands-on with OpenFOAM;
- Meshing 2D and convergence study;
- FEM and FVM: From van Karman to Magnus;

One of the key advantages of using HPC is the possibility to do a parallelisation of the problem. In terms of products analysis, problem parallelisation means efficiently dividing a large problem into multiple smaller ones and analysing each of them separately, thus raising the level of detail and speeding up the necessary calculation times. The three dominating programming models that are employed on today’s modern HPC hardware are presented. On clusters and distributed memory architectures, parallel programming with the Message Passing Interface (MPI) is the dominating programming model, whereas OpenMP can be used on shared memory (i.e., on one CPU or on the CPUs of one node of a cluster) and CUDA helps to exploit the capabilities of GPUs.

- Understand the main parallelisation principles
- Take advantage of shared and distributed memory systems as well as accelerators
- Write parallel programs using MPI, OpenMP and CUDA
- Parallelise serial programs by means of MPI, OpenMP and CUDA
- Combine MPI with OpenMP or MPI with CUDA

The need for usage of FEM software for products analysis, suitable for today’s competitive market, is present at all levels. The gathered material presents tools for using HPC that enable addressing problems with higher complexity, and more detailed analysis of problems. Since most of the HE FEM courses are conducted using only one software, typically commercially available, introducing themselves also with Open Source Software and different approaches is beneficial for broadening the perspective.

- Be familiar with the workflow of using HPC
- Understand the theoretical backgrounds behind FEM analyses
- Understand the discretization of a problem in order to transfer from real-life case to a numerical one
- Understand the parallelization principles at using HPC to solve problems
- Be familiar with several different approaches for conducting numerical analyses
- Have an oversight of several different software packages for numerical analyses

- Introduction to the Finite Element Method (FEM);
- Implementation of FEM on HPC;
- HPC parallelization methods with emphasis on MPI;
- Linear vs nonlinear problems;
- Explaining the GPUs for HPC;
- ESPRESO – Highly parallel finite element package for engineering simulations;
- Comparison of FEM with FVM applied to multiphysics cases;

Parallel programming describes the breaking down of a larger problem into smaller steps. Instructions are delivered to multiple processors, which will execute necessary calculations in parallel – hence the name.

- Knowing how to set up and run data analysis in parallel on an HPC cluster with R and Python
- Write parallel programs using MPI, OpenMP and CUDA
- Parallelize serial programs by means of MPI, OpenMP and CUDA
- Combine MPI with OpenMP or MPI with CUDA
- Understand the discretization of a problem in order to transfer from real-life case to a numerical one
- Parallelization of Machine and Deep Learning tasks to use multiple compute nodes and/or multiple accelerators (GPUs)

HPC helps engineers, data scientists, designers, and other researchers solve large, complex problems in far less time and at less cost than traditional computing. The primary benefits of HPC are: Reduced physical testing: HPC can be used to create simulations, eliminating the need for physical tests.

- Be familiar with the workflow of using HPC
- Understand the main parallelisation principles
- Take advantage of shared and distributed memory systems as well as accelerators
- Understand the parallelization principles at using HPC to solve problems
- Understanding the theoretical background of exploratory data analysis and modelling

`Establish generic modeling workflow.`

Learn to use recipes package for reproducible data preprocessing definition.

Define model parameters with parsnip.

Evaluate model performance using yardstick.

Get to know how to handle multiple models in tidy format with workflowsets.

Reinforce

Interactive Modelling with R [pdf]

]]>- What is R and when to consider using it?
- Basic data types
- Programming styles in R
- Very short introduction into tidyverse universe

Introduction to R [pdf]

]]>– Introduction

– New Generation Silicon

– Neuronal Networks

– Optimizations for Inference

– Summary ]]>

- Basic principles of Deep Learning
- Tensorflow/Keras API
- Tensorboard for visualization
- Data processing pipeline (Extract, Transform, Load)
- Support for multi-node/GPU (HPC clusters)