The course aimed at High-Performance Data Analytics (HPDA) covers data cleaning, exploratory data analysis, modelling using machine and deep learning/AI, and upscaling the codes to the High-Performance Computing (HPC) clusters, relying on the expertise of HPC specialists coming from four different European countries. The course will take place from June 26 to 30, 2023, at IT4Innovations National Supercomputing Center, VSB – Technical University of Ostrava, Czech Republic. The participants are expected to arrive to Ostrava on Sunday, 25th June 2023.
- registration period until 14/05/2023
- participants selection on 22/05/2023
- Between applicants 10 participants per partner institution (CINECA consortium/Italy, University of Ljubljana/Slovenia, TU Wien/Austria and IT4Innovations, VSB-TUO/Czech republic) will be selected. All travel and accommodation costs will be fully covered for the selected participants. Therefore, please inform us immediately if your plans to participate change and you would like to withdraw your application.
- If you are ineligible for funding and are willing to participate on your own costs - you are welcome to join us.
Description:With increased data created daily, data analytics is one of the most progressive fields of computer science. However, many engineers and researchers are confronted with hundreds of gigabytes of data that are often unstructured and complex. Therefore, extracting information from them is difficult due to their complexity and size. In this course, participants will learn how to prepare data and get a general idea about its characteristics to create meaningful models leveraging such data. Additionally, some techniques for making this process scalable on the HPC architectures will be shown. For this purpose, we will use open-source programming languages R and Python.
Target audience:The program is intended for students and others interested in High-Performance Data Analytics and Machine and Deep Learning who would like to expand their knowledge on performing exploratory data analysis and modelling for real-life problems that need HPC resources to be solved. The number of funded participants per partner institution (CINECA consortium/Italy, University of Ljubljana/Slovenia, TU Wien/Austria and IT4Innovations, VSB-TUO/Czech republic) is limited to 10.
Prerequisite knowledge:Participants should be familiar with the basics of statistics and data analysis. Furthermore, they should be able to work with Linux and have basic knowledge of programming. Basic Python and R knowledge would be of advantage but it is not necessary. No specific experience with supercomputing systems is necessary.
Workflow:The course will take place as an in-person event, using SSH or VNC remote connection to HPC clusters hosted in IT4Innovations, VSB-TUO. The course will take place from June 26 to 30, 2023 at IT4Innovations National Supercomputing Center, VSB – Technical University of Ostrava, Czech Republic. Participants are expected to bring their laptops to the event. Several different software will be demonstrated for dealing with Big Data and Machine and Deep Learning.
Skills to be gained:At the end of the course, the student will gain competencies in the following:
- Using a Linux-based HPC environment
- Understanding the theoretical background of exploratory data analysis and modelling
- Scale data analysis for Big Data in R and Python
- Creating basic Machine and Deep Learning models in R and Python
- Deciding whether to use Machine or Deep Learning methods
- Building data processing pipelines for Machine or Deep Learning tasks
- Knowing how to set up and run data analysis in parallel on an HPC cluster with R and Python
- Parallelization of Machine and Deep Learning tasks to use multiple compute nodes and/or multiple accelerators (GPUs)