The gathered material aimed at High-Performance Data Analytics (HPDA) covers data cleaning, exploratory data analysis, modelling using machine and deep learning/AI, and upscaling the codes to the High-Performance Computing (HPC) clusters.
With increased data created daily, data analytics is one of the most progressive fields of computer science. However, many engineers and researchers are confronted with hundreds of gigabytes of data that are often unstructured and complex. Therefore, extracting information from them is difficult due to their complexity and size. In this bundle, you can learn how to prepare data and get a general idea about its characteristics to create meaningful models leveraging such data. Additionally, some techniques for making this process scalable on the HPC architectures are shown. For this purpose, open-source programming languages R and Python are used.
Skills to be gained:
- Understanding the theoretical background of exploratory data analysis and modelling
- Scale data analysis for Big Data in R and Python
- Creating basic Machine and Deep Learning models in R and Python
- Deciding whether to use Machine or Deep Learning methods
- Building data processing pipelines for Machine or Deep Learning tasks
- Knowing how to set up and run data analysis in parallel on an HPC cluster with R and Python
- Parallelization of Machine and Deep Learning tasks to use multiple compute nodes and/or multiple accelerators (GPUs)