HLRA Skript

## High-Performance Computer Architectures Practical Course ### Introduction The course provides an introduction to software development for high-performance computers (HPC), covering applications from research routines. Technologies utilized include C++, Unix-based systems, and concepts of parallel programming such as multi-threading and vectorization. Tutors are available to assist with questions regarding examples, and course content may evolve based on student input and suggestions. ### Key Topics Covered The course encompasses several key topics: an introduction to Unix Shell and C++, Neural Networks, Parallelization Methods, the Vector Class Library (Vc), OpenMP, Intel’s Thread Building Blocks (TBB), and Open Computing Language (OpenCL). ## Chapter 1: Introduction - Unix Shell and C++ In this chapter, Unix Shell is defined as a command-line interface that facilitates user interaction with the operating system, efficiently managing complex, repetitive, and large-scale tasks compared to graphical user interfaces (GUIs). Developed in 1969, Unix's foundational elements are evident in modern operating systems like Linux and macOS, which offer multi-tasking and multi-user capabilities. The course emphasizes Linux as the preferred OS due to its open-source nature and adaptability for specific needs in HPC. ### Terminal and Shell Commands The terminal is essential for compiling and executing C++ applications. Key commands include `pwd` (prints the current working directory), `cd` (changes directory), and `ls` (lists directory contents). An example workflow involves navigating through directories using these terminal commands. ### C++ Basics C++ is notable for its efficiency and low-level access capabilities, making it a favored choice in scientific computing. The history and evolution of C++ began with its creation by Bjarne Stroustrup in 1985 as an extension of C, incorporating object-oriented features. C++ has been periodically updated to include capabilities such as templates, smart pointers, and lambda expressions. ### Learning Resources for C++ Recommended online resources for C++ learning include [cppreference.com](https://cppreference.com/) for comprehensive reference and [CppCoreGuidelines](https://isocpp.github.io/CppCoreGuidelines/) for best practices. ### Foundations of Programming A computer executes programs designed to perform specific tasks, which are sequences of instructions interpreted by the machine. Understanding binary logic is vital, as it underpins how computers operate while high-level languages simplify the programming experience. ### Compiling C++ Source Code The compilation process converts high-level code into executable binaries, with recommended compilers being g++ and clang++. A minimal C++ program example consists of the following code: ```cpp int main() { return 0; } ``` This program serves as the entry point for execution. Additionally, a "Hello World!" program example is provided as follows: ```cpp #include int main() { std::cout << "Hello world!" << std::endl; return 0; } ``` ### Primitive Data Types in C++ C++ includes various primitive data types: bool (1 bit), char (8 bits), int (32 bits), float (32 bits), and double (64 bits). ### Types and Casting C++ is statically typed, requiring type knowledge at compile time. The language supports various casting options, including `static_cast` and `dynamic_cast`. ### Pointers in C++ Pointers are defined as variables that store memory addresses of other variables, accessed via the `*` operator for dereferencing. References provide aliases for variables, improving performance by avoiding unnecessary copies. ### Const Correctness The practice of using `const` with references and parameters helps prevent unintended modifications. ### Control Flow Control structures, such as if statements, loops, jumps, and exception handling techniques, are crucial for managing flow in programming. ### Stack vs. Heap Allocation Memory management is divided between stack, which is automatically handled, and heap, which requires manual management using `new` and `delete` operators. ### Classes and Structs Classes and structs define types, with classes defaulting to private access and structs to public access. C++ templates allow developers to create functions and classes that are type-agnostic, enhancing flexibility and reusability. ### Multi-File Projects and Build Tools CMake is highlighted as a widely-used tool for managing multi-file C++ projects. ## Chapter 2: Neural Networks Neural Networks, or Artificial Neural Networks (ANNs), emulate processes of the human brain to enhance learning from data, making them ideal for challenges like image recognition and language processing. Multilayer Perceptrons form the architecture of ANNs, consisting of input, hidden, and output layers with interconnected nodes. The learning algorithm functions by minimizing discrepancies between predicted and actual outputs through weight adjustments. Forward propagation processes inputs through layers, transforming them based on weights and activation functions, while the loss function measures errors guiding the training process. Backpropagation computes gradients for weights to minimize errors efficiently, and gradient descent serves as an optimization technique adjusting weights based on the calculated gradient. ### Important Key Words and Parameters Essential concepts include epochs, batch size, learning rate, underfitting versus overfitting, and various optimization techniques. ## Chapter 3: Parallelization Methods Moore’s Law predicts that computing power will double every 1.5-2 years due to advancements in transistors. The demand for parallelization arises from the increase in CPU cores, necessitating the development of multi-core applications for enhanced performance. Flynn’s Taxonomy classifies computers based on instruction and data streams, with categories including SISD, SIMD, MISD, and MIMD. Techniques for parallelization involve executing smaller sub-tasks simultaneously across multiple processors, with SIMD instructions in C++ optimizing data processing tasks for efficiency in computation-heavy applications. ## Chapter 4: Vector Class Library Vc The Vc library simplifies SIMD optimizations, enhancing performance while maintaining portability across architectures. Anticipated for inclusion in the C++26 standard, Vc extends standard SIMD features and usability. The VcDevel library offers advanced SIMD functionalities that surpass standard options, thereby improving high-performance computational capabilities. ## Chapter 5: OpenMP OpenMP is a portable API designed for parallel programming in shared-memory architectures. It necessitates compatible compilers that support multithreading directives, with key directives managing parallel regions, loop distributions, and critical sections for effective multithreading. Efficient load balancing is crucial to optimize performance in parallel code. ## Chapter 6: Intel’s Thread Building Blocks (TBB) TBB simplifies aspects of parallelism in C++, presenting easier abstractions for parallel programming. It enhances the development of high-performance code by effectively utilizing multi-core processors and reducing bottlenecks in parallel applications. Advanced features like flow graphs illustrate frameworks for expressing parallel tasks and managing dependencies. ## Chapter 7: Open Computing Language (OpenCL) OpenCL standardizes cross-platform, parallel programming across diverse computing devices. Its programming model includes core concepts such as platform, execution, and memory models, which aid comprehension of its operations. Key functions for device management and memory operations are listed to guide users in utilizing OpenCL effectively.

Note