In-Depth Notes on Parallel Computing and Architecture

Definition: Not Von Neumann or Harvard architecture; ideal for repetitive tasks with similar algorithms.
Characteristics: Lacks program counter and allows for data-driven execution.
Usage: Common in simulations where data flows cyclically through a process.
Implementation:
1. Program written in a functional language.
2. Translated into a dataflow graph where nodes are instructions and edges represent data dependencies.
3. Executes the graph on hardware dedicated to dataflow processing.

Definition: Discovers potential parallelization autonomously.
Functionality: Works with expressions, finds computable parts, and performs reductions using functional representations.
Example:
- Expression: (x*y)+(x-y)
- Transformation and Reduction Steps:
1. Transform expression to prefixed notation.
2. Compute and return results with x=3 and y=2.

Structure: Utilizes processing nodes for calculations, binary trees for linkage.
Calculation Phases:
1. Distribute nodes into reducible segments.
2. Execute reduction simultaneously across processors.
3. Shift memory and manage results.

Nature: SISD architecture with a single program counter, offering long sequences of instructions executed sequentially.
Characteristics:
- Each instruction is sizeable, controls multiple processors.
- Compiler-generated instructions allow greater performance but face conditional jumps issues.

Functionality: Processes a single data set across multiple instructions (e.g., linear processors in pipelines).

Characteristics: Usable for vector processing, offering a single control unit across multiple processors.
Memory Access Variants:
- NUMA (Non-Uniform Memory Access): Each processor has a local memory and slower access to global memory.
- UMA (Uniform Memory Access): No distinction between local and global memory accesses, creating uniform latencies.

Definition: A distributed operating system managing process groups across various workstations dynamically.
Features:
- Portability and heterogeneity with processor architecture flexibility.
- Allows dynamic process management and error recovery.

Definition: A standard for developing parallel applications across distributed systems.
Communication Types:
- Point-to-point interactions (send/receive).
- Collective communication operations (broadcast/reduce).
Initialization and Finalization:
- MPI_Init(&argc,&argv);
- MPI_Finalize();

Purpose: Ensures correct order and timing of event executions between processes.
Mechanisms:
- Message passing and shared memory access.
Notable Algorithms:
- Semaphore solutions for resource management, preventing race conditions and deadlocks.

Problem: Allowing only one process to enter the critical section at a time.
Solutions:
- Dekker's Algorithm: Guarantees mutual exclusion without starvation.
- Peterson's Algorithm: Ensures both mutual exclusion and absence of deadlock.

Composed of message passing and shared memory techniques to communicate between processes.
Distributed systems often favor message passing for inter-process communications.

Parameters:
- Number of processors and required processes are specified.
- Time complexity (t(n)), work (c(n)), and efficiency metrics must be evaluated for optimal performance.

Discusses numerous sorting algorithms like:
- Enumeration Sort: Excellent for parallel processes but non-optimal in sequential time complexities.
- Odd-even transposition Sort: Efficient for parallel processing with linear complexity.
- Merge Sort Variants: Explored as optimal with respect to their operation volumes.

Each algorithm must validate its operational complexity (both time and pricing).
Examination of algorithmic efficacy across different systems is essential for thorough understanding.

Explores the properties of tree structures for algorithmic implementations, including traversals such as Euler and PreOrder.
Focuses on effective twiddling of weights and node management in complex data structures for effective tree processing.