Parallel Final Notes

Session 16: Parallel Computer Architecture

Parallel Computer Architecture → The method of organizing all the resources to maximize the performance and programmability within the limits given by technology and cost at any instance of time

Unified Memory Access → Model where all the processor share the physical memory uniformly

  • They each have equal access time to all memory
  • Symmetric Multiprocessor → When all the process have equal access to all the peripheral devices in a system
  • Asymmetric Multiprocessor → When only one or a few processors can access the peripheral devices in a system
  • Peripheral Device → Can transfer data to or from memory without involving the processor
    • Ex) printer, mouse, scanner, keyboard

No Uniform Memory Access → Model where memory access time varies with the location of the memory itself

  • Local Memory → When memory is physically distributed among all the processors
  • Each processor has its own local memory which it can access faster than non local memory

Distributed Memory Multicomputer System → Consists of multiple computers (nodes) that are interconnected by a message passing network

  • Each node acts autonomously, possessing a processor, local memory, and I/O devices
    • All local memories are private and only accessible to the local processors

Pipelining → A technique that divides a task into smaller subtasks and assigning them to different processors that can work concurrently

  • Improves performance and efficiency
  • Steps:
  1. Fetch → Fetch the next instruction to be executed from memory
  2. Decode → Decode the instructions
  3. Execute → Perform the actual operation given by the instructions
  4. Write Back → Write the result to the target register

Parallelism by Multiple Functional Units:

  • The number of functional units that can efficiently be utilized is restrict by data dependencies between neighboring instructions
  • Superscalar Processors → Dependencies are determined at runtime dynamically by the hardware and decoding instructions are dispatched to instruction units using dynamic scheduling

Parallelism at Process or Thread Level → System where each core of a multicore processor must obtain a separate flow of control

  • Each core access the same memory and share caches
    • Memory accesses must be coordinated

Parallelism in Hardware:

  1. Data Parallelism → Increases the amount of data to operated on at the same time
  2. Processor Parallelism → Increases the amount of processors
  3. Memory System Parallelism → Increases the number of memory units and increases communication bandwidth
  4. Communication Parallelism → Increases the amount of interconnections between elements and increases communication bandwidth

Dataflow Architectures → Architecture that processes data based on the availability and dependencies of the data rather than the sequential order of instructions

  • Hard to build correctly

Coherence → Implies that writes to a location become visible to all processors in the same order

  • Implement hardware protocol based on model of memory consistency

Sequential Consistency → Ensures that the order in which operations are executed by different processes in the system appears to be consistent with a global order of execution, and that this order is also consistent with the order of operations on each individual process

  • Conditions:
  1. Each process issues memory operations in program order
  2. The issuing process waits for the write to be complete before issuing the next operation

ACID Transactions:

  1. Atomicity → Entire transaction takes place at once
  2. Consistency → Database must be consistent before and after the transaction
  3. Isolation → Multiple transactions occur independently without interference
  4. Durability → Changes of a successful transaction occurs even if there is a system failure

Session 17: Distributed Memory Systems

Distributed Memory Systems → A form of memory architecture where physically separated memory can be addresses as a single shared address space

  • The same physical address on two processors refers to the same location in memory
  • Page Based Approach → Uses virtual memory to map pages of shared data to the local memory of each processor
  • Shared Variable Approach → Uses routines to access shared variables that are distributed across the processors
  • Object Based Approach → Uses objects as the units of data distribution and access with each object having a set of methods that can be performed on the processors

Session 18: Interconnection Network Design

Components of Interconnection Networks:

  1. Links → A cable of one or more optical fibers or electrical wires that transmits analog signals from one end to the other to obtain the original digital information
  2. Switches → Composes of a set of I/O ports, an internal cross bar connecting input to output, internal buffers, and they control the logic to affect the I/O connection at each point in time
  3. Network Interfaces → Formats the packets and constructs the routing and controls information
  • May check end to end error and flow control

Topology → The pattern to connect the individual switches to other elements like processors, memories, and other switches

Types of Networks:

  1. Direct Connection Networks → Fixed point to point connections between neighboring nodes
  • Fixed topology
  • Rings, meshes, cubes
  1. Indirect Connection Networks → Communication topology can be change dynamically based on the application demands
  • Bus networks, multistage networks, crossbar switches

Routing → Determines which of the possible paths from source to destination is used as routes and how the route is followed by each particular packet is determined

Dimension Order Routing → Limits the set of legal paths so that there is exactly one route from each source to each destination

Deterministic Routing → The route taken by a message is determined exclusively by its source and destination and not by other traffic in the network

Minimal Routing Algorithm → Selects the shortest route toward the destination of the message

Domain Name System (DNS) → Translate domain names (i.e. nytimes.com) to IP addresses so browsers can load internet resources

IP Address → Unique ID for a device connected to the internet and allows browsers to interact

DNS Recursor → A server designed to receive queries from client machines through web browsers

  • Also responsible for making additional requests to satisfy the query

Root Nameserver → First step in translating humans readable host names into IP addresses

Top Level Domain (TLD ) Nameserver → The next step in the search for a specific IP address, it hosts the last portion of a hostname (TLD server in example.com is com)

Authoritative Nameserver → If it has access to the requested record, it will return the IP address for the requested hostname to the DNS recursor

Transmission Control Protocol (TCP) → Provides reliable, ordered, and error-checked delivery of a stream of bytes between applications running on hosts communicating via an IP network

User Datagram Protocol (UDP) → Communications protocol that is used to establish low latency and loss tolerating connections between applications on the internet

  • Speeds up transmission by enabling the transfer of data before an agreement is provided

Open Systems Interconnection (OSI) Model → Conceptual model that enables diverse communication systems to communicate using standard protocols

Session 19: Distributed Systems

Distributed System → A collection of interconnected computers that work together to achieve a common goal

  • Designed to process and store data, perform computations, and provide services across multiple machines
  • Offer solutions to challenges of scalability, fault tolerance, and performance

Decentralization → Reduces single points of failure and enhances fault tolerance

Heterogeneity in Distributed Systems:

  • Distributed systems often consist of diverse hardware and software components
    • Components vary in processing power, operating systems, and programming languages
  • Can lead to compatibility and communication issues

Concurrency of Distributed Systems → Distributed systems leverage concurrency and parallelism to improve performance and throughput

Redundancy → Multiple copies of data or services are maintained to ensure availability in case of failures

Fault Tolerance → Mechanisms that include replication and data recovery techniques

Challenges and Considerations in Distributed Systems:

  1. Communication Challenges
  • Network latency, packet loss, and need for reliable communication protocols
  1. Consistency and Data Synchronization
  • Distributed systems often employ techniques like distributed transactions, quorum systems, and consensus algorithms to maintain data integrity
  1. Security and Authentication
  • Security is super important in distributed systems and it requires authentication, authorization, and encryption components
  1. Scalability and Load Balancing
  • Load balancing is the distribution of work evenly across nodes to ensure optimal utilization
    • Complex if it is a large scale system

Client Server Architecture → Clients request services or resources from central servers and central servers handle data processing and storage

Peer to Peer Architecture → Allows distributed nodes (peers) to act as both clients and servers where peers share resources directly without a central server

Microservices Architecture → Where an application is composed of small, independent services that focus on a specific function and communicates through APIs

  • Common in cloud based applications

Distributed Storage Systems → NoSQL, Cassandra, MongoDB, Hadoop Distributed File System (HDFS) for big data all use distributed storage

Advantages of Distributed Systems:

  1. Provide horizontal scalability to handle increased workloads
  2. Using parallelism enhances performance
  3. Withstand system failures
  4. Redundancy provide fault tolerance
  5. High availability through load balancing
  6. Enable applications to serve users worldwide

Content Delivery Network (CDN) → Uses geographical distribution to reduce latency and improve user experience

Session 22: Big Data, Hadoop, Spark, & The Cloud

Big Data → Data that contains greater variety arriving in increasing volumes with ever higher velocity

  • 3 Vs → Variety, volume, velocity

Velocity:

  • 90% of the world’s data has been created in the last two yeas

Variety:

  • 80% of data is unstructured
  • 20% is structured

Advanced Analytic Capabilities:

  1. data/text mining
  2. machine learning
  3. pattern matching
  4. forecasting
  5. semantic analysis
  6. sentiment analysis
  7. network and cluster analysis
  8. multivariate statistics
  9. neural networks

Hadoop → A technology that gives companies the ability to store and process huge amounts of data

  • Distributed File System
  • How it works:
  1. Every file in the system is split into several blocks
  2. Each block is stored in different data nodes
  3. Each block is replicated 3 times (by default) to recover in case of data loss
  • Blocks help store large files and provide fault tolerance

Apache Spark → A multi-language engine for executing data engineering, data science, and machine learning on single node machines or clusters

  • Evolution of spark
  • Quickly performs processing tasks across multiple computers
  • Components:
  1. Driver → Converts the user’s code into multiple tasks that can be distributed across worker nodes
  2. Executors → Run on the worker nodes and execute the tasks assigned to them

Session 24: Design Patterns

Design Pattern → A reusable solution to a commonly occurring problem within a given context in software design

  • A template/blueprint that can be applied to solve a particular design problem in a way that is effective and efficient

Creational Patterns → Deal with the process of object creation

Structural Patterns → Focus on the composition of classes or objects

Behavioral Patterns → Define the ways in which objects interact and communicate with each other

Session 25: P Problems & NP Problems

Algorithmic Complexity → Refers to the efficiency problem of algorithms in terms of time and space requirements

  • Assesses how the performance of an algorithm scales with an increase in input size

Efficient Algorithms → Crucial for solving real world problems within reasonable time frames and resource constraints

Complex Problems → Often require exponential time to solve, making them impractical for large datasets

  • Balancing accuracy and efficiency becomes a significant challenge

Polynomial (P) Problems → Problems that are solvable in polynomial time

  • They have a predictable execution time related to the input size
  • Ex) Quick sort

Non Deterministic Polynomial (NP) Problems → Problems whose validity can be verified in polynomial time

  • While the solution’s correctness may not be found quickly, its validity can be checked in polynomial time

Decision Problems → Involve determining a binary outcome based on the input

  • Ex) Is there a path between two nodes?

Optimization Problems → Seek the best solution from a set of feasible solutions

  • Ex) Finding the shortest path between two nodes

Reducing Execution Time → Parallel algorithms can significantly reduce overall computation time for certain problems

Complex Problem Considerations:

  1. Scalability → Algorithm must ensure efficiency as the problem size and available resources increase
  2. Synchronization → Algorithm must be able to coordinate parallel tasks without causing bottlenecks

P-Complete Class →A class of problems that are as hard as the hardest problems in polynomial time

  • Implies a polynomial time solution for all problems in P

NP-Complete Class → A class of problems to which NP can be reduced in polynomial time

  • If a polynomial time algorithm exists for one NP complete problem, it implies a polynomial time algorithm for all problems in NP

P-Complete Significance → If any P-Complete problem has a polynomial time solution, it implies P equals NP

NP-Complete Significance → If any NP-Complete problem has a polynomial time solution, P equals NP

Nick’s Class (NC) → Represents problems efficiently solvable in parallel, emphasizing low depth circuits

P-Completeness in Parallel Computing → Alass of problems addressing the complexity of parallel algorithms

Parallelism in P → Problems in P can be efficiently solved in parallel

Parallelism in NP → More complex

  • Challenges:
  1. NP problems are more complex as they lack known polynomial time solutions
  2. Efficient parallel algorithms involve communication between processors
  • Minimizing communication overhead while maintaining efficiency is difficult
  1. Synchronizing parallel tasks so that they do not interfere with each other is difficult
  • Approaches:
  1. Divide and Conquer → Parallel computing divides complex problems into smaller, manageable tasks, processed concurrently
  2. Heuristic Approaches → Develop parallel algorithms that use heuristic methods to find approximate solutions quickly
  • Sacrifice optimality for speed
  1. Parallelization of Special Cases → Tailoring algorithms to exploit the structure of specific instances can enhance parallel performance

Quantum Computing → A type of computing that uses quantum mechanical phenomena (superposition and entanglement) to perform operations on data

  • Superposition → Means that a quantum bit can exist in a combination of two states at the same time
  • Entanglement → Means that two or more bits can share a quantum state and influence each other even when they are far apart
  • Superposition and entanglement allow quantum computers to perform parallel computations to solve complex problems faster

Quantum Gates → Basic building blocks of quantum circuits that perform operations on qubits (quantum bits) to manipulate their state

Interference → Occurs when two quantum states interfere with each other in such a way that their amplitudes add or subtract, depending on their relative phases and is used to cancel out unwanted outcomes and amplify desired outcomes

Session 26: YARN

MapReduce → A programming model or pattern within the Hadoop framework that is used to access big data in the Hadoop File System (HDFS)

  • Facilitates concurrent processing by splitting data and processing them in parallel on Hadoop commodity servers and aggregating it again
  • Processing is executed on the server where the data resides to expedite processing
  • Data access and storage is disk based
  • Process:
  1. Map → Split data into smaller blocks and assign them to mappers for processing
  2. Reduce → Map output values that have the same key are assigned to a single reducer
  3. Combine → (Optional) A reducer that runs individually on each mapper server
  4. Partition → Translates key value pairs from mappers to another set of key value pairs to feed into the reducer
  • Decided how the data has to be presented

YARN → Goal is to split up the functionalities of resource management and job scheduling/monitoring into separate daemons

  • It implements a global Resource Manager (RM) per application Application Master (AM)

YARN Components:

  1. Container → Holds physical resources like a disk on a single node, CPU cores, or RAM
  • Container Launch Context (CLC) → Contains records of dependencies, security tokens, environment variables
  1. Application Master → Posts CLC by requesting the container from the node manager
  2. Node Manager → Takes care of individual nodes in the Hadoop cluster and manages containers related to each node
  • It is registered to the Resource Manager and sends each node’s health status (stating if the node process has finished working with the resource)
  1. Resource Manager → The master daemon of YARN and assigns resources
  2. Scheduler → Responsible for allocating resources to the various applications subject to familiar constraints of capacities, queues, etc
  3. Applications Manager → Responsible for accepting job-submissions, negotiating with the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure

YARN Features:

  1. Multi-tenancy → Allows access to multiple data processing engines
  2. Cluster performance → Optimized utilization of clusters
  3. Compatibility → With first version of hadoop
  4. Scalability → Thousands of clusters and nodes are allowed by the scheduler