Session 16: Parallel Computer Architecture
Parallel Computer Architecture → The method of organizing all the resources to maximize the performance and programmability within the limits given by technology and cost at any instance of time
Unified Memory Access → Model where all the processor share the physical memory uniformly
- They each have equal access time to all memory
- Symmetric Multiprocessor → When all the process have equal access to all the peripheral devices in a system
- Asymmetric Multiprocessor → When only one or a few processors can access the peripheral devices in a system
- Peripheral Device → Can transfer data to or from memory without involving the processor
- Ex) printer, mouse, scanner, keyboard
No Uniform Memory Access → Model where memory access time varies with the location of the memory itself
- Local Memory → When memory is physically distributed among all the processors
- Each processor has its own local memory which it can access faster than non local memory
Distributed Memory Multicomputer System → Consists of multiple computers (nodes) that are interconnected by a message passing network
- Each node acts autonomously, possessing a processor, local memory, and I/O devices
- All local memories are private and only accessible to the local processors
Pipelining → A technique that divides a task into smaller subtasks and assigning them to different processors that can work concurrently
- Improves performance and efficiency
- Steps:
- Fetch → Fetch the next instruction to be executed from memory
- Decode → Decode the instructions
- Execute → Perform the actual operation given by the instructions
- Write Back → Write the result to the target register
Parallelism by Multiple Functional Units:
- The number of functional units that can efficiently be utilized is restrict by data dependencies between neighboring instructions
- Superscalar Processors → Dependencies are determined at runtime dynamically by the hardware and decoding instructions are dispatched to instruction units using dynamic scheduling
Parallelism at Process or Thread Level → System where each core of a multicore processor must obtain a separate flow of control
- Each core access the same memory and share caches
- Memory accesses must be coordinated
Parallelism in Hardware:
- Data Parallelism → Increases the amount of data to operated on at the same time
- Processor Parallelism → Increases the amount of processors
- Memory System Parallelism → Increases the number of memory units and increases communication bandwidth
- Communication Parallelism → Increases the amount of interconnections between elements and increases communication bandwidth
Dataflow Architectures → Architecture that processes data based on the availability and dependencies of the data rather than the sequential order of instructions
Coherence → Implies that writes to a location become visible to all processors in the same order
- Implement hardware protocol based on model of memory consistency
Sequential Consistency → Ensures that the order in which operations are executed by different processes in the system appears to be consistent with a global order of execution, and that this order is also consistent with the order of operations on each individual process
- Each process issues memory operations in program order
- The issuing process waits for the write to be complete before issuing the next operation
ACID Transactions:
- Atomicity → Entire transaction takes place at once
- Consistency → Database must be consistent before and after the transaction
- Isolation → Multiple transactions occur independently without interference
- Durability → Changes of a successful transaction occurs even if there is a system failure
Session 17: Distributed Memory Systems
Distributed Memory Systems → A form of memory architecture where physically separated memory can be addresses as a single shared address space
- The same physical address on two processors refers to the same location in memory
- Page Based Approach → Uses virtual memory to map pages of shared data to the local memory of each processor
- Shared Variable Approach → Uses routines to access shared variables that are distributed across the processors
- Object Based Approach → Uses objects as the units of data distribution and access with each object having a set of methods that can be performed on the processors
Session 18: Interconnection Network Design
Components of Interconnection Networks:
- Links → A cable of one or more optical fibers or electrical wires that transmits analog signals from one end to the other to obtain the original digital information
- Switches → Composes of a set of I/O ports, an internal cross bar connecting input to output, internal buffers, and they control the logic to affect the I/O connection at each point in time
- Network Interfaces → Formats the packets and constructs the routing and controls information
- May check end to end error and flow control
Topology → The pattern to connect the individual switches to other elements like processors, memories, and other switches
Types of Networks:
- Direct Connection Networks → Fixed point to point connections between neighboring nodes
- Fixed topology
- Rings, meshes, cubes
- Indirect Connection Networks → Communication topology can be change dynamically based on the application demands
- Bus networks, multistage networks, crossbar switches
Routing → Determines which of the possible paths from source to destination is used as routes and how the route is followed by each particular packet is determined
Dimension Order Routing → Limits the set of legal paths so that there is exactly one route from each source to each destination
Deterministic Routing → The route taken by a message is determined exclusively by its source and destination and not by other traffic in the network
Minimal Routing Algorithm → Selects the shortest route toward the destination of the message
Domain Name System (DNS) → Translate domain names (i.e. nytimes.com) to IP addresses so browsers can load internet resources
IP Address → Unique ID for a device connected to the internet and allows browsers to interact
DNS Recursor → A server designed to receive queries from client machines through web browsers
- Also responsible for making additional requests to satisfy the query
Root Nameserver → First step in translating humans readable host names into IP addresses
Top Level Domain (TLD ) Nameserver → The next step in the search for a specific IP address, it hosts the last portion of a hostname (TLD server in example.com is com)
Authoritative Nameserver → If it has access to the requested record, it will return the IP address for the requested hostname to the DNS recursor
Transmission Control Protocol (TCP) → Provides reliable, ordered, and error-checked delivery of a stream of bytes between applications running on hosts communicating via an IP network
User Datagram Protocol (UDP) → Communications protocol that is used to establish low latency and loss tolerating connections between applications on the internet
- Speeds up transmission by enabling the transfer of data before an agreement is provided
Open Systems Interconnection (OSI) Model → Conceptual model that enables diverse communication systems to communicate using standard protocols
Session 19: Distributed Systems
Distributed System → A collection of interconnected computers that work together to achieve a common goal
- Designed to process and store data, perform computations, and provide services across multiple machines
- Offer solutions to challenges of scalability, fault tolerance, and performance
Decentralization → Reduces single points of failure and enhances fault tolerance
Heterogeneity in Distributed Systems:
- Distributed systems often consist of diverse hardware and software components
- Components vary in processing power, operating systems, and programming languages
- Can lead to compatibility and communication issues
Concurrency of Distributed Systems → Distributed systems leverage concurrency and parallelism to improve performance and throughput
Redundancy → Multiple copies of data or services are maintained to ensure availability in case of failures
Fault Tolerance → Mechanisms that include replication and data recovery techniques
Challenges and Considerations in Distributed Systems:
- Communication Challenges
- Network latency, packet loss, and need for reliable communication protocols
- Consistency and Data Synchronization
- Distributed systems often employ techniques like distributed transactions, quorum systems, and consensus algorithms to maintain data integrity
- Security and Authentication
- Security is super important in distributed systems and it requires authentication, authorization, and encryption components
- Scalability and Load Balancing
- Load balancing is the distribution of work evenly across nodes to ensure optimal utilization
- Complex if it is a large scale system
Client Server Architecture → Clients request services or resources from central servers and central servers handle data processing and storage
Peer to Peer Architecture → Allows distributed nodes (peers) to act as both clients and servers where peers share resources directly without a central server
Microservices Architecture → Where an application is composed of small, independent services that focus on a specific function and communicates through APIs
- Common in cloud based applications
Distributed Storage Systems → NoSQL, Cassandra, MongoDB, Hadoop Distributed File System (HDFS) for big data all use distributed storage
Advantages of Distributed Systems:
- Provide horizontal scalability to handle increased workloads
- Using parallelism enhances performance
- Withstand system failures
- Redundancy provide fault tolerance
- High availability through load balancing
- Enable applications to serve users worldwide
Content Delivery Network (CDN) → Uses geographical distribution to reduce latency and improve user experience
Session 22: Big Data, Hadoop, Spark, & The Cloud
Big Data → Data that contains greater variety arriving in increasing volumes with ever higher velocity
- 3 Vs → Variety, volume, velocity
Velocity:
- 90% of the world’s data has been created in the last two yeas
Variety:
- 80% of data is unstructured
- 20% is structured
Advanced Analytic Capabilities:
- data/text mining
- machine learning
- pattern matching
- forecasting
- semantic analysis
- sentiment analysis
- network and cluster analysis
- multivariate statistics
- neural networks
Hadoop → A technology that gives companies the ability to store and process huge amounts of data
- Distributed File System
- How it works:
- Every file in the system is split into several blocks
- Each block is stored in different data nodes
- Each block is replicated 3 times (by default) to recover in case of data loss
- Blocks help store large files and provide fault tolerance
Apache Spark → A multi-language engine for executing data engineering, data science, and machine learning on single node machines or clusters
- Evolution of spark
- Quickly performs processing tasks across multiple computers
- Components:
- Driver → Converts the user’s code into multiple tasks that can be distributed across worker nodes
- Executors → Run on the worker nodes and execute the tasks assigned to them
Session 24: Design Patterns
Design Pattern → A reusable solution to a commonly occurring problem within a given context in software design
- A template/blueprint that can be applied to solve a particular design problem in a way that is effective and efficient
Creational Patterns → Deal with the process of object creation
Structural Patterns → Focus on the composition of classes or objects
Behavioral Patterns → Define the ways in which objects interact and communicate with each other
Session 25: P Problems & NP Problems
Algorithmic Complexity → Refers to the efficiency problem of algorithms in terms of time and space requirements
- Assesses how the performance of an algorithm scales with an increase in input size
Efficient Algorithms → Crucial for solving real world problems within reasonable time frames and resource constraints
Complex Problems → Often require exponential time to solve, making them impractical for large datasets
- Balancing accuracy and efficiency becomes a significant challenge
Polynomial (P) Problems → Problems that are solvable in polynomial time
- They have a predictable execution time related to the input size
- Ex) Quick sort
Non Deterministic Polynomial (NP) Problems → Problems whose validity can be verified in polynomial time
- While the solution’s correctness may not be found quickly, its validity can be checked in polynomial time
Decision Problems → Involve determining a binary outcome based on the input
- Ex) Is there a path between two nodes?
Optimization Problems → Seek the best solution from a set of feasible solutions
- Ex) Finding the shortest path between two nodes
Reducing Execution Time → Parallel algorithms can significantly reduce overall computation time for certain problems
Complex Problem Considerations:
- Scalability → Algorithm must ensure efficiency as the problem size and available resources increase
- Synchronization → Algorithm must be able to coordinate parallel tasks without causing bottlenecks
P-Complete Class →A class of problems that are as hard as the hardest problems in polynomial time
- Implies a polynomial time solution for all problems in P
NP-Complete Class → A class of problems to which NP can be reduced in polynomial time
- If a polynomial time algorithm exists for one NP complete problem, it implies a polynomial time algorithm for all problems in NP
P-Complete Significance → If any P-Complete problem has a polynomial time solution, it implies P equals NP
NP-Complete Significance → If any NP-Complete problem has a polynomial time solution, P equals NP
Nick’s Class (NC) → Represents problems efficiently solvable in parallel, emphasizing low depth circuits
P-Completeness in Parallel Computing → Alass of problems addressing the complexity of parallel algorithms
Parallelism in P → Problems in P can be efficiently solved in parallel
Parallelism in NP → More complex
- NP problems are more complex as they lack known polynomial time solutions
- Efficient parallel algorithms involve communication between processors
- Minimizing communication overhead while maintaining efficiency is difficult
- Synchronizing parallel tasks so that they do not interfere with each other is difficult
- Divide and Conquer → Parallel computing divides complex problems into smaller, manageable tasks, processed concurrently
- Heuristic Approaches → Develop parallel algorithms that use heuristic methods to find approximate solutions quickly
- Sacrifice optimality for speed
- Parallelization of Special Cases → Tailoring algorithms to exploit the structure of specific instances can enhance parallel performance
Quantum Computing → A type of computing that uses quantum mechanical phenomena (superposition and entanglement) to perform operations on data
- Superposition → Means that a quantum bit can exist in a combination of two states at the same time
- Entanglement → Means that two or more bits can share a quantum state and influence each other even when they are far apart
- Superposition and entanglement allow quantum computers to perform parallel computations to solve complex problems faster
Quantum Gates → Basic building blocks of quantum circuits that perform operations on qubits (quantum bits) to manipulate their state
Interference → Occurs when two quantum states interfere with each other in such a way that their amplitudes add or subtract, depending on their relative phases and is used to cancel out unwanted outcomes and amplify desired outcomes
Session 26: YARN
MapReduce → A programming model or pattern within the Hadoop framework that is used to access big data in the Hadoop File System (HDFS)
- Facilitates concurrent processing by splitting data and processing them in parallel on Hadoop commodity servers and aggregating it again
- Processing is executed on the server where the data resides to expedite processing
- Data access and storage is disk based
- Process:
- Map → Split data into smaller blocks and assign them to mappers for processing
- Reduce → Map output values that have the same key are assigned to a single reducer
- Combine → (Optional) A reducer that runs individually on each mapper server
- Partition → Translates key value pairs from mappers to another set of key value pairs to feed into the reducer
- Decided how the data has to be presented
YARN → Goal is to split up the functionalities of resource management and job scheduling/monitoring into separate daemons
- It implements a global Resource Manager (RM) per application Application Master (AM)
YARN Components:
- Container → Holds physical resources like a disk on a single node, CPU cores, or RAM
- Container Launch Context (CLC) → Contains records of dependencies, security tokens, environment variables
- Application Master → Posts CLC by requesting the container from the node manager
- Node Manager → Takes care of individual nodes in the Hadoop cluster and manages containers related to each node
- It is registered to the Resource Manager and sends each node’s health status (stating if the node process has finished working with the resource)
- Resource Manager → The master daemon of YARN and assigns resources
- Scheduler → Responsible for allocating resources to the various applications subject to familiar constraints of capacities, queues, etc
- Applications Manager → Responsible for accepting job-submissions, negotiating with the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure
YARN Features:
- Multi-tenancy → Allows access to multiple data processing engines
- Cluster performance → Optimized utilization of clusters
- Compatibility → With first version of hadoop
- Scalability → Thousands of clusters and nodes are allowed by the scheduler