Looks like no one added any tags here yet for you.
Distributed system
Collection of mostly autonomous processor collaborating over a communication network (aren’t connected by common clock)
Features of a distributed system
Geographical separation, no shared memory, autonomy and heterogeneity
Circuit switching
Wires connect networks which are controlled using circuit switches to link certain devices. (guarantee connection, inefficient management)
Packet switching
Messages are broken up and sent via the quickest path across a network to their destination (reliable, high utilisation, must have routing tables)
Client-server model
Centra server connects to clients, who submit requests to server. Servers may query other servers, simpler for client
Peer-to-peer network
Clients connect to each other without need for a central server. Clients share information.
TCP/IP layers
Application, Transport, Internet, Link
Application layer
Used by the applications the user is interacting with, defines protocols to user services, error handling and recovery
Transport layer
Breaks messages down into packets, handshake/checksum, designates how packets will be sent
Internet layer
Routes packets, protocols include deciding the next hop, handling errors in transmission and reassembling packets at destination
Link layer
Network architecture that is physically connected, protocols include converting data to signals to pass along wires
UDP
minimalist, connectionless communication protocol (doesn’t set up connection between clients and uses 8 bytes for header. Lightweight & easy to set up but can’t confirm packet transmission
UDP header
Source port the packet originates from (0 if not used), Destination port, Length of packet + header in bytes (min. of 8, max. of 65507 (IP header is 20 bytes long)), 16 bit one’s-complement sum of all 16-but chunks of data
UDP packet creation
Convert message to binary, add header details, calculate checksum using pseudo header, pass packet to internet layer
TCP
Provides abstraction over the network to detect missing packets and request re-sending.
TCP header
UDP Header + Sequence number, Acknowledgement number, data offset (length of header & additional info), Flags, Window size (how much data receiver wants), Urgent pointer (with URG flag specifies that data is urgent until SEQ + Pointer), Reserved
TCP header flags
SYN (synchronise sequence numbers in initial communication), ACK (whether acknowledgement number is relevant), URG (whether urgent pointer is relevant), FIN (indicates final packet)
TCP handshake
Client sends SYN (seq = x), Server sends SYN-ACK (seq=y, ack=x+1), Client sends SYN-ACK (seq=x+1, ack=y+1)
TCP termination
Client sends FIN, Server acknowledges and sends own FIN, Client acknowledges FIN, waits some period of time then closes connection
Message passing paradigm
Client/servers programmed to know meaning of messages. So clients send messages to the server and relies on the server to understand what it wants. Server replies based on context
Request reply protocol
Client sends input to server, server gets request and selects operation to perform based on that input, then sends a response back to the client.
RPC calls
Used to avoid handling message passing as programmers. We treat execution on other machines as just a function call
Indirect communication
Communicating via an intermediary with no direct coupling between the sender and receivers. Allows for space and time uncoupling
Space uncoupling
Only the broker needs to know if we upgrade/replace the sender or modify a receive
Time uncoupling
Senders and receivers are truly asynchronous, one can be offline and the system will still work
Broker network
Single conceptual identity of multiple machines updating each other and the sender/receivers. Fixes single point of failure
Publish-subscribe model
system where one group can publish information on specific topics and one group can receive information they are interest in (many to many)
Memory managers
One per process, sends messages about local updates to the system and retrieves addressed memory that isn’t cached locally
Distributed memory
Virtual memory can be split across devices which are running processes. Processes still have their own local memory which acts like cache
Disadvantages of distributed memory
Message passing problem still exists, can cause slowdown or deadlock, cache misses are unpredictable and expensive. Useful for parallel applications or those with non-overlapping lifetimes
Tuple spaces
Distributed shared memory that doesn’t use memory. Tuples are sequences of elements, Processors can write, read and take from tuple space. Read and take are blocking so can be used to synchronise
Global state
The set of local states of each individual processes involved int he system plus the state of the communication channels
Local state
Process (the current state of all local memory and history of actions) and Communication channel (all messages currently in transit)
Consistent global state
Conservation of messages (every sent must either be in transit or received), No effects without causes (if not recorded as being sent, it should not be seen in transit or received)
Cuts
Represent global state. It is consistent if it follows rules for a consistent global state
Network time protocol
Designed to synchronise clocks over multiple machines. Trustworthy servers fix drift from local clocks.
Logical time system
Local time (process keeps track to update its own events locally), Global time (process uses this to keep timestamps consistent with other parts of the global system)
Logical clock
Keeps track of the ordering of events C(Mi) returns a value which is the timestamp of the message Mi within the system
Consistent Logical time
For two events e1 → e2 => C(e1) < C(e2)
Strongly consistent logical time
For two events e1 → e2 <=> C(e1) < C(e2)
total ordering
Using the process number to identify events.
vector equality
All elements the same as their counterpart
vector partially ordered
all elements less than or equal to their counterpart
vector fully ordered
all elements less than or equal to their counterparts with at least strictly less than
vector parallel
not fully ordered in either case and are not equivalent
vector causality
if one scalar is less than the other
fully synchronous network
latency is no larger than upper bound which defines time in which messages are guaranteed in
Asynchronous network
messages and nodes mya stop working for no reason (no timing guarantee)
partially synchronous network
mostly synchronous but for some time may be asynchronous
synchroniser
checks that there are no messages lost/awaiting delivery and ensures nodes are not lagging behind.
process safety
synchroniser checks each round before progressing to the next round
simple synchroniser
Each process sends one message to every neighbour (if multiple need to be sent they are combined, if no message needs to be sent, a dummy message is). Once every process receives n-1 messages, continue.
alpha synchroniser
A node proceeds once it learns that its neighbours are safe (doesn’t wait for entire network). T = O(1), C = O(n²) (each process sends n-1 messages)
Beta synchroniser
Create a minimum spanning tree between all nodes, with the root node as the leader. Leaf nodes message their parent when they are safe. Once a parent is safe and has received messages from all its children, it will message its parent. Once the root is safe and has received messages from all its children, it will message nodes to continue. T = O(n), C = O(n) (2n messages sent)
gamma synchroniser
Split network into sub networks and create minimum spanning trees for each. Run the beta synchroniser on each sub network. Connect the leader of each tree and run the alpha synchroniser. T = O(h) (height of the highest tree), C = O(n) (each node sends 4 messages)
Election requirements
Guarantee at most one leader selected, Can have multiple running simultaneously, may be started by any node.
Simple election
Initiator tells all other processes to begin election, all processes share an identifier with each other. Node that never receives a higher identifier than itself is the leader. T = O(1), C = O(n²)
Logical topology
Application specific. Nodes are hosts participating in the algorithm and edges represent logical links
Neighbourhoods
To reduce overheads of fully connected networks, we connect nodes to a small subset. Nodes could be distant in the topology so messages may need multiple hops
Logical overlay
Logical topologies can be brought together as part of a hierarchy. (Each node represents its own topology)
Failure model
No Failure, Fail Stop, Crash, Send/Receive Omission, Byzantine Failure
No failure
Assumes there is no failure of any kind
Fail Stop
A process/node may suddenly fail, but as part of stopping execution it will tell other nodes
Crash
A process/node may fail by stopping but won’t let other nodes know
Send/Receive omission
Process may fail by not sending/receiving some messages, can cause other processes not to receive
Byzantine failure
Process could begin to act arbitrarily (could be malicious)
Connection steps
Listen, Accept, Dial, Write, Close
Byzantine Generals
A source will communicate and initial value and al other processes must reach an agreement with each other: Agreement (all non faulty processes agree on the same value), Validity (if the source is non-faulty then agree with the source), Termination (each non faulty process must eventually decide)
Byzantine Generals unsolvable
If there are m faulty processes and 3m total processes, the problem is unsolvable.
Synchronised naive solution to byzantine generals
If we have 3m + 1 total processes. After receiving messages from the leader, the other nodes confer and settle on majority result. Consensus if achieved regardless of whether the leader is faulty.
Synchroniser General solution to byzantine generals
Algorithm repeated over m rounds recursively. In base case, the leader sends their value to every general. Every general uses the value they receive. For every m round where m > 0, the header sends their value to every general, each general individually acts as the leader and performs Algorithm(m-1). Each general assumes the majority of all messages it received. C = (n^m) messages