Parallel Processing Notes

A technique for simultaneous data processing to increase computational speed.
Achieves faster execution time through concurrent data processing.
May involve multiple ALUs executing instructions simultaneously.
Increases processing during a given time interval.
- Disadvantage: Increased hardware and system cost.

Lowest Level: Distinction between parallel and serial operations via register types.
Shift registers: Serial operation.
Parallel load registers: Simultaneous bit operation.
Parallel processing: Data distribution among multiple functional units.

Classified by:
- Internal processor organization.
- Interconnection structure between processors.
- Information flow through the system.
Instruction stream: Sequence of instructions read from memory.
Data stream: Operations performed on data in processors.
Parallel processing occurs in instruction, data streams, or both.

Four major groups:
- SISD (Single Instruction Stream, Single Data Stream): Single computer with one control, processor, and memory unit.
- SIMD (Single Instruction Stream, Multiple Data Stream): Multiple processing units under a common control unit, operating on different data.
- MISD (Multiple Instruction Stream, Single Data Stream): Rarely used.
- MIMD (Multiple Instruction Stream, Multiple Data Stream): Computer system processing several programs at once (e.g., Multi-processors).

Technique: Decomposing a sequential process into sub-operations.
Collection of processing segments for binary information flow.
Each segment performs partial processing; results are transferred to the next segment.
Final result: Data passes through all segments.

Each segment: Input register followed by a combinational circuit.
Register: Holds data.
Combinational circuit: Performs sub-operation.
Output of a segment: Input to the next segment's register.
Clock: Applied to all registers after segment activity completion.
Example Operation: $A<em>i * B</em>i + C_i$ for i = 1, 2, 3, 4…7
Each sub-operation implemented in a pipeline segment with registers and combinational circuits.

Found in high-speed computers, often for floating-point operations and fixed-point multiplication.
Example: $X = A x 2^a$ , $Y = B x 2^b$
- A & B: Mantissas.
- a & b: Exponents.
Floating-point addition & subtraction in four segments:
- Compare exponents.
- Align mantissas.
- Add or subtract mantissas.
- Normalize the result.

Efficient instruction pipeline use.
Data transfer instructions limited to load/store, using register indirect addressing.
Typically 3 or 4 pipeline stages.
Separate buses/memories for instructions and data to prevent conflicts; often cache memories are used.
Achieves single-cycle instruction execution.
Advantage over CISC: RISC uses single clock cycle segments; CISC requires multiple clock cycles.

Used in specialized applications requiring high performance.
Applications: weather forecasting, petroleum exploration, seismic data analysis, medical diagnosis, AI, and image processing.

Computationally intensive operation in vector processors.
Multiplication of two n x n matrices: $n^2$ inner products or $n^3$ multiply-add operations.

Modular memory is useful in systems with pipeline and vector processing.
Vector processor with n-way interleaved memory can fetch operands from n modules.
CPU with instruction pipeline benefits from multiple memory modules.
Pipeline and vector processors often require simultaneous memory access from multiple sources.

Performs computations on large data arrays.
Types:
- Attached array processor: Auxiliary processor attached to a general-purpose computer.
- SIMD array processor: Single Instruction Multiple Data organization.

Negative fixed-point binary number representations: signed-magnitude, signed-1’s complement, or signed-2’s complement.
Floating-point operations: Typically use signed-magnitude for the mantissa.

When adding or subtracting signed numbers, consider different conditions based on signs and operations.
Hardware implementation requires registers, parallel adder, and comparator circuits.