Computer Abstractions & Technology + Machine Learning Practice Flashcards
Chapter 1 — Computer Abstractions and Technology
1.1 Introduction & Classes of Computers
- Pervasiveness of Computing: Computing is embedded in automobiles, mobile phones, the genome project, the Web, and search engines. Progress is driven by technology improvements and domain-specific accelerators.
- Personal Computers (PCs): General-purpose systems designed to run variety of software. They are designed around a tradeoff.
- Server Computers: Network-based systems that emphasize high capacity, performance, and reliability. They range in size from small servers to building-sized facilities.
- Supercomputers: A specialized type of server designed for high-end scientific and engineering calculations. They possess the highest capability but represent a tiny fraction of the market.
- Embedded Computers: Computers hidden inside other systems (e.g., in cars or appliances), characterized by stringent power, performance, and cost constraints.
- The PostPC Era:
- Personal Mobile Devices (PMDs): Battery-operated, internet-connected devices costing hundreds of dollars (e.g., smartphones, tablets, e-glasses).
- Cloud Computing: Runs on Warehouse-Scale Computers (WSC). It delivers Software as a Service (SaaS), where applications are split between the PMD and the cloud (e.g., Amazon, Google).
1.2 Seven Great Ideas in Computer Architecture
- Use abstraction to simplify design: Hiding details to make the system easier to understand.
- Make the common case fast: Improving performance where it matters most.
- Performance via parallelism: Executing multiple tasks simultaneously.
- Performance via pipelining: Overlapping the execution of instructions.
- Performance via prediction: Guessing the outcome of a decision to avoid delays.
- Hierarchy of memories: Using different levels of memory to balance speed, size, and cost.
- Dependability via redundancy: Including extra components to protect against failure.
1.3 Below Your Program
- Software Layers:
- Application Software: Written in high-level languages (HLL).
- System Software: Includes the Compiler (translates HLL into machine code) and the Operating System (manages I/O, memory, storage, task scheduling, and resource sharing).
- Hardware: Includes the processor, memory, and I/O controllers.
- Levels of Program Code:
- High-level language: Provides abstraction near the problem; offers productivity and portability.
- Assembly language: The textual form of instructions.
- Hardware representation: Binary digits (bits) encoding instructions and data.
1.4 Under the Covers
- Five Classic Components: All computers share input, output, memory, datapath, and control.
- The Processor: Composed of the Datapath (performs operations on data) and Control (sequences the datapath and memory).
- Cache Memory: Small, fast SRAM used for immediate data access.
- Displays and Interfaces:
- Touchscreens: Resistive vs. capacitive; capacitive is standard for tablets/phones as it allows multi-touch.
- LCD: Made of pixels; mirrors the contents of the frame buffer memory.
- Abstraction Concepts:
- Instruction Set Architecture (ISA): The hardware/software interface.
- Application Binary Interface (ABI): The combination of the ISA and the system-software interface.
- Memory Types:
- Volatile Main Memory: Loses contents when power is removed.
- Non-volatile Secondary Memory: Magnetic disks, flash memory, and optical disks (CD/DVD).
- Networks: Provide communication and resource sharing. Types include LAN (Ethernet), WAN (the Internet), and wireless (WiFi, Bluetooth).
1.5 Building Processors and Memory
- Technology Progress:
- 1951: Vacuum tube (Relative performance/cost: ).
- 1965: Transistor (Relative performance/cost: ).
- 1975: Integrated circuit (IC) (Relative performance/cost: ).
- 1995: Very-large-scale IC (VLSI) (Relative performance/cost: ).
- 2013: Ultra-large-scale IC (Relative performance/cost: ).
- Manufacturing: Silicon is a semiconductor. Yield is the proportion of working dies per wafer. IC cost relates non-linearly to die area and defect rate.
1.6 Performance Metrics
- Definitions:
- Response time (Latency): Time taken for a single task.
- Throughput: Total work done per unit time.
- Relative Performance Formula:
- Worked Example 1: If Machine A takes and Machine B takes , . Machine A is faster.
- Measuring Time:
- Elapsed (wall-clock) time: Total response time including I/O and OS overhead.
- CPU time: Time spent purely on the job; divided into user CPU time and system CPU time.
- CPU Clocking Equations:
- Worked Example 2: Computer A (, CPU time). Computer B needs CPU time but has the cycles of A.
- .
- .
- .
- Instruction Count and CPI:
- CPI (Cycles Per Instruction): The average cycles used per instruction.
- Weighted CPI:
- Worked Example 3: Computer A (, ) vs. Computer B (, ).
- .
- .
- A is faster.
1.7 The Power Wall
- Dynamic Power Formula:
- Worked Example 5: New CPU has capacitive load, reduction in voltage (), and reduction in frequency ().
- ( reduction).
- The "power wall" stopped single-core frequency scaling because voltage cannot be lowered further and heat cannot be removed efficiently.
1.8 Multiprocessors
- The response to the power wall is multicore microprocessors, placing multiple processors on one chip. This requires explicitly parallel programming, unlike Instruction-Level Parallelism (ILP) which is handled by hardware.
1.9 Benchmarks and Laws
- SPEC (Standard Performance Evaluation Corp): Uses benchmark suites like CINT (integer) and CFP (floating-point). Includes SPEC power ( vs Watts).
- Amdahl's Law: Overall speed-up is limited by the part that cannot be improved.
- Worked Example 6: Multiply takes of a program. To get a speed-up ():
- . No finite works.
- MIPS (Millions of Instructions Per Second):
- Pitfall: MIPS ignores ISA differences and instruction complexity; CPI varies between programs.
Chapter 2 — Python for Data Science & Machine Learning
- Python Characteristics: Simple, versatile, and the most widely used language for data science.
- Operators:
- Arithmetic:
- Logical:
- Data Types:
- Core Libraries:
- NumPy: Numerical arrays and statistics.
- pandas: Data loading (e.g.,
pd.read_csv()) and table manipulation. - matplotlib: Plotting and visualization (scatter, bar, plot).
- scikit-learn (sklearn): Machine-learning models.
Chapter 4 — Regression
4.1 Supervised Learning and Types
- Supervised Learning: Learning from data where correct labels/outcomes are provided.
- Classification: Outcome variable is discrete/categorical.
- Regression: Outcome variable is continuous.
4.3 Linear Regression
- Equation: (where is intercept, is slope).
- Fitting: Uses Ordinary Least Squares (OLS) to minimize squared error.
- Mean Squared Error (MSE):
- Root Mean Squared Error (RMSE): . Interpreted in the same units as .
4.4 Multiple Linear Regression
- Generalizes a line to a hyperplane:
4.5 Regularization (Ridge and Lasso)
- Used to prevent overfitting by penalizing large coefficients.
- Ridge (L2 Penalty):
- Lasso (L1 Penalty):
- Lasso can drive coefficients to zero, performing automatic feature selection.
4.6 Gradient Descent
- A robust procedure for minimizing error by moving "downhill" on the error surface.
- Steps:
- Initialize parameters (m, c).
- Compute cost (e.g., MSE).
- Compute the gradient.
- Update parameters: .
- Repeat until convergence.
Chapter 5 — Classification — Part 1
5.2 k-Nearest Neighbours (k-NN)
- Mechanism: Uses majority voting of the closest points in the feature space.
- Worked Example: A vehicle classified by length/weight. If and neighbors are 1 Sedan and 2 SUVs, the vehicle is classified as an SUV.
5.3 Decision Trees
- Tree Structure: Composed of decision nodes (splits) and leaf nodes (class labels).
- Entropy Formula:
- Conditional Entropy:
- Information Gain:
- Worked Example (Balloons Dataset):
- Class variable "Inflated": . Total .
- Split on "Act": "Dip" (), "Stretch" ().
- .
5.4 Random Forest
- An ensemble of many decision trees to prevent overfitting.
- Bootstrap sampling: Sampling N records with replacement for each tree.
- Random feature selection: Selecting a random subset of features out of at each node split.
- Aggregation: Majority vote for classification, average for regression.
Chapter 6 — Classification — Part 2
6.1 Logistic Regression
- Used for two-class problems to convert continuous output to probability.
- Sigmoid Function:
- Probability Model: . Thresholded usually at .
- Log-Likelihood: Maximized during training to find best parameters.
- ROC and AUC: Plotting True-Positive Rate (TPR) vs. False-Positive Rate (FPR). AUC closer to signifies a good classifier.
6.2 Softmax Regression (Multinomial)
- Generalizes logistic regression for multiple categories.
- Softmax Function:
6.3 Naïve Bayes
- Based on Bayes' Theorem with the "naïve" assumption of independence between predictors.
- Formula:
- Variants:
- MultinomialNB: For counts/categories.
- GaussianNB: For continuous features assuming a normal distribution.
6.4 Support Vector Machine (SVM)
- Searches for the optimal separating hyperplane in a higher dimension.
- Support Vectors: The points closest to the hyperplane.
- Margin: The distance between the hyperplane and the support vectors; SVM maximizes this margin.
Appendix — Python Coding Reference
A.5 The Universal scikit-learn ML Recipe
This 7-step pattern applies to virtually all models in Chapters 4-6:
- Import: Get the model and tools (e.g.,
from sklearn.linear_model import LinearRegression). - Load: Load CSV using pandas:
df = pd.read_csv("data.csv"). - Split X/y:
X = df.drop(columns=["target"]),y = df["target"]. - Train/Test Split:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30). - Create Model: Instantiate the specific model (e.g.,
model = KNeighborsClassifier(n_neighbors=3)). - Train:
model.fit(X_train, y_train). - Predict & Score:
preds = model.predict(X_test),print(accuracy_score(y_test, preds)).
A.6 Model Creation Lines
- Linear Regression:
model = LinearRegression() - Ridge/Lasso:
model = Ridge(alpha=1.0)ormodel = Lasso(alpha=1.0) - k-NN:
model = KNeighborsClassifier(n_neighbors=3) - Decision Tree:
model = DecisionTreeClassifier(criterion="entropy") - Random Forest:
model = RandomForestClassifier(n_estimators=100) - Logistic Regression:
model = LogisticRegression() - Softmax:
model = LogisticRegression(multi_class="multinomial") - SVM:
model = SVC(kernel="linear")