Theoretical Computer Science & Machine Learning – Exam Notes

Algorithms and Data Structures

IPO-Model: Input → Process → Output; data structures supply input format, algorithms implement process.

- Digital vs. Analog computers: discrete vs. continuous encoding; programmability unique to digital.

Algorithms

Formal definition (Webster 2021): finite, unambiguous, step-by-step procedure mapping finite inputs to outputs in finite time.
Correctness (specification / verification) and efficiency (time, space).
Euclidean GCD shown in iterative and recursive pseudocode.

- Key questions: Why? (specification) – Does it work? (verification) – How efficient? (analysis).

Data Structures

Arrays (1-D & 2-D), Linked Lists, Queues (FIFO), Stacks (LIFO), Heaps (min-/max-binary tree).

- Choose structure by access pattern & resource constraints.

Graphs & Trees

Directed graph $G=(V,R,\alpha,\omega)$ with out-degree $g^+(v)$ , in-degree $g^-(v)$ .
Undirected $G=(V,E,\gamma)$ .
Representations (space):
• Incidence $\Theta(|V|\cdot |R|)$
• Adjacency matrix $\Theta(|V|^2)$
• Adjacency list $\Theta(|V|+|R|)$ (directed) / $\Theta(|V|+2|E|)$ (undirected).
Trees: directed acyclic graph with single root (or unique undirected path). Binary tree $g^+(v)\le2$ .

- Use cases: Euler’s Königsberg bridges, network attack graphs, BloodHound AD analysis.

Sorting & Searching

Selection, Insertion, Bubble $\mathcal O(n^2)$ ; Quick Sort avg $\mathcal O(n\log n)$ .

- Searching: Linear $\mathcal O(n)$ , Binary $\mathcal O(\log n)$ (needs sorted), String matching – Naïve, KMP $\mathcal O(m+n)$ , Boyer-Moore.

Algorithm Analysis

Primitive operations → cost function $f(n)$ ; asymptotic classes: \log n< n < n\log n< n^2< b^n < n!
Big-O upper bound, $\Omega$ lower, $\Theta$ tight.
Example loops: constants ignored; nested loops multiply.

Formal Languages & Automata

Grammar $G=(V,\Sigma,P,S)$ where $P$ are productions $l\to r$ .
Chomsky hierarchy $L0\supset L1\supset L2\supset L3$
1. Type-3 Regular (right-linear) – DFA/NFA.
2. Type-2 Context-free – PDA.
3. Type-1 Context-sensitive – LBA.
4. Type-0 Recursively enumerable – Turing machine.
Regular language example $(ab)^n$ ; conversion between grammar↔DFA; Rabin-Scott: NFA ≡ DFA.
Context-free: ${a^n b^n}$ , CNF conversion, CYK parsing, prefix function for KMP.
Context-sensitive: ${a^n b^n c^n}$ , productions keep length, Turing-level power.
Pushdown automaton $A=(S,\Sigma,\Gamma,\delta,s_0)$ with stack symbol $\bot$ .

Computability, Decidability & Complexity

Halting problem undecidable (Turing 1936).
Church-Turing Thesis: every effectively calculable function computable by Turing machine.
Decision-problem status table (Regular, CFG, CSG, Type-0): Word ✓✓✓✗, Emptiness ✓✓✗✗, Finiteness ✓✓✗✗, Equivalence ✓✗✗✗.
Complexity classes (relations):
$P \subseteq NP \subseteq PSPACE \subseteq EXPTIME \subseteq EXPSPACE$ ; NP-hard, NP-complete = NP ∩ NP-hard.
Quantum
- Qubit state $|\psi\rangle=\alpha|0\rangle+\beta|1\rangle$ ; Bloch sphere.
- Algorithms: Shor (factorization) $\mathcal O((\log N)^3)$ on QC; Grover search $\mathcal O(\sqrt N)$ .
- Complexity class BQP; inclusion with P, NP unknown.

Logic

Propositional Logic

Connectives tables; tautology vs. contradiction vs. contingency.
Equivalences: De Morgan, implication elimination $P\Rightarrow Q \equiv \lnot P\lor Q$ .
Normal forms: CNF = AND of OR-clauses; DNF = OR of ANDs.
Inference rules (MP, MT, HS, DS, CD).

Predicate Logic

Predicates, variables, quantifiers $\forall,\exists$ ; translating natural language.
Prenex & Skolem normal forms for resolution.

Proof Calculi

Resolution: convert to clausal CNF, derive empty clause ◻; complete for unsat.
Tableau: alpha/beta expansion; closed tableau ⇒ unsat, open ⇒ model.

Program Analysis & Verification

Static vs. Dynamic analysis; symbolic execution; data-flow; model checking.
Hoare triples ${P}\;S\;{Q}$ , axiomatic semantics; inference rules for assignment, sequence, if, while.
Semantics descriptions
- Algebraic: equational reasoning.
- Operational: small-/big-step transitions.
- Denotational: mapping syntax to mathematical functions.
Abstract Interpretation
- Abstract domain (e.g., intervals); sound over-approximation.
- Framework: abstraction function, Galois connection, widening.
- Example: interval analysis of simple program to bound variable ranges.

Artificial Intelligence & Machine Learning

Paradigms

Supervised (labeled) vs. Unsupervised (unlabeled clustering).
Regression: predict continuous $y\in\mathbb R$ .
Classification: discrete labels; logistic function $g(z)=\frac1{1+e^{-z}}.$

Linear & Multiple Regression

Hypothesis $h_\theta(x)=\theta^Tx$ , cost $J(\theta)=\frac1{2m}\sum (h- y)^2$ .
Gradient descent $\thetaj:=\thetaj-\alpha\frac{\partial J}{\partial \theta_j}$ .

Logistic Regression / Softmax

Binary cost $J=-\frac1m\sum[y\log h+(1-y)\log(1-h)]$ .
One-vs-all for $K$ classes; choose $\text{argmax}_k\,h^{(k)}(x).$

Neural Networks

Neuron output $a=g(\theta^Tx)$ with bias $x_0=1$ .
Layers: input → hidden(s) → output; weight matrices $\Theta^{(l)}$ .
Forward propagation yields $h\Theta(x)$ ; back-prop + gradient descent minimize $J(\Theta)= -\frac{1}{m}\sum{i,k} yk^{(i)}\log hk^{(i)}+\frac\lambda{2m}\sum (\Theta)^2.$
Applications: malware e-mail multi-class detection, captcha OCR, intrusion detection.

Theoretical Computer Science & Machine Learning – Exam Notes

Algorithms and Data Structures

- Digital vs. Analog computers: discrete vs. continuous encoding; programmability unique to digital.

- Key questions: Why? (specification) – Does it work? (verification) – How efficient? (analysis).

- Choose structure by access pattern & resource constraints.

- Use cases: Euler’s Königsberg bridges, network attack graphs, BloodHound AD analysis.

- Searching: Linear O(n)\mathcal O(n)O(n), Binary O(log⁡n)\mathcal O(\log n)O(logn) (needs sorted), String matching – Naïve, KMP O(m+n)\mathcal O(m+n)O(m+n), Boyer-Moore.

Formal Languages & Automata

Computability, Decidability & Complexity

Logic

Propositional Logic

Predicate Logic

Proof Calculi

Program Analysis & Verification

Artificial Intelligence & Machine Learning

Paradigms

Linear & Multiple Regression

Logistic Regression / Softmax

Neural Networks

- Searching: Linear $\mathcal O(n)$ , Binary $\mathcal O(\log n)$ (needs sorted), String matching – Naïve, KMP $\mathcal O(m+n)$ , Boyer-Moore.