1/87
Vocabulary terms and definitions from the COMP5270 assignment and exam terminology glossary version 2, covering notation, probability, randomized algorithms, hashing, graph optimization, linear programming, LSH, and streaming.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
[n]
The set {1,2,...,n}. Used as the item universe, element universe, vertex labels, and domains for SetCover and sampling.
S⊆T
S is a subset of T, meaning every element of S is also in T.
S∪T
Union of sets S and T; contains elements in S or T or both.
S∖T
Set difference; contains elements in S but not in T.
∥S∥
Cardinality, representing the number of elements in set S.
x∈{0,1}
A binary variable, usually representing yes/no or chosen/not chosen.
Pr[A]
The probability that event A occurs.
E[X]
Expectation, or average value, of random variable X.
Var(X)
Variance, a measure of spread around E[X].
1{event}
Indicator variable equal to 1 when the event happens and 0 otherwise.
X∼Bern(p)
Bernoulli random variable equal to 1 with probability p.
U∼Unif(0,1)
A uniformly random real number in the interval (0,1).
G∼N(0,Id)
A standard Gaussian random vector in d dimensions.
u.a.r.
Uniformly at random; every possible choice has the same probability.
with replacement
Samples are independent and the same object can be sampled again.
independently
Random choices do not influence each other.
O(f(n))
An asymptotic upper bound used for time, space, sample, and query complexity.
Ω(f(n))
An asymptotic lower bound.
Θ(f(n))
A tight asymptotic bound, both O and Ω.
ϵ
Accuracy parameter; controls additive or multiplicative error in approximation.
δ
Failure probability parameter, usually appearing inside log(1/δ) repetitions or sketch size.
OPT
The value of the true optimal solution.
OPT_LP
The optimum value of the LP relaxation, acting as a fractional benchmark before rounding.
x^\*
An optimal fractional LP solution.
⟨x,g⟩
Inner product of vectors x and g.
∥x−y∥_2
Euclidean distance between vectors x and y, used to distinguish near and far points.
hash function
A function mapping a large universe to a smaller range of buckets or values.
collision
Event where two distinct inputs receive the same hash value.
hash family
A set of hash functions from which one function is chosen randomly.
Bloom filter
A randomized dictionary structure with compact space and possible false positives, but no false negatives.
false positive
Occurs when the data structure says 'yes' even though the item is absent.
false negative
Occurs when the data structure says 'no' even though the item is present.
randomized algorithm
A randomized algorithm that may output a wrong answer with small probability.
Las Vegas algorithm
A randomized algorithm that always returns a correct answer but has random running time.
streaming algorithm
An algorithm that processes input items one at a time using space much smaller than the total stream.
one-pass model
The algorithm sees each stream item exactly once and cannot rewind.
reservoir sampling
Technique to maintain a uniform sample from a stream without knowing its final length in advance.
Bernoulli replacement rule
At time t, replacing a stored sample with probability 1/t or k/t to maintain a uniform sample.
CountMinSketch
A frequency estimation sketch with one-sided overestimation guarantees.
CountSketch
A streaming sketch using random signs (±1) to estimate frequencies, suitable for heavy hitters.
Frequent Elements
The problem of finding elements in a stream whose counts are above a certain threshold.
Approximate Nearest Neighbour (ANN)
Task to return a point close to the query mapping, allowing an approximation factor C.
Hamming space
Space of strings or vectors where distance counts the number of differing coordinates.
Hamming distance
The number of coordinates where two strings or vectors differ.
Euclidean space
Vector space Rd equipped with Euclidean distance.
Locality-Sensitive Hashing (LSH)
Hashing where close points have a higher probability of collision than far points.
(r, C, p, q)-LSH family
An LSH family where distances that are ≤r collide with probability ≥p, and distances that are ≥Cr collide with probability ≤q.
sensitivity parameter ρ
A parameter controlling LSH space/query complexity, often ρ=log(1/p)/log(1/q).
approximation factor C
In LSH, how much farther than the true near radius r the algorithm may accept.
collision probability
The probability that h(x)=h(y). LSH makes this larger for near pairs.
Gaussian projection
Projecting vectors onto a random Gaussian direction to reduce Euclidean distance to one-dimensional analysis.
random shift
Adding a random value U∼Unif(0,1) before taking the floor in an LSH hash.
bucket width w
The interval width in the Euclidean LSH hash: h_g,u(x)=⌊⟨x,g⟩/w+u⌋.
Baby ANN
A simpler ANN data structure used as a building block for the final Adult ANN.
doubling search
Trying geometrically increasing guesses for an unknown distance scale.
monotonic function
A function that either only increases or only decreases.
Taylor expansion
Approximation of a function by a power series, used to justify asymptotic behavior of ρ.
random cut
A cut produced by assigning vertices to sides randomly; expected to cut half the edges.
MaxCut
The problem of partitioning vertices to maximise the number of crossing edges.
Min-Cut
The task of finding a cut with the minimum number or weight of crossing edges.
complete graph K_n
A graph on n vertices where every pair of vertices is connected by an edge.
Minimum Spanning Tree
A minimum-weight tree connecting all vertices of a weighted connected graph.
independent set
A set of vertices in which no two vertices are connected by an edge.
integer linear program (ILP)
Optimization problem with linear objective, linear constraints, and integer variables.
feasible solution
An assignment of variables that satisfies all constraints of an optimization problem.
objective function
The quantity (such as cost or size) being maximised or minimised.
constraint
A condition every feasible solution must satisfy.
randomized rounding
Converting fractional LP values into random integer decisions using those values as probabilities.
SetCover
The problem of choosing sets whose union covers all elements while minimising total cost.
cost(C)
Total cost of the selected sets or objects.
Knapsack
NP-Hard problem to choose items under a budget/weight limit to maximise value.
NP-Hard
A class of problems at least as hard as the hardest problems in NP.
approximation factor
A measure of how close an algorithm's solution value is to the optimum.
coverage function
A function, often denoted Q(S), counting how many elements are covered by a chosen set.
constant-factor approximation
An approximation by a fixed constant factor independent of input size n.
Approximate Nearest Neighbour (ANN)
Task to return a point close to the query mapping, allowing an approximation factor C.
random bits
The finite random string used by an algorithm; if r bits are used, there are only 2r executions.
random subset
A subset drawn from all possible subsets, usually uniformly.
random permutation
A uniformly random ordering of n distinct items.
shuffle algorithm
An algorithm for producing a random permutation.
sampling with replacement
Every draw is independent and can repeat earlier objects.
indicator variable
A 0/1 variable for whether an event occurs, used to count covered elements, collisions, or successes.
empirical average
The average of observed samples.
failure event
The bad event whose probability must be controlled.
amplification
Repeating independent trials to reduce failure probability.
with high probability
Success probability close to 1, often 1−1/poly(n) or a stated constant such as 99%.
constant probability
A success probability bounded away from 0, such as 1/2, 2/3, or 9/10.
union bound
Pr[any bad event]≤sum of individual bad-event probabilities.