Looks like no one added any tags here yet for you.
graph
a set of edges connected by vertices
vertex/vertices
the points that connect each edge together
edges
the "links" between vertices
On a city map, ____ are vertices and ____ are edges
islands; bridges
valence/degee
the number of edges attached to a vertex
path
a sequence of vertices such that there is an edge between each vertex and the next one.
circuit
a path that starts and ends at the same vertex
eulerian circuit
a circuit that crosses each edge exactly once
A graph has a Eulerian circuit if and only if .....
the degree of all its vertices is even.
eulerian path
a path that crosses each edge exactly once.
A graph has a Eulerian path if and only if...
• either the degree of all its vertices is even (then it is even better: it has a Eulerian circuit); • or the degree of all its vertices except 2 of them is even (then it has a Eulerian path, but no Eulerian circuit)
hamiltonian circuit
a circuit that visits each vertex of the graph exactly once
hamiltonian path
a path that visits each vertex of the graph exactly once.
complete graph
a graph where there is exactly one edge between different pairs of vertices
weighted graph
a graph where there is a number attached to each edge
The weight of a path is the ....
sum of the weights on all edges encountered on this path (counted once for every passage)
traveling salesman problem
consists in finding a path / circuit that visits all vertices of a weighted graph (a Hamiltonian path / circuit); and that has the smallest weight.
brute force solution (for TSP)
Look at all possible Hamiltonian paths.
Check the weight of each of these paths.
Choose the one with the smallest weight.
n factorial
In a complete graph with n vertices, there are n! Hamiltonian paths Even if n is small, n! is humongous
nearest-neighbor algorithm
Start from any vertex.
Jump to the closest vertex (that is, the edge to this vertex has the smallest weight).
Then go to the closest vertex which has not been visited yet.
Continue until you have seen all the vertices.
If you want a circuit, eventually return to the starting vertex.
The nearest-neighbor algorithm provides a path with the following properties:
• It is usually “short”. • Its length can depend on the vertex that we start from. • Its length can depend of the choices that we make (if there are edges with the same length). • It usually does NOT provide a solution to the TSP (meaning, it is not always the shortest path).
sorted-edge algorithm
List the weight of all edges in increasing order.
Choose the first edge (the one with the smallest weight).
As long as not all vertices are included, add the next edge in the list (except if it makes three chosen edges meet at a vertex or it closes a circuit that does not include all the vertices)
When a Hamiltonian circuit has been created, we are done
The sorted-edge algorithm provides an approximate solution to the TSP. However...
-it is usually not the shortest path; -its length can depend of the choices that we make (if there are edges with the same length);
it only works to find a circuit
tree
a graph which contains no circuits
spanning tree
a tree that has only the edges of the original graph (but not always all the edges) and has all the vertices of the original graph
applications for a tree problem are:
Electrical grid: connect every home to the electrical grid at the minimum cost. Broadcasting: transmit one message to all recipients in the fastest way. Circuit board design: connect all parts at minimum cost.
How to obtain a spanning tree
• Start from the graph. • Remove edges one by one until no circuit remains.
The weight of a spanning tree is....
the sum of the weights of all its edges
minimum spanning tree
a spanning tree whose weight is minimum amongst all the possible spanning trees for a graph
kruskal's algorithm (MST)
List all the edges in increasing order of their weights.
At each step, add the next edge of the list if it does not create a circuit.
Stop when we get a spanning tree.
Prim's algorithm (MST)
Choose any vertex.
Look at all edges attached to this vertex, and choose the one with the minimum weight.
At each step, look at all the vertices that we saw, and all edges connected to these vertices, and choose the one with minimum weight, except if it creates a circuit.
Stop when we get a spanning tree
Prim’s algorithm is conceptually more _______ than Kruskal’s, but Kruskal’s needs to compare the weight of all the edges at the _______.
complicated; same time
coloring problem
consists in giving a color to each vertex, in such a way that two vertices that are connected by an edge have different colors.
chromatic number
the minimum number of colors needed to solve the vertexcoloring problem
how do you find the chromatic number?
Start from a vertex and give it a color.
Give a different color to a neighboring vertex.
Continue by going from neighboring vertex to neighboring vertex.
Try to use as few colors as possible: reuse colors as much as possible
planar graph
a graph that can be drawn without any edges intersecting
According to the four-color theorem, the chromatic number of a planar graph is at most ___
4
statistics
the art and science of • collecting data; • summarizing and analyzing data; • presenting data; • interpreting data
population
all the individuals for a problem
sample
all the individuals that we will actually gather data from
variables
the information that we gather from the experiment (ex: intended vote, grades, size, opinion, …)
discrete variable
an integer/whole number (1, 2, 3) ex: number of pets, number of classes taken
continuous variable
can be any number, not only whole; like 2.5, 3.9, etc ex: weight, height, price, size
how to find the frequency distribution of a variable:
Choose some non-overlapping intervals of values (for instance 0, 1 –2, 3 – 5, 6 – 10, more than 10).
Count how many times the variables takes such values.
how to find the relative frequency distribution of a variable:
Choose some non-overlapping intervals of values (for instance 0, 1 –2, 3 – 5, 6 – 10, more than 10).
Count the percentage of times the variables take such values.
histogram
a graphical representation of the distribution of a variable using bars of different heights.
outlier
an individual or data that does not fit the overall pattern
symmetric distribution
A distribution is symmetric if we can draw a vertical line on the histogram, and both sides are approximate mirror images of each other
skewed right distribution
A distribution is skewed right if the longer tail of the histogram is on the right side
skewed-left distribution
A distribution is skewed left if the longer tail of the histogram is on the left side
mean
the average of all the values
median
a list of values is a number such that half the values are higher than this number and half the values are lower than this number.
mode
the most frequently occurring value in a set of data
standard deviation
the average amount a value deviates from the mean; the square root of the variance
variance
The square of the standard deviation
how to find the variance and standard deviation
Compute the mean
Compute the deviations (x1-mean), (x2-mean),...
Compute the squared deviations (x1-mean)^2, (x2-mean)^2,....
Sum all these quantities.
Divide the result by n− 1 to get the variance.
Take the square root of the result to get the standard deviation
normal distribution
When the shape of a distribution is more or less "regular". • It looks like a bell curve. • The curve is an idealized version of the distribution
A normal distribution is characterized by two parameters:
The mean is where the curve reaches its maximum.
The standard deviation is the width of the curve, between the two points where it changes curvature (inflection points).
The 68-95-99.7 rule
For a normal distribution with a mean µ and a standard deviation σ: 68% of the values are between µ-σand µ+σ 95% of the values are between µ– 2σ and µ+ 2σ 99.7% of the values are between µ– 3σ and µ+ 3σ
For a normal distribution, less than ___% of the values are more than 2σ from the mean and less than ____% of the values are more than 3σ from the mean.
5%; 0.3%
distributions
we study one variable, and wish to analyze it: average value, spread of the values, etc.
relationships
we study two variables and wish to know whether they are related
response variable
measures the outcome of a study; is typically directly observed.
explanatory variable
a variable that explains the changes of a response variable; is typically hidden / not obvious.
correlation coefficient
a numerical value that measures the strength and direction of the relationship between two variables (usually denoted by r)
correlation
a relationship between two or more things
If r> 0, then it is a ________ association, if r< 0, then it is a ___________ association
positive; negative
The closer r is to 1 or -1, the more _______ associated/ correlated the variables are.
strongly
Correlation only makes sense for _______ values and ________ relationships
numerical; straight-line
If two variables are highly correlated, then....
• when one goes up, the other one goes up (r close to 1) • or when one goes up, the other one goes down (r close to -1)
correlation does not imply _______
causation
the equation of a line is
y=mx+b
m in a regression line equation is the _____
slope (rise over run; (y2-y1/x2-x1)
b is the ______ or a regression line equation
y-intercept
slope
rise over run (y2-y1/x2-x1)
(least-square)regression line
the line that fits the closest to all the points
how to find the least-square regression line
Take a straight line.
Compute the vertical distances from each point to the line.
Square these distances.
Sum all these quantities.
The line that makes this sum as small as possible is the least-square regression line
The regression line allows to make predictions only if the correlation is ______ (r close to -1 or +1) and only works for ______ correlations
high; linear
proportion
the number of times something happens out of the total outcomes
occam's razor
the principle that entities should not be multiplied needlessly; the simplest of two competing theories is to be preferred
statistical inference
the method of drawing conclusion about an entire population based on data from a sample
parameter
a fixed quantity that describes some characteristic of the population (for instance average height, proportion of women, result of a vote, opinion about hippos, etc.)
statistic
a quantity that describes some characteristic of a sample
central limit theorem
the larger the population, the closer the results will be to a normal distribution
confidence interval
the amount of confidence we have that a certain range of values will fall within 95% of the possible outcomes for that data
how to find the 95% confidence interval
find the approximate standard deviation (square root of p(1-0)/n)
take the percentage/probability you have and subract and add that by 2 times the approximate standard deviation ex: if your percentage is 56% and your approximate standard deviation is 5%, then you would calculate for 56%-(25%) and 56%+(25%)
Each poll comes with a _______, which is given by the standard deviation
Each poll comes with a margin of error
Sample Space (S)
the set of all possible outcomes of a random phenomenon
probability of an outcome (an element of the sample space)
the proportion of times the outcome occurs in an infinitely long series of repetitions
Probabilities can be expressed as....
decimals, percentages, or fractions
The probability of an outcome is between __ and ___ (inclusive)
Probability of a coin flip
For a coin flip: the probability of H or T is 1/2 = 0.5 = 50%
Probability of picking a card
the probability of each card (1S, 1H, …) is 1/52
event (E)
any set of outcomes of a random phenomenon; an event is the occurrence of something.
probability of events
For an event (E) we denote by P(E) the probability of E, that is, the proportion of times that the event E occurs when we repeat the same experiment infinitely many times
The goal of probability is to compute the _________: the proportion of time this event will occur when we repeat the _______ experiment infinitely many times
probability of events; same
property
The probability of an event is the sum of the individual probabilities of each outcome that it contains
The sum of the probabilities of all outcomes must be ____
1
The complement of an event A
-the event that happens when A does not occur; denoted by A^c -It contains all the outcomes of the sample space that; does not contain
The probability of the complement of an event is 1 minus the probability of the ______
event