1/216
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What does EDA stand for?
Exploratory Data Analysis
What is a distribution in statistics?
The frequency or probability of each value a variable takes on.
Name two common graphical methods used in EDA.
Histogram and Boxplot.
What does a histogram represent?
The distribution of a continuous variable using discrete bins.
What is the difference between frequency and relative frequency?
Frequency is the count of occurrences, while relative frequency is the percentage of the total that this count represents.
What is the purpose of a smoothed curve on a histogram?
To represent probabilities or proportions more conveniently.
What should you look for when interpreting graphs?
Overall patterns and striking deviations (outliers) from that pattern.
What does skewness describe?
The symmetry of a distribution and the relationship between mean and median.
What is a positively skewed distribution?
A distribution where the mean is greater than the median.
What is a stemplot?
A combined tabular and graphic display showing the distribution of a variable.
How is the median defined?
The midpoint of all values of a variable when sorted from smallest to largest.
What do quartiles represent?
They divide the observations into quarters, indicating the distribution of data points.
What is the mean?
The ordinary average of the observations.
How is standard deviation calculated?
By taking the average of the squared differences from the mean and then taking the square root.
What does a standard deviation of 0 indicate?
No variability at all in the data.
What is the relationship between mean and median in skewed distributions?
The mean can be pulled away from the median due to extreme values (outliers).
What is the purpose of using counts and percentages in data analysis?
To compare different variables or groups more easily.
What is an outlier?
An individual observation that falls outside the overall pattern of the graph.
What does variability refer to in statistics?
The spread of data points around the mean.
How do you find the median in an even number of observations?
By taking the average of the two middle values.
What does the first quartile (Q1) represent?
The 25th percentile of the data.
What does the third quartile (Q3) represent?
The 75th percentile of the data.
What is the significance of the second quartile (Q2)?
It is also called the median and represents the 50th percentile.
What is the effect of increasing variability on standard deviation?
As variability increases, the standard deviation becomes larger.
What is the formula for calculating the mean?
Mean = sum of observations / number of observations.
What does a relative frequency histogram show?
The percentage of times a value occurs in relation to the total number of observations.
What does the median indicate in a data set?
The median gives an idea of the rank of a data point among all the observations.
How does the mean differ from the median in terms of sensitivity to extreme values?
The mean is strongly influenced by a few extreme observations, while the median is not.
What are the components of the five-number summary?
The five-number summary consists of the smallest observation, first quartile, median, third quartile, and largest observation.
What does a boxplot represent?
A boxplot is a graph of the five-number summary, showing the inter-quartile range and median.
How are outliers represented in a boxplot?
Outliers are shown as dots above and below the whiskers of the boxplot.
What defines a data point as an outlier in relation to the inter-quartile range?
A data point is considered an outlier if it is more than 1.5 times the IQR away from the quartiles.
What is exploratory data analysis (EDA)?
EDA uses graphs and tables of data summaries to understand patterns, distributions, and variability in data.
What is a density curve?
A density curve is a smoothed approximation of a histogram, used to make statistical calculations easier.
What does the area under a density curve represent?
The area under a density curve represents proportions of the total number of observations.
What is the 68-95-99.7% rule in normal distribution?
Approximately 68% of observations fall within one standard deviation, 95% within two, and 99.7% within three standard deviations of the mean.
What is the mean and standard deviation of the heights of young women aged 18 to 24?
Mean is 63.7 inches and standard deviation is 2.5 inches.
How can a normal distribution be standardized?
By converting all values into a z-score, which has a mean of 0 and standard deviation of 1.
What does a positive z-score indicate?
A positive z-score indicates the observation is above the mean.
How do you calculate a z-score?
Z-score is calculated as (observation - mean) / standard deviation.
What is the significance of the median in terms of percentiles?
The median is the 50th percentile of any distribution.
What does the term 'quartiles' refer to?
Quartiles are the 25th and 75th percentiles of a distribution.
What is the relationship between mean and median in a normal distribution?
In a normal distribution, the mean and median are equal.
What happens to the shape of a normal distribution when the standard deviation increases?
A greater standard deviation makes the normal distribution wider and flatter, indicating more variability.
What is the purpose of using tables with z-scores?
Tables are used to convert between z-scores and percentiles, showing percentages of data below a given z-score.
What does the term 'standardized normal distribution' refer to?
It refers to a normal distribution that has been standardized to have a mean of 0 and a standard deviation of 1.
What is the significance of the first and third quartiles in a boxplot?
They represent the boundaries of the inter-quartile range (IQR), which contains the middle 50% of the data.
What is the effect of changing the mean of a normal distribution?
Changing the mean shifts the normal distribution along the axis but does not change its shape.
What is the relationship between the total area under a probability distribution curve and the probabilities?
The total area under the curve equals 100%, representing the total probability.
How is the variability of a dataset reflected in a boxplot?
Variability is reflected in the length of the box and the whiskers, indicating the spread of the data.
What is the purpose of summary statistics in EDA?
Summary statistics provide a concise overview of the main characteristics of the data, such as mean, median, and standard deviation.
What does it mean if a distribution is described as 'symmetrical'?
A symmetrical distribution has equal halves on either side of the center point.
What is the purpose of a z-score table?
To find the corresponding percentile for a given z-score and vice versa.
What percentile corresponds to a z-score of 1.5?
93rd percentile.
What is the z-score at the 42nd percentile?
-0.2.
How does probability model random events?
It describes the regular and predictable pattern of chance behavior in the long run.
What does a probability of 0 indicate?
The outcome never occurs.
What does a probability of 1/2 indicate?
The outcome occurs half the time in a very long series of trials.
What is the Law of Large Numbers?
In a large number of independent repetitions, averages become more stable while sums become more variable.
What is an example of personal probability?
Joe estimating a 10% chance that the Chicago Cubs will go to the World Series next year.
What is the probability of a man aged 20-24 dying in one year according to the 2013 report?
0.0012 or 0.12%.
What is the probability of a woman aged 20-24 dying in one year according to the 2013 report?
0.0004 or 0.04%.
Why do individuals often misinterpret probabilities?
Psychological factors, emotions, and cognitive biases affect how we perceive risk.
What is the risk of dying in a car accident during a lifetime of driving?
About 15,000 out of 1,000,000.
What is the risk of a teacher getting cancer from asbestos exposure?
About 15 out of 1,000,000.
What is the gambler's fallacy?
The false belief that past random events affect future outcomes.
What does it mean for an event to be random?
Individual outcomes are uncertain, but there is a regular distribution of outcomes in many repetitions.
How does short-run regularity differ from long-run regularity in probability?
Short-run outcomes can appear random and irregular, while long-run outcomes show predictable patterns.
What happens to the proportion of heads in coin tosses as the number of trials increases?
It approaches 0.5.
What is a common misconception about randomness?
That random phenomena should also be regular in the short run.
Why do people worry about unlikely threats more than likely ones?
They feel a lack of control over certain risks, leading to heightened concern.
What is the effect of cognitive biases on risk perception?
They can lead to overestimating small risks and underestimating larger risks.
What is the relationship between personal probabilities and theoretical probabilities?
Personal probabilities reflect subjective belief, while theoretical probabilities are based on long-run frequency.
What is an example of how psychological factors can distort risk assessment?
People may fear tornadoes more than heart attacks, despite the latter being statistically more dangerous.
What is the importance of understanding probability in decision-making?
It helps individuals make informed choices, especially in high-risk situations.
What influences our perception and actions regarding probability information?
Emotions, personal experiences, and cognitive biases such as overconfidence or fear of loss.
What does expected value represent?
The average (expected) value of an outcome variable.
How is expected value calculated?
By multiplying each outcome by its probability and then adding all the products.
What is the formula for expected value?
Expected value = a1p1 + a2p2 + … + akpk, where ai are outcomes and pi are their probabilities.
In the example of a three-digit number game, what is the expected value of playing?
$0.25 per game.
What is the expected value when considering the cost to play in the three-digit number game?
On average, you would lose $0.25 per game.
What is the average number of motor vehicles per American household based on 2017 data?
1.85 vehicles per household.
How do casinos ensure a positive expected return?
By calculating the expected values of their games.
What does the law of large numbers state?
The mean of observed outcomes approaches the expected value as the number of trials increases.
How do insurance companies operate similarly to casinos?
They bet that policyholders will not die and set premiums based on predicted payouts.
What is a probability model?
A mathematical representation that specifies the possible outcomes and their probabilities.
What is the probability of a woman being not married if the probabilities of marital status are given?
P(not married) = P(never married) + P(widowed) + P(divorced).
What are the rules regarding probabilities?
Any probability is between 0 and 1, and the sum of probabilities for all outcomes must equal 1.
What is the complement of an event?
It includes all outcomes not in the event of interest.
What do the odds against winning measure?
The relative probability of losing compared to winning.
How are betting odds typically expressed?
In the form 'Y to Z', indicating the ratio of losing to winning probabilities.
What does a probability of 0.478 for 'never married' indicate?
It represents the proportion of women aged 25 to 29 who have never married.
What is the expected value of a game where you pay $0.50 and have a chance to win $250?
$0.25, considering the probabilities of winning and losing.
What happens to your average winnings if you keep increasing your bets without infinite funds?
Your average winnings (expected value) remain the same, and you will eventually go broke.
What is the expected value of the outcomes 0, 1, 2, 3, 4, 5, and 6 vehicles with their respective probabilities?
1.85 vehicles per household.
What is the probability of being married if the probability of 'never married' is 0.478?
P(married) = 1 - P(never married) = 1 - 0.478 = 0.522.
What is the significance of the expected value in statistical reasoning?
It represents the long-run average outcome if the experiment is repeated many times.
What is the relationship between the probabilities of mutually exclusive events?
The probability that one or the other occurs is the sum of their individual probabilities.
How do casinos protect themselves from players trying to exploit betting strategies?
By implementing restrictions such as caps on bets.
What does it mean if the expected value is a decimal?
It represents a long-run average over many observations, not a single outcome.