Looks like no one added any tags here yet for you.
What is Bayes' Theorem and when is it used?
a formula that allows you to update the probability of an event based on new evidence. It's used when you know P(A|B) and need to find P(B|A), essentially flipping conditional probabilities to revise your beliefs.
State the formula for Bayes' Theorem
P(B/A) = P(A/B) * P(B) / P(A)
What does P(A|B) represent in Bayes' Theorem
The probability of event A occurring given that event B has occurred
When should you use the Law of Total Probability
When you need to find the total probability of an event that can occur through multiple, mutually exclusive pathways. It sums the probabilities of all possible ways the event can happen
State the Law of Total Probability formula
P(A) = sumation P(A|B_i) * P(B_i) for all mutually exclusive events B_i
Explain what a tree diagram is and its purpose
A visual tool that maps out all possible outcomes of events in branches. It helps simplify complex probability problems by clearly displaying conditional probabilities and pathways
List the steps to draw a tree diagram for a probability scenario
Start from a single point representing the initial event.
Draw branches for each possible outcome of this event.
From each branch, add subsequent branches for additional events.
Label each branch with the corresponding probability.
Multiply probabilities along the branches to find joint probabilities.
How does Bayes' Theorem relate to tree diagrams
visually represent the conditional probabilities needed, making it easier to identify and compute P(A), P(B), P(A|B), and P(B|A)
What are mutually exclusive and exhaustive events, and why are they important for the Law of Total Probability
Mutually exclusive events cannot occur simultaneously, and exhaustive events cover all possible outcomes. They ensure that all pathways are accounted for when calculating total probabilities
What is the Binomial Distribution and when should you use it
models the number of successes in a fixed number of independent Bernoulli trials, each with the same probability of success. It is used when there are two possible outcomes (success or failure) in experiments like coin flips or quality control tests.
What are the four conditions required for a Binomial Distribution
1) a fixed number of trials, 2) each trial is independent, 3) there are two possible outcomes (success or failure), and 4) the probability of success remains constant across trials.
What is the general formula for the Binomial Probability of exactly x successes in n trials
The general formula is P(X = x) = (n, x) * px * (1-p)(n-x), where (n, x) is the number of combinations of n trials taken x at a time, p is the probability of success, and (1-p) is the probability of failure.
In RStudio, which function gives you the exact binomial probability of observing exactly x successes
dbinom(x, size = n, prob = p).
Which RStudio function computes the cumulative probability up to x successes in a binomial setting
pbinom(x, size = n, prob = p)
How do you decide between using dbinom()
and pbinom()
in RStudio
Use dbinom()
when you need the probability of exactly x successes. Use pbinom()
when you need the probability of x or fewer successes (cumulative probability)
Given a probability scenario, how do you choose the correct form of the binomial formula to find the desired probability
Exactly x successes: Use the standard binomial formula or dbinom(x, n, p).
At most x successes (≤ x): Use cumulative probability with pbinom(x, n, p).
At least x successes (≥ x): Calculate 1 - pbinom(x - 1, n, p)
What is the Empirical Rule (68-95-99.7 Rule) in statistics
68% of data falls within ±1 standard deviation from the mean.
95% of data falls within ±2 standard deviations from the mean.
99.7% of data falls within ±3 standard deviations from the mean
What is a z-score and how do you calculate it
measures how many standard deviations an individual data point (x) is from the mean (μ**). It's calculated as:
z=(x−μ)/σ where σ is the standard deviation of the dataset.
How do you interpret a z-score
Positive z-score: The data point is above the mean.
Negative z-score: The data point is below the mean.
Magnitude: Indicates how far and in what direction the data point deviates from the mean
How do you "unstandardize" a z-score to find the original observed value (x)
x=μ+z×σ where μ is the mean and σ is the standard deviation of the dataset.
What does the RStudio function pnorm(x, μ, σ)
compute
It gives the probability that a randomly selected value is less than or equal to x.
How do you use qnorm(pct, μ, σ)
in RStudio
finds the value x such that a given percentage (pct**) of the data falls below x
What are the steps to calculate and interpret a z-score
Calculate the z-score: z=x−μ/σ
Interpret:
If z>0z > 0, xx is above the mean.
If z<0z < 0, xx is below the mean.
The larger the absolute value of zz, the further xx is from the mean
What is a "sampling distribution," and what does it represent conceptually
A sampling distribution is the probability distribution of a statistic obtained through repeated sampling from a population. It represents how the statistic varies from sample to sample, illustrating the sampling variability.
What are the two conditions that ensure the sampling distribution of x bar will be approximately Normal in shape (Only one must be met)
The population distribution is Normal: If the population from which samples are drawn is normally distributed, then the sampling distribution of xˉ\bar{x} will also be Normal, regardless of sample size.
Large sample size (Central Limit Theorem applies): If the sample size nn is large (typically n≥30n \geq 30), the sampling distribution of xˉ\bar{x} will be approximately Normal, even if the population distribution is not Normal
Given a population mean μ\mu and standard deviation σ\sigma, what are the shape, mean, and standard error of the sampling distribution of x bar for random samples of size n?
Shape: Approximately Normal if the population is Normal or nn is large (due to the CLT).
Mean of the sampling distribution (μxˉ\mu_{\bar{x}}): Equal to the population mean μ\mu.
Standard error σxˉ=σ/ Square Root of n: Equal to the population standard deviation divided by the square root of the sample size
How can you use a sampling distribution to find the probability that a random sample has a mean within a certain range
You can use the sampling distribution to determine the probability by calculating the z-scores for the sample mean within that range and using the standard normal distribution to find the corresponding probabilities.
Why does a larger sample size nn lead to a sampling distribution that is more closely Normal
Because as n increases, the influence of individual data points diminishes, and the aggregate effect smooths out irregularities, resulting in a sampling distribution that approaches Normality due to the Central Limit Theorem
How do you calculate a confidence interval for an unknown population mean using a t critical value and a sample of data
Confidence Interval=x bar±t∗(s/ Square Root of n) where x bar is the sample mean, t∗ is the critical value from the t-distribution, s is the sample standard deviation, and n is the sample size
How is the t critical value related to the level of confidence of the interval
The t* critical value corresponds to the desired confidence level and degrees of freedom (df = n - 1). A higher confidence level requires a larger t* value, resulting in a wider confidence interval. This ensures that the interval has a higher probability of containing the true population mean.The t critical value increases as the confidence level rises, reflecting the trade-off between confidence and precision in estimating the population mean.
Why might there be two different t critical values for a 95% confidence interval
Because the t* critical value depends on the degrees of freedom (df), which is based on the sample size (n). Different sample sizes result in different degrees of freedom:
Smaller sample sizes (lower df): t-distribution is wider; larger t* value.
Larger sample sizes (higher df): t-distribution approaches the standard Normal distribution; smaller t* value.
Therefore, for the same confidence level (e.g., 95%), t* varies with df
How do you interpret a 95% confidence interval
means that we are 95% confident that the true population mean lies within the interval. If we were to take many random samples and compute a confidence interval from each, approximately 95% of those intervals would contain the true population mean
How does sample size affect the width of the confidence interval
Increasing the sample size decreases the standard error (s/ Square Root of n), which narrows the confidence interval. Conversely, a smaller sample size increases the standard error, resulting in a wider interval. So:
Larger n: Narrower interval
Smaller n: Wider interval
How does the confidence level affect the width of the confidence interval
Higher confidence level: Wider interval
Lower confidence level: Narrower interval
How do you write the null and alternative hypotheses for a research question
Null Hypothesis (H0H_0): States that there is no effect or no difference. It's a statement of equality (e.g., μ=μ0\mu = \mu_0).
Alternative Hypothesis (HaH_a): Represents what you're trying to prove—there is an effect or a difference. It can be one-sided (e.g., μ>μ0\mu > \mu_0 or μ<μ0\mu < \mu_0) or two-sided (e.g., μ≠μ0\mu \neq \mu_0).
What does a test statistic measure conceptually in hypothesis testing
It quantifies the difference between the observed sample statistic and the parameter stated in the null hypothesis, relative to the standard error. Essentially, it measures how many standard errors the sample result is from the null hypothesis value
What is a p-value, and how do you interpret it
The probability of observing a test statistic at least as extreme as the one calculated, assuming the null hypothesis is true. A small p-value (typically ≤ α) indicates strong evidence against the null hypothesis, leading you to reject H0H_0. A large p-value suggests insufficient evidence to reject H0H_0.
How is the significance level (α\alpha) of a hypothesis test related to the t critical value
The significance level α\alpha determines the threshold for rejecting H0H_0. The t* critical value corresponds to this α\alpha in the t-distribution with appropriate degrees of freedom. It's the cutoff point beyond which we consider results statistically significant.
Given a t critical value, how do you decide whether to reject or fail to reject the null hypothesis
Compare the calculated test statistic to the t critical value. If the test statistic exceeds the t critical value, you reject the null hypothesis; if not, you fail to reject it. t=x bar−μo/s/ Sqare Root of n
Why is "failing to reject" the null hypothesis not the same as "accepting" it
Because failing to reject H0H_0 simply means there's not enough evidence against it—it doesn't prove H0H_0 is true. We can never confirm the null hypothesis; we can only gather evidence to reject or not reject it
What are α\alpha, β\beta, and power in the context of hypothesis testing
α\alpha (Type I Error): The probability of incorrectly rejecting a true null hypothesis (a false positive).
β\beta (Type II Error): The probability of failing to reject a false null hypothesis (a false negative).
Power (1 - β\beta): The probability of correctly rejecting a false null hypothesis—detecting an effect when there is one
How are α\alpha, β\beta, and power related
Increasing α\alpha: Decreases β\beta and increases power—you’re more likely to detect an effect but also more likely to make a Type I error.
Decreasing α\alpha: Increases β\beta and decreases power—you’re less likely to make a Type I error but more likely to miss detecting an effect.
There's a trade-off between α\alpha and β\beta; adjusting one affects the other, impacting the test's power.
How does sample size affect α\alpha, β\beta, and power
Larger Sample Size:
β\beta: Decreases (less likely to miss detecting an effect).
Power: Increases (more sensitive test).
α\alpha: Remains unchanged (set by the researcher).
A bigger sample provides more information, reducing variability and making it easier to detect true effects.
How does effect size influence α\alpha, β\beta, and power
Larger Effect Size:
β\beta: Decreases (easier to detect a true effect).
Power: Increases (higher chance of detecting the effect).
α\alpha: Unaffected directly but a larger effect size makes it easier to achieve significance at a given α\alpha.
What is the purpose of a one-sample t-test
The one-sample t-test assesses whether the mean of a single sample significantly differs from a known or hypothesized population mean. It helps determine if the observed sample mean is statistically different from the population mean due to chance or reflects a true effect.
Given a research question and sample summary statistics, how do you conduct a one-sample t-test
State the Hypotheses: Formulate the null and alternative hypotheses.
Calculate the Test Statistic: Use the sample mean, population mean, sample standard deviation, and sample size.
Determine the Critical Value: Based on the significance level (α\alpha) and degrees of freedom (df=n−1df = n - 1).
Compare Test Statistic to Critical Value: Decide whether to reject or fail to reject the null hypothesis.
Draw a Conclusion: Interpret the results in the context of the research question.
How do you state the null (H0H_0) and alternative (HaH_a) hypotheses for a one-sample t-test
Null Hypothesis (H0H_0): μ=μ0\mu = \mu_0 The population mean equals the hypothesized mean.
Alternative Hypothesis (HaH_a): Could be one of the following based on the research question:
Two-tailed: μ≠μ0\mu \neq \mu_0 The population mean is not equal to the hypothesized mean.
Left-tailed: μ<μ0\mu < \mu_0 The population mean is less than the hypothesized mean.
Right-tailed: μ>μ0\mu > \mu_0 The population mean is greater than the hypothesized mean
μ\mu: Actual population mean
μ0\mu_0: Hypothesized population mean
What is the formula for the test statistic in a one-sample t-test, and what does each symbol represent
The formula for the test statistic in a one-sample t-test is t = (\bar{x} - \mu_0) / (s / \sqrt{n}), where \bar{x} is the sample mean, \mu_0 is the population mean, s is the sample standard deviation, and n is the sample size, t is the test statistic, and uo is the hypothesized population mean.
Provide an example of a full conclusion for a one-sample t-test
Based on the sample data, the calculated t-statistic is 2.35 with 24 degrees of freedom. Since the p-value (0.026) is less than the significance level (α=0.05\alpha = 0.05), we reject the null hypothesis. This suggests that the true population mean significantly differs from the hypothesized mean. Therefore, we conclude that [insert context-specific conclusion]