1/99
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Residual
The difference between the actual value (y) and the predicted value (y-hat; ŷ); calculated as e = y - ŷ.
Least Squares Regression Line (LSRL)
The regression line that minimizes the sum of the squares of the residuals.
R² (R-squared)
A statistic that represents the proportion of the variance in the response variable that is explained by the regression line; values range from 0 to 1.
Assumptions for Regression
Conditions that must be met for regression analysis, including quantitative variable condition, straight enough condition, and no outliers condition.
Homoscedasticity
The condition where residuals have similar spread across the range of measured values.
Standard Error
A measure that summarizes the typical size of the residuals, serving as an estimate of the model's accuracy.
Probability
The long-run relative frequency of an event's occurrence, expressed as a number between 0 and 1.
Independent Events
Two events are independent if the occurrence of one does not affect the probability of the other occurring.
Conditional Probability
The probability of an event occurring given that another event has already occurred, expressed as P(B | A) = P(A∩B) / P(A).
Bernoulli Trials
A sequence of trials where each trial has exactly two outcomes: success or failure, and each trial is independent.
Binomial Model
A probability model for a random variable that counts the number of successes in a fixed number of Bernoulli Trials.
Complement Rule
The rule stating that the probability of the complement of an event A is given by P(A^C) = 1 - P(A).
General Addition Rule
The rule used when events are not disjointed, expressed as P(A ∪ B) = P(A) + P(B) - P(A ∩ B).
Simulation
The process of using random numbers to represent outcomes of uncertain events in a trial.
Sampling Distribution
The distribution of sample means that arises from taking multiple samples from a population.
Data
A collection of facts and statistics collected for reference or analysis.
Mean
The average value of a set of numbers, calculated by dividing the sum of the values by the number of values.
Median
The middle value in a list of numbers sorted in ascending order.
Mode
The value that appears most frequently in a data set.
Variance
A measure of how much the values in a data set differ from the mean.
Standard Deviation
A statistic that quantifies the amount of variation or dispersion in a set of values.
Population
The entire set of individuals or items that are of interest for a statistical study.
Sample
A subset of a population used to represent the entire group.
Hypothesis
A proposed explanation for a phenomenon, which can be tested through research and experimentation.
Null Hypothesis
A statement that there is no effect or difference, and it is the default position in statistical testing.
Alternative Hypothesis
The hypothesis that there is a significant effect or difference, contrary to the null hypothesis.
Type I Error
The error when the null hypothesis is rejected when it is actually true.
Type II Error
The error when the null hypothesis is not rejected when it is actually false.
Confidence Interval
A range of values that is likely to contain the population parameter with a specified level of confidence.
Regression Analysis
A statistical method for estimating the relationships among variables.
Correlation
A statistical measure that expresses the extent to which two variables are linearly related.
Outlier
A data point that significantly differs from other observations in the data set.
P-value
The probability of obtaining results at least as extreme as the observed results, assuming the null hypothesis is true.
Significance Level
A threshold for determining whether a result is statistically significant, often denoted as alpha (α).
z-score
A statistical measurement that describes a value's relation to the mean of a group of values.
Probability Distribution
A function that describes the likelihood of obtaining the possible values of a random variable.
Binomial Probability
The probability of getting exactly k successes in n Bernoulli trials.
Central Limit Theorem
A statistical theory stating that the distribution of sample means approaches a normal distribution as the sample size increases.
Skewness
A measure of the asymmetry of the probability distribution of a real-valued random variable.
Kurtosis
A measure of the 'tailedness' of the probability distribution of a real-valued random variable.
Chi-Square Test
A statistical test to determine if there is a significant association between categorical variables.
ANOVA (Analysis of Variance)
A statistical procedure for determining whether three or more group means are statistically significantly different from one another.
Time Series Data
Data points collected or recorded at specific time intervals.
Qualitative Data
Non-numeric information that represents categories or qualities.
Quantitative Data
Numeric information that can be measured and calculated.
Data Mining
The computational process of discovering patterns and knowledge from large amounts of data.
Sampling Error
The error caused by observing a sample instead of the whole population.
Non-Response Bias
Bias that occurs when individuals selected for a survey do not respond, and their characteristics differ from those who do respond.
Response Bias
A bias that occurs when participants give inaccurate or untruthful responses.
Survey
A method of gathering information from individuals, usually through questionnaires.
Census
A complete enumeration of a population, often used to collect demographic information.
Statistical Inference
The process of drawing conclusions about a population based on sample data.
Control Group
A group in an experiment that does not receive the treatment or intervention being studied.
Experimental Group
The group in an experiment that receives the treatment being tested.
Randomization
The process of randomly assigning participants to different groups in an experiment to reduce bias.
Field Experiment
An experimental study conducted in a real-world setting as opposed to a laboratory.
Longitudinal Study
Research that follows subjects over a period of time to observe changes.
Cross-Sectional Study
A study that examines a population at one specific point in time.
Causal Relationship
A relationship where one event causes another event to happen.
Statistical Significance
A determination that a result is unlikely to have occurred by chance if the null hypothesis is true.
Effect Size
A quantitative measure of the magnitude of a phenomenon.
Bias
Systematic errors that lead to incorrect conclusions in research.
Reliability
The consistency of a measure; a reliable measure produces the same results under consistent conditions.
Validity
The extent to which a test measures what it claims to measure.
Cohort
A group of individuals sharing a common characteristic, often used in research studies.
Reciprocal Causation
A situation where two variables influence each other mutually.
Meta-Analysis
A statistical technique for combining the findings from independent studies.
Observational Study
A study where researchers observe the subjects without manipulating variables.
Data Visualization
The graphical representation of data to help understand complex information.
Descriptive Statistics
Statistics that summarize or describe characteristics of a data set, including measures like mean, median, and mode.
Inferential Statistics
Methods that allow researchers to draw conclusions about a population based on a sample of data.
Normal Distribution
A symmetrical probability distribution where most observations cluster around the central peak, and probabilities for values further from the mean taper off equally in both directions.
Sampling Techniques
Methods used to select a sample from a population, including random sampling, stratified sampling, and cluster sampling.
Outlier Detection
The process of identifying and handling data points that deviate significantly from the overall pattern of data.
Multivariate Analysis
A set of statistical techniques used to analyze data that involves more than one variable.
Chi-Square Statistic
A measure used in statistical significance tests to determine if there is a significant association between categorical variables.
Coefficient of Determination
Another name for R², it indicates the proportion of the variance in the dependent variable that can be predicted from the independent variable(s).
Data Normalization
The process of adjusting values measured on different scales to a common scale.
Statistical Power
The probability that a statistical test will correctly reject a false null hypothesis; the ability to detect an effect if there is one.
Equation of a Least Squares Regression Line
The equation of the LSRL is typically written as ŷ = b₀ + b₁x, where b₀ is the y-intercept and b₁ is the slope.
Variance Formula
Variance is calculated using the formula: σ² = Σ(xᵢ - μ)² / N, where μ is the mean and N is the number of values.
Standard Deviation Formula
Standard Deviation (σ) is calculated as: σ = √(Σ(xᵢ - μ)² / N).
P-value Interpretation Rule
If P-value < α (significance level), reject the null hypothesis; if P-value ≥ α, fail to reject the null hypothesis.
Central Limit Theorem Equation
The CLT states that as the sample size (n) increases, the sampling distribution of the sample mean approaches a normal distribution, regardless of the population's distribution.
Binomial Probability Formula
The probability of getting exactly k successes in n trials is given by P(X = k) = (n choose k) * p^k * (1 - p)^(n - k), where p is the probability of success.
Confidence Interval Formula
A confidence interval for a population mean is given by: CI = ar{x} ± z*(σ/√n), where z* is the z-score corresponding to the desired confidence level.
General Addition Rule Equation
For two events A and B, P(A ∪ B) = P(A) + P(B) - P(A ∩ B).
Coefficient of Variation Formula
Coefficient of Variation (CV) = (σ / μ) * 100%, representing the ratio of the standard deviation to the mean.
Regression Equation for Simple Linear Regression
The regression equation is expressed as ŷ = b₀ + b₁x, where b₁ = r * (σy / σx) and b₀ = ȳ - b₁x̄.
Residual Plot
A graphical representation of the residuals plotted against predicted values (ŷ); used to check the assumptions of linear regression.
Standard Error of the Mean (SEM)
An estimate of the standard deviation of the sampling distribution of the sample mean; used to gauge the accuracy of sample mean estimates. Formula: SEM = σ/√n.
Multicollinearity
A situation in regression analysis where two or more independent variables are highly correlated, which can affect the stability of coefficient estimates.
Adjusted R²
A modified version of R² that adjusts for the number of predictors in the model; useful for comparing models with different numbers of predictors.
Logistic Regression
A regression model used when the dependent variable is binary; it predicts the probability that the outcome belongs to a particular category.
Power Analysis
A method used to determine the sample size required to detect an effect of a given size with a specified level of confidence; essential for study design.
Effect Size Interpretation Guidelines
Values of effect size can indicate the strength of the relationship; small (0.2), medium (0.5), and large (0.8) are commonly used thresholds.
Null Hypothesis Significance Testing (NHST)
A framework for hypothesis testing that assesses the evidence against a null hypothesis; used widely in statistical analyses.
Regression Coefficient Interpretation
In the regression equation, the slope (b₁) indicates the change in the dependent variable for a one-unit increase in the independent variable.
Box Plot
A graphical representation of the distribution of a data set through their quartiles; useful for identifying outliers and the spread of data.
Normality Tests
Statistical tests (e.g., Shapiro-Wilk, Kolmogorov-Smirnov) used to determine if a dataset follows a normal distribution, crucial for many inferential statistics.