Untitled Flashcards Set

0.0(0)

Studied by 0 people

View linked note

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/99

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

100 Terms

New cards

Residual

The difference between the actual value (y) and the predicted value (y-hat; ŷ); calculated as e = y - ŷ.

New cards

Least Squares Regression Line (LSRL)

The regression line that minimizes the sum of the squares of the residuals.

New cards

R² (R-squared)

A statistic that represents the proportion of the variance in the response variable that is explained by the regression line; values range from 0 to 1.

New cards

Assumptions for Regression

Conditions that must be met for regression analysis, including quantitative variable condition, straight enough condition, and no outliers condition.

New cards

Homoscedasticity

The condition where residuals have similar spread across the range of measured values.

New cards

Standard Error

A measure that summarizes the typical size of the residuals, serving as an estimate of the model's accuracy.

New cards

Probability

The long-run relative frequency of an event's occurrence, expressed as a number between 0 and 1.

New cards

Independent Events

Two events are independent if the occurrence of one does not affect the probability of the other occurring.

New cards

Conditional Probability

The probability of an event occurring given that another event has already occurred, expressed as P(B | A) = P(A∩B) / P(A).

New cards

Bernoulli Trials

A sequence of trials where each trial has exactly two outcomes: success or failure, and each trial is independent.

New cards

Binomial Model

A probability model for a random variable that counts the number of successes in a fixed number of Bernoulli Trials.

New cards

Complement Rule

The rule stating that the probability of the complement of an event A is given by P(A^C) = 1 - P(A).

New cards

General Addition Rule

The rule used when events are not disjointed, expressed as P(A ∪ B) = P(A) + P(B) - P(A ∩ B).

New cards

Simulation

The process of using random numbers to represent outcomes of uncertain events in a trial.

New cards

Sampling Distribution

The distribution of sample means that arises from taking multiple samples from a population.

New cards

Data

A collection of facts and statistics collected for reference or analysis.

New cards

Mean

The average value of a set of numbers, calculated by dividing the sum of the values by the number of values.

New cards

Median

The middle value in a list of numbers sorted in ascending order.

New cards

Mode

The value that appears most frequently in a data set.

New cards

Variance

A measure of how much the values in a data set differ from the mean.

New cards

Standard Deviation

A statistic that quantifies the amount of variation or dispersion in a set of values.

New cards

Population

The entire set of individuals or items that are of interest for a statistical study.

New cards

Sample

A subset of a population used to represent the entire group.

New cards

Hypothesis

A proposed explanation for a phenomenon, which can be tested through research and experimentation.

New cards

Null Hypothesis

A statement that there is no effect or difference, and it is the default position in statistical testing.

New cards

Alternative Hypothesis

The hypothesis that there is a significant effect or difference, contrary to the null hypothesis.

New cards

Type I Error

The error when the null hypothesis is rejected when it is actually true.

New cards

Type II Error

The error when the null hypothesis is not rejected when it is actually false.

New cards

Confidence Interval

A range of values that is likely to contain the population parameter with a specified level of confidence.

New cards

Regression Analysis

A statistical method for estimating the relationships among variables.

New cards

Correlation

A statistical measure that expresses the extent to which two variables are linearly related.

New cards

Outlier

A data point that significantly differs from other observations in the data set.

New cards

P-value

The probability of obtaining results at least as extreme as the observed results, assuming the null hypothesis is true.

New cards

Significance Level

A threshold for determining whether a result is statistically significant, often denoted as alpha (α).

New cards

z-score

A statistical measurement that describes a value's relation to the mean of a group of values.

New cards

Probability Distribution

A function that describes the likelihood of obtaining the possible values of a random variable.

New cards

Binomial Probability

The probability of getting exactly k successes in n Bernoulli trials.

New cards

Central Limit Theorem

A statistical theory stating that the distribution of sample means approaches a normal distribution as the sample size increases.

New cards

Skewness

A measure of the asymmetry of the probability distribution of a real-valued random variable.

New cards

Kurtosis

A measure of the 'tailedness' of the probability distribution of a real-valued random variable.

New cards

Chi-Square Test

A statistical test to determine if there is a significant association between categorical variables.

New cards

ANOVA (Analysis of Variance)

A statistical procedure for determining whether three or more group means are statistically significantly different from one another.

New cards

Time Series Data

Data points collected or recorded at specific time intervals.

New cards

Qualitative Data

Non-numeric information that represents categories or qualities.

New cards

Quantitative Data

Numeric information that can be measured and calculated.

New cards

Data Mining

The computational process of discovering patterns and knowledge from large amounts of data.

New cards

Sampling Error

The error caused by observing a sample instead of the whole population.

New cards

Non-Response Bias

Bias that occurs when individuals selected for a survey do not respond, and their characteristics differ from those who do respond.

New cards

Response Bias

A bias that occurs when participants give inaccurate or untruthful responses.

New cards

Survey

A method of gathering information from individuals, usually through questionnaires.

New cards

Census

A complete enumeration of a population, often used to collect demographic information.

New cards

Statistical Inference

The process of drawing conclusions about a population based on sample data.

New cards

Control Group

A group in an experiment that does not receive the treatment or intervention being studied.

New cards

Experimental Group

The group in an experiment that receives the treatment being tested.

New cards

Randomization

The process of randomly assigning participants to different groups in an experiment to reduce bias.

New cards

Field Experiment

An experimental study conducted in a real-world setting as opposed to a laboratory.

New cards

Longitudinal Study

Research that follows subjects over a period of time to observe changes.

New cards

Cross-Sectional Study

A study that examines a population at one specific point in time.

New cards

Causal Relationship

A relationship where one event causes another event to happen.

New cards

Statistical Significance

A determination that a result is unlikely to have occurred by chance if the null hypothesis is true.

New cards

Effect Size

A quantitative measure of the magnitude of a phenomenon.

New cards

Bias

Systematic errors that lead to incorrect conclusions in research.

New cards

Reliability

The consistency of a measure; a reliable measure produces the same results under consistent conditions.

New cards

Validity

The extent to which a test measures what it claims to measure.

New cards

Cohort

A group of individuals sharing a common characteristic, often used in research studies.

New cards

Reciprocal Causation

A situation where two variables influence each other mutually.

New cards

Meta-Analysis

A statistical technique for combining the findings from independent studies.

New cards

Observational Study

A study where researchers observe the subjects without manipulating variables.

New cards

Data Visualization

The graphical representation of data to help understand complex information.

New cards

Descriptive Statistics

Statistics that summarize or describe characteristics of a data set, including measures like mean, median, and mode.

New cards

Inferential Statistics

Methods that allow researchers to draw conclusions about a population based on a sample of data.

New cards

Normal Distribution

A symmetrical probability distribution where most observations cluster around the central peak, and probabilities for values further from the mean taper off equally in both directions.

New cards

Sampling Techniques

Methods used to select a sample from a population, including random sampling, stratified sampling, and cluster sampling.

New cards

Outlier Detection

The process of identifying and handling data points that deviate significantly from the overall pattern of data.

New cards

Multivariate Analysis

A set of statistical techniques used to analyze data that involves more than one variable.

New cards

Chi-Square Statistic

A measure used in statistical significance tests to determine if there is a significant association between categorical variables.

New cards

Coefficient of Determination

Another name for R², it indicates the proportion of the variance in the dependent variable that can be predicted from the independent variable(s).

New cards

Data Normalization

The process of adjusting values measured on different scales to a common scale.

New cards

Statistical Power

The probability that a statistical test will correctly reject a false null hypothesis; the ability to detect an effect if there is one.

New cards

Equation of a Least Squares Regression Line

The equation of the LSRL is typically written as ŷ = b₀ + b₁x, where b₀ is the y-intercept and b₁ is the slope.

New cards

Variance Formula

Variance is calculated using the formula: σ² = Σ(xᵢ - μ)² / N, where μ is the mean and N is the number of values.

New cards

Standard Deviation Formula

Standard Deviation (σ) is calculated as: σ = √(Σ(xᵢ - μ)² / N).

New cards

P-value Interpretation Rule

If P-value < α (significance level), reject the null hypothesis; if P-value ≥ α, fail to reject the null hypothesis.

New cards

Central Limit Theorem Equation

The CLT states that as the sample size (n) increases, the sampling distribution of the sample mean approaches a normal distribution, regardless of the population's distribution.

New cards

Binomial Probability Formula

The probability of getting exactly k successes in n trials is given by P(X = k) = (n choose k) * p^k * (1 - p)^(n - k), where p is the probability of success.

New cards

Confidence Interval Formula

A confidence interval for a population mean is given by: CI = ar{x} ± z*(σ/√n), where z* is the z-score corresponding to the desired confidence level.

New cards

General Addition Rule Equation

For two events A and B, P(A ∪ B) = P(A) + P(B) - P(A ∩ B).

New cards

Coefficient of Variation Formula

Coefficient of Variation (CV) = (σ / μ) * 100%, representing the ratio of the standard deviation to the mean.

New cards

Regression Equation for Simple Linear Regression

The regression equation is expressed as ŷ = b₀ + b₁x, where b₁ = r * (σy / σx) and b₀ = ȳ - b₁x̄.

New cards

Residual Plot

A graphical representation of the residuals plotted against predicted values (ŷ); used to check the assumptions of linear regression.

New cards

Standard Error of the Mean (SEM)

An estimate of the standard deviation of the sampling distribution of the sample mean; used to gauge the accuracy of sample mean estimates. Formula: SEM = σ/√n.

New cards

Multicollinearity

A situation in regression analysis where two or more independent variables are highly correlated, which can affect the stability of coefficient estimates.

New cards

Adjusted R²

A modified version of R² that adjusts for the number of predictors in the model; useful for comparing models with different numbers of predictors.

New cards

Logistic Regression

A regression model used when the dependent variable is binary; it predicts the probability that the outcome belongs to a particular category.

New cards

Power Analysis

A method used to determine the sample size required to detect an effect of a given size with a specified level of confidence; essential for study design.

New cards

Effect Size Interpretation Guidelines

Values of effect size can indicate the strength of the relationship; small (0.2), medium (0.5), and large (0.8) are commonly used thresholds.

New cards

Null Hypothesis Significance Testing (NHST)

A framework for hypothesis testing that assesses the evidence against a null hypothesis; used widely in statistical analyses.

New cards

Regression Coefficient Interpretation

In the regression equation, the slope (b₁) indicates the change in the dependent variable for a one-unit increase in the independent variable.

New cards

Box Plot

A graphical representation of the distribution of a data set through their quartiles; useful for identifying outliers and the spread of data.

100

New cards

Normality Tests

Statistical tests (e.g., Shapiro-Wilk, Kolmogorov-Smirnov) used to determine if a dataset follows a normal distribution, crucial for many inferential statistics.