1/47
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Statistics
the science of collecting, analyzing, and interpreting numerical data to understand patterns, test hypotheses, and make decisions about populations based on sample data.
Data
individual pieces of information, often numerical, collected through observation, measurement, or experimentation and used for statistical analysis.
Population
the entire group of individuals, events, or objects that a researcher wants to study or draw conclusions about.
Sample
a subset of the population selected for analysis, used to estimate characteristics of the whole population.
Parameter
a fixed, often unknown value that describes a characteristic of a population, such as the true mean or proportion.
Statistic
a numerical value calculated from sample data that estimates a population parameter.
Sampling
the process of selecting a subset of individuals from a population to represent the whole in statistical analysis.
Sampling Bias
a systematic error that occurs when the sample is not representative of the population, leading to distorted results.
Sampling Variability
the natural variation in sample statistics that occurs when different samples are drawn from the same population.
Random Sampling
a method where every member of the population has an equal chance of being selected, minimizing bias.
Simple Random Sample
a type of random sampling where each individual and each possible sample has an equal probability of selection.
Probability
a numerical measure between 0 and 1 that expresses the likelihood of an event occurring.
Properties of Probability
probabilities range from 0 to 1, with 0 meaning impossible, 1 meaning certain, and the sum of all possible outcomes equaling 1.
Random Process
a process whose outcome is unpredictable in the short term but follows predictable patterns over time, governed by probability.
Variable
a characteristic or attribute that can take on different values among individuals in a population.
Categorical Variable
a variable whose values are expressed in distinct categories, such as sex or religion.
Nominal Variable
a categorical variable with no inherent order among categories, like blood type or ethnicity.
Ordinal Variable
a categorical variable with a meaningful order among categories, such as severity of illness or education level.
Binary Variable
a categorical variable with only two possible values, such as yes/no or male/female.
Quantitative Variable
a variable measured on a numerical scale, including discrete and continuous types.
Discrete Variable
a quantitative variable that takes on countable, isolated values, such as number of children.
Continuous Variable
a quantitative variable that can take on any value within a range, such as height or blood pressure.
Describing Categorical Variables
typically summarized using frequencies and percentages, and visualized with bar plots or pie charts.
Describing Quantitative Variables
summarized using measures of central tendency and variability, and visualized with histograms or box plots.
Frequency Distribution
a table or graph showing how often each value or category occurs in a dataset.
Relative Frequency
the proportion of observations in each category relative to the total number of observations.
Cumulative Frequency
the running total of frequencies up to a certain category or interval.
Bar Plot
a graph using rectangular bars to represent the frequency or percentage of categorical data.
Pie Chart
a circular graph divided into slices to show proportions of categorical data.
Histogram
a graph displaying the distribution of a continuous variable using adjacent bars for intervals.
Skewness
a measure of asymmetry in a distribution, with positive skew indicating a tail on the right and negative skew indicating a tail on the left.
Kurtosis
a measure of the "tailedness" or peakedness of a distribution, indicating how heavy or light the tails are.
Measures of Central Tendency
statistics that describe the center or typical value of a dataset, including mean, median, and mode.
Mean
the arithmetic average of a dataset, calculated by summing all values and dividing by the number of observations.
Median
the middle value in an ordered dataset, dividing the data into two equal halves.
Mode
the value that occurs most frequently in a dataset, useful for identifying common values or peaks in distributions.
Range
the difference between the maximum and minimum values in a dataset, providing a crude measure of variability.
Centiles
values that divide a dataset into 100 equal parts; the 25th centile is the first quartile (Q1), the 50th is the median (Q2), and the 75th is the third quartile (Q3).
Interquartile Range (IQR)
the range between the 25th and 75th centiles, measuring the spread of the middle 50% of data and reducing the influence of outliers.
Box Plot
a graphical summary of a dataset showing the median, quartiles, IQR, and potential outliers; useful for comparing distributions.
Outlier
a data point that lies far outside the range of most other values, possibly indicating variability, error, or special cases.
Measures of Variability
statistics that describe the spread or dispersion of data, including range, variance, standard deviation, and coefficient of variation.
Variance
the average of squared deviations from the mean, reflecting how much values differ from the average.
Standard Deviation (SD)
the square root of the variance, indicating the average distance of values from the mean and widely used to assess variability.
Coefficient of Variation
a standardized measure of dispersion, calculated as SD divided by mean, useful for comparing variability across datasets with different units or scales.
Normal Distribution
a symmetric, bell-shaped distribution defined by its mean and SD, commonly found in biological and health-related variables.
Properties of the Normal Distribution
symmetrical about the mean, mean equals median and mode, defined entirely by mean and SD, with 68% of data within ±1 SD, 95% within ±2 SD, and 99.7% within ±3 SD.
Fitting a Normal Distribution
once sample mean and SD are known, a normal curve can be fitted to the data to model its behavior and make predictions.