1/49
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Population
All members of group of interest
Element
Single member of the group of interest
Sample
A subset of the population
Census
Collection of all elements in the population
Variable
A characteristic of an element (values will vary)
Descriptive Statistics
Summarize and present- graphs, charts, tables, summaries
Structured Data
Data that is organized in a defined manner, making it easily searchable.
Unstructured Data
Data that does not have a pre-defined format or organization, making it more complex to analyze.
Data Cleanup
The process of preparing data for analysis by correcting errors, translating units, and handling missing values.
Cross-Section
Data collected from many elements at one time period, providing a snapshot.
Time-Series
Data collected from one element over many time periods, showing trends.
Longitudinal Data
Data that combines both time series and cross-sectional data.
Categorical Variables
Also known as qualitative variables, these are labels and names that do not allow for mathematical operations.
Quantitative Variables
Also known as numeric variables, these are numbers and measurements that answer 'how much?' or 'how many?'.
Discrete Variable
A variable that takes separate chunks, with a finite or countably infinite set of values.
Continuous Variable
A variable that can take any value within an interval, including fractions.
Nominal Scale
A scale of measurement that uses labels or names without any order.
Ordinal Scale
A scale of measurement where the ordering of values makes sense.
Interval Scale
A scale of measurement where distances between values are meaningful, but zero does not signify nothingness.
Ratio Scale
A scale of measurement where both distances and ratios are meaningful, and zero signifies nothingness.
Frequency Distribution Table
A table that displays the count of how values are spread out.
Relative Frequency
The proportion of a category count divided by the total.
Histograms
Graphs used for quantitative data, characterized by no spaces between bars.
Cumulative Relative Frequency
The sum of relative frequencies for each category, eventually reaching 1.0.
Ogive
A cumulative frequency polygon that typically has an elongated S shape and always ends at 1.
Scatter Plot
A graph used to display the relationship between two quantitative variables.
Line Graphs
Graphs used to show trends over time.
Mean
The arithmetic average, calculated by adding all values and dividing by the number of values.
Median
The middle value in a data set, determined after arranging values in ascending order.
Skewed Distribution
When the mean is different than the median, indicating asymmetry in the data.
Mode
The most common value in a data set.
Subgroup Mean
Used to find the average of subgroups within a whole data set.
Weighted Mean
Calculated as Mean sales price 1(relative frequency 1) + Mean sales price 2(relative frequency 2) + etc.
Percentiles
The p^th percentile is a value such that p% of the observations are smaller and (100-p)% are larger.
Quartiles
Special percentiles: Q1 is the 25th percentile, Q2 is the 50th percentile (Median), and Q3 is the 75th percentile.
Interquartile Range (IQR)
Calculated as Q3 - Q1, representing the distance of the middle 50% of values.
Lower Fence
Any value smaller than Q1 - 1.5(IQR) could be considered an outlier.
Upper Fence
Any value larger than Q3 + 1.5(IQR) could be considered an outlier.
Range
Calculated as Max - Min, representing the spread of the data.
Variance
Population variance is calculated as 𝞼^2 = (x1 - mu)^2 + (x2 - mu)^2 + …etc/ N.
Sample Variance
Calculated as S^2 = (x1 - x-bar)^2 + (x2 - x-bar)^2/ n-1.
Standard Deviation
Population standard deviation is the square root of variance, 𝞼 = square root of variance.
Coefficient of Variation (C.V)
Measures how much observations vary relative to their mean, calculated as C.V = standard deviation/mean.
Empirical Rule
In a symmetric bell-shaped distribution, 68% of observations are within one standard deviation of the mean, 95% within two, and 99.7% within three.
Z-scores
A measure of how many standard deviations an observation is from the mean, calculated as X - Xbar/s or X - mu/sigma.
Covariance
Indicates how two variables change together, calculated as sample covariance: sxy = (x1-xbar)(y1-ybar) + etc/ n-1.
Population Covariance
Calculated as sigmaxy = (x1-mux)(y1-muy)/N.
Correlation
Describes the strength and direction of the linear relationship between two variables.
Sample Correlation
Calculated as rxy = Sxy/SxSy, which is covariance divided by the product of the standard deviations.
Population Correlation
Calculated as Pxy = sigmaxy/sigmax sigma.