1/45
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Statistics
The science of variability.
What are the 3 short statistical sayings?
What was compared?
Whoās not here?
Incorporate āishā-ness
Data
A representation of someone or something.
Tidy Data
A way of mapping the real world to a dataset.
Observations & attributes
Observations
Map to rows, & are the things we are interested in.
Attributes
Map to columns, & are the pieces of information we are interested in.
Measures
The way in which we collect information about observations.
Quantitive, categorical, rating scale, or time series data
Quantitative Data
Refers to data in which the values of an attribute for an observation are numbers representing a quantity of something.
Categorical Data
Refers to data in which the values of an attribute for an observation are selected from a set of different category labels.
Rating Scale Data (Ordinal)
A special type of categorical data that refers to data in which the values of an attribute for an observation are selected form a predetermined rating scale.
Time Series Data
Refers to data in which the values of an attribute for an observation indicate a moment in time (such as year, month, or day).
Reliability
Refers to the extent to which the data you collect from a measure truly represents & reflects the real world characteristics of the observation.
Data Validation
The act of ensuring that the values collected from each observation for each attribute are reliable.
Univariate Analysis
The analysis of a single attribute or variable at a time.
Standard Deviation
A measure that tells us how spread out observations are from one another.
middle 95%/4 = SD
Five Number Summary
Contains five numbers that help statisticians & data scientists understand the different values that the different observations have for an attribute.
min, Q1, median, Q3, max
Minimum
The smallest value that any observation has for the attribute.
First Quartile (Q1)
The 25th percentile value. 25% of observations have a value below the first quartile.
Median (Q2)
The 50th percentile value. Half of all observations have a value below the median.
Third Quartile (Q3)
The 75th percentile value. 75% of all observations have a value below the third quartile.
Maximum
The largest value that any observation has for the attribute.
Frequencies
The total number of observations whose response is equal to a particular value.
Relative Frequencies
The percentage of all observations without missing values whose value is equal to a particular response.
Dot Plot
A graph where each observation is displayed as one point on the graph.
looking at the height tells you the frequency for a particular value.
Density Plot
Very similar to a dot plot with a line drawn across the top of all of the stacks of dots.
Word Cloud
Graphically depicts all of the words across all of the responses from all of the observations. The size of the word varies by the frequency of the word.
Bar Graph
Based on a frequency table, & has one bar for every response option.
The height is equal to each response optionās frequency
Aggregate Characteristics
The characteristics of a group of observations.
For text, rating scale, & categorical data, what are the main aggregate characteristics we focus on?
The frequencies & percentages of each of the response options.
For quantitative data, what are the main aggregate characteristics we focus on?
Shape of the data
Spread of the data
Location of the data
Distribution
The pattern that the responses from all the observations make.
What are the many different common shapes that we often see in quantitative distributions?
Normal distribution
Skewed distribution
Multi-Modal distribution
Normal Distribution
A bell-curve shape
Most of the observations have a value near the average
Approximately 95% of the observations will have a value within two standard deviations of the mean
Skewed Distributions
Looks like theyāve had one side stretched out.
Right skew distributions look like the right side of a normal distribution has been stretched, which indicates that some units have very large values.
Left skew distributions look like the left side of a normal distribution has been stretched, which indicates that some units have very small values.
Multi-Modal Distributions
Multiple peaks.
Often seen when there are actually group differences in the attribute.
Key to think statistically:
Focusing on how each observation varies.
Two-Way Table
Similar to a frequency table, except that one attributeās frequencies are presented as different rows in the table, & a second attributeās frequencies are presented as columns.
Column Percents
Relative frequencies based only on the total from a single column.
Statisticians use column percents to compare the distribution of a categorical or rating scale attribute between two groups.
Line Graph
Places time on the horizontal (x) axis, & the average value of an attribute or a percentage on the vertical axis.
Ratio of Standard Deviation
Equal to the largest standard deviation between the two groups divided by the smallest standard deviation.
used to compare the spread of the distribution between two groups
If the ratio is approximately 3 or higher, we say that one distribution is more spread out than the other.
Effect Size
Equal to the difference in the means, divided by the larger standard deviation.
Used to compare the relative difference in the means between two groups.
0.10 or less = no difference in the averages between groups.
0.25 or less = small difference in the averages between groups.
0.75 or more = large difference in the averages between groups.
Grouped Bar Graph
Has one bar graph created separately for each of the different response options for the second attribute being considered in the two-way table.
Grouped Density Plots
Have one density curve for a quantitative attribute for each group all on the same plot.
Scatter Plot
A plot in which each observation is placed as a point on a graph according to their value for each of the two quantitative attributes.
usually the causal factor goes on the horizontal (x) axis
Smoothed Trend Line
A line through the average value of the vertical axis attribute across all values of the horizontal axis attribute.
Correlation Coefficient
A statistic that summarizes how related two attributes are to each other.
Close to 0 = no association between the attributes.
Close to -1 = strong negative association between the attributes.
Close to +1 = strong positive association between the attributes.