1/185
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Statistics
Science dealing with the collection, analysis, interpretation, and presentation of numerical data
Descriptive Stats
Using data gathered on a group to reach conclusions about the same group
Inferential statistics
Using data gathered on a group to reach conclusions about the population
Population
Collection of persons, objects, or items of interest
Census
Gathering data from the entire population
Sample
A portion of the population that represents the entire population
Parameter
A descriptive measure of the population
Statistic
Descriptive measure of the sample
Variable
Characteristic of any entity being studied that is capable of taking on different values
Measurement
Occurs when a standard process is used to assign numbers to particular attributes or characteristics of a variable
Data
Measurement that is recoreded and stored
Nominal
Data used only to classify or categorize
-no value statement
-no order
Ordinal
Data that is used to order/rank items
-no value statement
Interval
Data that has ranking and between each ranking has meaning
Ratio
Data that has ranking and between each ranking has meaning, additionally zero means the absence
Big data
Large amount of either organized or unorganized data from different sources that is difficult to process
Variety
Different forms of data
Velocity
Speed at which data is available and can be processed
Veracity
Quality and accuracy of the data
Volume
Size of the data
Data Mining
Process of collecting, exploring, and analyzing large volumes of data in an effort to uncover hidden patterns/relationships
Data visualization
Study of the visual representation of data, employed to convey data or information by imparting it as visual objects displayed in graphs
Ungrouped data
Raw data or data that has not been summarized
Grouped data
Data that has been organized into a frequency distribution
Frequency distribution
Summary of data presented in the form of class intervals and frequencies
Range
The difference between the largest and smallest value in a set of numbers
-generally between 5 and 15 classes
Class midpoint
Value halfway across a class interval
Relative Frequency
Proportion of the total frequency that is in any given class interval in a frequency distribution
= Individual class frequency/Total frequency (proportion of the total that the individual class makes up)
Cumulative frequency
Running total of frequencies through the class of a frequency distribution
Histogram
Vertical bar chart constructed by graphing segments for frequencies
-frequency on Y axis
-classes on X axis
Frequency Polygon
Graphical display of class frequencies
-line graph that connects class midpoints
Ogive
Line graph connecting the cumulative frequency of class endpoints
Stem and Leaf Plot
Consists of Stems (left digit) in the first column and Leaf (right digit) coming out of the stems
-Stems ordered lowest value at the top
-Leafs ordered lowest value at the left
Pie chart
Circular depiction of data where area of the whole pie represents 100%
Bar chart
Chart containing two or more categories along one axis and bars along the other
Pareto chart
A vertical bar chart, categories being graphed descending order (highest value on the left)
-often includes a cumulative frequency line
-80/20 rule
Cross Tabulation
Process for producing a two dimensional table, displaying frequency counts for two variables
Scatter plot
Two dimensional plot of pairs of points from two variables
Time series
Data gathered on a given characteristic over a period of time at regular intervals
Measures of central tendency
One type of measure that is used to yield information about the center of a group of numbers
-Mean, Median, Mode
Mean
Average of a group of numbers
Median
middle value in an ordered array of numbers
-the (N+1)/2 term
Mode
The most frequently occuring value in a set of data
Bimodal
Data set that has two modes
Multimodal
Data set that has more than two modes
Percentiles
Measures of central tendency that divide a group into 100 parts
nth percentile means at least n% of the data is below that value
-always rounds down
Average Ith and (I+1)th number
When calculating percentile, if I is a whole number, what do you do to find location of the percentile?
Whole number part of (I+1)th number
When calculating percentile, if I is not a whole number, what do you do to find location of the percentile?
Quartiles
Measures of central tendency that divide a group of data into four parts
Q1 = 25th percentile
Q2 = Median
Q3 = 75th percentile
Measures of variability
Statistics that describe the spread or dispersion of a set of data
Interquartile range
Q3 - Q1
68%, 95%, 99.7%
The empirical rule states that if data is normally distributed, (blank)% of data is within 1 standard deviation, (blank)% of data is within 2 standard deviations, and (blank)% of data is within 3 standard deviations
1 - 1/K²
Chebyshev’s Theorem states that at least (blank) values will fall within K standard deviations
-works regardless of shape of distribution
Z score
The number of standard deviations by which a value is above or below the mean of a set of numbers, when the data is normally distributed
Skewness
The degree of symmetry around the sample mean
-left skewed means the long tail is on the left (right means long tail on the right)
-Left: Mean, Median, Mode
-Right: Mode, Median, Mean
-Symmetrical: All in the middle
Box and Whisker plot
Diagram that with the interquartile range as the box
1.5*IQR as the inner fence
3*IQR as outer fence
-values in the inner fence are mild outliers
-values in the outer fence are extreme outliers
-if the median in the box is to the right, skewed left
Classical method (probability)
Assigning probability based on laws or rules (number of times event occurs/total number of outcomes)
Relative frequency of occurence method (probability)
Probability based on historical (number of times event occured/number of times it could have occured)
Subjective method (probability)
Probability based on feelings or insight
Experiment
Process that produces outcomes
Event
Outcome of an experiment
-Broken down furthest into elementary events
Sample space
Complete roster or listing of all elementary events of an experiment
-can be deonted using set notation
Union
Combination of all the numbers between two sets (X and Y)
-numbers don’t get repeated when listing them
Intersection
Numbers that are common to both sets
Mutually exclusive events
Events such that the occurence of one means the other cannot occur
Ex. Making a shot vs missing a shot
Independent events
Events such that the occurence of one has no effect on the occurence of the other
Collectively exhaustive events
Contains all possible elementary events
-The entire sample space
Complement
An event that comprises all the elementary events not in one event
-Denoted P(A’)
= 1 - P(A)
M*n counting rule
When there are multiple combinations, what rule should you apply to figure out the total number of possible combinations
Ex. When there is a cake with 5 flavours and 5 sizes how many possible combinations?
N^n
When sampling with replacement, how many different possibles can occur?
-where N is population size and n is sample size
N!/n!(N-n)!
When sampling without replacement, how many possibles can occur?
-where N is population size and n is sample size
n!/(n-r)!
When sampling where order matters, how many possible permutations are there?
-where n is the population and r is the sample size
Random variable
A variable that contains the outcome of a chance experiment
Discrete variable
A random variable that is finite or countably infinite
Continous
A random variable that has values at every point over a given interval
Binominal Distribution
Discrete distribution with only 2 possible outcomes in a given trial (ex. success, failure)
-Assumption: Replacement/independence
n < 5% N
You can use the binominal distribution without assuming independence/replacement if:
What rule regarding n and N?
Number of trials, Number of successes desired, Probability of success, Probability of failure (n, x, p, q)
What information do you need to do to solve a binomial problem using the binominal formula?
Normal distribution (Z)
-Continous distribution
-Symmetrical about the mean
-Asymptotic (doesn’t touch horizontal axis)
-Unimodal
-Family of curves
-Area of the curve = 1
NP > 5 and NQ > 5
If (this condition) is met, we can use the Z distribution to solve binominal problems, after applying a correction factor
+0.50, -0.50, -0.50, +0.50
When using Z distribution to solve binomial problems, what is the correction factor for solving for:
X >
X >=
X <
X <=
Frame
A list, map, directory, or any source that can be used to represent a population
-can be overregistered or underegistered
Random sampling
Sampling in which every unit of the population has the same probability of being selected
Simple random sampling
The most elementary of the random sampling techniques, using a random number generator to pick items
Statified random sampling
Random sampling in which the population is divided into various strata (ex. age), then items are picked from each strata
-can be proportionate (pick so sample reflects the proportions of each strata in the population) or disproportionate
Systematic sampling
A random sampling technique in which every kth item or person in a randomized list is selected
-where k = N/n
Cluster Sampling
A random sampling technique in which the population is divided into clusters and elements are randomly sampled from clusters
-Homogeneity between clusters, hetero within clusters
Non random sampling
Sampling in which not every unit of the population has the same probability of being selected for the sample
-not scientific
Convenience sampling
Selecting a sample at researcher’s convenience
Judgement sampling
Selectinga sample at researcher’s judgement
Quota sampling
Sample is selected non randomly to fit a desired quota
Snowball sampling
Survey subjects are selected based on referral from others
Sampling error
The error that results if the sample is not representative of the population
Central limit theorem
Regardless of the shape of a population, the distributions of sample means and sample proportions are normally distributed as long as n is large (n>30 or np>5 nq >5)
-Thus we can use Z to solve sample problems
Sqrt(N-n/N-1)
When working with a finite population (and n is more than 5% of the population), what correction factor do we apply?
T distribution
What distribution should you use when the population standard deviation is unknown but the sample standard deviation is known?
-Also assuming population is normally distributed
Robust
A term used to describe statistical techniques that are relatively insensitive to minor violations in its assumptions
Area between mean and the Z
What area does the Z value give?
Area between T and the upper/lower tail
What area does the T value give?
n - 1
What is degrees of freedom for T?