1/138
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Statistics
the science of collecting, analyzing, and drawing conclusions form data.
Descriptive - methods of organizing and summarizing statistics
Inferential - making generalizations from a sample to the population
Population
An entire collection of individuals or objects
Sample
A subset of the population selected for the study
Variable
Any characteristic whose value changes
Data
observations on single or multi-variables
Variables
categorical, numerical, univariate, bivariate, multivariate
Categorical (Quallitative)
-basic characteristics
Numerical (Quantative)
measurements or observations of numerical data.
Discrete- listable sets (counts)
Continuous- any value over an interval of values (measurements)
Univariate
One variable
Bivariate
Two variables
Multivariate
many variables
Types of distributions
symmetrical, uniform, skewed, bimodal
Symmetrical
Data on which both sides are fairly the same shape and size. "Bell curve"
Uniform
Every class has an equal frequency (number) "a rectangle"
Skewed
one side (tail) is longer than the other side. The skewness is in the direction that the tail points (left or right)
Bimodal
data of two or more classes have large frequencies separated by another class between them. "double hump camel"
How to describe numerical graphs - S.O.C.S
Shape, Outliers, Center, Spread
Shape
overall type (symmetrical, skewed right left, uniform or bimodal)
Statistic (x that type of stuff)
a calculated value about a population from a sample(s).
Measures of Center
Median, Mean, Mode
Mean
μ is for a population (parameter) and x is for a sample (statistic)
Variability
allows statisticians to distinguish between usual and unusual occurrences.
Resistant
-not affected by outliers
Median and IQR
Non-resistant
Mean, Range, Variance, Standard Deviation, Correlation Coefficient (r), Least Squares Regression Line (LRSL) and Coefficient of Determination (r^2)
Trimmed Mean
use a % to take observations away from the top and bottom of the ordered data. This possibly eliminates outliers
Z-score
is a standardized score. This tells you how many standard deviations from the mean an observation is. It creates a standard normal curve consisting of z-scores with a μ = 0 & σ = 1.
z= x-μ/σ
5- Number Summary
Minimum, Q1, Median, Q3, Maximum
Probability rules
Sample Space, Event, Complement, Union, Intersection, Mutually Exclusive, Independent, Experimental Probability, Law of Large Numbers
Sample Space
is collection of all outcomes
Event
any sample of outcomes
Complement
all outcomes not in the event
Union
A or B, all the outcomes in both circles. AuB
Intersection
A and B, happening in teh middle of A and B. AnB
Mutually Exclusive (Disjoint)
A and B have no intersection. They cannot happen at the same time.
Independent
if knowing one event does not change the outcome of another
Experimental Probability
is the number of success from an experiment divided by the total amount from the experiment.
Law of Large Numbers
as an experiment is repeated the experimental probability gets close and closer to the true (theoretical) probability. The difference between the two probabilities will approach "0"
Correlation Coefficient - (r)
is a quantitative assessment of the strength and direction of a linear relationship.
Least Squares Regression LIne (LRSL)
is a line of mathematical best fit. Minimizes the deviations (residuals) from teh line. Used with bivariate data.
Residuals (error)
is a vertical difference of a point from the LRSL. All residuals sum up to "0".
Residual Plot
a scatterplot of residual. No matter indicates a linear relationship
Coefficient of Determination (r^2)
gives the proportion of variation in y (response) that is explained by teh relationship of (x,y) Never use the adjusted r^2.
Interpretations
Slope (b)
For unit increase in x, then the y variable will increase/decrease slope amount
Correlation coefficient (r)
There is a strength, direction, linear association between x and y
Coefficient of determination (r^2)
Approximately r^2% of the variation in y can be explained by the LRSL of x any y.
Extrapolation
LRSL cannot be used to find values outside of the range of the original data
Influential Points
are points that if removed significantly change the LSRL.
Outliers (residuals)
are points with large residuals
Sampling Frame
is a list of everyone in the population.
Types of Sampling Designs
SRS, Stratified, Systematic, Cluster Sample
SRS (Simple Random Sample)
one chooses so that each unit has an equal chance and every set of units has an equal chance of being selected.
Advantage's: easy and unbiased
Disadvantages: large σ2 and must know population
Stratified
divide the population into homogeneous groups called strata
Advantages: more precise than an SRS and cost reduced if strata already available.
Disadvantages: difficult to divide into groups, more complex formulas & must know population
Systematic
use a systematic approach (every 50th) after choosing randomly where to begin.
Advantages: unbiased, the sample is evenly distributed across population & don't need to know population
Disadvantages: a large σ2 and can be confounded by trends
Cluster Sample
based on location. Select a random location and sample ALL at that location
Advantages: cost is reduced, is unbiased and don't need to know population
Disadvantages: May not be representative of population and has complex formulas.
Random Digit Table
each entry is equally likely and each digit is independent of the rest
Random # Generator
Calculator or computer program
Bias-
Error, favors a certain outcome, has to do with center of sampling distributions - if centered over true parameter then considered unbiased
Sources of Bias
Voluntary Response, Convenience Sampling, Undercoverage, Non-response, Response, Wording of the Questions
Voluntary Response
People choose themselves to participate
Convenience Sampling
ask people who are easy, friendly, or comfortable asking
Undercoverage
some group(s) are left out of the selection process.
Non-response
someone cannot or does not want to be contacted or participate.
Response
false answers- can be caused by a variety of things
Wording of Questions
leading questions
Types of Experimental Designs
Observational study, experiment, experimental unit, factor, level, response variable, treatment, control group, placebo, blinding, double blinding.
Observational study
observe outcomes with out giving a treatment
Experiment
actively imposes a treatment on the subjects
Experimental unit
single individual or object that receives a treatment
Factor
Is the explanatory variable, what is being tested.
Level
a specific value for the factor
Response Variable
What you are measuring with the experiment
Treatment
experimental condition applied to each unit
Control Group
a group used to compare the factor to for effectiveness - does NOT have to be placebo
Placebo
a treatment with no active ingredients (provides control)
Blinding
a method used so that the subjects are unaware of the treatment (who gets a placebo or the real treatment).
Double Blinding
neither the subjects nor the evaluators know which treatment is being given.
Principles
Control, Replication, Randomization
Control
Keep all extraneous variables (not being stated) constant
Replication
uses many subjects to quantify the natural variation in the response
Randomization
uses chance to assign the subjects to the treatments.
How to create proper cause and effect
it is with a well designed, well controlled experiment
Experimental Designs
Completely Randomized, Randomized Block, Matched Pairs, Confounding Variables, Randomization, Blocking
Completely Randomized
all units are allocated to all the treatments randomly
Randomized Block
units are blocked and then randomly assigned in each block - reduces variation
Matched Pairs-
are matched up units by characteristics and then randomly assigned. Once a pair receives a certain treatment, then the other pair automatically receives the second treatment. OR individuals do both treatments in random order (before/ after or pretest/post-test). Assignment in dependent
Confounding Variables
are where the effect of the variable on the response cannot be separated from teh effects of the factor being tested - happens in observational studies - when you use random assignment to treatments you do NOT have confounding variables.
Randomization (Designs)
reduces bias by spreading extraneous variables to all groups in the experiment
Blocking
helps reduce variability. Another was to reduce variability is to increase sample size
Random variable
a numerical value that depends on teh outcome of an experiment
Discrete
a count of a random variable
Continuous
a measure of a random variable
Discrete Probability Distributions
gives values and probabilities associated with each possible x.
calculator shortcut - 1 VARSTAT L1, L2
Fair game
a fair game is one in which all pay-ins equal all pay-outs
Special discrete distributions
binomial distributions and geometric distributions
Binomial distribution
Properties- two mutually exclusive outcomes, fixed number of trails (n), each trial is independent, the probability (p) of success is the same for all trials.
Random variable- is the number of successes out of the fixed # of trials. Starts at X = 0 and is finite.
μx = np σ = sqrt(npq)
Calculator: binomialpdf (n, p, x) - single outcome P(X=x)
binomialcdf (n, p, x) = cumulative outcome P(X < x)
1 - binomialcdf (n, p, (x-1)) = cumulative outcome P(X>x)
Geometric Distributions
Properties - two mutually exclusive outcomes, each trial is independent, probability (p) of success is the same for all trials. (NOT a fixed number of trials)
Random Variable - when the FIRST succcess occurs. Starts at 1 and is infinite
Calculator: geometricpdf (p, a) = single outcome P(X = a)
geometriccdf (p, a) = cumulative outcomes P(X < a)
1 - geometriccdf (n, p, (a-1)) = cumulative outcome P(X > a)
Continuous Random Variable
numerical values that fall within a range of interval (measurements), use density curves where the area under the curve always = 1. The find probabilities, find area under the curve
Unusual Density Curves - any shape (triangles, etc.)
Uniform Distributions - uniformly (evenly) distributed, shape of a rectangle.
Normal Distributions - symmetircal, unimodal, bell shaped curves defined by the parameters μ and σ.
Calculator: Normalpdf - used for graphing only
Normalcdf (lower bound, upper bound, μ, σ) - finds probability
InvNorm(p) - z-score OR InvNorm (p, μ, σ) - gives x-value
To assess Normality
Use Graphs - dotplots, boxplots, histograms, or normal probability plot.
Distribution
is all of the values of a random variable
Sampling Distribution
of a statistic is the distribution of all possible values of all possible samples. Use normalcdf to calculate probabilities - be sure to use correct SD
Standard error
estimate of the standard deviation of the statistic
Central Limit Theorem
when n is sufficiently large (n>30) the sampling distribution is approximately normal even if the population distribution is not normal