Descriptive Statistics - Organize, summarize, simplify and presentation of data. (Describing Data)
Inferential Statistics - is the mathematics and logic of how this generalization from sample to population can be made.
Inferential Statistics - can be used to prove or disprove theories, determine associations between variables, and determine if findings are significant and whether or not we can generalize from our sample to the entire population.
Probability Theory - the basis for decision-making statistical inferences, refers to a large number of outcomes that will happen in a population in the long run.
hypothesis - educated guess about something in the world around you.
null hypothesis - refers to the null condition, no difference between means or no relationship between variables
null hypothesis - statement of no differences or no association between variables.
Alternative hypothesi - statement of differences or association between variables.
probability - is the study of patterns of random processes
Statistical Inference - statistics enable us to judge the probability that our inferences or estimates are close to the truth.
Random Selection – equal chance for anyone to be selected amkes sample more representative
Sampling distributions - are theoretical distributions developed to organize statistical outcomes from various sample sizes and to know the relative frequency.
Degrees of Freedom - are the way in which the scientific tradition accounts for variation due to error it specifies how many values vary within a statistical test.
Degrees of Freedom - it specifies how many values vary within a statistical test
Type I error - Rejecting a True Hypothesis
Type II error -- Accepting a False Hypothesis
Parametric Measures of Association - These answer the question, “within a given population, is there a relationship between one variable and another variable?”
Correlation – answers the question, “What is the degree of relationship between “x” and “y”
Parametric tests of significance - used if there are at least 30 observations, the population can be assumed to be normally distributed, variables are at least in an interval scale.
Z tests are used with samples over 30. here are four kinds (two samples or two categories)
t-tests are used when samples are 30 or less.
Mann Whitney U - an alternate to the independent t-test must have at least ordinal data.
The Wilcoxon Matched Pairs test is an alternative to the paired t-test for analyzing repeated measures on the same individual. It involves comparing T values, and if the calculated T value is less than or equal to the critical T value from a table, the null hypothesis is rejected.
Parametric Prediction – using a correlation, to know score “x”, and can predict score “y” for one person, Use regression analysis:
Simple linear regression – allows the prediction from one variable to another, you must have at least interval level data
Non-parametric Prediction - measures the extent to which you can reduce the error in predicting the dependent variable as a consequence of having some knowledge of the independent variable.
Kendall’s Tau – used with ordinal data and ranking, it is
better than the Gamma because it takes ties into account.
o
Gamma - used with ordinal data to predict the rank of one
variable by knowing rank on another variable.
o
Lambda – can be used with nominal data, knowledge of the
IV allows one to make a better prediction of the DV than if
you had no knowledge at all.
null hypothesis
ex. there’s no difference/significant relationship between the [variable A] and [variable B]
If the .05 level is achieved (p is equal to or less than .05), then a researcher Reject the H0 and accepts the H1.
If the .05 significance level is not achieved, then the H0 is retained.
Random does not mean chaotic, rather it means there’s a pattern that becomes apparent only when we examine a large number of events .
Correlation
use it when you want to know about the association or relationship between two continuous variables
it tells you If a linear relationship exists between two variables, and how strong that relationship is
T-tests
Use this When comparing the MEANS of a continuous variable in two non-independent samples
What do the results look like?
Student’s t
How do you interpret it?
By looking at corresponding p-value
If p < .05, means are significantly different from each other
If p > 0.05, means are not significantly different from each other
Chi-square
When to use it?
When you want to know if there is an association between two categorical (nominal) variables (i.e., between an exposure and outcome)
Ex) Smoking (yes/no) and lung cancer (yes/no)
What does a chi-square test tell you?
If the observed frequencies of occurrence in each group are significantly different from expected frequencies (i.e., a difference of proportions)
Inferential Statistics are based on:
Probability Theory
Statistical Inference
Sampling Distributions
5 Steps of Inferential Statistics
State Hypothesis
Level of Significance
Computing Calculated Value
Obtain Critical Value
Testing Hypothesis ???
Types of Inferential Statistics
Correlation
T-tests/ANOVA
Chi-square
Logistic Regression
Types of Data and Analysis
Analysis of Categorical/Nominal Data
Correlation T-tests
T-tests
Analysis of Continuous Data
Chi-Square
Logistic Regression
Guide for interpreting strength of correlations:
0 – 0.25 = Little or no relationship
0.25 – 0.50 = Fair degree of relationship
0.50 - 0.75 = Moderate degree of relationship
0.75 – 1.0 = Strong relationship
1.0 = perfect correlation
If r is positive, high values of one variable are associated with high values of the other variable ‘
If r is negative, low values of one variable are associated with high values of the other variable
t-tests
Single sample t-test (one sample)
Independent t-test (two samples)\
To compare the MEANS of a continuous variable in TWO independent samples (i.e., two different groups of people)
Paired t-test (two categories
When comparing the MEANS of a continuous variable in two non-independent samples (i.e., measurements on the same people before and after a treatment)