1/194
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
what is empirical research?
when we go and collect data
any activity in which data (quantitative or qualitative) are concluded from some area of experience and then conclusions are drawn from the data about that area of experience.
statistics
a branch of mathematics
involves both the collection, analysis, interpretation, and presentation in relation to numerical data (quantitative)
any quantitative research question requires what?
stats
identify the steps to the scientific method:
developing a research hypothesis
collecting data
analyzing data
conclusions about research hypothesis
communicating findings
what is a variable?
a property or characteristic that can take on different values
are constructs variables?
constructs are special types of variables that you cannot directly measure because they are theoretical in nature
how do we determine what the research hypothesis is?
identifying a question or issue to be examined
reviewing and evaluating relevant theories and research
define the independant variable (IV):
the varible manipulated by the reseacher
define the dependant variable (DV):
the variable measured by the researcher
what do research hypothesis specify (as precisely as possible)?
the nature and direction of the relationship between variables
what are different types of research hypotheses?
directional vs. non-directional
research vs. statistical
define a directional research hypothesis:
states that there is a relationship between the IV and the DV, and which group will score higher or lower on the DV
ex. youth soccer players who wear head protection will have fewer concussions than those who do not wear head protection
define a non-directional research hypothesis:
states that there is a relationship between the IV and the DV but no direction, dont say how the IV would score on the DV
ex. there will be a difference in the number of concussions between youth soccer players who wear head protection and those who do not
research hypothesis vs. statistical hypothesis?
research hypothsis
A is related to B
A causes B
statistical hypothesis: mathematical expression
null hypothesis
alternative hypothesis
what is the null hypothesis
this is the statistical hypothesis to be rejected
it says nothing will happen; that there is no relationship
define population
the total number of possible units or elements that could be included in a study
this is the theoretical population
define sample
a subset of the population used to represent the population
what is the main assumption when drawing a sample from a population?
sample characteristics = population characteristics
what is the assumption with random sampling? what are limitations to this?
that there is an equal probability of any unit or element in the population being selected into the sample
but, you have to know who everybody in the population is in order to do this
the people who decide to particpate in the study are often different from those who dont
define measurement:
concerned with the methods to provide descriptions of the degree (value to which an individual (or place, thing, etc.) possesses a defined characteristic (property)
what are the different levels of measurement?
discrete:
categorical nominal
ordinal
continuous
interval
ratial
what is catagorical nominal?
values differ in catagory or type
a numerical value is used to denote a category but the actual number itself isn’t meaningful
ex. basketball = 1, football = 2
what is ordinal
values that can be placed in order to other values
nhl draft prospect ranking
what is interval level of measurement
values are equally spaced on a numeric contiuum with no absolute 0
likert type scale
tempurature
what is ratio level of measurement
values are equally spaced on a numeric continuum, true 0 point
distance
how many response options do you need to move from discrete ordinal to continuous interval
if you have a likert type scale:
4 or less = ordinal
5 or more = interval
what is a likert type scale?
where there is a statement, you are asked to agree - disagree
then, we assign values to those responses
why does the level of measurement matter?
what we can do statistically with the data
the mathematical operations that can be performed
how we interpret data
weather differences between individuals or groups are meaningful
descriptive statistics?
organise, summarise, and describe the data that has been collected
inferential statistics:
test hypotheses and draw conclusions about the data collected from the sample
inferences from samples to populations
ex. mean arm hang of this sample could be used to represent the mean arm hang of all female canadians aged 20-29
why does use of language matter when drawing conclusions about research hypotheses
we are wanting to know whether the results support the research hypothesis
there is an important distinction between support and prove
what is communicating findings
how researchers communicate and interpret the results — important within the field
reasons as to why examine data?
to gain a initial sense of the data
detecting data entry errors or data coding errors
to identify outliers
rare, extreme scores that are outside the range of most other scores in the data set
to evaluate research methodology
very similar scores may indicate problems with measure used
to determine whether data meet statistical criteria and assumptions
what is a frequency distribution table?
summarizes the number and percentage of participats for the different values of the variable
how do we create a frequency distribution table
identify all possible values for the variable
determine the frequency of participants who report each value
calculate the percentage for each value
in a frequency distribution table what is the difference between percent and valid percent
Sometimes there is missing values in a data set (ex. someone didnt state the province they were from) and so '“percent” accounts for the missing people in the total number of scores:
ex. out of 150, if 3 didnt state they are still included in the total and would account for 2%
In “valid percent” the missing people are excluded from the total number of scores and make up for nothing as the new total is 147
what is cumulative percent and when would we use/not use it? why
it doesnt make a whole lot of sense for categorical/nominal variable
because its not ranked data
because its the added percent for each sequential variable, for ranked data, its good to know how many people score under this value:
ex. if 15% of people in the data are 20 and they all smoke, then it could show us that 30% of people under 30 smoke. this is because they added the percentages of 20 yo (15%) and 30 yo (15%) to make 30% so we can say that that is the amount of people under that variable who engage in smoking because its ranked
makes sense for ordinal, interval, or ratio
what are the ways that frequency tables can identify “problem data”
incorect entry ex. BMI = 333
restricted range
highly skewed data
missing data
what do you do with problem data?
in real world we have algorithms that are extremely good at predicting what the outcome would have been but for the purposes of this class, if you do not know what happened delete the value
what is a grouped frequency distribution table:
a table that groups interval or ratio values of a variable into smaller numbers of intervals
frequencies and percentages are calculated within the intervals
not based on how you collected the data but grouped afterwards
what is a real lower limit?
the smallest value of a variable that would be grouped in a particular interval
what is a real lower limit?
the largest value of a variable that would be grouped into a particular interval
what type of charts might we use for discrete; ordinal or nominal data?
bar chart
pie chat
bar chart
use bars to represent the frequency or percentage of values
bars do not touch, not a continuum
pie chart
represent the percentage of the sample corresponding to the value
what charts do we use for continuous levels of measurement such as interval and ratio data
histogram
frequency polygons
histograms
use bars to represent the frequency of values
bars touch - indicate an interval variable
bars touch indicating an underlying numerical continuum
you can take a grouped frequency distribution table and plot it into a histogram
frequency polygons
are line graphs that use data points to represent frequencies
still on a continuum
what are the implications with poorly designed figures when drawing conclusions from figures
poorly designed figures might lead to inappropriate or misleading conclusions
ex. same data but differently scaled y axes (could make the results look way more exaggerated than they are
what is important to note when describing distributions?
modality
symmetry
variability
modality
different types of modality are based on how many times a value has the highest frequency.
NEED to have a gap between those actual values
ex.
unimodal
bimodal
multimodal
symmetry
symmetric distributions have frequencies that change in similar manner moving away from the mode
asymmetric distributions have outliers that skew the shape of the distribution
negative skey
positive skew
doesnt always mean that there is outliers
skewness statistic
how do we quatify skewness? explain:
skewness statistic
posiitive statistic = positive skew
negative statistic = negative skew
0 = perfectly normal distribution
the further the skewness statistic is from 0 the more skewed the distribution
variability
the amount of differences in the distribution of a variable
are there scores different from or similar to one another?
the flatter the distribution the MORE variability
kurtosis statistic
leptokurtic distribution
the tallest, most peaked
least variable
mesokurtic
in the middle
the variability of a normal distribution
a perfectly mesokurtic distribution has a kurtosis statistic of 0
neither peaked nor flat
platykurtic
the flattest
most variable
kurtosis statistic
positive statistic = indicates a leptokurtic distribution
negative statistic = indicates a platykurtic
0 = perfectly (mesokurtic) normal distribution
the further the kurtosis statistic is from 0 the more likely the distribution is to be not normal
what are the measures of central tendancy
mean
median
mode
mean
the average of all the values in the data set
sum all of the values and devide by the number of values you have
median
the exact number that sits in the middle of the data set when arranged in ascending to descending order
if there is an even number of values, calculate the mean of the two middle numbers
mode
the value that appears most frequent in the dataset
what does it mean to say the mean as a balancing point?
if we subtract the mean from each score, and add up those values, it would equal 0.
the sum of the negative differences will always be equal to the sum of the positive differences
in a perfectly normal distribution, the mean and the median will be…?
the same value
in a positively skewed distribution the mean will be… compared to median
higher value than the median
in a negatively skewed distribution the mean will be… compared to median ?
a smaller value than the median
why does the mean sit more towards the tail of the distribution?
because of potential outliers, further away values that are having a disproportionate impact on the mean.
in an asymmetrical distribution that has a pretty significant skew to it, what is the best measure of central tendancy?
median
what do measures of central tendancy try and tell you?
where most of the data is.
what is variability?
quatifies the amount of difference among the scores
concerned with the spread of the scores
indicates the amount of difference among the scores
why does variability matter
you can have three different distributions with the same modality ad symmetry but they can be completely different distributions
leptokurtic
meso kurtic
platykurtic
in kin we want to know why people are different. we want to:
describe the variability
how much do 14 year olds vary in bmi
understand variability
why do 14 year olds vary in bmi
explain variability
do genetic markers explain variability in 14 -15 year olds bmi scores
predict variability
does parents bmi scores predict their 14 year old daughters bmi score?
what are the different ways that variability is measured?
range
variance
standard deviation
range
highest score - lowest score
examines the two endpoints of the distribution
you report, you state the range and then the lowest and highest values
strengths of the range
easy to compute
provides some information about the sample
weaknesses of the range
only focuses on two scores out of the whole distribution
may not accurately reflect the variability of the whole distribution
cannot be used to test hypotheses about distributions
affected by outliers (extreme scores)
the interquartile range
the range of the middle 50% of the scores
removes the highest and lowest 25% of the distribution — minimizing the effect of outliers
strengths of the interquartile range
reduces the influence of outliers by focusing on the middle 50%
can be reported with the median (both compensating for outliers)
weaknesses of the interquartile range
ignores the top 25% and bottom 25%
may not accurately reflect the variability of the whole distribution
can not be used to test hypotheses about distributions
population parameter
any value that refers to a population value
sample statistic
any value thats based on samples
the variance
Average squared deviation of a score from the mean
includes all the scores in the distribution
measures the variability by examining the extent to which each score differs from the mean
sum of squared scores
square the scores
sum the scores
sum of scores squares
sum the scores
square the sum
why n-1?
because it corrects for the bias when using a sample to estimate a population variability
populations generally have more variability than there is in samples
n-1 corrects for this
if you divide by smaller it makes the variance larger
what is bias
systematic underrepresentation of the true score
standard deviation
the average deviation of a score from the mean
whats the difference between the formula for the population standard deviation and the formula for the sample standard deviation?
theres no n-1
you dont have to correct because youre just genuinely dealing with population data
important to distinguish when youre working with population vs. sample data
what is standard deviation impacted by?
unit of measurement
the larger the standard deviation is, what does that mean in terms of variability?
the larger the standard deviation the more variability in the data
when can you compare standard deviations?
only when you have the same unit of measurement
what are z scores
indicate distance from the mean in standard deviation units
measured in standard deviation units
normal distribution
unimodal
perfectly symmetrical
mesokurtic ; neither peaked nor flat
to be a normal distribution, it has to be based on a population of an infinite number of scores
generated from mathematical formulas — not collected data
frequency distributions are based…
data that we have collected, which typically means its based on samples
characteristics of normal distributions?
the let and right tails continue to infinity without touching the x axis
bell shaped curve
unimodal
symmetric
mesokurtic
the mean is the population mean (mew)
the standard deviation is the population standard deviation
why do we care about normal distributions?
researchers beleive that many variables are normally distributed — we expect variables such as height, broad jump, vert, age, to be normally distributed throughout the population
many inferential tests are based mathematically on the assumption of normal distributions
AND we can determine the proportion of the distribution associated with any given score; the proportion of peaople that are going to likely fall between various scores
what are the three reasons why you mightg not be able to directly compare scores on a normal distribution
even if they have the same population standard deviation, that doesnt mean they could have very different population means
they could have the same population means but different population standard deviations
they could also just be completly different units of measurement
research hypothesis
a statement regarding an expected or predicted relationship between variables
the standard normal distribution
you can take all those different normal distributions and put them all into the same metric on a distribution; standardising those distributions
to avoid confusion that may have come from different units of measurement, different population means or different population standard deviations
theres only one
describinng z scores
positive = above the mean
negative = below the mean
how much of the data will fall within ± 1 s from the mean?
68.26% of all observations