1/53
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
data management
entire process of record keeping before, during, and after res. study
function of a codebook?
a guide written for a spef. study- describes each variable + how collected data will be entered into a computer file
also describes how anticipated data problems (missing responses) will be handled
spreadsheet
file that stores data in cells… row-by-column table
database
data man. sys. stores data in tables which each row represents one record represents one record - related records in diff. tables can be linked
double-entry
ensures accuracy - two individuals enter same data into separate computer files → compare → resolve discrepancies
data cleaning
correcting any typographical/other errors in data files
derived variable
new variable created during data analysis from existing variables in the data file
recoding
generating values for a new variable based on 1+ existing columns of data in a file
*always save backup version of data file before creating new variables in duplicate copy
how does stat software programs help w res. process?
1) most software programs can run all common stat functions
2) results from diff programs are identical/nearly
general data security process for computer files. paper files?
paper records- safely store paper records- (+ signed informed consent statements in locked n secure room)
all files w sensitive info should be password protected + limited access only to essential research personnel
descriptive stats
means. medians, proportions, standard deviations, used to characterize distributions of quant. data
general analytic plans: case series
univariate analysis- describes the study pop
general analytic plans: cross-sectional
unvariate + a little of bivariate
describe study pop + compare groups
general analytic plans: case control/cohort/experimental
univariate+bivariate+ a little of multivariate
describe study pop + compare + regression/other adv. analysis
univariate analysis
one variable in data set- uses simple stats (counts aka frequencies, proportion, averages)
bivariable analysis
rate ratios, odd ratios, other comparative stat tests used to examine associations between 2 variables
multivariable analysis
stat tests such as multiple regression models - used to examine relationships w 3+ variables
variable
characteristic assigned to 1+ value
ratio variable
numeric variable, able to be plotted on a scale
value of zero indicates total a sense of characteristic
ex: income, height, weight, temperature
interval variable
numeric variable + has order that is meaningful
zero does not indicate total absence of characteristic
ex: IQ score, temp in F or C, college years
continuous variable
numeric variable that can take on any value within a range
*can be added on any point on a plot
discrete variable
numeric variable can ONLY take point values, no value in between
ex: ppl in this class, eye color
ordinal variable
aka ranked variable, responses that have order but are not equal/systematically meaningful
ex: tier rankings, best to worst, first to last
nominal/categorical variable
has values but has no inherent rank/order aka a list of stuff
dichotomous variable
only two possible answers
binomial variable
dichotomous variable- coded as having values of only 0 + 1
categories are mutually exclusive
central tendency
types of average values- mean, median, mode
mean
add up all values, divide the sum by total number of individuals w/ value for variable
median
put all values from least → greatest, find middle number
mode
most frequently occurring value in a data set
how to describe central tendency for ratio + interval? ordinal? categorical?
ratio + interval - means, medians, mode
ordinal - median/mode
categorical - mode
histogram
graphical representation of distribution of ratio + interval data
x-axis - values of responses
y-axis - count of # of times each response appears in data set
bar has to be same width, no gaps between bars
boxplot
aka box-and-whisker plot, graphical depiction of numeric variable
displays median, interquartile range, outliers
bar chart
graph presents categorical data using equal width rectangles, can appear in any order
pie chart
circle, each wedge/slice shows percentage of participants w/ a spef value
normal distribution
aka gaussian distribution- bell shaped curve
kurtosis
how peaked/flat a distribution is
leptokurtic - looks like a peen, super big in the center
platykurtic - small hill in the middle
skewness
asymmetrical of normal dist.
variance
add squares of differences between each observation + sample mean → divide by total number of observations
standard deviation
square root of variance
z-score
how far from the mean each data values are, using standardized scale
score - mean/ stand. dev.
standard deviations under normal distributions: 68%
68% of responses fall within 1 SD above/below mean
standard deviations under normal distributions: 95%
95% of responses are within 2 SDs above/below mean
standard deviations under normal distributions: more than 99%
more than 99% of responses are within 3+ SDs above/below mean
small SD? large SD?
small SD- data points are tightly clustered around mean/ data has similar value
large SD- data points are more scattered + more varied
descriptive stats
describes basic characteristics of quant. data (means+proportions)
typical reporting: ratio+interval w/ normal dist
mean + SD
typical reporting: ordinal ratio+interval w/ non normal dist.
median / interquartile range
typical reporting: categorical
proportions of participants who provided particular responses
confidence intervals provide info ab
expected value of a measure in a source pop based on value of that measure in a study pop
sample size relates to confidence interval
larger sample size = narrower confidence interval (more room for errors)
95% confidence interval
5% of the time, CI is expected to miss capturing true value of a measure in source pop
three most serious forms of research misconduct
fabrication: creation of fake data,you made shhi uppp
falsification: misrepresentation of results
plagiarism: use of other ppls things
main benefits of consulting w a statistician early in st design process
ensures that:
sample methods/size are appropriate
questionnaire will yield usable data
analysis plan is reasonable