1/26
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
Stats is a branch of math that studies variation (a change or difference in something as a result of random, genetic, or environmental factors)
Stats is a way of collecting, analyzing, and identifying patterns in variation to inform decisions, form conclusions, and communicate findings
stats is the study of
Descriptive stats → Involves summarizing and describing data to reveal patterns. Does not allow for conclusions beyond the initial data collected.
Includes the visualization of data, measures of central tendency, and variability
Inferential stats → Method of data analysis that allows for conclusions to be made about how the world works. Used to address specific questions.
Standard error of the mean (SEM), confidence intervals, hypothesis testing, t-tests, ANOVAs, correlations, regressions, and non-parametric test
what are the two branches of statistics, describe the difference between them
biomedical research, evidence-based medicine, forensics, machine learning, and government census [population health]
list 5 instances in which statistics are used in the health sciences
Biomedical research - research field exploring disease prevention and treatment
Uses stats to identify probability of certain outcome (ex phenotype, hormone, drug dosage working, disease development, drug efficacy, etc.) and what factors may influence that outcome
Provides objective evidence of drug efficacy, allows researchers to report on the success of a study. Almost every study uses statistical analysis to provide an objective score on how good that data is
Not all studies use stats properly, u need to have ur own proficiency to understand results and determine whether u trust the results or not
describe how stats is used in biomed research
evidence based medicine is a multifaceted practicing method used by HCPs to make clinical decisions on patient care.
Depends greatly on accurate interpretation of stat findings
Misinterpretation can have grave consequences → ex. 1998 study by Wakefield et al. proposed that the Measles, Mumps, and Rubella (MMR) vaccine was linked to autism, had questionable methodology, but led to negative attention and anti-vaccination attitudes even after being retracted in 2010
describe how stats is used in evidence based medicine
Used by various clinical care settings to track patient information and collect and share health data with machine learning → statistical technique that draws patterns from raw data to make predictions
describe how stats is used in clinical medicine [machine learning]
Stats used to match bio samples to victims/perpetrators and help determine the likelihood of criminal activity vs coincidence
Used to identify patterns that can suggest something malicious underlying by comparing results to a comparator
describe how stats are used in the criminal justice system
Comparators are like "controls" and help you contextualize data by comparing a case to a normal situation
what is a comparator
With stats, one question often leads to more questions
Stats are a component of iterative cycle of investigation: PPDAC - FRAMEWORK FOR UNDERSTANDING/DISCUSSING STATS
Problem → What is the problem/question (ex. Is something malicious happening?)
Plan → What info do I need to ans the Q (ex. Comparing time of death w comparator)
Data → Collect high qual info
Analysis → Sort, graph info, and run stat test (ex comparison)
Conclusions → Interpret, communicate, & generate new ideas
what is the PPDAC cycle used for? describe each stage
Classifications describing the TYPE of info the data reps
what does the term "level of measurment" refer to?
what are the 4 levels of measurement
categorical data describes data in which numbers are used to represent categories of qualitiative information
- it is also known as 'discrete' data bc the values are usually whole numbers, not fractional
nominal data - random numbers assigned to group variables into qualitiative categories. actual number assigned has no value, only the corresponding label holds meaningful value
ordinal data - ranked data (e.g. Likert scales), numbers group data into meaningful order which is described by the number itself (therefore, the number holds value). calcs can't be performed on this data
describe what is meant by "Categorical data"
give the 2 types of categorical data and describe the difference between them
scale data - quantitative MEASUREMENTS (or counts) where the difference between numerical values has significance
interval scale data - numerical measurement on a scale where each point is equidistant but there is NO true zero (i.e. capable of going into the negatives or going on forever in either direction)
ratio scale data - numerical measurement that is NOT restricted to certain values and there IS a true zero (i.e. proportions)
describe what is meant by "scale data"
give the 2 types of scale data and describe the difference between them
TIME - Quantitative where each point is equidistant [Ex. 9pm-10pm is 1 hr, 2pm-3pm is 1 hour]. No true zero, since 00:00 does not mean the absence of time. ∴ scale data
why is time considered scale data instead of interval
Absolute frequency distribution tables → Use raw data to show HOW MANY counts/obvs are in each category.
Relative frequency distribution tables → Show the proportion of values in each category as a percent
→ Divide the number of values in an interval by the TOTAL number of values in the table. X 100 to see as a percent
what 2 tables are used to summarize data, describe how they are different
For categorical data, a frequency distribution shows a set of categories in one column then numerical counts in the other column.
For scale data, one column will gorup the scores into non-overlapping intervals and the other column will have the number of observations that fell into that score
how are absolute frequency distributions different when used on scale vs. categorical data
mean - average
median - Middle value in data set when values are arranged in order from LOWEST TO HIGHEST, Divides the data set in half
mode - Most commonly occurrig value, Occurs at highest frequency
define the 3 levels of central tendency
Mean
- SIGNIFICANTLY affected by outliers so may lead to misleading but statistically correct outcomes (eg. ave income/GPA, bill gates)
- Very useful in larger data sets bc uses calc
- Can be used in further calculations, stdev
Median and mode
- Both median and mode are less sensitive to outliers but are more difficult to identify in larger data sets bc they don't use a formula.
- When there are no outliers, all 3 stats will be similar if not identical.
describe the pros and cons of using the 3 measures of central tend: mean, median, and mode
Mode
Since numbers are non-meaningful, doing calc doesn't make sense
Median has no meaning
Mode is best bc u can determine which category is most frequent
which MCT is best for nominal data and why
Any
Since numbers have meaning- you can take average or arrange to find median or mode. Mode may not make sense depending on what categories data rep but may be better than mean in some cases (ex. Average rating)
which MCT is best for ORDINAL data and why
Any
Numbers rep true values so u can perform calcs on them
They don't rep categories
which MCT is best for SCALE data and why
*Variability = differences amongst data within a set. Aka as "spread" of the data - how far the numbers are from the mean/median in a data set
VARIANCE - A measure that quantifies amount of spread/dispersion around the mean.
define variability/variance
range and IQR
RANGE: Range = maximum value - minimum value
Measures spread of data by describing difference btw min/max values in a set.
May also be written as (min value) to (max value). Both are accurate. Include unit
IQR - Identifies values within 50% of the mean or median.
Measures data spread by dividing set into QUARTILES to identify the range of values within 50% of the median of the data set. Calculated in 5 steps.
2 methods for calculating variance
25th percentile < Q1
Q2 = 50th percentile
75th percentile > Q3
what percentiles do the different quartiles correspond to
Measure that quantifies amount of spread/dispersion around the mean.
Involves identifying the DIFFERENCE btw each entry and the mean (average) and then taking the AVE of those diffs
When working with a symmetrical data set (data set with SAME NUMBER of data points on above and below the mean), distances will be +/- and may cancel out in the ∑ part of the formula or result in a negative value for spread.
It is impossible to have a negative spread around the mean → Must be addressed
Variance squares the eqn to make everything pos, → why variance is repp'ed by s2
but the issue is that it squares the units and makes the value greater than most observations - mean
- to address this we take the square root to cancel out the square and keep the units the same
- easier to interpret --> where STDEV came from
what is VARIANCE (s2) and why is it not really used to calc variance
what is used instead and why
Measures of central tendency provide a quick summary of data but SD and other measures of variability add CONTEXT which can help you interpret variation in samples
Need to understand variation to interpret sample results to make proper diagnosis.
together, both can be used to summarize info about a data set, but you can't have one w/o another
why is variation always used in conjuction with MCTs
*Data framing - INTENTIONALLY selecting a statistical number/descriptive to support one's argument.
1. What is the number measuring
2. Is it an absolute or relative number?
3. Does this number answer the research question --> PPDAC
what is data framing? list the 3 strategies can you implement when reading or communicating numbers to reduce it
Absolute number - raw # collected during data acquisition process [more accurate, under-used in media & research]
Relative number -An absolute number shown as a proportion or percentage
- more often used, bc easier to undstand provide scale and context] but they can exaggerate findings [in comparison to a low starting point] and minimize changes if dealing with large numbers
describe the difference btw absolute and relative numbers and how they can contribute to data framing