HSCI 190: Module 1 - Intro to Statistics

0.0(0)
studied byStudied by 1 person
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/26

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

27 Terms

1
New cards

Stats is a branch of math that studies variation (a change or difference in something as a result of random, genetic, or environmental factors)

Stats is a way of collecting, analyzing, and identifying patterns in variation to inform decisions, form conclusions, and communicate findings

stats is the study of

2
New cards

Descriptive stats → Involves summarizing and describing data to reveal patterns. Does not allow for conclusions beyond the initial data collected.

Includes the visualization of data, measures of central tendency, and variability

Inferential stats → Method of data analysis that allows for conclusions to be made about how the world works. Used to address specific questions.

Standard error of the mean (SEM), confidence intervals, hypothesis testing, t-tests, ANOVAs, correlations, regressions, and non-parametric test

what are the two branches of statistics, describe the difference between them

3
New cards

biomedical research, evidence-based medicine, forensics, machine learning, and government census [population health]

list 5 instances in which statistics are used in the health sciences

4
New cards

Biomedical research - research field exploring disease prevention and treatment

Uses stats to identify probability of certain outcome (ex phenotype, hormone, drug dosage working, disease development, drug efficacy, etc.) and what factors may influence that outcome

Provides objective evidence of drug efficacy, allows researchers to report on the success of a study. Almost every study uses statistical analysis to provide an objective score on how good that data is

Not all studies use stats properly, u need to have ur own proficiency to understand results and determine whether u trust the results or not

describe how stats is used in biomed research

5
New cards

evidence based medicine is a multifaceted practicing method used by HCPs to make clinical decisions on patient care.

Depends greatly on accurate interpretation of stat findings

Misinterpretation can have grave consequences → ex. 1998 study by Wakefield et al. proposed that the Measles, Mumps, and Rubella (MMR) vaccine was linked to autism, had questionable methodology, but led to negative attention and anti-vaccination attitudes even after being retracted in 2010

describe how stats is used in evidence based medicine

6
New cards

Used by various clinical care settings to track patient information and collect and share health data with machine learning → statistical technique that draws patterns from raw data to make predictions

describe how stats is used in clinical medicine [machine learning]

7
New cards

Stats used to match bio samples to victims/perpetrators and help determine the likelihood of criminal activity vs coincidence

Used to identify patterns that can suggest something malicious underlying by comparing results to a comparator

describe how stats are used in the criminal justice system

8
New cards

Comparators are like "controls" and help you contextualize data by comparing a case to a normal situation

what is a comparator

9
New cards

With stats, one question often leads to more questions

Stats are a component of iterative cycle of investigation: PPDAC - FRAMEWORK FOR UNDERSTANDING/DISCUSSING STATS

Problem → What is the problem/question (ex. Is something malicious happening?)

Plan → What info do I need to ans the Q (ex. Comparing time of death w comparator)

Data → Collect high qual info

Analysis → Sort, graph info, and run stat test (ex comparison)

Conclusions → Interpret, communicate, & generate new ideas

what is the PPDAC cycle used for? describe each stage

10
New cards

Classifications describing the TYPE of info the data reps

what does the term "level of measurment" refer to?

what are the 4 levels of measurement

11
New cards

categorical data describes data in which numbers are used to represent categories of qualitiative information

- it is also known as 'discrete' data bc the values are usually whole numbers, not fractional

nominal data - random numbers assigned to group variables into qualitiative categories. actual number assigned has no value, only the corresponding label holds meaningful value

ordinal data - ranked data (e.g. Likert scales), numbers group data into meaningful order which is described by the number itself (therefore, the number holds value). calcs can't be performed on this data

describe what is meant by "Categorical data"

give the 2 types of categorical data and describe the difference between them

12
New cards

scale data - quantitative MEASUREMENTS (or counts) where the difference between numerical values has significance

interval scale data - numerical measurement on a scale where each point is equidistant but there is NO true zero (i.e. capable of going into the negatives or going on forever in either direction)

ratio scale data - numerical measurement that is NOT restricted to certain values and there IS a true zero (i.e. proportions)

describe what is meant by "scale data"

give the 2 types of scale data and describe the difference between them

13
New cards

TIME - Quantitative where each point is equidistant [Ex. 9pm-10pm is 1 hr, 2pm-3pm is 1 hour]. No true zero, since 00:00 does not mean the absence of time. ∴ scale data

why is time considered scale data instead of interval

14
New cards

Absolute frequency distribution tables → Use raw data to show HOW MANY counts/obvs are in each category.

Relative frequency distribution tables → Show the proportion of values in each category as a percent

→ Divide the number of values in an interval by the TOTAL number of values in the table. X 100 to see as a percent

what 2 tables are used to summarize data, describe how they are different

15
New cards

For categorical data, a frequency distribution shows a set of categories in one column then numerical counts in the other column.

For scale data, one column will gorup the scores into non-overlapping intervals and the other column will have the number of observations that fell into that score

how are absolute frequency distributions different when used on scale vs. categorical data

16
New cards

mean - average

median - Middle value in data set when values are arranged in order from LOWEST TO HIGHEST, Divides the data set in half

mode - Most commonly occurrig value, Occurs at highest frequency

define the 3 levels of central tendency

17
New cards

Mean

- SIGNIFICANTLY affected by outliers so may lead to misleading but statistically correct outcomes (eg. ave income/GPA, bill gates)

- Very useful in larger data sets bc uses calc

- Can be used in further calculations, stdev

Median and mode

- Both median and mode are less sensitive to outliers but are more difficult to identify in larger data sets bc they don't use a formula.

- When there are no outliers, all 3 stats will be similar if not identical.

describe the pros and cons of using the 3 measures of central tend: mean, median, and mode

18
New cards

Mode

Since numbers are non-meaningful, doing calc doesn't make sense

Median has no meaning

Mode is best bc u can determine which category is most frequent

which MCT is best for nominal data and why

19
New cards

Any

Since numbers have meaning- you can take average or arrange to find median or mode. Mode may not make sense depending on what categories data rep but may be better than mean in some cases (ex. Average rating)

which MCT is best for ORDINAL data and why

20
New cards

Any

Numbers rep true values so u can perform calcs on them

They don't rep categories

which MCT is best for SCALE data and why

21
New cards

*Variability = differences amongst data within a set. Aka as "spread" of the data - how far the numbers are from the mean/median in a data set

VARIANCE - A measure that quantifies amount of spread/dispersion around the mean.

define variability/variance

22
New cards

range and IQR

RANGE: Range = maximum value - minimum value

Measures spread of data by describing difference btw min/max values in a set.

May also be written as (min value) to (max value). Both are accurate. Include unit

IQR - Identifies values within 50% of the mean or median.

Measures data spread by dividing set into QUARTILES to identify the range of values within 50% of the median of the data set. Calculated in 5 steps.

2 methods for calculating variance

23
New cards

25th percentile < Q1

Q2 = 50th percentile

75th percentile > Q3

what percentiles do the different quartiles correspond to

24
New cards

Measure that quantifies amount of spread/dispersion around the mean.

Involves identifying the DIFFERENCE btw each entry and the mean (average) and then taking the AVE of those diffs

When working with a symmetrical data set (data set with SAME NUMBER of data points on above and below the mean), distances will be +/- and may cancel out in the ∑ part of the formula or result in a negative value for spread.

It is impossible to have a negative spread around the mean → Must be addressed

Variance squares the eqn to make everything pos, → why variance is repp'ed by s2

but the issue is that it squares the units and makes the value greater than most observations - mean

- to address this we take the square root to cancel out the square and keep the units the same

- easier to interpret --> where STDEV came from

what is VARIANCE (s2) and why is it not really used to calc variance

what is used instead and why

25
New cards

Measures of central tendency provide a quick summary of data but SD and other measures of variability add CONTEXT which can help you interpret variation in samples

Need to understand variation to interpret sample results to make proper diagnosis.

together, both can be used to summarize info about a data set, but you can't have one w/o another

why is variation always used in conjuction with MCTs

26
New cards

*Data framing - INTENTIONALLY selecting a statistical number/descriptive to support one's argument.

1. What is the number measuring

2. Is it an absolute or relative number?

3. Does this number answer the research question --> PPDAC

what is data framing? list the 3 strategies can you implement when reading or communicating numbers to reduce it

27
New cards

Absolute number - raw # collected during data acquisition process [more accurate, under-used in media & research]

Relative number -An absolute number shown as a proportion or percentage

- more often used, bc easier to undstand provide scale and context] but they can exaggerate findings [in comparison to a low starting point] and minimize changes if dealing with large numbers

describe the difference btw absolute and relative numbers and how they can contribute to data framing