Biostatistics Cumulative Exam

0.0(0)
studied byStudied by 3 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/124

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

125 Terms

1
New cards

Deductive Reasoning

Premises claim to be undeniable proof of the conclusion.

2
New cards

Inductive Reasoning

Premises claim to make the conclusion more likely.

3
New cards

Statistics

The discipline that concerns the collection, organization, analysis, interpretation, and presentation of data.

4
New cards

Two Types of Data

Categorical (qualitative data) and quantitative data.

5
New cards

Quantitative variables

Discrete: whole values 1,2,3.. ex. # of people.

Continuous: can have decimal points ex. 1.1, 2.34, height, weight, temp this is easily measurable.

6
New cards

Categorical data

States of being-

Nominal data: order DOES NOT matter examples include gender, color, or names.

Ordinal data: order DOES matter, such as rankings or ratings (the likert scale)

Rank: most common to least

Binomial data (dichotomous) ex. a coin toss, heads or tails, yes or no.

Sagely advise: make the more important category “1” and includes variables that can be divided into groups based on attributes or qualities.

7
New cards

Which type of data can you not calculate averages for?

Categorical data like nominal, ordinal, and ranked data, where values represent categories rather than numerical quantities. The values do not represent magnitude and therefore do not allow for the calculation of meaningful averages.

8
New cards

Likert scale

A five-point scale often used in surveys ex. 5 being strongly agree and 1 being strongly disagree.

9
New cards

Independent variable

X-axis, this is the variable you are changing.

10
New cards

Dependent variable

Y-axis, this depends on the other variable, the one you are measuring. It is the outcome that is influenced by the independent variable.

11
New cards

Bar graphs

Numbers by categories: ordinal or nominal variable for the independent variable

12
New cards

Line graphs

Numbers by numbers, sequential relationship between two continuous variables, often used to show trends over time.

13
New cards

Scatter graphs

Pairs of numbers, exploratory data analysis, looking to see if there is a relationship.

14
New cards

Mean

The average value calculated by summing all observations and dividing by the number of observations.

15
New cards

Median

The middle value in a data set when arranged from smallest to largest order. If there is an even number of observations, it is the average of the two middle values. This is more robust in terms of deviation.

16
New cards

Mode

The most repeated value, generally not going to use this.

17
New cards

Symmetrical distribution

Mode= median = x bar : in a perfectly symmetrical distribution, indicating that the mean, median, and mode are all equal.

18
New cards

Asymmetrical distribution

X bar does not equal median or mode: A distribution where the values do not symmetrically cluster around the mean; in such a distribution, the mean, median, and mode are typically unequal. for example left or right skewed a leaning distribution.

19
New cards

Range

The difference between the biggest and smallest value (biggest-smallest). It indicates the spread of the data set and provides a measure of variability.

20
New cards

Interquartile range

The difference between the first and third quartiles in a data set, measuring the spread of the middle 50% of values. A robust measurement of variation.

21
New cards

Probability

The likelihood that an outcome will occur. The number of times.

22
New cards

Sample space

All possible outcomes that there are in an experiment or random process.

23
New cards

Relative frequency

The proportion of times an outcome occurs relative to the total number of trials. It provides an estimate of the probability of an event. (the same as probability)

24
New cards

Classical approach of probability

Probability calculation assumes that all possible events are equiprobable.

25
New cards

Equiprobable

Events that have an equal chance of occurring.

26
New cards

Permutation

The number of ways objects can be arranged.

27
New cards

Combinations

Number of arrangements where the order does not matter.

28
New cards

Complement

The complement of an event is the set of outcomes in a sample space that are not included in the event itself. It represents all other possible outcomes.

29
New cards

Intersection

P(A ∩ B) is the probability that both event A and event B occur. key word “and”. Looking at a Venn diagram it’s the middle portion or the intersection (so only the middle part would be shaded).

30
New cards

Union

P(A ∪ B) is the probability that either event A or event B occurs. In a Venn diagram, it includes all parts of both events (so entire part would be shaded).

31
New cards

Independent events

These are events where the occurrence of one event does not affect the probability/outcome of the other event occurring.

32
New cards

Product rule

In probability, the product rule states that for two independent events A and B, the probability of both events occurring is P(A) × P(B). This rule is used to calculate the joint probability of independent events.

33
New cards

Mutually exclusive events

These are events that cannot occur at the same time. The occurrence of one event means the other cannot happen.

34
New cards

Summation rule

In probability, the summation rule states that for two mutually exclusive events, the probability of either event occurring is the sum of their individual probabilities, represented as P(A) + P(B). This rule is essential for calculating probabilities when events cannot happen simultaneously.

35
New cards

Union of independent events

In probability, the union of independent events refers to the likelihood that at least one of the events occurs. For two independent events A and B, it is calculated using P(A) + P(B) - P(A) × P(B).

36
New cards

Union of not independent events

In probability, the union of not independent events refers to the likelihood that at least one of the events occurs when the events are dependent on each other. It requires adjustments in calculations to account for the interaction between events, expressed as P(A) + P(B) - P(A ∩ B).

37
New cards

Mixture of independent and mutually exclusive events

In probability, a mixture of these events refers to a situation where events are both independent from each other and cannot occur at the same time. The probability of such a mixture can be calculated by summing the probabilities of the independent events without any need for adjustments due to their mutual exclusivity.

38
New cards

Conditional probability

The probability of a second event after another event occurred first. It measures how the occurrence of one event affects the likelihood of the other event. P(B|A) = P(A ∩ B) / P(A) A is in the denominator because it’s already occurred we want to know all the times B will happen if A has already occurred.

39
New cards

Dependent events

One thing can’t happen without the other happening. These events are interconnected, meaning the occurrence of one event affects the probability of the other. For example, the probability of event B occurring may change based on whether event A has occurred.

40
New cards

Bayes theorem

Bayes' theorem is a mathematical formula used to determine conditional probabilities. It describes the probability of an event based on prior knowledge of related events, expressed as P(A|B) = P(B|A)P(A) / P(B).

41
New cards

Prevalence

The proportion of a population who have a specific characteristic or condition at a given time. P(E+) E+= event positive.

42
New cards

Sensitivity

The ability of a test to correctly identify those with the condition, calculated as the proportion of true positives among all actual positives. P(Test + | Event +)= P(T + ∩ E+)/ P(E+)

Sensitivity is crucial for assessing a test's performance and minimizing false negatives.

43
New cards

Specificity

The ability of a test to correctly identify those without the condition, calculated as the proportion of true negatives among all actual negatives. Specificity is essential in evaluating a diagnostic test's accuracy and reducing false positives. P(Test - | Event -) = P(T - E-) / P(E-)

44
New cards

False positive

A test result that indicates a person has a condition when they do not. False positives can lead to unnecessary anxiety and further testing. P(Test + | Event -) = P(T + ∩ E-) / P(E-)= 1- specificity.

45
New cards

False negative

A test result that indicates a person does not have a condition when they actually do. False negatives can result in missed diagnoses and delayed treatment. P(Test - | Event +) = P(T - ∩ E+) / P(E+) = 1 - sensitivity.

46
New cards

Predictive value

Probability that a result from a diagnostic test is correct. It estimates the likelihood that a patient has or does not have a condition based on the test result. Predictive values are influenced by the prevalence of the condition in the population.

47
New cards

Relative risk

The ratio of the probability of an event occurring in an exposed group versus a non-exposed group. It provides insight into the likelihood of a particular outcome based on exposure to a certain risk factor. rr = P(Disease | Exposed) / P(Disease | Not Exposed)

48
New cards

Variable

Any item being measured.

49
New cards

Random variable

Any variable whose value are controlled by an element of chance.

50
New cards

Discrete random variable

Random variable with discrete values. It can take specific values and is often counted, such as the number of occurrences of an event.

51
New cards

Continuous random variable

Random variable with continuous random values. It can take any value within a given range or interval, such as heights or weights.

52
New cards

Probability distribution

An accounting of the probability of all possible outcomes of a discrete random variable. It describes how probabilities are distributed over the values of a random variable.

53
New cards

Binomial distributions

Theoretical probability distribution of binomial outcomes.

54
New cards

What are the assumptions of binomial distributions?

The assumptions of binomial distributions include a fixed number of trials (observations), only two possible outcomes per trial (success or failure) idk this one isnt in our slides, constant probability of success, and independence of trials.

55
New cards

Binomial events

Only two possible outcomesfor each trial, often termed success or failure. ex. in slides: Sex- boy vs girl, disease state- healthy vs diseased.

56
New cards

Binomial probability

The probability of obtaining a certain number of successes in a fixed number of independent Bernoulli trials, where each trial has the same probability of success.

57
New cards

Poisson distribution

Distribution of rare events.

58
New cards

What are the assumptions of poisson distribution?

Binomial probability (constant P, fixed number of trials), P is small, and n is large.

59
New cards

Normal distribution

A probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean.

60
New cards

Poisson distribution can be a good approximation of the binomial distribution when:

p is small and n is large, the number of trials (n) is large, and the probability of success (p) is small.

61
New cards

Poisson distribution can be a poor approximation of the binomial distribution when:

p is large and n is small, the number of trials is low, or when the average rate of success is high.

62
New cards

Binomial distribution can approximate a normal distribution when:

P= 0.5 and n is relatively large the number of trials (n) is large, allowing for a unimodal, symmetric distribution around the mean.

63
New cards

How can a normal distribution be described by?

Any normal distribution is described by: its mean (μ) and standard deviation (σ). These define the normal curve, they dictate the shape and location of the distribution on the number line.

64
New cards

When mean changes (related to normal distribution):

The location changes, and shape does not change. The mean shifts the center of the distribution along the number line, while the standard deviation remains constant, preserving the overall shape.

65
New cards

When standard deviation changes (related to normal distribution):

The shape of the distribution changes, becoming wider or narrower. Increasing standard deviation spreads the data more widely around the mean, while decreasing it results in a steeper curve centered around the mean. The location does not change.

66
New cards

Empirical rule

also known as the 68.3-95.5-99.7 rule, states that in a normal distribution approximately 68% of the data falls within one standard deviation of the mean, about 95% falls within two standard deviations, and about 99.7% falls within three standard deviations.

67
New cards

Standard normal distribution

A normal distribution with a mean of zero and a standard deviation of one. It allows for easier comparison of different datasets, as it standardizes scores.

68
New cards

Standard normal curve

A graphical representation of the standard normal distribution, which is bell-shaped and reflects the distribution of standardized scores. mean equals

69
New cards

Central limit theorem (CLT)

Describes the distribution of the sampling distributions.

70
New cards

Sampling distribution

The frequency distribution of the sample means (X bar).

71
New cards

CLT Assumptions

Random sampling, sample size is constant (n), and population does not change.

72
New cards

CLT Conclusions

  1. Sampling distribution is normally distributed even if the sampled population is not.

  2. The mean of the sampling distribution equals the population mean.

  3. Standard error, which is the standard deviation of the sampling distribution it can be approximated by the population standard deviation divided by the square root of the sample size.

73
New cards

Confidence Interval

A range of values used to estimate the true population parameter, with a specified level of confidence.

74
New cards

Range estimator

Range of values that describe the location of a population parameter.

75
New cards

Student’s t-distribution

A probability distribution used to estimate population parameters when the sample size is small and/or the population standard deviation is unknown.

76
New cards

Margin of error

The amount of uncertainty in a sample estimate, typically expressed as a percentage or a value range, that indicates how much the sample results are expected to vary from the true population value. (confidence interval too?)

77
New cards

One-tail test

A hypothesis test that evaluates the effect of a treatment in one direction only, determining if a parameter is greater than or less than a specified value.

78
New cards

Two-tailed test

A hypothesis test that assesses the possibility of a parameter being either greater than or less than a specified value, evaluating effects in both directions.

79
New cards

Typical values of a (alpha)

usually 0.05 or 0.01 in hypothesis testing

a=0.05 -95% one tail, a/2=0.025 95% CI two tailed, a/2=0.005 99%.

80
New cards

Biased sampling

One of the ways to “lie” with statistics.

81
New cards

Publication Bias

Publication of research results is influenced by the nature and direction of the study findings. Studies with positive or significant results are more likely to be published, while those with negative or non-significant results are often not published, leading to an incomplete and biased body of scientific literature.

82
New cards

Reporting Bias

Researchers may selectively report results that support their hypotheses, while omitting or downplaying contradictory data, leading to an incomplete or distorted representation of the study findings.

83
New cards

Confounding Bias

AKA unreported variables. There are other variables, which in fact, influence your findings - but you report as if your variable of interest is the only important one.

84
New cards

Confirmation Bias

Researchers may have preconceived notions or expectations about the outcomes of their studies, leading them to interpret data in a way that confirms their hypotheses. This can lead to the inadvertent dismissal of conflicting data or the overemphasis of supporting evidence.

85
New cards

P-Hacking

Manipulating or analyzing data in various ways until a statistically significant result is obtained, without proper correction for multiple comparisons. Researchers may selectively analyze data or perform multiple statistical tests until a significant result is achieved, which can inflate the likelihood of false-positive findings.

86
New cards

Harking

Hypothesizing After the Results are Known. It refers to the practice of presenting a post hoc (done after the event) hypothesis as if it were initially specified before the data were collected.

87
New cards

P-Hacking and Harking

Both increase rate of false positives.

88
New cards

Bias

When the sample is systematically different than the population.

89
New cards

Sampling scheme

Strategies used to eliminate bias.

90
New cards

Sampling fraction

The proportion of the population samples (n/N) n= sample size, N = population size.

91
New cards

Simple random sampling

If the population is randomly distributed, one can simply sample from the population.

If the population isn’t randomly distributed, choose individuals randomly.

92
New cards

Sampling without replacement

You keep sample where previously selected individuals cannot be chosen again.

93
New cards

Sampling with replacement

-You put it back into the population after selection, allowing individuals to be chosen multiple times.

-This is the preferred method when sampling could affect the outcome of the experiment.

-Issues with sampling the same individual twice.

-Double sampling isn’t likely when N is large.

94
New cards

How do you choose a random sample?

Mechanical mixer ex. bingo, lottery games, and more. Or a random number generator.

95
New cards

Systematic sampling

This is a method where individuals are selected at regular intervals from a sorted list or population, often starting from a random point. This is easier than simple random sampling.

96
New cards

Stratified random sampling

A method where the population is divided into distinct subgroups, or strata, and random samples are taken from each stratum to ensure representation of different segments.

97
New cards

Strata

The distinct subgroups within a population used in stratified sampling to ensure representation.

98
New cards

Respondent Bias

A type of bias that occurs when respondents' answers are influenced by their personal opinions, experiences, or the way questions are framed, leading to inaccurate data collection.

99
New cards

Investigator’s Bias

A type of bias that occurs when a researcher's expectations or preferences influence the study's results, affecting the interpretation of data and outcomes.

100
New cards

Placebo effect

The phenomenon where a patient experiences a perceived improvement in condition after receiving a treatment with no therapeutic effect, due to their expectations or beliefs.