1/124
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Deductive Reasoning
Premises claim to be undeniable proof of the conclusion.
Inductive Reasoning
Premises claim to make the conclusion more likely.
Statistics
The discipline that concerns the collection, organization, analysis, interpretation, and presentation of data.
Two Types of Data
Categorical (qualitative data) and quantitative data.
Quantitative variables
Discrete: whole values 1,2,3.. ex. # of people.
Continuous: can have decimal points ex. 1.1, 2.34, height, weight, temp this is easily measurable.
Categorical data
States of being-
Nominal data: order DOES NOT matter examples include gender, color, or names.
Ordinal data: order DOES matter, such as rankings or ratings (the likert scale)
Rank: most common to least
Binomial data (dichotomous) ex. a coin toss, heads or tails, yes or no.
Sagely advise: make the more important category “1” and includes variables that can be divided into groups based on attributes or qualities.
Which type of data can you not calculate averages for?
Categorical data like nominal, ordinal, and ranked data, where values represent categories rather than numerical quantities. The values do not represent magnitude and therefore do not allow for the calculation of meaningful averages.
Likert scale
A five-point scale often used in surveys ex. 5 being strongly agree and 1 being strongly disagree.
Independent variable
X-axis, this is the variable you are changing.
Dependent variable
Y-axis, this depends on the other variable, the one you are measuring. It is the outcome that is influenced by the independent variable.
Bar graphs
Numbers by categories: ordinal or nominal variable for the independent variable
Line graphs
Numbers by numbers, sequential relationship between two continuous variables, often used to show trends over time.
Scatter graphs
Pairs of numbers, exploratory data analysis, looking to see if there is a relationship.
Mean
The average value calculated by summing all observations and dividing by the number of observations.
Median
The middle value in a data set when arranged from smallest to largest order. If there is an even number of observations, it is the average of the two middle values. This is more robust in terms of deviation.
Mode
The most repeated value, generally not going to use this.
Symmetrical distribution
Mode= median = x bar : in a perfectly symmetrical distribution, indicating that the mean, median, and mode are all equal.
Asymmetrical distribution
X bar does not equal median or mode: A distribution where the values do not symmetrically cluster around the mean; in such a distribution, the mean, median, and mode are typically unequal. for example left or right skewed a leaning distribution.
Range
The difference between the biggest and smallest value (biggest-smallest). It indicates the spread of the data set and provides a measure of variability.
Interquartile range
The difference between the first and third quartiles in a data set, measuring the spread of the middle 50% of values. A robust measurement of variation.
Probability
The likelihood that an outcome will occur. The number of times.
Sample space
All possible outcomes that there are in an experiment or random process.
Relative frequency
The proportion of times an outcome occurs relative to the total number of trials. It provides an estimate of the probability of an event. (the same as probability)
Classical approach of probability
Probability calculation assumes that all possible events are equiprobable.
Equiprobable
Events that have an equal chance of occurring.
Permutation
The number of ways objects can be arranged.
Combinations
Number of arrangements where the order does not matter.
Complement
The complement of an event is the set of outcomes in a sample space that are not included in the event itself. It represents all other possible outcomes.
Intersection
P(A ∩ B) is the probability that both event A and event B occur. key word “and”. Looking at a Venn diagram it’s the middle portion or the intersection (so only the middle part would be shaded).
Union
P(A ∪ B) is the probability that either event A or event B occurs. In a Venn diagram, it includes all parts of both events (so entire part would be shaded).
Independent events
These are events where the occurrence of one event does not affect the probability/outcome of the other event occurring.
Product rule
In probability, the product rule states that for two independent events A and B, the probability of both events occurring is P(A) × P(B). This rule is used to calculate the joint probability of independent events.
Mutually exclusive events
These are events that cannot occur at the same time. The occurrence of one event means the other cannot happen.
Summation rule
In probability, the summation rule states that for two mutually exclusive events, the probability of either event occurring is the sum of their individual probabilities, represented as P(A) + P(B). This rule is essential for calculating probabilities when events cannot happen simultaneously.
Union of independent events
In probability, the union of independent events refers to the likelihood that at least one of the events occurs. For two independent events A and B, it is calculated using P(A) + P(B) - P(A) × P(B).
Union of not independent events
In probability, the union of not independent events refers to the likelihood that at least one of the events occurs when the events are dependent on each other. It requires adjustments in calculations to account for the interaction between events, expressed as P(A) + P(B) - P(A ∩ B).
Mixture of independent and mutually exclusive events
In probability, a mixture of these events refers to a situation where events are both independent from each other and cannot occur at the same time. The probability of such a mixture can be calculated by summing the probabilities of the independent events without any need for adjustments due to their mutual exclusivity.
Conditional probability
The probability of a second event after another event occurred first. It measures how the occurrence of one event affects the likelihood of the other event. P(B|A) = P(A ∩ B) / P(A) A is in the denominator because it’s already occurred we want to know all the times B will happen if A has already occurred.
Dependent events
One thing can’t happen without the other happening. These events are interconnected, meaning the occurrence of one event affects the probability of the other. For example, the probability of event B occurring may change based on whether event A has occurred.
Bayes theorem
Bayes' theorem is a mathematical formula used to determine conditional probabilities. It describes the probability of an event based on prior knowledge of related events, expressed as P(A|B) = P(B|A)P(A) / P(B).
Prevalence
The proportion of a population who have a specific characteristic or condition at a given time. P(E+) E+= event positive.
Sensitivity
The ability of a test to correctly identify those with the condition, calculated as the proportion of true positives among all actual positives. P(Test + | Event +)= P(T + ∩ E+)/ P(E+)
Sensitivity is crucial for assessing a test's performance and minimizing false negatives.
Specificity
The ability of a test to correctly identify those without the condition, calculated as the proportion of true negatives among all actual negatives. Specificity is essential in evaluating a diagnostic test's accuracy and reducing false positives. P(Test - | Event -) = P(T - ∩ E-) / P(E-)
False positive
A test result that indicates a person has a condition when they do not. False positives can lead to unnecessary anxiety and further testing. P(Test + | Event -) = P(T + ∩ E-) / P(E-)= 1- specificity.
False negative
A test result that indicates a person does not have a condition when they actually do. False negatives can result in missed diagnoses and delayed treatment. P(Test - | Event +) = P(T - ∩ E+) / P(E+) = 1 - sensitivity.
Predictive value
Probability that a result from a diagnostic test is correct. It estimates the likelihood that a patient has or does not have a condition based on the test result. Predictive values are influenced by the prevalence of the condition in the population.
Relative risk
The ratio of the probability of an event occurring in an exposed group versus a non-exposed group. It provides insight into the likelihood of a particular outcome based on exposure to a certain risk factor. rr = P(Disease | Exposed) / P(Disease | Not Exposed)
Variable
Any item being measured.
Random variable
Any variable whose value are controlled by an element of chance.
Discrete random variable
Random variable with discrete values. It can take specific values and is often counted, such as the number of occurrences of an event.
Continuous random variable
Random variable with continuous random values. It can take any value within a given range or interval, such as heights or weights.
Probability distribution
An accounting of the probability of all possible outcomes of a discrete random variable. It describes how probabilities are distributed over the values of a random variable.
Binomial distributions
Theoretical probability distribution of binomial outcomes.
What are the assumptions of binomial distributions?
The assumptions of binomial distributions include a fixed number of trials (observations), only two possible outcomes per trial (success or failure) idk this one isnt in our slides, constant probability of success, and independence of trials.
Binomial events
Only two possible outcomesfor each trial, often termed success or failure. ex. in slides: Sex- boy vs girl, disease state- healthy vs diseased.
Binomial probability
The probability of obtaining a certain number of successes in a fixed number of independent Bernoulli trials, where each trial has the same probability of success.
Poisson distribution
Distribution of rare events.
What are the assumptions of poisson distribution?
Binomial probability (constant P, fixed number of trials), P is small, and n is large.
Normal distribution
A probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean.
Poisson distribution can be a good approximation of the binomial distribution when:
p is small and n is large, the number of trials (n) is large, and the probability of success (p) is small.
Poisson distribution can be a poor approximation of the binomial distribution when:
p is large and n is small, the number of trials is low, or when the average rate of success is high.
Binomial distribution can approximate a normal distribution when:
P= 0.5 and n is relatively large the number of trials (n) is large, allowing for a unimodal, symmetric distribution around the mean.
How can a normal distribution be described by?
Any normal distribution is described by: its mean (μ) and standard deviation (σ). These define the normal curve, they dictate the shape and location of the distribution on the number line.
When mean changes (related to normal distribution):
The location changes, and shape does not change. The mean shifts the center of the distribution along the number line, while the standard deviation remains constant, preserving the overall shape.
When standard deviation changes (related to normal distribution):
The shape of the distribution changes, becoming wider or narrower. Increasing standard deviation spreads the data more widely around the mean, while decreasing it results in a steeper curve centered around the mean. The location does not change.
Empirical rule
also known as the 68.3-95.5-99.7 rule, states that in a normal distribution approximately 68% of the data falls within one standard deviation of the mean, about 95% falls within two standard deviations, and about 99.7% falls within three standard deviations.
Standard normal distribution
A normal distribution with a mean of zero and a standard deviation of one. It allows for easier comparison of different datasets, as it standardizes scores.
Standard normal curve
A graphical representation of the standard normal distribution, which is bell-shaped and reflects the distribution of standardized scores. mean equals
Central limit theorem (CLT)
Describes the distribution of the sampling distributions.
Sampling distribution
The frequency distribution of the sample means (X bar).
CLT Assumptions
Random sampling, sample size is constant (n), and population does not change.
CLT Conclusions
Sampling distribution is normally distributed even if the sampled population is not.
The mean of the sampling distribution equals the population mean.
Standard error, which is the standard deviation of the sampling distribution it can be approximated by the population standard deviation divided by the square root of the sample size.
Confidence Interval
A range of values used to estimate the true population parameter, with a specified level of confidence.
Range estimator
Range of values that describe the location of a population parameter.
Student’s t-distribution
A probability distribution used to estimate population parameters when the sample size is small and/or the population standard deviation is unknown.
Margin of error
The amount of uncertainty in a sample estimate, typically expressed as a percentage or a value range, that indicates how much the sample results are expected to vary from the true population value. (confidence interval too?)
One-tail test
A hypothesis test that evaluates the effect of a treatment in one direction only, determining if a parameter is greater than or less than a specified value.
Two-tailed test
A hypothesis test that assesses the possibility of a parameter being either greater than or less than a specified value, evaluating effects in both directions.
Typical values of a (alpha)
usually 0.05 or 0.01 in hypothesis testing
a=0.05 -95% one tail, a/2=0.025 95% CI two tailed, a/2=0.005 99%.
Biased sampling
One of the ways to “lie” with statistics.
Publication Bias
Publication of research results is influenced by the nature and direction of the study findings. Studies with positive or significant results are more likely to be published, while those with negative or non-significant results are often not published, leading to an incomplete and biased body of scientific literature.
Reporting Bias
Researchers may selectively report results that support their hypotheses, while omitting or downplaying contradictory data, leading to an incomplete or distorted representation of the study findings.
Confounding Bias
AKA unreported variables. There are other variables, which in fact, influence your findings - but you report as if your variable of interest is the only important one.
Confirmation Bias
Researchers may have preconceived notions or expectations about the outcomes of their studies, leading them to interpret data in a way that confirms their hypotheses. This can lead to the inadvertent dismissal of conflicting data or the overemphasis of supporting evidence.
P-Hacking
Manipulating or analyzing data in various ways until a statistically significant result is obtained, without proper correction for multiple comparisons. Researchers may selectively analyze data or perform multiple statistical tests until a significant result is achieved, which can inflate the likelihood of false-positive findings.
Harking
Hypothesizing After the Results are Known. It refers to the practice of presenting a post hoc (done after the event) hypothesis as if it were initially specified before the data were collected.
P-Hacking and Harking
Both increase rate of false positives.
Bias
When the sample is systematically different than the population.
Sampling scheme
Strategies used to eliminate bias.
Sampling fraction
The proportion of the population samples (n/N) n= sample size, N = population size.
Simple random sampling
If the population is randomly distributed, one can simply sample from the population.
If the population isn’t randomly distributed, choose individuals randomly.
Sampling without replacement
You keep sample where previously selected individuals cannot be chosen again.
Sampling with replacement
-You put it back into the population after selection, allowing individuals to be chosen multiple times.
-This is the preferred method when sampling could affect the outcome of the experiment.
-Issues with sampling the same individual twice.
-Double sampling isn’t likely when N is large.
How do you choose a random sample?
Mechanical mixer ex. bingo, lottery games, and more. Or a random number generator.
Systematic sampling
This is a method where individuals are selected at regular intervals from a sorted list or population, often starting from a random point. This is easier than simple random sampling.
Stratified random sampling
A method where the population is divided into distinct subgroups, or strata, and random samples are taken from each stratum to ensure representation of different segments.
Strata
The distinct subgroups within a population used in stratified sampling to ensure representation.
Respondent Bias
A type of bias that occurs when respondents' answers are influenced by their personal opinions, experiences, or the way questions are framed, leading to inaccurate data collection.
Investigator’s Bias
A type of bias that occurs when a researcher's expectations or preferences influence the study's results, affecting the interpretation of data and outcomes.
Placebo effect
The phenomenon where a patient experiences a perceived improvement in condition after receiving a treatment with no therapeutic effect, due to their expectations or beliefs.