1/39
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
what is a sample?
subset of the population, used to find out information about the population as a whole
why should a sample have the same characteristics as the population it is representing?
- Generalizability: It allows you to apply findings from the sample to the whole population.
- Bias Reduction: A similar sample minimizes bias, ensuring results are accurate for everyone.
- Statistical Validity: Many statistical methods assume the sample reflects the population; if it doesn't, results can be misleading.
- Understanding Variability: A representative sample shows the diversity within the population, helping identify trends.
- Informed Decision-Making: It ensures that decisions based on research are relevant to the entire population.
What type of people are excluded from trials?
- Pregnant or Nursing Individuals: To protect the baby.
- Children: Due to ethical concerns and different reactions to treatments.
- Elderly: Often excluded because of multiple health issues.
- People with Multiple Health Conditions: To focus on specific effects of the treatment.
- Certain Ethnic or Racial Groups: Sometimes excluded for targeted studies or due to historical biases.
- Those on Other Medications: To avoid drug interactions.
- Individuals with Specific Allergies: To reduce risk.
- People with Mental Health Disorders: If it could affect study results
2 classifications of data
Qualitative (categorical) data
Quantitative (numerical) data
types of qualitative data?
nominal
ordinal
types of quantitative data
discrete
continuous
interval
ratio
nominal data
consists of names, labels, or categories
categories without any intrinsic order, size
no units
e.g gender, blood types, types of animal
ordinal data
ranking of some kind
meaningful order, but intervals are not uniform, so difficult to know what diff is between them
no units
e.g Customer satisfaction ratings (Poor, Fair, Good, Excellent), Education level (Secondary School, Bachelor's, Master's), Social class
interval data
ordered categories
equal diff between 2 points
e.g Temperature in Celsius, dates, IQ scores, pH
Can perform arithmetic operations like addition and subtraction but no meaningful multiplication or division because of the lack of a true zero
zero does not mean a lack of
ratio data
has true zero point, which indicates absence of the variable being measured
zero point designates where measurement begins
A meaningful conclusion can be made on the ratio between scores
e.g weight, height, income
can multiply, divide
discrete data
counting things
whole numbers
e.g number of students in a class, number of cars in a parking lot, or number of pills in a bottle
continuous data
units of measurement
not limited to whole numbers
e.g weight, height, blood pressure
summary
nominal- mutually exclusive, no order
ordinal- order, no units
metric- order, units/ no of things
discrete- counting
continuous/ration- measuring, zo zero
interval- measuring, zero point
what data is this: data on student grades: A, B, C, D, E, F
ordinal
what data is this: The number of pages in a book
ratio
what data is this: A survey asking "Which mode of transportation do you prefer?"
(Bus, Car, Bicycle, Walk)
nominal
descriptive statistics
describe or summarize a set of data
collect data e.g survey
most time consuming, most expensive, most difficult
This data could be presented e.g. tables and graphs
simplify large amounts of data, show patterns, make comparisons between groups, present in informative way
(see next cards)
graphical techniques of presenting data
diagrams for numerical data
diagrams for graphical data
graphs for qualitative (categorical) data
bar graph, pie chart
graphs for numerical data
dotplots, histograms, and stemplots
histogram
normal distribution curve, highest number is median
mean+ median are the same here
skewness: If one tail stretches out farther than the other, mean + median are diff here
Positively or right skewed: median is typically less than the mean and is located to the left of the center
Negatively or left skewed: median is usually greater than the mean and is located to the right of the center
mean or median
in normal distribution we normally look at mean
in skewed we normally look at median
diff between bar chart + histogram
Histogram- continuous data
Bar chart- separate entities, gap in between
numerical techniques of presenting data
- frequencies
- central tendency
- dispersions
e.g Mean, standard deviation, range, mode, median, frequencies, percentages, incidence, prevalence, risk, odds
frequency distribution
number of occurences in each of several categories
used to summarize large volumes of data values- you might group values into intervals like 0-10, 11-20, 21-30 etc
mean
balance point" of a data set, representing its central tendency
Advantages:
Incorporates every data point, making it comprehensive.
Easily combined with other statistical measures
Disadvantages:
Sensitive to extreme values (outliers), which can skew the result and introduce bias.
lower SD
lower standard deviation is more accurate
median
middle value
It divides the data into two equal halves—50% above and 50% below
mode
The value that occurs most frequently in a given data set
possible to have more than one mode
may not be at the centre of a distribution
range
Distance between the smallest value and highest
Not affected by skewness but is sensitive to the addition or removal of an outlier value
why is a smaller range better?
Consistency: Less variability in results.
Predictability: More reliable outcomes.
Reduced Risk: Fewer unexpected results.
Easier Analysis: Simpler to interpret data.
Improved Quality: Generally means higher quality and reliability.
Interquartile range
The Interquartile Range (IQR) measures the spread of the middle 50% of data.
Q1: Find the first quartile (25th percentile).
Q3: Find the third quartile (75th percentile).
Calculate IQR: Subtract Q1 from Q3:
IQR=Q3−Q1IQR=Q3−Q1
Why It Matters:
Less Affected by Outliers: IQR is robust and not influenced by extreme values.
Shows Data Spread: It helps you understand how data is distributed
standard deviation
average distance of data values from collective mean
Unlike the IQR, it uses all information in the data
Use alongside mean
scatterplot correlation
visually displays the relationship between two variables
Types of Correlation:
Positive Correlation:
As one variable increases, the other also increases.
Points trend upwards from left to right.
Negative Correlation:
As one variable increases, the other decreases.
Points trend downwards from left to right.
No Correlation:
There is no apparent relationship between the variables.
Points are scattered randomly
Strength of Correlation:
Strong Correlation: Points are close to a straight line.
Weak Correlation: Points are more spread out but still show a trend.
Correlation Coefficient:
A numerical value (between -1 and 1) that quantifies the correlation:1: Perfect positive correlation-1: Perfect negative correlation0: No correlation
linear regression
Why is it important to produce a scatter plot first before fitting a regression line (trend line)?
Linear regression is a method used to find the relationship between two variables by fitting a straight line to the data
we assume that a change in x will lead directly to a change in y
Key Points:
Dependent Variable (Y): What you want to predict.
Independent Variable (X): The predictor you use.
Equation:
The relationship is expressed as:
Y=a+bXY=a+bX
Y: Predicted value
a: Y-intercept (value of Y when X is 0)
b: Slope (how much Y changes for a one-unit change in X)
Steps:
Collect Data: Gather information for both variables.
Plot Data: Create a scatterplot to see the relationship.
Fit the Line: Find the best-fitting straight line.
Evaluate: Check how well the line predicts Y.
Why is it important to produce a scatter plot first before fitting a regression line?
because it:
Shows Relationships: Helps you see if there's a relationship between the two variables.
Reveals Patterns: Allows you to spot trends or patterns that might need a different model.
Identifies Outliers: Highlights unusual data points that could affect results.
Checks Assumptions: Helps verify if the data meets linear regression assumptions.
Guides Analysis: Informs whether you need to adjust your approach or use a more complex model
In short, a scatter plot provides valuable insights to ensure a proper regression analysis
systolic bp slide?
box & whisker plot
top whisker- max value
bottom whisker- lowest value
The top of the box is the value below which 75% of the values lie. This value is known as the 75th percentile or upper quartile
The middle of the box is the central value in the data after it has been arranged in ascending order. This value is known as the median.
The bottom of the box is the value below which 25% of the value lie. This value is known as the 25th percentile or lower quartile.
Summary-Questions to ask when assessing descriptive statistics in published literature
Have several tests of normality been considered and reported?
Are appropriate statistics used to describe the centre and spread of the data?
Do the values of the mean ± 2SD represent a reasonable 95% range?
If a distribution is skewed, has the mean of either group been underestimated or overestimated?
If the data are skewed, have the median and inter-quartile range been reported?