PPT 1: Descriptive Statistics
Intro to Biostats
Why do we use statistics in biology?
Most things are probabilistic (data based on probabilities) rather than deterministic (data based on know facts)
Definitions
What is the difference between observational and experimental populations?
Observational- A finite population, but is difficult to count
Experimental- An infinite amount in the population
What is a sample survey (or an observational study)
A study of the individuals actually present in a population (under what the investigator can control)
What is the difference between an experiment and an observation?
Observations are descriptions of patterns
Experiments are designed to collect observations according to a plan
What is a sample?
The amount of a population actually measured
What is a sample unit?
An individual thing drawn from the population (e.g. an organism or a measurement)
What is inference?
A generalization of an observation (as samples do not always contain all of the population)
What is random sampling?
Truly random, all members of a population have a chance to be chosen
What is simple random sampling?
A random sample within the whole population
What is a stratified random sample?
A random sample within a group (males v females, age 1 age 2, etc)
Biological Variables
What is continuous measurement of variables?
Any value between two extremes can be selected
What are discrete measurements of variables?
Fixed values are chosen between extremes (whole numbers)
What are rank variables?
Indicate more or less of variables based on their rank (e.g. smallest to greatest)
What are qualitative variables
Categorical variables (e.g. male/female, living/dead)
What is a rate?
The quantity per unit (eg. time, mass, births per year)
What are indices (or index)?
Complex derived variables (e.g. condition index: condition of body)
What is the difference between accuracy an precision?
Accuracy- How close a value is to the true value
Precision- How close repeated variables are
What is bias?
Departure from the true value
Frequency Distribution
What can frequency distribution show?
Location, dispersion, and symmetry
What are examples of frequency distributions?
Symmetrical unimodal:
Asymmetrical bimodal:
Symmetrical uniform:
Asymmetrical (skewed) unimodal:
Symmetrical bimodal:
Extremely skewed:
What is the difference between absolute and relative frequencies?
Absolute- The vertical axis represents the real number of observations
Relative- The axis represents a percentage of observations
What does ∑ represent?
The sum of all of the variables
What is the statistics of location (or mean)?
The position of a sample oolong a given dimension representing a variable
What is the difference between an arithmetic mean and a weighted mean?
Arithmetic- The balance point of a distribution. All numbers are treated equally and have equal weight. Most commonly used
Weighted- The averages of the values are taken to find the mean (may be based on prevalence or the overall percentage of some variables)
Why do we transform our results?
We want to be able to shift the data to get a normal distribution
What is the geometric mean (GM)?
The back-transformed mean of a log transformed variable (Y becomes logY, and then is changed back to Y)
What is a harmonic mean?
1/Y
When we take a (highly) skewed distribution and make it normal.
What is the median?
M or Y
It is the middle value of the distribution
What is mode?
The most frequent value. Can be bimodal or multimodal.
Where are the mode, median, and means in an asymmetric distribution?
Mode- Farthest from the tail
Median- in between
Mean- Closest to the tail
What is the mean deviation?
A measure of the average deviation from the mean
What is standard deviation?
A measure of the amount of variation from the variables around the mean
The square root of the variance
σ
What is variance?
The overall deviation of the observations from their mean
What is a parameter?
The true number value of a population (this number is the goal of an estimate)
What is a sample statistic?
The estimate of a parameter based on a sample
What is μ?
The population mean or expected value
What is ȳ?
The unbiased estimate of μ (the mean)
What is σ?
The standard deviation
If the means and standard deviations are the same, how do we measure interdependence (how variables are related)?
We use covariance. It describes the extent and direction of two numerical variables with numbers. You take the product of two deviations.
What happens if there is a positive covariance?
Large Y1, and large Y2
Small Y1, and small Y2
(left in figure)
What happens if there is a negative covariance?
Large Y1, small Y2
Small Y1, large Y2
(right in figure)
What happens if the covariance is 0?
(middle of figure)
This means that knowing one variable tells you nothing about the other
What is standardized covariance?
The correlation coefficient
It scales the variance between -1 and 1
PPT 2: 2 prob and distributions
What is n?
Equally likely outcomes
The number of possible events
What is s?
Successful outcomes
What is the probability of success?
s/n
What is a simple event?
Any 1 element in a sample space
Only one can occur in a trial
What is an event?
When more successes happen (depending on what is the desired outcome)
What is an intersection?
An “and” relationship among common elements (eg. (A,B) = A⋂B)
What is a union?
An “either/or” relationship among all elements in both sets (e.g. A⋃B)
What is independence?
Among two events, when probability of one occurring does not affect the other
What is replacement?
P(D)2
When you remove one individual and there is a chance of getting another (the sample space is unchanged)
What happens to the chance of an event (2 draws) happening as the population size increases?
Your chance of an event occurring increases
A smaller population results in less chances of an event
Can we multiply two probabilities of the two events we want to get the probability of both events?
No
What is π?
The probability of a success
What is 1-π?
The probability of a failure
What is n and r in C(n,r)?
The number of combinations of n things taken r at a time (r successes)
What is the equation for the mean of the distribution?
wi= frequencies (graph)
ri= # of successes
What is the equation for the expected mean (or the absolute frequency of success)?
μ (mean) = n (number of equally likely outcomes) π (probability of success)
What is the equation for the s (standard deviation) of the distribution?
What happens to σ with changes in n?
σ increases, with an increase of n
σ decreases, with a decrease of n
What is the distribution expected if π (probability of success) is 0.5?
What is a repulsed distribution?
s (success) < σ (SD)
When there is excess in the center and too few at the tails
What is a clumped distribution?
s (success) > σ (SD)
When there is an excess in tails
What is skewness?
When data has a tail to one side (left or right)
Can be measured with g1
What is kurtosis?
The shape and height of a distribution
E.g. leptokurtic (high peaked)
E.g. platykurtic (rounded)
What happens to the standard error of the mean with changes of n?
As n increases, the SD decreases
What is a confidence interval?
The percent of the results that are expected to include the μ (mean)
What is the central limit theory?
As the sample size n increases, the distribution of the sample mean ( ȳ) approaches a normal distribution, even if the original population is not normally distributed
What is Pr?
95% of the data is between the two extremes
The confidence interval
95% probability the true mean is between sample mean +- std error
95% of intervals calculated from samples will contain the sample mean
Changing in the 1.96→ changes the % confidence interval
1.96 is the Z value
There is no 100% confidence interval