Looks like no one added any tags here yet for you.
categorical data
ordinal
not ordinal
what is not ordinal data
data you canNOT sort
blood type
geo location
sex
colors
what is ordinal data
data you CAN sort
numbers
dates
size
time
what is quantitative data
discrete
continuous
what is discrete data
can list all possible values (finite)
# of eggs
colonies in agar
people in a room
what is continuous data
you cannot count/list all possible values (this largely depends on how accurate data is)
age
repeating numbers (pi)
truly accurate weights/heights
in a histogram do the bars touch
yes
represents continuity
in a bar plot do the bars touch
no
what kind of data does a histogram represent
continuous data only
what are the types of distributions
symmetrical
long tail
skewed right
skewed left
exponential
bimodal (2 modes)
uniform
u shaped
what is “Y”
random variable
what is “y”
value of random variable
what is “n”
sample size
what is “yi”
value of y in the ith place in sample
what is a minimun
lowest number in sample
max
largest number in sample
mode
most common # in sample
median
middle # in sample after data had been ordered from lowest to highest
mean
average
range
max - min
avg absolute deivation
how far away is data from average
variance
how spread out data is from each other
standard deviation
how spread out data is compared to mean
when do you use median over mean
when a sample had MAJOR outliers
income in Seattle WA: mean=1.2 mill median=500 thousand
bill gates income skews data so mean is no longer accurate
what are the steps to making a box plot
find median
find Q1 and Q1
split data into 2 parts at median.. find median of those 2 parts
get IQR=(Q3-Q1)
find 1.5x IQR
Find upper fence (if data above this=outlier)
Q3 + 1.5xIQR
find lower fence (if data below this=outlier)
Q1 - 1.5xIQR
μ
pop mean
σ
pop standard deviation
p
population
ȳ
sample mean
estimates μ
s
sample standard devation
estimates σ
p̂
sample population
estimates p
random sampling
everyone has equal chance of being picked
systematic sampling
measure values in set increments
every 20th person
stratified sampling
sample proportional to data
opperuntisitic sampling
sample everything as you come across it
can be biased
what is basic probability
number of time event can occur / number of possible outcomes
number of time you WANT event to occur / number of time event CAN occur
what is the range a prob can be
0-1
what does P{E} mean
probability of event E
what can do you if 2 probabilities are independent
Pr (H and roll 6)
multiply them
does conditionally prob change outcomes. if so, how?
yes
states a condition that must happen
what can you use the binomial coefficient for
finding how many different ways tings can be arranged
what can the binomial distribution formula be used for
finding prob of an event
what are the binomial distributions parameters
sample size
probability
what makes a distribution
all possible outcomes add to one
parameters determine shape
what are the parameters of the normal distrubtion
standard deviation (pop)
pop mean
what is the central limit theorem (CLT)
calc prob using normal distribution no matter original distribution
why do we use the normal distrbution
many things in bio are approx normal
what do we need to make the standard normal curve
z
what examples of continuous data distributions
T distrib
chi square
normal
what are discrete data distrib
binomial
poisson
unifrom
what do you need in order to calc the poisson distrib
pop mean
how is ȳ typically distrubted
normal distrib
what do you do if you don’t know the pop standard deviation
use sample standard deviation
what is SEȳ
standard error for ȳ
noise
Variation in the data we are interested in
Statistics
a value calculated or derived from the data
does a bar plot represent continuous data?
no— its discrete data
show by not touching bars
how would you find Pr(z>-2.0)
a) 1- Pr(z>-2.0)
b) Pr(z< 2.0).. look up 2 rather than -2
Know when statistics might be appropriate (or inappropriate). Be able to give an example of each.
appropriate: analyze medical data
inappropriate: to lie using false/unspecific stats
90% effective with no parameters
Be able to give an example of an application of statistics
weather forecasting
testing drug effectiveness
election polling
What is a record ( = observational unit = case)?
individual entity or subject that data is collected on
Why do we use n - 1 in the denominator for the variance (and standard deviation)? Why not just n?
provides a more accurate estimate of the population variance by correcting for the bias introduced when using the sample mean to estimate the population mean
samples tend to be closer to the center than pop data
add extra room to var to account for this error
What is the difference between a sample and a population?
sample = section of pop
pop= whole
When is sampling better than trying to measure everything?
more efficient/cost effective and just as accurate
Why is it so important to define a population precisely?
students at gmu is very different than student at gmu taking biostats
defined pop helps narrow targeted group
minimize risk bias
Why do we sometimes have to be careful with studies done in zoos or labs?
limited sampled size can lead to errors
Know the difference between estimates and parameters
parameter: specific characteristic of an entire population
estimate: calc based on sample to approx true pop parameter
What does the ^ (hat) symbol mean?
estimated value
How would you do random sampling?
define your population
determine your desired sample size
assign a unique number to each member of the population
use a random number generator or lottery to select the individuals that will be included in your sample
Are the random numbers generated by a computer truly random? Why, or why not?)
no. random number are given based on algorithm
What is the effect of sample size?
larger than sample = more precise and higher confidence
Why is probability important to statistics?
can determine the the probability of getting this result by chance.
If this probability is low, we say the experiment had an effect on the outcome
what is a distribution
function that shows how values are spread out across a range of possible values
How does the shape of a binomial distribution depend on n and p?
p = 0.5: Symmetrical distribution regardless of n.
p < 0.5: Skewed right.
p > 0.5: Skewed left.
Large n: Even when p is not exactly 0.5, the distribution tends to appear more symmetrical due to the "central limit theorem" effect.
Small n: The skewness caused by p value is more pronounced
How does the shape of a normal distribution depend on μ and σ
μ:
increases= entire curve shifts to the right
decreases= curve shifts to the left
σ
larger= creates a wider, flatter curve
smaller= narrower, steeper curve
How does area relate to probability (for a continuous distribution)?
area under a curve = prob{e}
How do you use the normal distribution? What table do you use
z score table
What is reverse lookup?
looking up the z area and finding the z score from that
What is the mean (μ) of a binomial distribution? What is the standard deviation (σ)?
μ= np
σ= sqrt[np(1-p)]
What is the mean (μ) of a normal distribution? What is the standard deviation (σ
mean=0
standard deviation= 1