1/87
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
population
entire group that you wish to study
sample
group within the population that you actually collect data from
inference
conclusion that we CAN MAKE USING SAMPLE DATA ABOUT POPULATION!
FOR SAMPLE SURVEYS YOU NEED WHAT
DETERMINE U NEED TO USE SAMPLE
RANDOMIZE
DETERMINE GOOD SAMPLE SIZE
CENSUS
type of sample that included everyone in a population
can be problematic if studying big populations or constantly changing populations - impractical
biased
Sample methods that tend to over- or underemphasize some characteristics of the population are said to be biased
-e.g. if want to survey sinclair students but only survey women students and not men, is biased
parameter
a measure used to talk about a population
-Parameters go with Population
-if using any numbers to talk about population, you use the term parameters
statistic
is a measure used to talk about a sample.
Statistics go with Samples
-if using any numbers to talk about samples, you use term statistics
types of samples
SRS
simple random sample
the standard that sampling is measured against
go to sampling
completely random how you choose people
Ex. Assign each person in the possible sample a number and use a random number generator to randomly select 100 of them. - like names out of a hat blindly
sampling variability
applies to SRS
Samples drawn at random will be different; thus, the statistics associated with them may be different. This is known as sampling variability
100 people you sample may not have the same results as 100 people another samples, should be close, but will vary
What if we suspect that there are two (or more) distinct groups within our population that think differently on a topic?
What if we suspect that there are two (or more) distinct groups within our population that think differently on a topic?
we use stratified sampling
We can break the population into strata (ex. males and females) before sampling the data. We will then pull percentages of each group based on their percentage in the entire population
—> reduces sampling variability
e.g. We think there will be a distinct difference in opinion between the 60% of the population that are young and middle aged voters (18-49) and the 40% of the population that is older people (50 and older). So we divide them into those two strata (have young and middle aged voters in one group and older voters in the other) and then randomly choose 60% of our sample from people ages 18-49 and 40% from people ages 50 and older
What if we cannot obtain a list of everyone in our population?
SRS cannot be used because we cannot assign everyone an ID number, and thus we do not know everyone in our population
stratified sampling cannot be used cuz we dont know proportionally how people stack up within population
WE USE CLUSTER SAMPLING!
—→ We can sample certain clusters of our population, as long as we have no reason to believe that the individuals in these clusters differ from one cluster to another
Ex. We want to sample 100 high school students at a certain school. These students each have a home room to which they are randomly assigned. We could randomly pick a few home rooms and sample everyone in each of those home rooms.
—> we have all of these different clusters organized the same
Stratified Sampling vs. Cluster Sampling
Stratified - divided into groups of similar individuals such that the strata are distinct from one another - we split everyone into groups that have some difference e.g. age - in stratified sample, we survey ppl from every single one of those groups in proportion to the population
Cluster - all groups pretty much same, we choose only a few and survey everyone in those couple groups
Systematic Sampling
When the order of the individuals is in no way associated with their possible responses, we can use systematic sampling, in which we pick a random individual to start with and then survey every nth person. every 5th person etc.
Ex. If we want to survey 100 students off of a list of 1,000 students, pick a random number from 1 to 10 to choose where to start among the first ten students (ex. 7), and then choose every 10th student from there (ex. 17, 27, 37, etc.)
good Sampling Methods
SRS - simple random sampling
stratified sampling
cluster sampling -
systematic sampling
Pick every 10th passenger as people board the plane
systematic
From the boarding list, randomly choose 5 people flying first class and 25 of the other passengers.
stratified sampling
Randomly generate 30 seat numbers and survey the passengers who sit there.
simple random sampling SRS
Randomly select a seat position (right window, right center, right aisle, etc.) and survey all passengers sitting in those seats
cluster sampling cuz we assumiing ppl have no reason to feel different abt their journey
Know What You Want To Know
When setting up your survey, be sure to consider exactly what you are investigating and what data will help you in that process.
How many hours did you sleep last night? vs. How much do you usually sleep?
How many hours did you sleep last night? better cuz specific, other one could lead to answers like “a lot”
Ask For Quantitative Results - How many magazines did you read last week? vs. How much do you read magazines: A lot, A moderate amount, A little, or Not at all?
How many magazines did you read last week? better
cuz other question, we do not know what ppl define as a lot or a little
Be Careful In Your Question Phrasing!!!
Your questions should be clear and should NOT lead the respondent to any particular answer / influence their answer!
“In a recent study, students in an Algebra 1 course were given a 25 question basic skills test. On average, students used a graphing calculator to answer 21 out of 25 questions. Do you think graphing calculators are overused? - what is wrong
by giving the prior info to the question, we are leading the respondent by giving info that implies graphing calcs are being used too much!
“By using a graphing calculator, students in an Algebra 1 course are able to make visual connections between equations and their graphs, reinforcing difficult concepts. Do you think graphing calculators are overused?” - what is wrong
the background info leads respondents to think that graphing calcs are not being overused. - we should not be influencing them in any way b4 we ask question!
“Do you like English or Math?” -what is wrong
question is not specific enough, does not allow us to analyse data, could get stuff like not at all or a little bit
“Do you send/receive text messages frequently? - what is wrong
we are not asking for quanitiative data, which we need. should ask how much send/recieve in a day
“Do you believe posting anti-drug posters in schools is salubrious? -what is wrong
wording too complex, ppl dont know what salubrious mean!
could answer without knowing the meaning!
Voluntary Response Sample/Bias
we do NOT want to use this
a large group of individuals are invited to respond, and all who respond are counted. Ex. A poll on a web site - is voluntary cuz u choose to visit website and u choose to respond to poll, thus its not very representative of the population
tend to overemphasise the opinions of those who feel strongly on this issue/experience, and underemphasise those who feel in the middle
Convenience Sampling
we do not want to use this
we simply survey those people that are conveniently available.
Ex. I survey all of the students in my classes, but we really want to find out about sinclair students as a whole - that is not a representative sample!
Types of Bias Video Lesson
Undercoverage e.g of (Selection Bias)
occurs when some portion of the population is not sampled.
Ex. A phone survey tends to miss people who work during the day or people who do not have a telephone.
selection bias
any time some group is going to be over or underrepresented - theres problem in selection process
Nonresponse Bias
Some surveys can be set up where many members of the sample do not respond
It is better to create a smaller sample with a higher response rate than a larger sample with a high nonresponse rate.
Response Bias
Response bias refers to anything in the survey design that influences the responses. - something that affects how respondents respond to question
Ex. Respondents not wanting to reveal personal facts or admit to illegal or unapproved behavior. - something they are not going to be comfortable admitting/stating
Ex. The wording of the questions influencing responses.
Variable
what is being studied
a characteristic or measurement that can be determined for each member of a population
may be numerical (weight or time in hours) or categorical (category)
Data
a collection of facts or information from which conclusions can be drawn
Quantitative Variable
a variable that takes on a number for its value - e.g. what percentage get on test
Categorical/ Qualitative variable
a variable that takes on a name/label/category for its value e.g. colour: blue or name of a tv show
y are u dependent
what u are measuring/ what you are trying to study, response variable
grade would be on y axis - dependent
time spent studying would be on x axis - is what u can control - independent
observational studies
Observational studies are studies in which the researcher does not assign a treatment or design an experiment, instead just observing things that are happening or data that have already been collected
Ex. A researcher wants to know if taking an AP test helps students succeed in college. The researcher pulls the past five years' college GPA data on students who took AP tests in high school
- researcher not involved in any of these decisions, not dictating who takes AP tests, has nothing to do with their success in college, just lets things happens naturally and studies data compiled,
It is IMPOSSIBLE for an observational study to ever show a cause-and-effect relationship!
—> cuz not in control of any of variables
These studies merely help us describe associations between certain data, helps us show correlation between variables but does not show causation
Can we ever actually prove anything is a cause-and-effect relationship
In fact, we can! A properly setup experiment can do this for us!
Because we can control every aspect of an experiment, we can block out any lurking variables and actually prove a cause-and-effect relationship
A study randomly selected men from various countries around the world and collected information about diet and health. The men who ate Mediterranean diets had lower rates of heart disease than those men who did not.
observational study
population is ALL MEN
VARIABLES - THINGS THAT CHANGE - diet that the person eats - is categorical and quanitative, and whether or not the men had hard disease - is categorical and dependent
cannot prove cause and effect relationship cuz working with observational
A study gathered 7,000 people in Spain who were at high risk of heart disease. Participants were randomly assigned to one of three groups: Mediterranean diet supplemented with extra-virgin olive oil, Mediterranean diet supplemented with mixed nuts, or Control group that were just advised to reduce their dietary fat intake. Participants were followed for five years. The groups on the Mediterranean diet had significantly lower rates of heart diseas
experiment
all people in Spain at high risk of heart dieease studied
indep.and quant. variable: diet . quant and dep: presence of heart diesease
yes cause and efffect can be determined by this study cuz working with experiment, we can cantrol all lurking variables
TEXTBOOK FLASHCARDS
quantitative discrete data
all data that are the result of counting e.g. one phone call
often starts with numbers of
quantitative continuous data
Data that are not only made up of counting numbers, but that may include fractions, decimals, or irrational numbers, are called quantitative continuous data. Continuous data are often the results of measurements like lengths, weights, or times. A list of the lengths in minutes for all the phone calls that you make in a week, with numbers like 2.4, 7.5, or 11.0, would be quantitative continuous data.
e.g. 19lbs, 19.5 lbs
pie charts and bar graphs display what type of data
qualititative
pareto chart
bar graph but the bars are ordered by catergory size largest to smallest
when should pie charts not be used
when things can belong to more than one category or when percentages do not add to 100%, or when frequencies do not add to total!
use bar graph instead
how should pie charts be organized
by the size of each wedge
sampling with replacement
truly random sampling, once a member is picked, that member goes back into population and may be chosen more than once
—) for practical reasons for most populations, simple random sampling without replacement is done
sampling errors
-occurs with process of sampling having errors, will always have some sampling error cuz sample will never exactly represent population - sample not large enough
nonsampling errors
errors not related to sampling e.g. counting device faulty
Self-selected samples: Responses
Responses only by people who choose to respond, such as call-in surveys, are often unreliable.
Confounding:
When the effects of multiple factors on a response cannot be separated. Confounding makes it difficult or impossible to draw valid conclusions about the effect of each factor.
practice creating sampling
Try It 1.11
You are going to use the random number generator to generate different types of samples from the data.
This table displays six sets of quiz scores (each quiz counts 10 points) for an elementary statistics class.
#1 | #2 | #3 | #4 | #5 | #6 |
|---|---|---|---|---|---|
5 | 7 | 10 | 9 | 8 | 3 |
10 | 5 | 9 | 8 | 7 | 6 |
9 | 10 | 8 | 6 | 7 | 9 |
9 | 10 | 10 | 9 | 8 | 9 |
7 | 8 | 9 | 5 | 7 | 4 |
9 | 9 | 9 | 10 | 8 | 7 |
7 | 7 | 10 | 9 | 8 | 8 |
8 | 8 | 9 | 10 | 8 | 8 |
9 | 7 | 8 | 7 | 7 | 8 |
8 | 8 | 10 | 9 | 8 | 7 |
Table 1.7
Instructions: Use the Random Number Generator to pick samples.
Create a stratified sample by column. Pick three quiz scores randomly from each column.
Number each row one through ten.
On your calculator, press Math and arrow over to PRB.
For column 1, Press 5:randInt( and enter 1,10). Press ENTER. Record the number. Press ENTER 2 more times (even the repeats). Record these numbers. Record the three quiz scores in column one that correspond to these three numbers.
Repeat for columns two through six.
These 18 quiz scores are a stratified sample.
Create a cluster sample by picking two of the columns. Use the column numbers: one through six.
Press MATH and arrow over to PRB.
Press 5:randInt( and enter 1,6). Press ENTER. Record the number. Press ENTER and record that number.
The two numbers are for two of the columns.
The quiz scores (20 of them) in these 2 columns are the cluster sample.
Create a simple random sample of 15 quiz scores.
Use the numbering one through 60.
Press MATH. Arrow over to PRB. Press 5:randInt( and enter 1, 60).
Press ENTER 15 times and record the numbers.
Record the quiz scores that correspond to these numbers.
These 15 quiz scores are the random sample.
Create a systematic sample of 12 quiz scores.
Use the numbering one through 60.
Press MATH. Arrow over to PRB. Press 5:randInt( and enter 1, 60).
Press ENTER. Record the number and the first quiz score. From that number, count ten quiz scores and record that quiz score. Keep counting ten quiz scores and recording the quiz score until you have a sample of 12 quiz scores. You may wrap around (go back to the beginning).
explanatory variable.
When one variable causes change in another, we call the first variable the explanatory variable.
response variable.
When one variable causes change in another, we call the first variable the explanatory variable. The affected variable is called the response variable.
treatments
The different values of the explanatory variable are called treatments
experimental unit
An experimental unit is a single object or individual to be measured.
variable
a characteristic of interest for each person or object in a population
lurking variable
additional variables that can cloud a study
how do you prove that the explanatory variable is causing a change in the response variable
you isolate the explanatory variable
—→ you must design experiment in a way that there is only one difference between groups being compared: the planned treatments
—> you can do this by the random assignment of experimental units to treatment groups. When subjects are assigned treatments randomly, all of the potential lurking variables are spread equally among the groups.
—> the only difference between groups is the one imposed by the researcher. Different outcomes measured in the response variable, therefore, must be a direct result of the different treatments. In this way, an experiment can prove a cause-and-effect connection between the explanatory and response variables.
control group
group given a placebo treatment - treatment that cannot directly influence response variable
You want to investigate the effectiveness of vitamin E in preventing disease. You recruit a group of subjects and ask them if they regularly take vitamin E. You notice that the subjects who take vitamin E exhibit better health on average than those who do not. Does this prove that vitamin E is effective in disease prevention?
It does not. There are many differences between the two groups compared in addition to vitamin E consumption. People who take vitamin E regularly often take other steps to improve their health: exercise, diet, other vitamin supplements, choosing not to smoke. Any one of these factors could be influencing health. As described, this study does not prove that vitamin E is the key to disease prevention.
levels of measurement
nominal scale level
ordinal scale level
interval scale level
ratio scale level
nominal scale data (organic chem tutor)
qualitative/ categorical
names, colors, labels, gender etc.
order does not matter
cannot be used in calculation
responses of people to these can be used in calculation though - like 5 ppl said fave color is red, 3 say is blue, 2 say is green
e.g. could assign 1 to red, 2 to blue, 3 to green - cannot be used in calculation cuz they qualitative
ordinal scale data (org chem)
order matters cuz is ranking/ placement
differences cannot be measured - e.g. need to be given 1st, 2nd and 3rd place time to measure difference between data.
—→ e.g.if given times, difference between 1st and 2nd place time not the same as differences between 2nd and 3rd place time
****e.g. can assign values to survey results, 1 is excellent, 2 is good, 3 is satisfactory etc. but you cannot measure the difference between excellent and good etc.
interval scale data
order matters- can place in inc or dec order with meaning
differences can be measured (except ratios)
no true “0” starting point
e.g. temps:
30F
60F
90F
—> differences between temperatures can be measured
—→ but you cannot measure using ratio cuz 60 is not twice as hot as 30
—→ there are temps lower than 0 for F and C so temps dont start at 0
ratio scale data
order matters - can place in meaningful order of inc or dec
differences are measureable (including ratios)
contains a “0” starting point
e.g. place grades in inc or dec order, you can measure difference between grades by just subtracting them from each other
0, 30, 56, 70, 82, 90
can meaure with ratios - 90/30 is 3, you can say that the student that scored 90 has a grade 3x as high as the student who scored 30
data truly starts at 0 cuz cant have grade lower than that
what scales of data can be labeled
nominal, ordinal , interval, ratio
what data scales have meaningful order
-ordinal, interval, ratio
what data scales have measurable difference
interval and ratio scale data
what scale data has a true zero starting point
ratio only
frequency
number of times a value of the data occurs. According
relative frequency
is the ratio (fraction or proportion) of the number of times a value of the data occurs in the set of all outcomes to the total number of outcomes.
cumulative relative frequency
the accumulation of the previous relative frequencies. To find the cumulative relative frequencies, add all the previous relative frequencies to the relative frequency for the current row
0.15, 0.15+0.25 =0.40, 0.40+ 0.15= 0.55 etc.
difference between frequencies table
DATA VALUE | FREQUENCY | RELATIVE | CUMULATIVE RELATIVE |
|---|---|---|---|
2 | 3 | 3/20 or 0.15 | 0.15 |
3 | 5 | 5/20 or 0.25 | 0.15 + 0.25 = 0.40 |
4 | 3 | 3/20 or 0.15 | 0.40 + 0.15 = 0.55 |
5 | 6 | 6/20 or 0.30 | 0.55 + 0.30 = 0.85 |
6 | 2 | 2/20 or 0.10 | 0.85 + 0.10 = 0.95 |
7 | 1 | 1/20 or 0.05 | 0.95 + 0.05 = 1.00 |
1.3 Problems for practice
descriptive statistics
Organizing and summarizing data \e.g. by graphing or finding average
inferential statistics
formal methods of drawing conclusions from good data
representative sample
The sample must contain the characteristics of the population in order to be a representative sample.
ASK ABT POWER OF SUGGESTION!!! - highlighted in textbook
blinding in an experiment
when person blinded in research study, they do not know who receives active treatments and who receives placebo treatment!
double-blind experiment
both the subjects and the researchers involved with the subjects are blinded - they do not know who is getting what
RANDOM ASSIGNMENT ELIMINATES WHAT
THE PROBLEM OF LURKING VARIABLES!