intro to stats terms - sampling and general stats terms 1.1-1.4

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/87

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

88 Terms

1
New cards

population

entire group that you wish to study

2
New cards

sample

  • group within the population that you actually collect data from

3
New cards

inference

conclusion that we CAN MAKE USING SAMPLE DATA ABOUT POPULATION!

4
New cards

FOR SAMPLE SURVEYS YOU NEED WHAT

  • DETERMINE U NEED TO USE SAMPLE

  • RANDOMIZE

  • DETERMINE GOOD SAMPLE SIZE

5
New cards

CENSUS

type of sample that included everyone in a population

  • can be problematic if studying big populations or constantly changing populations - impractical

6
New cards

biased

Sample methods that tend to over- or underemphasize some characteristics of the population are said to be biased

-e.g. if want to survey sinclair students but only survey women students and not men, is biased

7
New cards

parameter

a measure used to talk about a population

-Parameters go with Population

-if using any numbers to talk about population, you use the term parameters

8
New cards

statistic

is a measure used to talk about a sample.

  • Statistics go with Samples

  • -if using any numbers to talk about samples, you use term statistics

9
New cards

types of samples

10
New cards

SRS

simple random sample

  • the standard that sampling is measured against

  • go to sampling

  • completely random how you choose people

Ex. Assign each person in the possible sample a number and use a random number generator to randomly select 100 of them. - like names out of a hat blindly

11
New cards

sampling variability

  • applies to SRS

Samples drawn at random will be different; thus, the statistics associated with them may be different. This is known as sampling variability

100 people you sample may not have the same results as 100 people another samples, should be close, but will vary

12
New cards

What if we suspect that there are two (or more) distinct groups within our population that think differently on a topic?

What if we suspect that there are two (or more) distinct groups within our population that think differently on a topic?

  • we use stratified sampling

  • We can break the population into strata (ex. males and females) before sampling the data. We will then pull percentages of each group based on their percentage in the entire population

  • —> reduces sampling variability

  • e.g. We think there will be a distinct difference in opinion between the 60% of the population that are young and middle aged voters (18-49) and the 40% of the population that is older people (50 and older). So we divide them into those two strata (have young and middle aged voters in one group and older voters in the other) and then randomly choose 60% of our sample from people ages 18-49 and 40% from people ages 50 and older

13
New cards

What if we cannot obtain a list of everyone in our population?

  • SRS cannot be used because we cannot assign everyone an ID number, and thus we do not know everyone in our population

  • stratified sampling cannot be used cuz we dont know proportionally how people stack up within population

  • WE USE CLUSTER SAMPLING!

  • —→ We can sample certain clusters of our population, as long as we have no reason to believe that the individuals in these clusters differ from one cluster to another

  • Ex. We want to sample 100 high school students at a certain school. These students each have a home room to which they are randomly assigned. We could randomly pick a few home rooms and sample everyone in each of those home rooms.

  • —> we have all of these different clusters organized the same

14
New cards

Stratified Sampling vs. Cluster Sampling

Stratified - divided into groups of similar individuals such that the strata are distinct from one another - we split everyone into groups that have some difference e.g. age - in stratified sample, we survey ppl from every single one of those groups in proportion to the population

Cluster - all groups pretty much same, we choose only a few and survey everyone in those couple groups

15
New cards

Systematic Sampling

When the order of the individuals is in no way associated with their possible responses, we can use systematic sampling, in which we pick a random individual to start with and then survey every nth person. every 5th person etc.

Ex. If we want to survey 100 students off of a list of 1,000 students, pick a random number from 1 to 10 to choose where to start among the first ten students (ex. 7), and then choose every 10th student from there (ex. 17, 27, 37, etc.)

16
New cards

good Sampling Methods

SRS - simple random sampling

stratified sampling

cluster sampling -

systematic sampling

17
New cards

Pick every 10th passenger as people board the plane

systematic

18
New cards

From the boarding list, randomly choose 5 people flying first class and 25 of the other passengers.

stratified sampling

19
New cards

Randomly generate 30 seat numbers and survey the passengers who sit there.

simple random sampling SRS

20
New cards

Randomly select a seat position (right window, right center, right aisle, etc.) and survey all passengers sitting in those seats

cluster sampling cuz we assumiing ppl have no reason to feel different abt their journey

21
New cards

Know What You Want To Know

When setting up your survey, be sure to consider exactly what you are investigating and what data will help you in that process.

22
New cards

How many hours did you sleep last night? vs. How much do you usually sleep?

How many hours did you sleep last night? better cuz specific, other one could lead to answers like “a lot”

23
New cards

Ask For Quantitative Results - How many magazines did you read last week? vs. How much do you read magazines: A lot, A moderate amount, A little, or Not at all?

How many magazines did you read last week? better

cuz other question, we do not know what ppl define as a lot or a little

24
New cards

Be Careful In Your Question Phrasing!!!

Your questions should be clear and should NOT lead the respondent to any particular answer / influence their answer!

25
New cards

“In a recent study, students in an Algebra 1 course were given a 25 question basic skills test. On average, students used a graphing calculator to answer 21 out of 25 questions. Do you think graphing calculators are overused? - what is wrong

  • by giving the prior info to the question, we are leading the respondent by giving info that implies graphing calcs are being used too much!

26
New cards

“By using a graphing calculator, students in an Algebra 1 course are able to make visual connections between equations and their graphs, reinforcing difficult concepts. Do you think graphing calculators are overused?” - what is wrong

  • the background info leads respondents to think that graphing calcs are not being overused. - we should not be influencing them in any way b4 we ask question!

27
New cards

“Do you like English or Math?” -what is wrong

question is not specific enough, does not allow us to analyse data, could get stuff like not at all or a little bit

28
New cards

“Do you send/receive text messages frequently? - what is wrong

we are not asking for quanitiative data, which we need. should ask how much send/recieve in a day

29
New cards

“Do you believe posting anti-drug posters in schools is salubrious? -what is wrong

  • wording too complex, ppl dont know what salubrious mean!

  • could answer without knowing the meaning!

30
New cards

Voluntary Response Sample/Bias

  • we do NOT want to use this

  • a large group of individuals are invited to respond, and all who respond are counted. Ex. A poll on a web site - is voluntary cuz u choose to visit website and u choose to respond to poll, thus its not very representative of the population

  • tend to overemphasise the opinions of those who feel strongly on this issue/experience, and underemphasise those who feel in the middle

31
New cards

Convenience Sampling

  • we do not want to use this

  • we simply survey those people that are conveniently available.

Ex. I survey all of the students in my classes, but we really want to find out about sinclair students as a whole - that is not a representative sample!

32
New cards

Types of Bias Video Lesson

33
New cards

Undercoverage e.g of (Selection Bias)

occurs when some portion of the population is not sampled.

Ex. A phone survey tends to miss people who work during the day or people who do not have a telephone.

34
New cards

selection bias

any time some group is going to be over or underrepresented - theres problem in selection process

35
New cards

Nonresponse Bias

Some surveys can be set up where many members of the sample do not respond

It is better to create a smaller sample with a higher response rate than a larger sample with a high nonresponse rate.

36
New cards

Response Bias

Response bias refers to anything in the survey design that influences the responses. - something that affects how respondents respond to question

  • Ex. Respondents not wanting to reveal personal facts or admit to illegal or unapproved behavior. - something they are not going to be comfortable admitting/stating

  • Ex. The wording of the questions influencing responses.

37
New cards

Variable

what is being studied

  • a characteristic or measurement that can be determined for each member of a population

  • may be numerical (weight or time in hours) or categorical (category)

38
New cards

Data

a collection of facts or information from which conclusions can be drawn

39
New cards

Quantitative Variable

a variable that takes on a number for its value - e.g. what percentage get on test

40
New cards

Categorical/ Qualitative variable

a variable that takes on a name/label/category for its value e.g. colour: blue or name of a tv show

41
New cards

y are u dependent

what u are measuring/ what you are trying to study, response variable

grade would be on y axis - dependent

time spent studying would be on x axis - is what u can control - independent

42
New cards

observational studies

Observational studies are studies in which the researcher does not assign a treatment or design an experiment, instead just observing things that are happening or data that have already been collected

Ex. A researcher wants to know if taking an AP test helps students succeed in college. The researcher pulls the past five years' college GPA data on students who took AP tests in high school

- researcher not involved in any of these decisions, not dictating who takes AP tests, has nothing to do with their success in college, just lets things happens naturally and studies data compiled,

It is IMPOSSIBLE for an observational study to ever show a cause-and-effect relationship!

—> cuz not in control of any of variables

  • These studies merely help us describe associations between certain data, helps us show correlation between variables but does not show causation

43
New cards

Can we ever actually prove anything is a cause-and-effect relationship

In fact, we can! A properly setup experiment can do this for us!

Because we can control every aspect of an experiment, we can block out any lurking variables and actually prove a cause-and-effect relationship

44
New cards

A study randomly selected men from various countries around the world and collected information about diet and health. The men who ate Mediterranean diets had lower rates of heart disease than those men who did not.

  • observational study

  • population is ALL MEN

  • VARIABLES - THINGS THAT CHANGE - diet that the person eats - is categorical and quanitative, and whether or not the men had hard disease - is categorical and dependent

  • cannot prove cause and effect relationship cuz working with observational

45
New cards

A study gathered 7,000 people in Spain who were at high risk of heart disease. Participants were randomly assigned to one of three groups: Mediterranean diet supplemented with extra-virgin olive oil, Mediterranean diet supplemented with mixed nuts, or Control group that were just advised to reduce their dietary fat intake. Participants were followed for five years. The groups on the Mediterranean diet had significantly lower rates of heart diseas

  • experiment

  • all people in Spain at high risk of heart dieease studied

  • indep.and quant. variable: diet . quant and dep: presence of heart diesease

  • yes cause and efffect can be determined by this study cuz working with experiment, we can cantrol all lurking variables

46
New cards

TEXTBOOK FLASHCARDS

47
New cards

quantitative discrete data

  • all data that are the result of counting e.g. one phone call

  • often starts with numbers of

48
New cards

quantitative continuous data

Data that are not only made up of counting numbers, but that may include fractions, decimals, or irrational numbers, are called quantitative continuous data. Continuous data are often the results of measurements like lengths, weights, or times. A list of the lengths in minutes for all the phone calls that you make in a week, with numbers like 2.4, 7.5, or 11.0, would be quantitative continuous data.

e.g. 19lbs, 19.5 lbs

49
New cards

pie charts and bar graphs display what type of data

qualititative

50
New cards

pareto chart

bar graph but the bars are ordered by catergory size largest to smallest

51
New cards

when should pie charts not be used

when things can belong to more than one category or when percentages do not add to 100%, or when frequencies do not add to total!

  • use bar graph instead

52
New cards

how should pie charts be organized

by the size of each wedge

53
New cards

sampling with replacement

truly random sampling, once a member is picked, that member goes back into population and may be chosen more than once

—) for practical reasons for most populations, simple random sampling without replacement is done

54
New cards

sampling errors

-occurs with process of sampling having errors, will always have some sampling error cuz sample will never exactly represent population - sample not large enough

55
New cards

nonsampling errors

  • errors not related to sampling e.g. counting device faulty

56
New cards

Self-selected samples: Responses

Responses only by people who choose to respond, such as call-in surveys, are often unreliable.

57
New cards

Confounding:

When the effects of multiple factors on a response cannot be separated.  Confounding makes it difficult or impossible to draw valid conclusions about the effect of each factor.

58
New cards

practice creating sampling

Try It 1.11

You are going to use the random number generator to generate different types of samples from the data.

This table displays six sets of quiz scores (each quiz counts 10 points) for an elementary statistics class.

#1

#2

#3

#4

#5

#6

5

7

10

9

8

3

10

5

9

8

7

6

9

10

8

6

7

9

9

10

10

9

8

9

7

8

9

5

7

4

9

9

9

10

8

7

7

7

10

9

8

8

8

8

9

10

8

8

9

7

8

7

7

8

8

8

10

9

8

7

Table 1.7

Instructions: Use the Random Number Generator to pick samples.

  1. Create a stratified sample by column. Pick three quiz scores randomly from each column.

    • Number each row one through ten.

    • On your calculator, press Math and arrow over to PRB.

    • For column 1, Press 5:randInt( and enter 1,10). Press ENTER. Record the number. Press ENTER 2 more times (even the repeats). Record these numbers. Record the three quiz scores in column one that correspond to these three numbers.

    • Repeat for columns two through six.

    • These 18 quiz scores are a stratified sample.

  2. Create a cluster sample by picking two of the columns. Use the column numbers: one through six.

    • Press MATH and arrow over to PRB.

    • Press 5:randInt( and enter 1,6). Press ENTER. Record the number. Press ENTER and record that number.

    • The two numbers are for two of the columns.

    • The quiz scores (20 of them) in these 2 columns are the cluster sample.

  3. Create a simple random sample of 15 quiz scores.

    • Use the numbering one through 60.

    • Press MATH. Arrow over to PRB. Press 5:randInt( and enter 1, 60).

    • Press ENTER 15 times and record the numbers.

    • Record the quiz scores that correspond to these numbers.

    • These 15 quiz scores are the random sample.

  4. Create a systematic sample of 12 quiz scores.

    • Use the numbering one through 60.

    • Press MATH. Arrow over to PRB. Press 5:randInt( and enter 1, 60).

    • Press ENTER. Record the number and the first quiz score. From that number, count ten quiz scores and record that quiz score. Keep counting ten quiz scores and recording the quiz score until you have a sample of 12 quiz scores. You may wrap around (go back to the beginning).

59
New cards

explanatory variable.

When one variable causes change in another, we call the first variable the explanatory variable.

60
New cards

response variable.

When one variable causes change in another, we call the first variable the explanatory variable. The affected variable is called the response variable.

61
New cards

treatments

The different values of the explanatory variable are called treatments

62
New cards

experimental unit

An experimental unit is a single object or individual to be measured.

63
New cards

variable

a characteristic of interest for each person or object in a population

64
New cards

lurking variable

  • additional variables that can cloud a study

65
New cards

how do you prove that the explanatory variable is causing a change in the response variable

you isolate the explanatory variable

—→ you must design experiment in a way that there is only one difference between groups being compared: the planned treatments

—> you can do this by the random assignment of experimental units to treatment groups. When subjects are assigned treatments randomly, all of the potential lurking variables are spread equally among the groups.

—> the only difference between groups is the one imposed by the researcher. Different outcomes measured in the response variable, therefore, must be a direct result of the different treatments. In this way, an experiment can prove a cause-and-effect connection between the explanatory and response variables.

66
New cards

control group

  • group given a placebo treatment - treatment that cannot directly influence response variable

67
New cards

You want to investigate the effectiveness of vitamin E in preventing disease. You recruit a group of subjects and ask them if they regularly take vitamin E. You notice that the subjects who take vitamin E exhibit better health on average than those who do not. Does this prove that vitamin E is effective in disease prevention?

It does not. There are many differences between the two groups compared in addition to vitamin E consumption. People who take vitamin E regularly often take other steps to improve their health: exercise, diet, other vitamin supplements, choosing not to smoke. Any one of these factors could be influencing health. As described, this study does not prove that vitamin E is the key to disease prevention.

68
New cards

levels of measurement

  • nominal scale level

  • ordinal scale level

  • interval scale level

  • ratio scale level

69
New cards

nominal scale data (organic chem tutor)

  • qualitative/ categorical

  • names, colors, labels, gender etc.

  • order does not matter

  • cannot be used in calculation

  • responses of people to these can be used in calculation though - like 5 ppl said fave color is red, 3 say is blue, 2 say is green

e.g. could assign 1 to red, 2 to blue, 3 to green - cannot be used in calculation cuz they qualitative

70
New cards

ordinal scale data (org chem)

  1. order matters cuz is ranking/ placement

  2. differences cannot be measured - e.g. need to be given 1st, 2nd and 3rd place time to measure difference between data.

  3. —→ e.g.if given times, difference between 1st and 2nd place time not the same as differences between 2nd and 3rd place time

  4. ****e.g. can assign values to survey results, 1 is excellent, 2 is good, 3 is satisfactory etc. but you cannot measure the difference between excellent and good etc.

71
New cards

interval scale data

  • order matters- can place in inc or dec order with meaning

  • differences can be measured (except ratios)

  • no true “0” starting point

  • e.g. temps:

  • 30F

  • 60F

  • 90F

  • —> differences between temperatures can be measured

  • —→ but you cannot measure using ratio cuz 60 is not twice as hot as 30

  • —→ there are temps lower than 0 for F and C so temps dont start at 0

72
New cards

ratio scale data

  • order matters - can place in meaningful order of inc or dec

  • differences are measureable (including ratios)

  • contains a “0” starting point

  • e.g. place grades in inc or dec order, you can measure difference between grades by just subtracting them from each other

  • 0, 30, 56, 70, 82, 90

  • can meaure with ratios - 90/30 is 3, you can say that the student that scored 90 has a grade 3x as high as the student who scored 30

  • data truly starts at 0 cuz cant have grade lower than that

73
New cards

what scales of data can be labeled

  • nominal, ordinal , interval, ratio

74
New cards

what data scales have meaningful order

-ordinal, interval, ratio

75
New cards

what data scales have measurable difference

  • interval and ratio scale data

76
New cards

what scale data has a true zero starting point

ratio only

77
New cards

frequency

  • number of times a value of the data occurs. According

78
New cards

relative frequency

  • is the ratio (fraction or proportion) of the number of times a value of the data occurs in the set of all outcomes to the total number of outcomes.

79
New cards

cumulative relative frequency

  • the accumulation of the previous relative frequencies. To find the cumulative relative frequencies, add all the previous relative frequencies to the relative frequency for the current row

  • 0.15, 0.15+0.25 =0.40, 0.40+ 0.15= 0.55 etc.

80
New cards

difference between frequencies table

DATA VALUE

FREQUENCY

RELATIVE
FREQUENCY

CUMULATIVE RELATIVE
FREQUENCY

2

3

3/20 or 0.15

0.15

3

5

5/20 or 0.25

0.15 + 0.25 = 0.40

4

3

3/20 or 0.15

0.40 + 0.15 = 0.55

5

6

6/20 or 0.30

0.55 + 0.30 = 0.85

6

2

2/20 or 0.10

0.85 + 0.10 = 0.95

7

1

1/20 or 0.05

0.95 + 0.05 = 1.00

81
New cards

1.3 Problems for practice

82
New cards

descriptive statistics

Organizing and summarizing data \e.g. by graphing or finding average

83
New cards

inferential statistics

  • formal methods of drawing conclusions from good data

84
New cards

representative sample

The sample must contain the characteristics of the population in order to be a representative sample.

85
New cards

ASK ABT POWER OF SUGGESTION!!! - highlighted in textbook

86
New cards

blinding in an experiment

  • when person blinded in research study, they do not know who receives active treatments and who receives placebo treatment!

87
New cards

double-blind experiment

  • both the subjects and the researchers involved with the subjects are blinded - they do not know who is getting what

88
New cards

RANDOM ASSIGNMENT ELIMINATES WHAT

THE PROBLEM OF LURKING VARIABLES!