Studied by 3 people

0.0(0)

get a hint

hint

Looks like no one added any tags here yet for you.

1

Who was Karl Pearson and what did he say the problem with theories were?

founder of statistics

he said the problem with theories is that there are no ways to confirm if a theory is right / objective so he made a statistical method called chi square tests to determine whether or not the data fit the theory (you gather 1000 data points and calculate a chi square test)

New cards

2

How does Ronald Fisher’s technique differ from Pearson’s?

with an experiment and an analysis of variance

he said that you have to just conduct an experiment of the comparison of two or more conditions.

e.g. you can determine why a field is so variable, like what makes some plots lush and some flimsy

New cards

3

How does Jerzy Neyman’s theory differ from both theories and what is it about?

null hypothesis significance testing

null means no, hypothesis means explanation it means no explanation for the significance between two variables or null hypothesis means no difference between two different conditions

it is the most used by many disciplines including anthropology, biology, chemistry, defence strategy, education, forestry, geology, health, immunology, jurisprudence, manufacturing, medicine, neurology, ophthalmology, political science, psychology, sociology, zoology, and others.

the problem with Fisher’s analysis is that it focuses exclusively on finding the difference between two variables, but just because a difference isn’t detected doesn’t mean there isn’t one. it could just mean that the statistical test wasn’t sensitive enough to detect the difference

New cards

4

What did Jacob Cohen say?

NHST has its problems

we should calculate an effect size statistic which will show the size of the differences observed in NHST, not just that there were differences

New cards

5

Who else did not like NHST?

Geoff Cumming, he says to avoid NHST and we can’t trust p value, but NHST is still very common today

New cards

6

What is the theory put under attack?

null hypothesis significance testing

FOR EX. Cumming says we shouldn’t claim it

new ways have developed in statistics that does use NHST, but it is still widely used by some researchers and most of the thinking in NHST is required for other approaches

New cards

7

What does statistics mean?

“state numbers”

fact about or summaries of data

referring to a country’s quantifiable political characteristics such as population, taxes and area

New cards

8

What does the field of statistics mean?

the branch of mathematics that deals with collecting, analyzing, interpreting, and presenting data

it ensures that info is presented and used in a way that is most accurate

provides order by giving a set of standardized techniques that are universally understood

New cards

9

What is data?

measurements / observations of a variable

a datum or data point is a single measure / observation

New cards

10

What is a data set?

a collection of measurements and observations

New cards

11

What are the two types of statistics? What do they each mean?

descriptive

inferential

New cards

12

What is descriptive statistics?

number or graph that conveys a particular characteristic of a set of data. basically a number or figure that summarizes or describes what certain data is showing you.

FOR EX. these would be central measures of tendency, graphs like line graphs, histograms, bar graphs and polygraphs, measures of variability or even the skew of the graph

New cards

13

What is inferential statistics?

techniques used to make inferences about a population from measurements taken of a sample taken from the larger, unmeasured population

FOR EX. we can infer this population has a certain gene from the sample taken from the population

null hypothesis significance testing is a type of inferential statistics used in stats today

New cards

14

Give an example of descriptive and a inferential statistics.

you have two groups of university students and you calculate the average of their GPAS, this would be descriptive statistics

then you notice there is a difference between the averages and try to figure out if the difference was because of sampling error or chance so there is no difference or if it was because there is a difference because you picked from two different populations / they weren’t representative of the population

New cards

15

What is a problem with samples?

samples depend partly on the “luck of draw” and chance determines the particular measurements you get

FOR EX. if you were to draw a handful of candy from a candy jar and you got more reds than whites, they might not be representative of the candy jar if the candy jar actually had more whites

if you had measurements for the entire population, chance doesn’t play a part because any difference or variance you see just exists within that population, it isn’t based on how you draw it BUT sometimes variance shown in statistics is actually shown in the population

New cards

16

How does inferential statistics account for this problem with samples?

inferential statistics is a method that takes chance factors into account when samples are used to reach conclusions about populations.

New cards

17

Give an example of how a discipline used NHST.

Today, there is a lot of evidence that people remember the tasks they fail to complete better than the tasks they complete. This is known as the Zeigarnik effect. Bluma Zeigarnik asked participants in her experiment to do about 20 tasks, such as work a puzzle, make a clay figure, and construct a box from cardboard.9 For each participant, half the tasks were interrupted before completion. Later, when the participants were asked to recall the tasks they worked on, they listed more of the interrupted tasks (average about 7) than the completed tasks (about 4). One good question to start with is, “Did interrupting make a big difference or a small difference?” In this case, interruption produced about three additional memory items compared to the completion condition. This is a 75% difference, which seems like a big change, given our experience with tests of memory. The question of “How big is the difference?” can often be answered by calculating an effect size index.

we can’t conclude that interruption improves memory yet, you could conduct the experiment again but that would be more expensive. but we can use inferential statistics like NHST

NHST begins with the actual data from the experiment. It ends with a probability—the probability of obtaining data like those actually obtained if it is true that interruption has no effect on memory. If the probability of getting the same result is very small, you can conclude that interruption does affect memory. For Zeigarnik’s data, the probability was tiny. Now for the conclusion. One version might be, “After completing about 20 tasks, memory for interrupted tasks (average about 7) was greater than memory for completed tasks (average about 4). The approximate 75% difference cannot be attributed to chance because chance by itself would rarely produce a difference between two samples as large as this one.”

New cards

18

What type of discipline is statistics?

Statistics is a dynamic discipline characterized by more than a little controversy. New techniques in both descriptive and inferential statistics continue to be developed. Controversy continues too, as you saw at the end of our exploration tour

New cards

19

What is a general example of how statistics is used in a wide variety of field?

Researchers start with a phenomenon, event, or process that they want to understand better. They make measurements that produce numbers. The numbers are manipulated according to the rules and conventions of statistics. Based on the outcome of the statistical analysis, researchers draw conclusions and then write the story of their new understanding of the phenomenon, event, or process. Statistics is just one tool that researchers use, but it is often an essential tool.

New cards

20

What’s a population?

the people or group of individuals that you, the researcher, wish to study

all measurements of a specified group. The population is the thing of interest. It is defined by the investigator and includes all cases.

New cards

21

What is a sample?

a sample is a subset of a population is selected to represent the larger group in a study

usually the population is so large or inaccessible that the researcher has to pick a smaller size to represent that population

New cards

22

What’s a parameter and what’s a statistic?

a parameter is a numerical (number) or nominal (name) characteristic of a population

a statistic is some numerical or nominal characteristic of a sample.

New cards

23

What’s the difference between a statistic and parameter?

parameters are constant; statistics are variable

a parameter is constant; it does not change unless the population itself changes. the mean of a population is exactly one number. but you can’t always measure an entire population. so the statistic is used to estimate the parameter. since it is the estimate of the parameter, one sample tends to differ from the other so if you have 5 different samples, you have 5 different sample means.

New cards

24

What is a variable?

a variable is something that exists in more than one amount or in more than one form.

characteristics or conditions that change or has different values for different individuals

a quantity or characteristic that varies

FOR EX. height and eye colour

New cards

25

What are the two ways an independent variable can be defined?

experimentally defined: giving them something to make the two groups distinct

FOR EX. caffeine or no caffeine

naturally defined: it is natural, you can’t really change that variable the two groups already being distinct

FOR EX. your age, genetics

you can have both in your study, but a variable can only be either naturally defined or experimentally defined

New cards

26

What is a score?

the result of measuring a variable

New cards

27

What are quantitative variables? What are the two types of quantitative variables?

quantitative variables tell you the degree or amount of the thing being measured

most dependent measures in psych but not all

New cards

28

What are the two types of quantitative variables?

continuous (measured) variables are quantitative variables whose scores can be any value or intermediate value over the variable’s possible range

there are upper and lower limits (the number you get to where you have to round up to the nearest whole number) of continuous variables: the number 6.5 is the lower limit and 7.5 is the upper limit of the score of 7. The idea is that recall can be any value between 6.5 and 7.5, but that all the recall values in this range are expressed as 7.

FOR EX a similar way, a charge indicator value of 62% on your cell phone stands for all the power values between 61.5% (the lower limit) and 62.5% (the upper limit). sometimes scores are expressed in tenths, hundredths, or thousandths. like integers, these scores have lower and upper limits that extend halfway to the next value on the quantitative scale.

discrete (counting) variables: it means that there are no intermediate values, observations that can only exist in limited values often counts

FOR EX. the number of siblings you have, the number of times you’ve been hospitalized, and how many pairs of shoes you have are examples. intermediate scores such as 2½ just don’t make sense.

New cards

29

What is the lower limit?

bottom of the range of possible values that a measurement on a continuous variable can have.

New cards

30

What is a lower limit?

bottom of the range of possible values that a measurement on a continuous variable can have.

New cards

31

What are categorical variables?

produce scores that differ in kind and not amount. eye colour is a categorical variable.

you can use numbers to represent categories, but that doesn’t mean that it is now quantitative

all categorical variables produce discrete scores, but not all discrete scores are from a categorical variable

FOR EX. your major or political party

New cards

32

Who developed the 4 different scales of measurement and what are they?

S. S. Stevens

nominal

ordinal

interval

ratio

New cards

33

What’s a nominal scale of data?

numbers are used simply as names and have no real quantitative value.

numerals on sports uniforms are an example. thus, 45 is different from 32, but that is all you can say

New cards

34

What is an ordinal scale?

has the characteristic of the nominal scale (different numbers mean different things) plus the characteristic of indicating greater than or less than.

FOR EX. the object with the number 3 has less or more of something than

the object with the number 5.

but for ranking like 1st and 3rd, differences can’t be measured

New cards

35

What is the interval scale?

which has the properties of both the nominal and ordinal scales plus the additional property that intervals between the numbers are equal.

“equal interval” means that the distance between the things represented by 2 and 3 is the same as the distance between the things represented by 3 and 4. Temperature is measured on an interval scale. The difference in temperature between 10°C and 20°C is the same as the difference between 40°C and 50°C., but you can’t make ratio statements like 100°C is twice as high as 50°C

New cards

36

What is a ratio scale?

has all the characteristics of the nominal, ordinal, and interval scales plus one other: It has a true zero point, which indicates a complete absence of the thing measured. On a ratio scale, zero means “none.”

height, weight, and time are measured with ratio scales. zero height, zero weight, and zero time mean that no amount of these variables is present. With a true zero point, you can make ratio statements such as 16 kilograms is four times heavier than 4 kilograms.

New cards

37

Can descriptive statistics compute for numbers sometimes?

Yes

FOR EX. you can find the mean or average of a SIN number or telephone number or a psychological diagnosis. this has to be categorical.

even words/letters can be inferential / numerical.

New cards

38

What approach does the textbook take? What’s the difference?

Statistics viewpoint with some of the experimental design viewpoint

A statistics viewpoint fails to account for the other things that could happen in a problem.

New cards

39

What does an independent variable mean? What is the value of the independent variable called?

variable controlled by the researcher; changes in this variable may produce changes in the dependent variable.

a level

usually categorical, but not always

New cards

40

What does an dependent variable mean? What is the value of the dependent variable called?

observed variable that is expected to change as a result of changes in the independent variable in an experiment

a treatment or measurement is the value or level of the dependent

aka the response or outcome variable

usually quanitiative, but not always

New cards

41

How does the independent and dependent variables work in research?

The basic idea is that the researcher finds or creates two groups of participants that are similar except for the independent variable. These individuals are measured on the dependent variable. The question is whether the data will allow the experimenter to claim that the values on the dependent variable depend on the level of the independent variable.

New cards

42

What is usually the dependent variable in experiments?

the values of the dependent variable are found by measuring or observing participants in the investigation.

the dependent variable might be scores on a personality test, number of items remembered, or whether or not a passerby offered assistance.

New cards

43

What is usually the independent variable in experiments?

for the independent variable, the two groups might have been selected because they were already different—in age, gender, personality, and so forth. Alternatively, the experimenter might have produced the difference in the two groups by an experimental manipulation such as creating different amounts of anxiety or providing different levels of practice.

New cards

44

What is an example of an experiment in dependent and independent variables?

Suppose for a moment that as a budding gourmet cook you want to improve your spaghetti sauce. One of your buddies suggests adding marjoram. To investigate, you serve spaghetti sauce at two different gatherings. For one group of guests, the sauce is spiced with marjoram; for the other it is not. At both gatherings, you count the number of favorable comments about the spaghetti sauce. Stop reading; identify the independent and the dependent variables.

The dependent variable is the number of favorable comments, which is a measure of the taste of the sauce. The independent variable is marjoram, which has two levels: present and absent

New cards

45

What is an extraneous variable and how does it relate to our previous example?

variable other than the independent variable that may affect the dependent variable

extraneous variables include the amount and quality of the other ingredients

in the sauce, the spaghetti itself, the “party moods” of the two groups, and how hungry everyone was.

If any of these extraneous variables was actually operating, it weakens the claim that a difference in the comments about the sauce is the result of the presence or absence of marjoram.

New cards

46

How do you remove an extraneous variable?

The simplest way to remove an extraneous variable is to be sure that all participants are equal on that variable. For example, you can ensure that the sauces are the same except for marjoram by mixing up the ingredients, dividing it into two batches, adding marjoram to one batch but not the other, and then cooking. The “party moods” variable can be controlled (equalized) by conducting the taste test in a laboratory. Controlling extraneous variables is a complex topic covered in courses that focus on research methods and experimental design. In many experiments, it is impossible or impractical to control all the extraneous variables. Sometimes researchers think they have controlled them all, only to find that they did not. The effect of an uncontrolled extraneous variable is to prevent a simple cause-and-effect conclusion. Even so, if the dependent variable changes when the independent variable changes, something is going on. In this case, researchers can say that the two variables are related, but that other variables may play a part, too.

New cards

47

Summarize the relationships between statistics and experimental design.

Researchers suspect that there is a relationship between two variables. They design and conduct an experiment; that is, they choose the levels of the independent variable (treatments), control the extraneous variables, and then measure the participants on the dependent variable. The measurements (data) are analyzed using statistical procedures. Finally, the researcher tells a story that is consistent with the results obtained and the procedures used.

New cards

48

What is epistemology? What is statistics place in it?

The study or theory of the nature of knowledge.

both reason and experience are the ways to acquire knowledge and the basis of mathematics is reason. mathematics starts with axioms that are assumed to be true and then theorems are thought up and are then proved by giving axioms as reasons. once a theorem is proved, it can be used as a reason in a proof of other theorems.

statistics has its foundations in mathematics and thus, a statistical analysis is based on reason like calculating the mean requires reason, but experimental design is different in the fact that it includes experience and observation as well

A very common task of most human beings can be described as trying to understand. Statistics has helped many in their search for better understanding, and it is such people who have recommended (or demanded) that statistics be taught in school. A reasonable expectation is that you, too, will find statistics useful in your future efforts to understand and persuade. Speaking of persuasion, you have probably heard it said, “You can prove anything with statistics.” The implied message is that a conclusion based on statistics is suspect because statistical methods are unreliable. Well, it just isn’t true that statistical methods are unreliable, but it is true that people can misuse statistics (just as any tool can be misused). One of the great advantages of studying statistics is that you get better at recognizing statistics that are used improperly.

New cards

49

What are the 4 levels of statistical sophistication?

Category 1—those who understand statistical presentations

Category 2—those who understand, select, and apply statistical procedures

Category 3—applied statisticians who help others use statistics

Category 4—mathematical statisticians who develop new statistical techniques and

discover new characteristics of old techniques

New cards

50

What are the 3 steps to analyze a data set?

The first step is exploratory. Read all the information and examine the data. Calculate descriptive statistics and focus on the differences that are revealed. In this textbook, descriptive statistics are emphasized in Chapters 2 through 6 and include graphs, means, and effect size indexes. Calculating descriptive statistics helps you develop preliminary ideas for your story (Step 3).

The second step is to answer the question, What are the effects that chance could have on the descriptive statistics I calculated? An answer requires inferential statistics (Chapter 7 through Chapter 15).

The third step is to write the story the data reveal. Incorporate the descriptive and inferential statistics to support the conclusions in the story. Of course, the skills you’ve learned and taught yourself about composition will be helpful as you compose and write your story. Don’t worry about length; most good statistical stories about simple data sets can be told in one paragraph. Write your story using journal style, which is quite different from textbook style. Textbook style, at least this textbook, is chatty, redundant, and laced with footnotes.

New cards

51

What is a raw score?

score obtained by observation or from an experiment.

FOR EX. if you gave a bunch of students a questionnare and they answered it

New cards

52

What is a simple frequency distrubution?

scores arranged from highest to lowest, each with its frequency of occurrence

an ordered arrangement that shows the frequency of each score

the goal is to organize data to communicate how many observations exist at each category on the scale of measurement

it can be either in a graph or table

New cards

53

What do the symbols in a frequency distrubution mean?

The generic name for all variables is X, which is the symbol used in formulas.

The Frequency (f) column shows how frequently a score occurred. The tally marks are used when you construct a rough-draft version of a table and are not usually included in the final form.

N is the number of scores and is found by summing the numbers in the f column / by adding up all the frequencies.

New cards

54

What do a formal presentation of a table of frequency distribution not have?

tally marks or zero-frequency scores

this goes for both simple and grouped frequencies, but for formal grouped distributions zero frequency intervals are included if they are within the range of distribution

New cards

55

What is a grouped frequency distrubution?

scores compiled into intervals of equal size. includes the frequency of scores in each interval.

raw data are usually condensed into a grouped frequency distribution when researchers want to present the data as a graph or as a table.

usually used grouped frequency when the x values are over 20.

New cards

56

What are class intervals?

a range of scores in a grouped frequency distribution.

in Table 2.4, the entire range of scores, from 35 to 5, is reduced to 11 class intervals. each interval covers three scores; the symbol i indicates the size of the interval. In Table 2.4, i = 3.

New cards

57

What’s the midpoint?

The midpoint of the interval is noteworthy because it represents all the scores in that interval.

For example, five students had scores of 15, 16, or 17. The midpoint of the class interval 15–17 is 16.2

The midpoint, 16, represents all five scores. There are no scores in the interval 6–8, but zero-frequency intervals are included in formal grouped frequency distributions if they are within the range of the distribution.

New cards

58

What do class intervals have?

Class intervals have lower and upper limits, much like simple scores obtained by measuring a continuous variable. A class interval of 15–17 has a lower limit of 14.5 and an upper limit of 17.5.

New cards

59

What is the difference between grouped frequency distributions and simple frequency distributions?

class intervals

New cards

60

What are graphics?

Pictures that present statistical data

New cards

61

What are the horizontal axis and vertical axis?

horizontal axis - x-axis (abscissa)

vertical axis - y-axis (ordinate)

New cards

62

What do frequency distribution graphs present?

an entire set of observations of a sample or a population

New cards

63

What are the 3 ways to present frequency distribution?

frequency polygons

histograms

bar graphs

New cards

64

If the variable being graphed is a continuous variable, use a ____

frequency polygon or histogram

New cards

65

If the variable is categorical or discrete, use ____

a bar graph

New cards

66

What is a frequency polygon and what does it represent?

used to graph continuous variables (on the x-axis); frequency polygons are closed at both ends

two numbers: : the class midpoint (FOR EX. the class interval ranges from 33, 34, 35 and the midpoint 34 is used to show the frequecy of the 3 variables) directly below it on the x-axis and the frequency of that class directly across from it on the y-axis.

in cases where the lowest score in the distribution is well above zero, it is conventional to replace numbers smaller than the lowest score on the x-axis with a slash mark, which indicates that the scale to

the origin is not continuous.

New cards

67

What is a histogram?

a histogram is another graphing technique that is appropriate for

continuous variables.

A histogram is constructed by raising bars from the x-axis

to the appropriate frequencies on the y-axis. The lines that separate the bars intersect the x-axis at the lower and upper limits of the class intervals.

New cards

68

How can you decide between using a frequency polygon or a histogram?

If you are displaying two overlapping distributions on the same axes, a frequency polygon is less cluttered and easier to comprehend than a histogram. However, it is easier to read frequencies from a histogram.

New cards

69

What is a bar graph?

a bar graph is used to present the frequencies of categorical variables and

discrete variables

New cards

70

What’s the difference between a conventional bar graph and a histogram? Can a bar graph be ordinal?

A conventional bar graph looks like a histogram except

it has wider spaces between the bars. The space is a signal that the

variable is not continuous. Conventionally, bar graphs have the name of

the variable being graphed on the x-axis and frequency on the y-axis.

If an ordinal scale variable is being graphed, the order of the values

on the x-axis follows the order of the variable. If, however, the variable is a

nominal scale variable, then any order on the x-axis is permissible.

Alphabetizing might be best. Other considerations may lead to some other

order.

New cards

71

When are the distrubutions shape meaningful?

For continuous variables and ordered categorical variables, a distribution’s

shape is meaningful. For unordered categorical distributions, however, whatever shape the distribution has is arbitrary.

New cards

72

What’s a normal distribution (normal curve)?

A mathematically defined, theoretical distribution or a graph of observed scores with a particular shape.

New cards

73

What is a rectangular distribution (uniform distribution)?

A rectangular distribution (also called a uniform distribution) is a symmetrical distribution that occurs when the frequency of each value on the x-axis is the same.

New cards

74

What are skewed distrubution?

In some distributions, the scores that occur most frequently are near one end of the scale, which leaves few scores at the other end. Such distributions are skewed. Skewed distributions, like a skewer, have one end that is thin and narrow.

New cards

75

What’s a positive skew?

On graphs, if the thin point is to the right—the positive direction—

the curve has a positive skew.

New cards

76

What is a negative skew?

If the thin point is to the left, the curve is negatively skewed.

New cards

77

What are bimodal distributions?

a graph with two distinct humps is called a bimodal distribution

distributions with two modes

they don’t have to be the same height, if two high frequency scores are separated by scores with lower frequency than bimodal is appropriate

New cards

78

What’s the graph most frequently used by scientists?

line graphs

graph that uses lines to show the relationship between two variables.

New cards

79

What’s the serial position effect?

when you are more likely to remember words in the beginning or end instead of in the middle

New cards

80

What’s the difference between line graphs and frequency polygons?

Line graphs typically do not connect to the x-axis. With frequency polygons, the connection to the x-axis is a data point—no one received that score. Connecting a line graph to the x-axis says there is a data point there. Don’t do it unless the point actually represents data.

New cards

81

What are the three features every distrubtion has?

form

central tendency

variablity

they are all independent of each other

New cards

82

What is central tendency?

descriptive statistics that indicate a typical or representative score

defined as a typical score or a representative score, measures of central tendency are almost a necessity for exploring a set

of data.

used by themselves, however, they do not provide any information about form or variability.

New cards

83

What are the 3 measures of tendency covered in the textbook?

mean

median

mode

New cards

84

What is the mean?

the arithmetic average; the sum of the scores divided by the number of scores

the symbol for the mean of a sample is 𝑋 (pronounced “mean” or “X-bar”). The symbol for the mean of a population is µ (a Greek letter, mu, pronounced “mew”). Of course, an 𝑋 is only one of many possible means from a population. Because other samples from that same population produce somewhat different 𝑋s, a degree of uncertainty goes with 𝑋. 1

New cards

85

What symbol would you use for an entire population?

If you had an entire population of scores, you could calculate µ and it would carry no uncertainty with it. Most of the time, however, the population is not available and you must make do with a sample.

The difference in and µ, then, is in the interpretation. carries some uncertainty with it; µ does not.

New cards

86

What’s an example of a sample mean?

Suppose a college freshman arrives at school in the fall with a promise of a monthly allowance for spending money. Sure enough, on the first of each month, there is money to spend. However, 3 months into the school term, our student discovers a recurring problem: too much month left at the end of the money. Unable to secure a calendar compressor, our student zeros in on money spent at the Student Center. For a 2-week period, he records everything bought at the center, a record that includes coffee, both regular and cappuccino Grande, bagels (with cream cheese), chips, soft drinks, ice cream, and the occasional banana.

𝑋 = the mean

Σ = an instruction to add (Σ is uppercase Greek sigma)

X = a score or observation;

ΣX means to add all the Xs

N = number of scores or observations

These data are for a 2-week period, but our freshman is interested in his expenditures for at least 1 month and, more likely, for many months. Thus, the result is a sample mean and the symbol 𝑋 is appropriate. The amount, $2.90, is an estimate of the amount that our friend spends at the Student Center each day. $2.90 may seem unrealistic to you. If so, go back and explore the data; you can find an explanation.

New cards

87

How would you draw a graph for the statistical analysis?

Now we come to an important part of any statistical analysis, which is to answer the question, “So what?” Calculating numbers or drawing graphs is a part of almost every statistical problem, but unless you can tell the story of what the numbers and pictures mean, you won’t find statistics worthwhile. The first use you can make of Table 3.1 is to estimate the student’s monthly Student Center expenses. This is easy to do. Thirty days times $2.90 is $87.00. Now, let’s suppose our student decides that this $87.00 is an important part of the “monthly money problem.” The student has three apparent options. The first is to get more money. The second is to spend less at the Student Center. The third is to justify leaving things as they are. For this third option, our student might perform an economic analysis to determine what he gets in return for his almost $90 a month. His list might be pretty impressive: lots of visits with friends; information about classes, courses, and professors; a borrowed book that was just super; thousands of calories; and more. The point of all this is that part of the attack on the student’s money problem involved calculating a mean. However, an answer of $2.90 doesn’t have much meaning by itself. Interpretations and comparisons are called for.

New cards

88

What are the characteristics of the mean?

Three characteristics of the mean will be referred to again in this book.

First, if the mean of a distribution is subtracted from each score in that distribution and the differences are added, the sum will be zero; that is, ∑(X-𝑋 ) = 0. The statistic, X-𝑋 is called a deviation score. To demonstrate to yourself that ∑(X-𝑋 ) = 0, you might pick a few numbers to play with (numbers 1, 2, 3, 4, and 5 are easy to work with).

Second, if those deviation scores are squared and then summed, which is expressed as ∑(X-𝑋 )2 , the result will be smaller than the sum you get if any number other than 𝑋 is used. That is, using the mean minimizes the sum of squared deviations. Again, you can demonstrate this relationship for yourself by playing with a small set of scores.

Third, the formula ΣX ⁄ N produces a sample mean that is an unbiased estimator of µ, the population mean. Some sample statistics, however, are not unbiased estimators of their population parameters.

New cards

89

What is the median and what is an example of the median?

Point that divides a distribution of scores into equal halves.

The median is the point that divides a distribution of scores into two equal parts. To find the median of the Student Center expense data, begin by arranging the daily expenditures from highest to lowest. The result is Table 3.2, which is called an array. Because there are 14 scores, the halfway point, or median, will have seven scores above it and seven scores below it. The seventh score from the bottom is $2.50. The seventh score from the top is $3.00. The median, then, is halfway between these two scores, or $2.75. (The halfway point between two numbers is the mean of the two numbers. Thus, [$2.50 + $3.00]/2 = $2.75.) The median is a hypothetical point in the distribution; it may or may not be an actual score. What is the interpretation of a median of $2.75? The simplest interpretation is that on half the days our student spends less than $2.75 in the Student Center and on the other half he spends more. What if there had been an odd number of days in the sample? Suppose the student chose to sample half a month, or 15 days. Then the median would be the eighth score. The eighth score has seven scores above and seven below. For example, if an additional day was included, during which $5.00 was spent, the median would be $3.00. If the additional day’s expenditure was zero, the median would be $2.50.

New cards

90

What is the formula for median?

N + 1 / 2

New cards

91

What is the mode and an example of the mode?

score that occurs most frequently in a distribution.

New cards

92

How do you calculate the mean of a frequency distribution? (Page 50)

The first step in calculating the mean from a simple frequency distribution is to multiply each score in the X column by its corresponding f value, so that all the people who make a particular score are included. Next, sum the fX values and divide the total by N. (N is the sum of the f values.) The result is the mean. In terms of a formula, Finding Central Tendency of Simple Frequency Distributions µ or 𝑋 = ∑ f 𝑋

New cards

93

How do you calculate the median? (page 50)

Median location = 2 + 1 Median location = Thus, for the scores in Table 3.3, 2 + 1 = 100+1 2 = 50.5

To find the 50.5th position, begin adding the frequencies in Table 3.3 from the bottom (2 + 2 + 2 + 1 + . . .). The total is 43 by the time you include the score of 24. Including 25 would make the total 52—more than you need. So the 50.5th score is among those nine scores of 25. The median is 25.

Suppose you start the quest for the median at the top of the array rather than at the bottom. Again, the location of the median is at the 50.5th position in the distribution. To get to 50.5, add the frequencies from the top (2 + 1 + 2 + . . .). The sum of the frequencies of the scores from 35 down to and including 26 is 48. The next score, 25, has a frequency of 9. Thus, the 50.5th position is among the scores of 25. The median is 25.

Calculating the median by starting from the top of the distribution produces the same result as calculating the median by starting from the bottom.

New cards

94

How do you find the mode?

It is easy to find the mode from a simple frequency distribution. In Table 3.3, the score with the highest frequency, which is 10, is the mode. So, mode = 27.

New cards

95

What is each measure of central tendency good for?

usually use the mean

A mean is appropriate for ratio or interval scale data, but not for ordinal or nominal distributions. A median is appropriate for ratio, interval, and ordinal scale data, but not for nominal data. The mode is appropriate for any of the four scales of measurement. You have already thought through part of this issue in working Problem 3.5. In it, you found that the yard sign names (very literally, a nominal variable) could be characterized with a mode, but it would be impossible to try to add up the names and divide by N or to find the median of the names. For an ordinal scale such as class standing in college, either the median or the mode makes sense. The median would probably be sophomore, and the mode would be freshman.

New cards

96

How is the mean misleading if you have skewed distributions?

Even if you have interval or ratio data, the mean may be a misleading choice if the distribution is severely skewed. Here’s a story to illustrate. The developer of Swampy Acres Retirement Homesites is attempting to sell building lots in a southern “paradise” to out-of-state buyers. The marks express concern about flooding. The developer reassures them: “The average elevation of the lots is 78.5 feet and the water level in this area has never, ever exceeded 25 feet.” The developer tells the truth, but this average truth is misleading. The actual lay of the land is shown in Figure 3.1. Now look at Table 3.4, which shows the elevations of the 100 lots arranged in a grouped frequency distribution. (Grouped frequency distributions are explained in Appendix B.

New cards

97

How do you calculate the mean of a grouped frequency distribution?

To calculate the mean of a grouped frequency distribution, multiply the midpoint of each interval by its frequency, add the products, and divide by the total frequency. Thus, in Table 3.4, FIGURE 3.1 Elevation of Swampy Acres µ = 𝑁 ∑ f 𝑋 = 7850 100 = 78.5 feet The mean is 78.5 feet, exactly as the developer said. However, only the 20 lots on the bluff are out of the flood zone; the other 80 lots are, on the average, under water. The mean, in this case, is misleading. What about the median? The median of the distribution in Table 3.4 is 12.5 feet, well under the high-water mark, and a better overall descriptor of Swampy Acres Retirement Homesites. The distribution in Table 3.4 is severely skewed, which leads to the big disparity between the mean and the median. Income data present real-life examples of severely skewed distributions. For 2016, the U.S. Census Bureau reported that mean household income was $83,143. The median, however, was $59,039. For distributions that are severely or even moderately skewed, the median is often preferred because it is unaffected by extreme scores.

New cards

98

What are the open-ended class interval? What is the best types of central tendency for open-ended class intervals?

Even if you have interval or ratio data and the distribution is fairly symmetrical, there is a situation for which you cannot calculate a mean. If the highest interval or the lowest interval of a grouped frequency distribution is open-ended, there is no midpoint and you cannot calculate a mean. Age data are sometimes reported with the oldest being “75 and over.” The U.S. Census Bureau reports household income with the largest category as $200,000 and over. Because there is no midpoint to “75 and over” or to “$200,000 or more,” you cannot calculate a mean. Medians and modes are appropriate measures of central tendency when one or both of the extreme class intervals are open ended. In summary, use the mean if it is appropriate. To follow this advice, you must recognize data for which the mean is not appropriate. Perhaps Table 3.5 will help.

New cards

99

What are the relationships between the median and mean for positive and negative skews?

You can determine skewness by comparing the mean and median (most of the time). If the mean is smaller than the median, expect negative skew. If the mean is larger than the median, expect positive skew. Figure 3.2 shows the relationship of the mean to the median for a negatively skewed distribution (left) and a positively skewed distribution (right) of continuous data.

I’ll illustrate by changing the slightly skewed data in Table 3.1 into more severely skewed data. The original expenditures in Table 3.1 have a mean of $2.90 and a median of $2.50, a difference of just 40 cents. If I add an expenditure of $100.00 to the 14 scores, the mean jumps to $9.37 and the median moves up to $3.00. The difference now is $6.37 rather than 40 cents. This example follows the general rule that the greater the difference between the mean and median, the greater the skew.4 Note also that the mean of the new distribution ($9.37) is greater than every score in the distribution except for $100.00. For severely skewed distributions, the mean is not a typical score. This rule about the mean/median relationship usually works for continuous data such as SWLS scores and dollars, but it is less trustworthy for discrete data such as the number of adult residents in U.S. households (von Hippel, 2005). One response to severely skewed distributions is to use the median. Another solution, often found in inferential statistics, is the trimmed mean. A trimmed mean is calculated by excluding a certain percentage of the values from each tail of the distribution. Means trimmed by 10% and 20% are common.

New cards

100

When do you use each scale of measurement?

A mean is appropriate for ratio or interval scale data, but not for ordinal or nominal distributions.

A median is appropriate for ratio, interval, and ordinal scale data, but not for nominal data.

The mode is appropriate for any of the four scales of measurement.

For an ordinal scale such as class standing in college, either the median or the mode makes sense. The median would probably be sophomore, and the mode would be freshman.

New cards