Looks like no one added any tags here yet for you.
Who was Karl Pearson and what did he say the problem with theories were?
founder of statistics
he said the problem with theories is that there are no ways to confirm if a theory is right / objective so he made a statistical method called chi square tests to determine whether or not the data fit the theory (you gather 1000 data points and calculate a chi square test)
How does Ronald Fisher’s technique differ from Pearson’s?
with an experiment and an analysis of variance
he said that you have to just conduct an experiment of the comparison of two or more conditions.
e.g. you can determine why a field is so variable, like what makes some plots lush and some flimsy
How does Jerzy Neyman’s theory differ from both theories and what is it about?
null hypothesis significance testing
null means no, hypothesis means explanation it means no explanation for the significance between two variables or null hypothesis means no difference between two different conditions
it is the most used by many disciplines including anthropology, biology, chemistry, defence strategy, education, forestry, geology, health, immunology, jurisprudence, manufacturing, medicine, neurology, ophthalmology, political science, psychology, sociology, zoology, and others.
the problem with Fisher’s analysis is that it focuses exclusively on finding the difference between two variables, but just because a difference isn’t detected doesn’t mean there isn’t one. it could just mean that the statistical test wasn’t sensitive enough to detect the difference
What did Jacob Cohen say?
NHST has its problems
we should calculate an effect size statistic which will show the size of the differences observed in NHST, not just that there were differences
Who else did not like NHST?
Geoff Cumming, he says to avoid NHST and we can’t trust p value, but NHST is still very common today
What is the theory put under attack?
null hypothesis significance testing
FOR EX. Cumming says we shouldn’t claim it
new ways have developed in statistics that does use NHST, but it is still widely used by some researchers and most of the thinking in NHST is required for other approaches
What does statistics mean?
“state numbers”
fact about or summaries of data
referring to a country’s quantifiable political characteristics such as population, taxes and area
What does the field of statistics mean?
the branch of mathematics that deals with collecting, analyzing, interpreting, and presenting data
it ensures that info is presented and used in a way that is most accurate
provides order by giving a set of standardized techniques that are universally understood
What is data?
measurements / observations of a variable
a datum or data point is a single measure / observation
What is a data set?
a collection of measurements and observations
What are the two types of statistics? What do they each mean?
descriptive
inferential
What is descriptive statistics?
number or graph that conveys a particular characteristic of a set of data. basically a number or figure that summarizes or describes what certain data is showing you.
FOR EX. these would be central measures of tendency, graphs like line graphs, histograms, bar graphs and polygraphs, measures of variability or even the skew of the graph
What is inferential statistics?
techniques used to make inferences about a population from measurements taken of a sample taken from the larger, unmeasured population
FOR EX. we can infer this population has a certain gene from the sample taken from the population
null hypothesis significance testing is a type of inferential statistics used in stats today
Give an example of descriptive and a inferential statistics.
you have two groups of university students and you calculate the average of their GPAS, this would be descriptive statistics
then you notice there is a difference between the averages and try to figure out if the difference was because of sampling error or chance so there is no difference or if it was because there is a difference because you picked from two different populations / they weren’t representative of the population
What is a problem with samples?
samples depend partly on the “luck of draw” and chance determines the particular measurements you get
FOR EX. if you were to draw a handful of candy from a candy jar and you got more reds than whites, they might not be representative of the candy jar if the candy jar actually had more whites
if you had measurements for the entire population, chance doesn’t play a part because any difference or variance you see just exists within that population, it isn’t based on how you draw it BUT sometimes variance shown in statistics is actually shown in the population
How does inferential statistics account for this problem with samples?
inferential statistics is a method that takes chance factors into account when samples are used to reach conclusions about populations.
Give an example of how a discipline used NHST.
Today, there is a lot of evidence that people remember the tasks they fail to complete better than the tasks they complete. This is known as the Zeigarnik effect. Bluma Zeigarnik asked participants in her experiment to do about 20 tasks, such as work a puzzle, make a clay figure, and construct a box from cardboard.9 For each participant, half the tasks were interrupted before completion. Later, when the participants were asked to recall the tasks they worked on, they listed more of the interrupted tasks (average about 7) than the completed tasks (about 4). One good question to start with is, “Did interrupting make a big difference or a small difference?” In this case, interruption produced about three additional memory items compared to the completion condition. This is a 75% difference, which seems like a big change, given our experience with tests of memory. The question of “How big is the difference?” can often be answered by calculating an effect size index.
we can’t conclude that interruption improves memory yet, you could conduct the experiment again but that would be more expensive. but we can use inferential statistics like NHST
NHST begins with the actual data from the experiment. It ends with a probability—the probability of obtaining data like those actually obtained if it is true that interruption has no effect on memory. If the probability of getting the same result is very small, you can conclude that interruption does affect memory. For Zeigarnik’s data, the probability was tiny. Now for the conclusion. One version might be, “After completing about 20 tasks, memory for interrupted tasks (average about 7) was greater than memory for completed tasks (average about 4). The approximate 75% difference cannot be attributed to chance because chance by itself would rarely produce a difference between two samples as large as this one.”
What type of discipline is statistics?
Statistics is a dynamic discipline characterized by more than a little controversy. New techniques in both descriptive and inferential statistics continue to be developed. Controversy continues too, as you saw at the end of our exploration tour
What is a general example of how statistics is used in a wide variety of field?
Researchers start with a phenomenon, event, or process that they want to understand better. They make measurements that produce numbers. The numbers are manipulated according to the rules and conventions of statistics. Based on the outcome of the statistical analysis, researchers draw conclusions and then write the story of their new understanding of the phenomenon, event, or process. Statistics is just one tool that researchers use, but it is often an essential tool.
What’s a population?
the people or group of individuals that you, the researcher, wish to study
all measurements of a specified group. The population is the thing of interest. It is defined by the investigator and includes all cases.
What is a sample?
a sample is a subset of a population is selected to represent the larger group in a study
usually the population is so large or inaccessible that the researcher has to pick a smaller size to represent that population
What’s a parameter and what’s a statistic?
a parameter is a numerical (number) or nominal (name) characteristic of a population
a statistic is some numerical or nominal characteristic of a sample.
What’s the difference between a statistic and parameter?
parameters are constant; statistics are variable
a parameter is constant; it does not change unless the population itself changes. the mean of a population is exactly one number. but you can’t always measure an entire population. so the statistic is used to estimate the parameter. since it is the estimate of the parameter, one sample tends to differ from the other so if you have 5 different samples, you have 5 different sample means.
What is a variable?
a variable is something that exists in more than one amount or in more than one form.
characteristics or conditions that change or has different values for different individuals
a quantity or characteristic that varies
FOR EX. height and eye colour
What are the two ways an independent variable can be defined?
experimentally defined: giving them something to make the two groups distinct
FOR EX. caffeine or no caffeine
naturally defined: it is natural, you can’t really change that variable the two groups already being distinct
FOR EX. your age, genetics
you can have both in your study, but a variable can only be either naturally defined or experimentally defined
What is a score?
the result of measuring a variable
What are quantitative variables? What are the two types of quantitative variables?
quantitative variables tell you the degree or amount of the thing being measured
most dependent measures in psych but not all
What are the two types of quantitative variables?
continuous (measured) variables are quantitative variables whose scores can be any value or intermediate value over the variable’s possible range
there are upper and lower limits (the number you get to where you have to round up to the nearest whole number) of continuous variables: the number 6.5 is the lower limit and 7.5 is the upper limit of the score of 7. The idea is that recall can be any value between 6.5 and 7.5, but that all the recall values in this range are expressed as 7.
FOR EX a similar way, a charge indicator value of 62% on your cell phone stands for all the power values between 61.5% (the lower limit) and 62.5% (the upper limit). sometimes scores are expressed in tenths, hundredths, or thousandths. like integers, these scores have lower and upper limits that extend halfway to the next value on the quantitative scale.
discrete (counting) variables: it means that there are no intermediate values, observations that can only exist in limited values often counts
FOR EX. the number of siblings you have, the number of times you’ve been hospitalized, and how many pairs of shoes you have are examples. intermediate scores such as 2½ just don’t make sense.
What is the lower limit?
bottom of the range of possible values that a measurement on a continuous variable can have.
What is a lower limit?
bottom of the range of possible values that a measurement on a continuous variable can have.
What are categorical variables?
produce scores that differ in kind and not amount. eye colour is a categorical variable.
you can use numbers to represent categories, but that doesn’t mean that it is now quantitative
all categorical variables produce discrete scores, but not all discrete scores are from a categorical variable
FOR EX. your major or political party
Who developed the 4 different scales of measurement and what are they?
S. S. Stevens
nominal
ordinal
interval
ratio
What’s a nominal scale of data?
numbers are used simply as names and have no real quantitative value.
numerals on sports uniforms are an example. thus, 45 is different from 32, but that is all you can say
What is an ordinal scale?
has the characteristic of the nominal scale (different numbers mean different things) plus the characteristic of indicating greater than or less than.
FOR EX. the object with the number 3 has less or more of something than
the object with the number 5.
but for ranking like 1st and 3rd, differences can’t be measured
What is the interval scale?
which has the properties of both the nominal and ordinal scales plus the additional property that intervals between the numbers are equal.
“equal interval” means that the distance between the things represented by 2 and 3 is the same as the distance between the things represented by 3 and 4. Temperature is measured on an interval scale. The difference in temperature between 10°C and 20°C is the same as the difference between 40°C and 50°C., but you can’t make ratio statements like 100°C is twice as high as 50°C
What is a ratio scale?
has all the characteristics of the nominal, ordinal, and interval scales plus one other: It has a true zero point, which indicates a complete absence of the thing measured. On a ratio scale, zero means “none.”
height, weight, and time are measured with ratio scales. zero height, zero weight, and zero time mean that no amount of these variables is present. With a true zero point, you can make ratio statements such as 16 kilograms is four times heavier than 4 kilograms.
Can descriptive statistics compute for numbers sometimes?
Yes
FOR EX. you can find the mean or average of a SIN number or telephone number or a psychological diagnosis. this has to be categorical.
even words/letters can be inferential / numerical.
What approach does the textbook take? What’s the difference?
Statistics viewpoint with some of the experimental design viewpoint
A statistics viewpoint fails to account for the other things that could happen in a problem.
What does an independent variable mean? What is the value of the independent variable called?
variable controlled by the researcher; changes in this variable may produce changes in the dependent variable.
a level
usually categorical, but not always
What does an dependent variable mean? What is the value of the dependent variable called?
observed variable that is expected to change as a result of changes in the independent variable in an experiment
a treatment or measurement is the value or level of the dependent
aka the response or outcome variable
usually quanitiative, but not always
How does the independent and dependent variables work in research?
The basic idea is that the researcher finds or creates two groups of participants that are similar except for the independent variable. These individuals are measured on the dependent variable. The question is whether the data will allow the experimenter to claim that the values on the dependent variable depend on the level of the independent variable.
What is usually the dependent variable in experiments?
the values of the dependent variable are found by measuring or observing participants in the investigation.
the dependent variable might be scores on a personality test, number of items remembered, or whether or not a passerby offered assistance.
What is usually the independent variable in experiments?
for the independent variable, the two groups might have been selected because they were already different—in age, gender, personality, and so forth. Alternatively, the experimenter might have produced the difference in the two groups by an experimental manipulation such as creating different amounts of anxiety or providing different levels of practice.
What is an example of an experiment in dependent and independent variables?
Suppose for a moment that as a budding gourmet cook you want to improve your spaghetti sauce. One of your buddies suggests adding marjoram. To investigate, you serve spaghetti sauce at two different gatherings. For one group of guests, the sauce is spiced with marjoram; for the other it is not. At both gatherings, you count the number of favorable comments about the spaghetti sauce. Stop reading; identify the independent and the dependent variables.
The dependent variable is the number of favorable comments, which is a measure of the taste of the sauce. The independent variable is marjoram, which has two levels: present and absent
What is an extraneous variable and how does it relate to our previous example?
variable other than the independent variable that may affect the dependent variable
extraneous variables include the amount and quality of the other ingredients
in the sauce, the spaghetti itself, the “party moods” of the two groups, and how hungry everyone was.
If any of these extraneous variables was actually operating, it weakens the claim that a difference in the comments about the sauce is the result of the presence or absence of marjoram.
How do you remove an extraneous variable?
The simplest way to remove an extraneous variable is to be sure that all participants are equal on that variable. For example, you can ensure that the sauces are the same except for marjoram by mixing up the ingredients, dividing it into two batches, adding marjoram to one batch but not the other, and then cooking. The “party moods” variable can be controlled (equalized) by conducting the taste test in a laboratory. Controlling extraneous variables is a complex topic covered in courses that focus on research methods and experimental design. In many experiments, it is impossible or impractical to control all the extraneous variables. Sometimes researchers think they have controlled them all, only to find that they did not. The effect of an uncontrolled extraneous variable is to prevent a simple cause-and-effect conclusion. Even so, if the dependent variable changes when the independent variable changes, something is going on. In this case, researchers can say that the two variables are related, but that other variables may play a part, too.
Summarize the relationships between statistics and experimental design.
Researchers suspect that there is a relationship between two variables. They design and conduct an experiment; that is, they choose the levels of the independent variable (treatments), control the extraneous variables, and then measure the participants on the dependent variable. The measurements (data) are analyzed using statistical procedures. Finally, the researcher tells a story that is consistent with the results obtained and the procedures used.
What is epistemology? What is statistics place in it?
The study or theory of the nature of knowledge.
both reason and experience are the ways to acquire knowledge and the basis of mathematics is reason. mathematics starts with axioms that are assumed to be true and then theorems are thought up and are then proved by giving axioms as reasons. once a theorem is proved, it can be used as a reason in a proof of other theorems.
statistics has its foundations in mathematics and thus, a statistical analysis is based on reason like calculating the mean requires reason, but experimental design is different in the fact that it includes experience and observation as well
A very common task of most human beings can be described as trying to understand. Statistics has helped many in their search for better understanding, and it is such people who have recommended (or demanded) that statistics be taught in school. A reasonable expectation is that you, too, will find statistics useful in your future efforts to understand and persuade. Speaking of persuasion, you have probably heard it said, “You can prove anything with statistics.” The implied message is that a conclusion based on statistics is suspect because statistical methods are unreliable. Well, it just isn’t true that statistical methods are unreliable, but it is true that people can misuse statistics (just as any tool can be misused). One of the great advantages of studying statistics is that you get better at recognizing statistics that are used improperly.
What are the 4 levels of statistical sophistication?
Category 1—those who understand statistical presentations
Category 2—those who understand, select, and apply statistical procedures
Category 3—applied statisticians who help others use statistics
Category 4—mathematical statisticians who develop new statistical techniques and
discover new characteristics of old techniques
What are the 3 steps to analyze a data set?
The first step is exploratory. Read all the information and examine the data. Calculate descriptive statistics and focus on the differences that are revealed. In this textbook, descriptive statistics are emphasized in Chapters 2 through 6 and include graphs, means, and effect size indexes. Calculating descriptive statistics helps you develop preliminary ideas for your story (Step 3).
The second step is to answer the question, What are the effects that chance could have on the descriptive statistics I calculated? An answer requires inferential statistics (Chapter 7 through Chapter 15).
The third step is to write the story the data reveal. Incorporate the descriptive and inferential statistics to support the conclusions in the story. Of course, the skills you’ve learned and taught yourself about composition will be helpful as you compose and write your story. Don’t worry about length; most good statistical stories about simple data sets can be told in one paragraph. Write your story using journal style, which is quite different from textbook style. Textbook style, at least this textbook, is chatty, redundant, and laced with footnotes.
What is a raw score?
score obtained by observation or from an experiment.
FOR EX. if you gave a bunch of students a questionnare and they answered it
What is a simple frequency distrubution?
scores arranged from highest to lowest, each with its frequency of occurrence
an ordered arrangement that shows the frequency of each score
the goal is to organize data to communicate how many observations exist at each category on the scale of measurement
it can be either in a graph or table
What do the symbols in a frequency distrubution mean?
The generic name for all variables is X, which is the symbol used in formulas.
The Frequency (f) column shows how frequently a score occurred. The tally marks are used when you construct a rough-draft version of a table and are not usually included in the final form.
N is the number of scores and is found by summing the numbers in the f column / by adding up all the frequencies.
What do a formal presentation of a table of frequency distribution not have?
tally marks or zero-frequency scores
this goes for both simple and grouped frequencies, but for formal grouped distributions zero frequency intervals are included if they are within the range of distribution
What is a grouped frequency distrubution?
scores compiled into intervals of equal size. includes the frequency of scores in each interval.
raw data are usually condensed into a grouped frequency distribution when researchers want to present the data as a graph or as a table.
usually used grouped frequency when the x values are over 20.
What are class intervals?
a range of scores in a grouped frequency distribution.
in Table 2.4, the entire range of scores, from 35 to 5, is reduced to 11 class intervals. each interval covers three scores; the symbol i indicates the size of the interval. In Table 2.4, i = 3.
What’s the midpoint?
The midpoint of the interval is noteworthy because it represents all the scores in that interval.
For example, five students had scores of 15, 16, or 17. The midpoint of the class interval 15–17 is 16.2
The midpoint, 16, represents all five scores. There are no scores in the interval 6–8, but zero-frequency intervals are included in formal grouped frequency distributions if they are within the range of the distribution.
What do class intervals have?
Class intervals have lower and upper limits, much like simple scores obtained by measuring a continuous variable. A class interval of 15–17 has a lower limit of 14.5 and an upper limit of 17.5.
What is the difference between grouped frequency distributions and simple frequency distributions?
class intervals
What are graphics?
Pictures that present statistical data
What are the horizontal axis and vertical axis?
horizontal axis - x-axis (abscissa)
vertical axis - y-axis (ordinate)
What do frequency distribution graphs present?
an entire set of observations of a sample or a population
What are the 3 ways to present frequency distribution?
frequency polygons
histograms
bar graphs
If the variable being graphed is a continuous variable, use a ____
frequency polygon or histogram
If the variable is categorical or discrete, use ____
a bar graph
What is a frequency polygon and what does it represent?
used to graph continuous variables (on the x-axis); frequency polygons are closed at both ends
two numbers: : the class midpoint (FOR EX. the class interval ranges from 33, 34, 35 and the midpoint 34 is used to show the frequecy of the 3 variables) directly below it on the x-axis and the frequency of that class directly across from it on the y-axis.
in cases where the lowest score in the distribution is well above zero, it is conventional to replace numbers smaller than the lowest score on the x-axis with a slash mark, which indicates that the scale to
the origin is not continuous.
What is a histogram?
a histogram is another graphing technique that is appropriate for
continuous variables.
A histogram is constructed by raising bars from the x-axis
to the appropriate frequencies on the y-axis. The lines that separate the bars intersect the x-axis at the lower and upper limits of the class intervals.
How can you decide between using a frequency polygon or a histogram?
If you are displaying two overlapping distributions on the same axes, a frequency polygon is less cluttered and easier to comprehend than a histogram. However, it is easier to read frequencies from a histogram.
What is a bar graph?
a bar graph is used to present the frequencies of categorical variables and
discrete variables
What’s the difference between a conventional bar graph and a histogram? Can a bar graph be ordinal?
A conventional bar graph looks like a histogram except
it has wider spaces between the bars. The space is a signal that the
variable is not continuous. Conventionally, bar graphs have the name of
the variable being graphed on the x-axis and frequency on the y-axis.
If an ordinal scale variable is being graphed, the order of the values
on the x-axis follows the order of the variable. If, however, the variable is a
nominal scale variable, then any order on the x-axis is permissible.
Alphabetizing might be best. Other considerations may lead to some other
order.
When are the distrubutions shape meaningful?
For continuous variables and ordered categorical variables, a distribution’s
shape is meaningful. For unordered categorical distributions, however, whatever shape the distribution has is arbitrary.
What’s a normal distribution (normal curve)?
A mathematically defined, theoretical distribution or a graph of observed scores with a particular shape.
What is a rectangular distribution (uniform distribution)?
A rectangular distribution (also called a uniform distribution) is a symmetrical distribution that occurs when the frequency of each value on the x-axis is the same.
What are skewed distrubution?
In some distributions, the scores that occur most frequently are near one end of the scale, which leaves few scores at the other end. Such distributions are skewed. Skewed distributions, like a skewer, have one end that is thin and narrow.
What’s a positive skew?
On graphs, if the thin point is to the right—the positive direction—
the curve has a positive skew.
What is a negative skew?
If the thin point is to the left, the curve is negatively skewed.
What are bimodal distributions?
a graph with two distinct humps is called a bimodal distribution
distributions with two modes
they don’t have to be the same height, if two high frequency scores are separated by scores with lower frequency than bimodal is appropriate
What’s the graph most frequently used by scientists?
line graphs
graph that uses lines to show the relationship between two variables.
What’s the serial position effect?
when you are more likely to remember words in the beginning or end instead of in the middle
What’s the difference between line graphs and frequency polygons?
Line graphs typically do not connect to the x-axis. With frequency polygons, the connection to the x-axis is a data point—no one received that score. Connecting a line graph to the x-axis says there is a data point there. Don’t do it unless the point actually represents data.
What are the three features every distrubtion has?
form
central tendency
variablity
they are all independent of each other
What is central tendency?
descriptive statistics that indicate a typical or representative score
defined as a typical score or a representative score, measures of central tendency are almost a necessity for exploring a set
of data.
used by themselves, however, they do not provide any information about form or variability.
What are the 3 measures of tendency covered in the textbook?
mean
median
mode
What is the mean?
the arithmetic average; the sum of the scores divided by the number of scores
the symbol for the mean of a sample is 𝑋 (pronounced “mean” or “X-bar”). The symbol for the mean of a population is µ (a Greek letter, mu, pronounced “mew”). Of course, an 𝑋 is only one of many possible means from a population. Because other samples from that same population produce somewhat different 𝑋s, a degree of uncertainty goes with 𝑋. 1
What symbol would you use for an entire population?
If you had an entire population of scores, you could calculate µ and it would carry no uncertainty with it. Most of the time, however, the population is not available and you must make do with a sample.
The difference in and µ, then, is in the interpretation. carries some uncertainty with it; µ does not.
What’s an example of a sample mean?
Suppose a college freshman arrives at school in the fall with a promise of a monthly allowance for spending money. Sure enough, on the first of each month, there is money to spend. However, 3 months into the school term, our student discovers a recurring problem: too much month left at the end of the money. Unable to secure a calendar compressor, our student zeros in on money spent at the Student Center. For a 2-week period, he records everything bought at the center, a record that includes coffee, both regular and cappuccino Grande, bagels (with cream cheese), chips, soft drinks, ice cream, and the occasional banana.
𝑋 = the mean
Σ = an instruction to add (Σ is uppercase Greek sigma)
X = a score or observation;
ΣX means to add all the Xs
N = number of scores or observations
These data are for a 2-week period, but our freshman is interested in his expenditures for at least 1 month and, more likely, for many months. Thus, the result is a sample mean and the symbol 𝑋 is appropriate. The amount, $2.90, is an estimate of the amount that our friend spends at the Student Center each day. $2.90 may seem unrealistic to you. If so, go back and explore the data; you can find an explanation.
How would you draw a graph for the statistical analysis?
Now we come to an important part of any statistical analysis, which is to answer the question, “So what?” Calculating numbers or drawing graphs is a part of almost every statistical problem, but unless you can tell the story of what the numbers and pictures mean, you won’t find statistics worthwhile. The first use you can make of Table 3.1 is to estimate the student’s monthly Student Center expenses. This is easy to do. Thirty days times $2.90 is $87.00. Now, let’s suppose our student decides that this $87.00 is an important part of the “monthly money problem.” The student has three apparent options. The first is to get more money. The second is to spend less at the Student Center. The third is to justify leaving things as they are. For this third option, our student might perform an economic analysis to determine what he gets in return for his almost $90 a month. His list might be pretty impressive: lots of visits with friends; information about classes, courses, and professors; a borrowed book that was just super; thousands of calories; and more. The point of all this is that part of the attack on the student’s money problem involved calculating a mean. However, an answer of $2.90 doesn’t have much meaning by itself. Interpretations and comparisons are called for.
What are the characteristics of the mean?
Three characteristics of the mean will be referred to again in this book.
First, if the mean of a distribution is subtracted from each score in that distribution and the differences are added, the sum will be zero; that is, ∑(X-𝑋 ) = 0. The statistic, X-𝑋 is called a deviation score. To demonstrate to yourself that ∑(X-𝑋 ) = 0, you might pick a few numbers to play with (numbers 1, 2, 3, 4, and 5 are easy to work with).
Second, if those deviation scores are squared and then summed, which is expressed as ∑(X-𝑋 )2 , the result will be smaller than the sum you get if any number other than 𝑋 is used. That is, using the mean minimizes the sum of squared deviations. Again, you can demonstrate this relationship for yourself by playing with a small set of scores.
Third, the formula ΣX ⁄ N produces a sample mean that is an unbiased estimator of µ, the population mean. Some sample statistics, however, are not unbiased estimators of their population parameters.
What is the median and what is an example of the median?
Point that divides a distribution of scores into equal halves.
The median is the point that divides a distribution of scores into two equal parts. To find the median of the Student Center expense data, begin by arranging the daily expenditures from highest to lowest. The result is Table 3.2, which is called an array. Because there are 14 scores, the halfway point, or median, will have seven scores above it and seven scores below it. The seventh score from the bottom is $2.50. The seventh score from the top is $3.00. The median, then, is halfway between these two scores, or $2.75. (The halfway point between two numbers is the mean of the two numbers. Thus, [$2.50 + $3.00]/2 = $2.75.) The median is a hypothetical point in the distribution; it may or may not be an actual score. What is the interpretation of a median of $2.75? The simplest interpretation is that on half the days our student spends less than $2.75 in the Student Center and on the other half he spends more. What if there had been an odd number of days in the sample? Suppose the student chose to sample half a month, or 15 days. Then the median would be the eighth score. The eighth score has seven scores above and seven below. For example, if an additional day was included, during which $5.00 was spent, the median would be $3.00. If the additional day’s expenditure was zero, the median would be $2.50.
What is the formula for median?
N + 1 / 2
What is the mode and an example of the mode?
score that occurs most frequently in a distribution.
How do you calculate the mean of a frequency distribution? (Page 50)
The first step in calculating the mean from a simple frequency distribution is to multiply each score in the X column by its corresponding f value, so that all the people who make a particular score are included. Next, sum the fX values and divide the total by N. (N is the sum of the f values.) The result is the mean. In terms of a formula, Finding Central Tendency of Simple Frequency Distributions µ or 𝑋 = ∑ f 𝑋
How do you calculate the median? (page 50)
Median location = 2 + 1 Median location = Thus, for the scores in Table 3.3, 2 + 1 = 100+1 2 = 50.5
To find the 50.5th position, begin adding the frequencies in Table 3.3 from the bottom (2 + 2 + 2 + 1 + . . .). The total is 43 by the time you include the score of 24. Including 25 would make the total 52—more than you need. So the 50.5th score is among those nine scores of 25. The median is 25.
Suppose you start the quest for the median at the top of the array rather than at the bottom. Again, the location of the median is at the 50.5th position in the distribution. To get to 50.5, add the frequencies from the top (2 + 1 + 2 + . . .). The sum of the frequencies of the scores from 35 down to and including 26 is 48. The next score, 25, has a frequency of 9. Thus, the 50.5th position is among the scores of 25. The median is 25.
Calculating the median by starting from the top of the distribution produces the same result as calculating the median by starting from the bottom.
How do you find the mode?
It is easy to find the mode from a simple frequency distribution. In Table 3.3, the score with the highest frequency, which is 10, is the mode. So, mode = 27.
What is each measure of central tendency good for?
usually use the mean
A mean is appropriate for ratio or interval scale data, but not for ordinal or nominal distributions. A median is appropriate for ratio, interval, and ordinal scale data, but not for nominal data. The mode is appropriate for any of the four scales of measurement. You have already thought through part of this issue in working Problem 3.5. In it, you found that the yard sign names (very literally, a nominal variable) could be characterized with a mode, but it would be impossible to try to add up the names and divide by N or to find the median of the names. For an ordinal scale such as class standing in college, either the median or the mode makes sense. The median would probably be sophomore, and the mode would be freshman.
How is the mean misleading if you have skewed distributions?
Even if you have interval or ratio data, the mean may be a misleading choice if the distribution is severely skewed. Here’s a story to illustrate. The developer of Swampy Acres Retirement Homesites is attempting to sell building lots in a southern “paradise” to out-of-state buyers. The marks express concern about flooding. The developer reassures them: “The average elevation of the lots is 78.5 feet and the water level in this area has never, ever exceeded 25 feet.” The developer tells the truth, but this average truth is misleading. The actual lay of the land is shown in Figure 3.1. Now look at Table 3.4, which shows the elevations of the 100 lots arranged in a grouped frequency distribution. (Grouped frequency distributions are explained in Appendix B.
How do you calculate the mean of a grouped frequency distribution?
To calculate the mean of a grouped frequency distribution, multiply the midpoint of each interval by its frequency, add the products, and divide by the total frequency. Thus, in Table 3.4, FIGURE 3.1 Elevation of Swampy Acres µ = 𝑁 ∑ f 𝑋 = 7850 100 = 78.5 feet The mean is 78.5 feet, exactly as the developer said. However, only the 20 lots on the bluff are out of the flood zone; the other 80 lots are, on the average, under water. The mean, in this case, is misleading. What about the median? The median of the distribution in Table 3.4 is 12.5 feet, well under the high-water mark, and a better overall descriptor of Swampy Acres Retirement Homesites. The distribution in Table 3.4 is severely skewed, which leads to the big disparity between the mean and the median. Income data present real-life examples of severely skewed distributions. For 2016, the U.S. Census Bureau reported that mean household income was $83,143. The median, however, was $59,039. For distributions that are severely or even moderately skewed, the median is often preferred because it is unaffected by extreme scores.
What are the open-ended class interval? What is the best types of central tendency for open-ended class intervals?
Even if you have interval or ratio data and the distribution is fairly symmetrical, there is a situation for which you cannot calculate a mean. If the highest interval or the lowest interval of a grouped frequency distribution is open-ended, there is no midpoint and you cannot calculate a mean. Age data are sometimes reported with the oldest being “75 and over.” The U.S. Census Bureau reports household income with the largest category as $200,000 and over. Because there is no midpoint to “75 and over” or to “$200,000 or more,” you cannot calculate a mean. Medians and modes are appropriate measures of central tendency when one or both of the extreme class intervals are open ended. In summary, use the mean if it is appropriate. To follow this advice, you must recognize data for which the mean is not appropriate. Perhaps Table 3.5 will help.
What are the relationships between the median and mean for positive and negative skews?
You can determine skewness by comparing the mean and median (most of the time). If the mean is smaller than the median, expect negative skew. If the mean is larger than the median, expect positive skew. Figure 3.2 shows the relationship of the mean to the median for a negatively skewed distribution (left) and a positively skewed distribution (right) of continuous data.
I’ll illustrate by changing the slightly skewed data in Table 3.1 into more severely skewed data. The original expenditures in Table 3.1 have a mean of $2.90 and a median of $2.50, a difference of just 40 cents. If I add an expenditure of $100.00 to the 14 scores, the mean jumps to $9.37 and the median moves up to $3.00. The difference now is $6.37 rather than 40 cents. This example follows the general rule that the greater the difference between the mean and median, the greater the skew.4 Note also that the mean of the new distribution ($9.37) is greater than every score in the distribution except for $100.00. For severely skewed distributions, the mean is not a typical score. This rule about the mean/median relationship usually works for continuous data such as SWLS scores and dollars, but it is less trustworthy for discrete data such as the number of adult residents in U.S. households (von Hippel, 2005). One response to severely skewed distributions is to use the median. Another solution, often found in inferential statistics, is the trimmed mean. A trimmed mean is calculated by excluding a certain percentage of the values from each tail of the distribution. Means trimmed by 10% and 20% are common.
When do you use each scale of measurement?
A mean is appropriate for ratio or interval scale data, but not for ordinal or nominal distributions.
A median is appropriate for ratio, interval, and ordinal scale data, but not for nominal data.
The mode is appropriate for any of the four scales of measurement.
For an ordinal scale such as class standing in college, either the median or the mode makes sense. The median would probably be sophomore, and the mode would be freshman.