Looks like no one added any tags here yet for you.
Who was Karl Pearson and what did he say the problem with theories were?
founder of statistics
he said the problem with theories is that there are no ways to confirm if a theory is right / objective so he made a statistical method called chi square tests to determine whether or not the data fit the theory (you gather 1000 data points and calculate a chi square test)
How does Ronald Fisher’s technique differ from Pearson’s?
with an experiment and an analysis of variance
he said that you have to just conduct an experiment of the comparison of two or more conditions.
e.g. you can determine why a field is so variable, like what makes some plots lush and some flimsy
How does Jerzy Neyman’s theory differ from both theories and what is it about?
null hypothesis significance testing
null means no, hypothesis means explanation it means no explanation for the significance between two variables or null hypothesis means no difference between two different conditions
it is the most used by many disciplines including anthropology, biology, chemistry, defence strategy, education, forestry, geology, health, immunology, jurisprudence, manufacturing, medicine, neurology, ophthalmology, political science, psychology, sociology, zoology, and others.
the problem with Fisher’s analysis is that it focuses exclusively on finding the difference between two variables, but just because a difference isn’t detected doesn’t mean there isn’t one. it could just mean that the statistical test wasn’t sensitive enough to detect the difference
What did Jacob Cohen say?
NHST has its problems
we should calculate an effect size statistic which will show the size of the differences observed in NHST, not just that there were differences
Who else did not like NHST?
Geoff Cumming, he says to avoid NHST and we can’t trust p value, but NHST is still very common today
What is the theory put under attack?
null hypothesis significance testing
FOR EX. Cumming says we shouldn’t claim it
new ways have developed in statistics that does use NHST, but it is still widely used by some researchers and most of the thinking in NHST is required for other approaches
What does statistics mean?
“state numbers”
fact about or summaries of data
referring to a country’s quantifiable political characteristics such as population, taxes and area
What does the field of statistics mean?
the branch of mathematics that deals with collecting, analyzing, interpreting, and presenting data
it ensures that info is presented and used in a way that is most accurate
provides order by giving a set of standardized techniques that are universally understood
What is data?
measurements / observations of a variable
a datum or data point is a single measure / observation
What is a data set?
a collection of measurements and observations
What are the two types of statistics? What do they each mean?
descriptive
inferential
What is descriptive statistics?
number or graph that conveys a particular characteristic of a set of data. basically a number or figure that summarizes or describes what certain data is showing you.
FOR EX. these would be central measures of tendency, graphs like line graphs, histograms, bar graphs and polygraphs, measures of variability or even the skew of the graph
What is inferential statistics?
techniques used to make inferences about a population from measurements taken of a sample taken from the larger, unmeasured population
FOR EX. we can infer this population has a certain gene from the sample taken from the population
null hypothesis significance testing is a type of inferential statistics used in stats today
Give an example of descriptive and a inferential statistics.
you have two groups of university students and you calculate the average of their GPAS, this would be descriptive statistics
then you notice there is a difference between the averages and try to figure out if the difference was because of sampling error or chance so there is no difference or if it was because there is a difference because you picked from two different populations / they weren’t representative of the population
What is a problem with samples?
samples depend partly on the “luck of draw” and chance determines the particular measurements you get
FOR EX. if you were to draw a handful of candy from a candy jar and you got more reds than whites, they might not be representative of the candy jar if the candy jar actually had more whites
if you had measurements for the entire population, chance doesn’t play a part because any difference or variance you see just exists within that population, it isn’t based on how you draw it BUT sometimes variance shown in statistics is actually shown in the population
How does inferential statistics account for this problem with samples?
inferential statistics is a method that takes chance factors into account when samples are used to reach conclusions about populations.
Give an example of how a discipline used NHST.
Today, there is a lot of evidence that people remember the tasks they fail to complete better than the tasks they complete. This is known as the Zeigarnik effect. Bluma Zeigarnik asked participants in her experiment to do about 20 tasks, such as work a puzzle, make a clay figure, and construct a box from cardboard.9 For each participant, half the tasks were interrupted before completion. Later, when the participants were asked to recall the tasks they worked on, they listed more of the interrupted tasks (average about 7) than the completed tasks (about 4). One good question to start with is, “Did interrupting make a big difference or a small difference?” In this case, interruption produced about three additional memory items compared to the completion condition. This is a 75% difference, which seems like a big change, given our experience with tests of memory. The question of “How big is the difference?” can often be answered by calculating an effect size index.
we can’t conclude that interruption improves memory yet, you could conduct the experiment again but that would be more expensive. but we can use inferential statistics like NHST
NHST begins with the actual data from the experiment. It ends with a probability—the probability of obtaining data like those actually obtained if it is true that interruption has no effect on memory. If the probability of getting the same result is very small, you can conclude that interruption does affect memory. For Zeigarnik’s data, the probability was tiny. Now for the conclusion. One version might be, “After completing about 20 tasks, memory for interrupted tasks (average about 7) was greater than memory for completed tasks (average about 4). The approximate 75% difference cannot be attributed to chance because chance by itself would rarely produce a difference between two samples as large as this one.”
What type of discipline is statistics?
Statistics is a dynamic discipline characterized by more than a little controversy. New techniques in both descriptive and inferential statistics continue to be developed. Controversy continues too, as you saw at the end of our exploration tour
What is a general example of how statistics is used in a wide variety of field?
Researchers start with a phenomenon, event, or process that they want to understand better. They make measurements that produce numbers. The numbers are manipulated according to the rules and conventions of statistics. Based on the outcome of the statistical analysis, researchers draw conclusions and then write the story of their new understanding of the phenomenon, event, or process. Statistics is just one tool that researchers use, but it is often an essential tool.
What’s a population?
the people or group of individuals that you, the researcher, wish to study
all measurements of a specified group. The population is the thing of interest. It is defined by the investigator and includes all cases.
What is a sample?
a sample is a subset of a population is selected to represent the larger group in a study
usually the population is so large or inaccessible that the researcher has to pick a smaller size to represent that population
What’s a parameter and what’s a statistic?
a parameter is a numerical (number) or nominal (name) characteristic of a population
a statistic is some numerical or nominal characteristic of a sample.
What’s the difference between a statistic and parameter?
parameters are constant; statistics are variable
a parameter is constant; it does not change unless the population itself changes. the mean of a population is exactly one number. but you can’t always measure an entire population. so the statistic is used to estimate the parameter. since it is the estimate of the parameter, one sample tends to differ from the other so if you have 5 different samples, you have 5 different sample means.
What is a variable?
a variable is something that exists in more than one amount or in more than one form.
characteristics or conditions that change or has different values for different individuals
a quantity or characteristic that varies
FOR EX. height and eye colour
What are the two ways an independent variable can be defined?
experimentally defined: giving them something to make the two groups distinct
FOR EX. caffeine or no caffeine
naturally defined: it is natural, you can’t really change that variable the two groups already being distinct
FOR EX. your age, genetics
you can have both in your study, but a variable can only be either naturally defined or experimentally defined
What is a score?
the result of measuring a variable
What are quantitative variables? What are the two types of quantitative variables?
quantitative variables tell you the degree or amount of the thing being measured
most dependent measures in psych but not all
What are the two types of quantitative variables?
continuous (measured) variables are quantitative variables whose scores can be any value or intermediate value over the variable’s possible range
there are upper and lower limits (the number you get to where you have to round up to the nearest whole number) of continuous variables: the number 6.5 is the lower limit and 7.5 is the upper limit of the score of 7. The idea is that recall can be any value between 6.5 and 7.5, but that all the recall values in this range are expressed as 7.
FOR EX a similar way, a charge indicator value of 62% on your cell phone stands for all the power values between 61.5% (the lower limit) and 62.5% (the upper limit). sometimes scores are expressed in tenths, hundredths, or thousandths. like integers, these scores have lower and upper limits that extend halfway to the next value on the quantitative scale.
discrete (counting) variables: it means that there are no intermediate values, observations that can only exist in limited values often counts
FOR EX. the number of siblings you have, the number of times you’ve been hospitalized, and how many pairs of shoes you have are examples. intermediate scores such as 2½ just don’t make sense.
What is the lower limit?
bottom of the range of possible values that a measurement on a continuous variable can have.
What is a lower limit?
bottom of the range of possible values that a measurement on a continuous variable can have.
What are categorical variables?
produce scores that differ in kind and not amount. eye colour is a categorical variable.
you can use numbers to represent categories, but that doesn’t mean that it is now quantitative
all categorical variables produce discrete scores, but not all discrete scores are from a categorical variable
FOR EX. your major or political party
Who developed the 4 different scales of measurement and what are they?
S. S. Stevens
nominal
ordinal
interval
ratio
What’s a nominal scale of data?
numbers are used simply as names and have no real quantitative value.
numerals on sports uniforms are an example. thus, 45 is different from 32, but that is all you can say
What is an ordinal scale?
has the characteristic of the nominal scale (different numbers mean different things) plus the characteristic of indicating greater than or less than.
FOR EX. the object with the number 3 has less or more of something than
the object with the number 5.
but for ranking like 1st and 3rd, differences can’t be measured
What is the interval scale?
which has the properties of both the nominal and ordinal scales plus the additional property that intervals between the numbers are equal.
“equal interval” means that the distance between the things represented by 2 and 3 is the same as the distance between the things represented by 3 and 4. Temperature is measured on an interval scale. The difference in temperature between 10°C and 20°C is the same as the difference between 40°C and 50°C., but you can’t make ratio statements like 100°C is twice as high as 50°C
What is a ratio scale?
has all the characteristics of the nominal, ordinal, and interval scales plus one other: It has a true zero point, which indicates a complete absence of the thing measured. On a ratio scale, zero means “none.”
height, weight, and time are measured with ratio scales. zero height, zero weight, and zero time mean that no amount of these variables is present. With a true zero point, you can make ratio statements such as 16 kilograms is four times heavier than 4 kilograms.
Can descriptive statistics compute for numbers sometimes?
Yes
FOR EX. you can find the mean or average of a SIN number or telephone number or a psychological diagnosis. this has to be categorical.
even words/letters can be inferential / numerical.
What approach does the textbook take? What’s the difference?
Statistics viewpoint with some of the experimental design viewpoint
A statistics viewpoint fails to account for the other things that could happen in a problem.
What does an independent variable mean? What is the value of the independent variable called?
variable controlled by the researcher; changes in this variable may produce changes in the dependent variable.
a level
usually categorical, but not always
What does an dependent variable mean? What is the value of the dependent variable called?
observed variable that is expected to change as a result of changes in the independent variable in an experiment
a treatment or measurement is the value or level of the dependent
aka the response or outcome variable
usually quanitiative, but not always
How does the independent and dependent variables work in research?
The basic idea is that the researcher finds or creates two groups of participants that are similar except for the independent variable. These individuals are measured on the dependent variable. The question is whether the data will allow the experimenter to claim that the values on the dependent variable depend on the level of the independent variable.
What is usually the dependent variable in experiments?
the values of the dependent variable are found by measuring or observing participants in the investigation.
the dependent variable might be scores on a personality test, number of items remembered, or whether or not a passerby offered assistance.
What is usually the independent variable in experiments?
for the independent variable, the two groups might have been selected because they were already different—in age, gender, personality, and so forth. Alternatively, the experimenter might have produced the difference in the two groups by an experimental manipulation such as creating different amounts of anxiety or providing different levels of practice.
What is an example of an experiment in dependent and independent variables?
Suppose for a moment that as a budding gourmet cook you want to improve your spaghetti sauce. One of your buddies suggests adding marjoram. To investigate, you serve spaghetti sauce at two different gatherings. For one group of guests, the sauce is spiced with marjoram; for the other it is not. At both gatherings, you count the number of favorable comments about the spaghetti sauce. Stop reading; identify the independent and the dependent variables.
The dependent variable is the number of favorable comments, which is a measure of the taste of the sauce. The independent variable is marjoram, which has two levels: present and absent
What is an extraneous variable and how does it relate to our previous example?
variable other than the independent variable that may affect the dependent variable
extraneous variables include the amount and quality of the other ingredients
in the sauce, the spaghetti itself, the “party moods” of the two groups, and how hungry everyone was.
If any of these extraneous variables was actually operating, it weakens the claim that a difference in the comments about the sauce is the result of the presence or absence of marjoram.
How do you remove an extraneous variable?
The simplest way to remove an extraneous variable is to be sure that all participants are equal on that variable. For example, you can ensure that the sauces are the same except for marjoram by mixing up the ingredients, dividing it into two batches, adding marjoram to one batch but not the other, and then cooking. The “party moods” variable can be controlled (equalized) by conducting the taste test in a laboratory. Controlling extraneous variables is a complex topic covered in courses that focus on research methods and experimental design. In many experiments, it is impossible or impractical to control all the extraneous variables. Sometimes researchers think they have controlled them all, only to find that they did not. The effect of an uncontrolled extraneous variable is to prevent a simple cause-and-effect conclusion. Even so, if the dependent variable changes when the independent variable changes, something is going on. In this case, researchers can say that the two variables are related, but that other variables may play a part, too.
Summarize the relationships between statistics and experimental design.
Researchers suspect that there is a relationship between two variables. They design and conduct an experiment; that is, they choose the levels of the independent variable (treatments), control the extraneous variables, and then measure the participants on the dependent variable. The measurements (data) are analyzed using statistical procedures. Finally, the researcher tells a story that is consistent with the results obtained and the procedures used.
What is epistemology? What is statistics place in it?
The study or theory of the nature of knowledge.
both reason and experience are the ways to acquire knowledge and the basis of mathematics is reason. mathematics starts with axioms that are assumed to be true and then theorems are thought up and are then proved by giving axioms as reasons. once a theorem is proved, it can be used as a reason in a proof of other theorems.
statistics has its foundations in mathematics and thus, a statistical analysis is based on reason like calculating the mean requires reason, but experimental design is different in the fact that it includes experience and observation as well
A very common task of most human beings can be described as trying to understand. Statistics has helped many in their search for better understanding, and it is such people who have recommended (or demanded) that statistics be taught in school. A reasonable expectation is that you, too, will find statistics useful in your future efforts to understand and persuade. Speaking of persuasion, you have probably heard it said, “You can prove anything with statistics.” The implied message is that a conclusion based on statistics is suspect because statistical methods are unreliable. Well, it just isn’t true that statistical methods are unreliable, but it is true that people can misuse statistics (just as any tool can be misused). One of the great advantages of studying statistics is that you get better at recognizing statistics that are used improperly.
What are the 4 levels of statistical sophistication?
Category 1—those who understand statistical presentations
Category 2—those who understand, select, and apply statistical procedures
Category 3—applied statisticians who help others use statistics
Category 4—mathematical statisticians who develop new statistical techniques and
discover new characteristics of old techniques
What are the 3 steps to analyze a data set?
The first step is exploratory. Read all the information and examine the data. Calculate descriptive statistics and focus on the differences that are revealed. In this textbook, descriptive statistics are emphasized in Chapters 2 through 6 and include graphs, means, and effect size indexes. Calculating descriptive statistics helps you develop preliminary ideas for your story (Step 3).
The second step is to answer the question, What are the effects that chance could have on the descriptive statistics I calculated? An answer requires inferential statistics (Chapter 7 through Chapter 15).
The third step is to write the story the data reveal. Incorporate the descriptive and inferential statistics to support the conclusions in the story. Of course, the skills you’ve learned and taught yourself about composition will be helpful as you compose and write your story. Don’t worry about length; most good statistical stories about simple data sets can be told in one paragraph. Write your story using journal style, which is quite different from textbook style. Textbook style, at least this textbook, is chatty, redundant, and laced with footnotes.
What is a raw score?
score obtained by observation or from an experiment.
FOR EX. if you gave a bunch of students a questionnare and they answered it
What is a simple frequency distrubution?
scores arranged from highest to lowest, each with its frequency of occurrence
an ordered arrangement that shows the frequency of each score
the goal is to organize data to communicate how many observations exist at each category on the scale of measurement
it can be either in a graph or table
What do the symbols in a frequency distrubution mean?
The generic name for all variables is X, which is the symbol used in formulas.
The Frequency (f) column shows how frequently a score occurred. The tally marks are used when you construct a rough-draft version of a table and are not usually included in the final form.
N is the number of scores and is found by summing the numbers in the f column / by adding up all the frequencies.
What do a formal presentation of a table of frequency distribution not have?
tally marks or zero-frequency scores
this goes for both simple and grouped frequencies, but for formal grouped distributions zero frequency intervals are included if they are within the range of distribution
What is a grouped frequency distrubution?
scores compiled into intervals of equal size. includes the frequency of scores in each interval.
raw data are usually condensed into a grouped frequency distribution when researchers want to present the data as a graph or as a table.
usually used grouped frequency when the x values are over 20.
What are class intervals?
a range of scores in a grouped frequency distribution.
in Table 2.4, the entire range of scores, from 35 to 5, is reduced to 11 class intervals. each interval covers three scores; the symbol i indicates the size of the interval. In Table 2.4, i = 3.
What’s the midpoint?
The midpoint of the interval is noteworthy because it represents all the scores in that interval.
For example, five students had scores of 15, 16, or 17. The midpoint of the class interval 15–17 is 16.2
The midpoint, 16, represents all five scores. There are no scores in the interval 6–8, but zero-frequency intervals are included in formal grouped frequency distributions if they are within the range of the distribution.
What do class intervals have?
Class intervals have lower and upper limits, much like simple scores obtained by measuring a continuous variable. A class interval of 15–17 has a lower limit of 14.5 and an upper limit of 17.5.
What is the difference between grouped frequency distributions and simple frequency distributions?
class intervals
What are graphics?
Pictures that present statistical data
What are the horizontal axis and vertical axis?
horizontal axis - x-axis (abscissa)
vertical axis - y-axis (ordinate)
What do frequency distribution graphs present?
an entire set of observations of a sample or a population
What are the 3 ways to present frequency distribution?
frequency polygons
histograms
bar graphs
If the variable being graphed is a continuous variable, use a ____
frequency polygon or histogram
If the variable is categorical or discrete, use ____
a bar graph
What is a frequency polygon and what does it represent?
used to graph continuous variables (on the x-axis); frequency polygons are closed at both ends
two numbers: : the class midpoint (FOR EX. the class interval ranges from 33, 34, 35 and the midpoint 34 is used to show the frequecy of the 3 variables) directly below it on the x-axis and the frequency of that class directly across from it on the y-axis.
in cases where the lowest score in the distribution is well above zero, it is conventional to replace numbers smaller than the lowest score on the x-axis with a slash mark, which indicates that the scale to
the origin is not continuous.
What is a histogram?
a histogram is another graphing technique that is appropriate for
continuous variables.
A histogram is constructed by raising bars from the x-axis
to the appropriate frequencies on the y-axis. The lines that separate the bars intersect the x-axis at the lower and upper limits of the class intervals.
How can you decide between using a frequency polygon or a histogram?
If you are displaying two overlapping distributions on the same axes, a frequency polygon is less cluttered and easier to comprehend than a histogram. However, it is easier to read frequencies from a histogram.
What is a bar graph?
a bar graph is used to present the frequencies of categorical variables and
discrete variables
What’s the difference between a conventional bar graph and a histogram? Can a bar graph be ordinal?
A conventional bar graph looks like a histogram except
it has wider spaces between the bars. The space is a signal that the
variable is not continuous. Conventionally, bar graphs have the name of
the variable being graphed on the x-axis and frequency on the y-axis.
If an ordinal scale variable is being graphed, the order of the values
on the x-axis follows the order of the variable. If, however, the variable is a
nominal scale variable, then any order on the x-axis is permissible.
Alphabetizing might be best. Other considerations may lead to some other
order.
When are the distrubutions shape meaningful?
For continuous variables and ordered categorical variables, a distribution’s
shape is meaningful. For unordered categorical distributions, however, whatever shape the distribution has is arbitrary.
What’s a normal distribution (normal curve)?
A mathematically defined, theoretical distribution or a graph of observed scores with a particular shape.
What is a rectangular distribution (uniform distribution)?
A rectangular distribution (also called a uniform distribution) is a symmetrical distribution that occurs when the frequency of each value on the x-axis is the same.
What are skewed distrubution?
In some distributions, the scores that occur most frequently are near one end of the scale, which leaves few scores at the other end. Such distributions are skewed. Skewed distributions, like a skewer, have one end that is thin and narrow.
What’s a positive skew?
On graphs, if the thin point is to the right—the positive direction—
the curve has a positive skew.
What is a negative skew?
If the thin point is to the left, the curve is negatively skewed.
What are bimodal distributions?
a graph with two distinct humps is called a bimodal distribution
distributions with two modes
they don’t have to be the same height, if two high frequency scores are separated by scores with lower frequency than bimodal is appropriate
What’s the graph most frequently used by scientists?
line graphs
graph that uses lines to show the relationship between two variables.
What’s the serial position effect?
when you are more likely to remember words in the beginning or end instead of in the middle
What’s the difference between line graphs and frequency polygons?
Line graphs typically do not connect to the x-axis. With frequency polygons, the connection to the x-axis is a data point—no one received that score. Connecting a line graph to the x-axis says there is a data point there. Don’t do it unless the point actually represents data.
What are the three features every distrubtion has?
form
central tendency
variablity
they are all independent of each other
What is central tendency?
descriptive statistics that indicate a typical or representative score
defined as a typical score or a representative score, measures of central tendency are almost a necessity for exploring a set
of data.
used by themselves, however, they do not provide any information about form or variability.
What are the 3 measures of tendency covered in the textbook?
mean
median
mode
What is the mean?
the arithmetic average; the sum of the scores divided by the number of scores
the symbol for the mean of a sample is 𝑋 (pronounced “mean” or “X-bar”). The symbol for the mean of a population is µ (a Greek letter, mu, pronounced “mew”). Of course, an 𝑋 is only one of many possible means from a population. Because other samples from that same population produce somewhat different 𝑋s, a degree of uncertainty goes with 𝑋. 1