1/107
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is the difference between sampling bias and sample error?
SAMPLE ERROR
Differences that invariably exist between population parameters and sample statistics
Describes the natural differences that arise from sampling (samples will never 100% represent population)
SAMPLE BIAS
what is a variable?
characteristic or condition
different for different individuals in a data set
What is a margin of error?
Estimated range of differences between statistic and parameter
What is the difference between discrete variables and continuous variables? Give examples.
DISCRETE VARIABLES
Countable
Whole numbers only
Nothing in-between
Only certain numbers on an interval
5 means 5
EXAMPLES = number of students in a class, number of items sold per day, number of players on a team, number of shoes in a closet
CONTINUOUS VARIABLES
Infinite
Decimal
Always something in-between
Any value at any point along an interval
5 could mean 5.1161, 5.03859, or 5.4224736
EXAMPLES = height, weight, time, temperature
What is a random sample?
each item has an equal chance of being chosen
What is the difference between a statistic and a parameter? Give examples.
statistic = used to describe a sample
parameter = used to describe a population
What is the difference between a sample and a population? Give examples.
sample = subset of a population that represents it
population = general group you’re studying about
Why can’t you randomly select individuals from a population?
selection bias
self-reporting —> selection bias
What is the difference between descriptive and inferential statistics? Give examples of each.
DESCRIPTIVE STATISTICS
Summarize, organize, and simplify data about a sample
INFERENITAL STATISTICS
Give us generalizable information about a population (bc we often have to estimate and infer from smaller sample information to population information)
What are the different types of study methods?
Descriptive
Correlational
Comparative
Experimental
Nonexperimental
Pre-Post Study
What is the difference between an extraneous variable and a confounding variable?
EXTRANEOUS VARIABLE
Any variable that you're not investigating that can potentially affect the DV of your research study
Shouldn’t differ significantly between groups
EXAMPLE = participant characteristics (ie. gender, age, race, class) and environmental factors (ie. time of day, location, lighting)
CONFOUNDING VARIABLE
A type of extraneous variable that not only affects the DV, but is also related to the IV
EXAMPLE =
what is a construct?
a variable that cannot be directly observed
EXAMPLES = childhood, gender, nationality, loneliness
When examining the relationship between two variables, what are two reasons why there may be a difference? How do you determine whether or not this difference is significant?
1) There is a real difference between group scores that is attributable to the differences in group conditions
2) The difference is due to chance (idiosyncratic fluctuations in scores) and just an artifact of sampling error
Inferential statistics
What are three ways to ensure that participant characteristics don’t differ (control other variables)?
Random assignment
Each participant has an equal chance of being assigned to each of the treatment conditions
Distributes the participant characteristics evenly between the two groups so that neither group is noticeably more [insert characteristic] than the other
Matching characteristics in different groups
Ensures groups are equivalent in terms of participant variables and environmental variables
Accounts for any preexisting differences between participants
EXAMPLE = researcher could match groups by ensuring that each group has exactly 60% females and 40% males
Holding constants
EXAMPLE = The researcher in (Polman et al., 2008) used only 10-year-old boys as participants (holding age and gender constant).
The experimental method has what two characteristics that differentiate experiments from other types of research studies?
MANIPULATION
One variable (IV) must be manipulated to create treatment conditions
Another variable (DV) must be measured/observed to obtain scores for each condition and determine a cause-and-effect relationship
CONTROL
Extraneous variables (participant and environmental) must be controlled so they don’t affect the relationship being studied
What statistical tests are used in a correlational study?
Numerical = correlations
Categorical = chi-square
What is a pre-post study?
What is a nonexperimental study?
Examine relationships between variables by comparing groups of scores, BUT they do not have the rigor of true experiments and cannot produce cause-and-effect explanations
Uses a preexisting participant characteristic (such as older/younger) or the passage of time (before/after) to create the groups being compared
Can be within participants (pre-post study) or between participants (that naturally exist/nonequivalent groups
Measure variables as they naturally occur without any further manipulation
What is a quasi-experimental study?
What are the different types of non-experimental study methods?
Descriptive/Qualitative
Observational
Case Study
Survey
Correlational
Cross-sectional
In what scenario(s) would you be unable to conduct an experiment?
1) Unethical methodology
2) Unable reliably and feasibly measure variable
What are scores? What symbol(s) represent(s) them?
Values that result from variable measurement
Represented by X or Y
What is a frequency distribution? What does it include?
Ways to organize and summarize scores in a data set
A summary that is organized by the number of scores at each value
Includes
Set of categories from the original scale of measurement
Frequency (number) of individuals in/with each category/score
What is proportion (p)? State the equation.
The fraction of the total group that is associated with each score
p = f/N
score frequency divided by total number of scores
Typically expressed as a percentage
What is the difference between a percentile rank and a percentile? How are they similar?
Both are used to describe the position of individual scores within a distribution
Percentile = refers to score
Percentile rank = refers to a percentage (of scores at or below a certain score)
EXAMPLE
You score a 1450 on the SAT, and you know that exactly 97% all test takers had scores of either 1450 or lower.
Percentile = 97th percentile
Percentile rank = 97%
What is a percentile rank?
The percentage of individuals with scores AT or BELOW the particular value in a distribution
AKA cumulative percentage (c%)
Gives the cumulative percentage associated with a particular score
lower limit
The LOWEST something can be if it’ll be rounded UP to the normal value
EXAMPLE = 5 ft and 6 inches
Lower limit = 5 feet and 5.50 inches
upper limit
The HIGHEST something can be if it’ll be rounded DOWN to the normal value
EXAMPLE = 5 ft and 6 inches
Lower limit = 5 feet and 6.49 inches
What are bar graphs used for?
non-continuous data (ie. nominal or numerical but discrete variables)
What are six common things in a frequency distribution table?
Scores (X)
Frequency (f)
Proportion (p)
Percentage (%)
Cumulative frequency (cf)
Cumulative percentage (c%)
What is the cumulative frequency (cf)?
Refers to the NUMBER of elements at or below a score in a distribution
Can be used to calculated the cumulative percentage (c%) or percentile rank
What is a skewed distribution? What are the two types? Give examples.
The scores tend to pule up toward one end of the scale and taper off gradually at the other end
Tail points in the direction of the skew
Positively skewed = more negative scores
EXAMPLE = hard exam —> more low scores —> positively skewed
Negatively skewed = more positive scores
EXAMPLE = easy exam —> more high scores —> negatively skewed
What types of graph(s) are/is used for nominal or ordinal data?
Bar graphs
What types of graphs are used for population distribution?
The following are used when presented with large amounts of data:
Relative frequency graph
X-axis = category
Y-axis = percent
Smooth curves
X-axis = scores
Y-axis = frequency
What are three characteristics that can thoroughly describe any frequency distribution?
shape (symmetrical or skewed)
smooth curves
central tendency (mean/median/mode)
representative statistic (where center or distribution is located) → usually the average
variability (variation/SD)
extent of spread/clustering of data
What is a symmetrical distribution?
How do you graph continuous and discrete variables on a histogram?
CONTINUOUS
bars are touching
bar width corresponds to real limits of variable for continuous variables
DISCRETE
bars are separate
When are smooth curves used? What do they show? Explain axes.
When there’s ALOT of data
Used to indicate exact scores weren’t used
Shows where scores generally fall in a population
Axes
X = scores (X)
Y = frequencies (f)
Polygon
A dot appears where the center of the top of each bar would be in a histogram
corresponds to the frequency for the category
A continuous line is drawn from dot to dot to connect the series of dots
The graph is completed by drawing a line down to the X-axis (zero frequency) at each end of the range of scores. The final lines are usually drawn so that they reach the X-axis at a point that is one category below the lowest score on the left side and one category above the highest score on the right side.
What types of graphs can be used for interval or ratio data?
histograms
informal histograms
polygons
What is a stem and leaf plot?
A simple alternative to a grouped frequency distribution table or graph
Each score is separated into two parts: a stem and a leaf
Stem = First digit (or digits)
Leaf = Last digit
EXAMPLE = If X = 85 → stem = 8 and leaf = 5 ●
Gives us similar information to graphs at first glance
Every individual score can be identified
Also gives us information about each score in the data set
A data set has the following statistics:
mean = 89
median = 66
mode = 43
Is the distribution of this data set symmetrical or skewed? If skewed, is it a positive or negative skewed?
positively skewed
A data set has the following statistics:
mean = 29
median = 32
mode = 41
Is the distribution of this data set symmetrical or skewed? If skewed, is it a positive or negative skewed?
negatively skewed
What is the purpose of central tendency? What are the three measures of central tendency?
To determine the single value that identifies the center of the distribution and best represents the entire set of scores
Mean, median, mode
What may be two reasonings that may explain a difference between two groups?
1) Error
2) Significant Difference
What is a nominal scale? Give examples.
Consists of categories that differ only in NAME and are not differentiated in terms of magnitude or direction
Label and categorize
No quantitative distinctions
Numbers have no ‘real meaning’ other than differentiating between object
EXAMPLES
GENDER: Females = 1, Males = 2
UNIFORM NUMBERS: The numbers on the uniforms differentiate the different players and doesn’t provide insight on what position they’re in or how skilled they are
Others: diagnosis, experimental or control
What is an ordinal scale? Give examples.
The categories are differentiated in terms of direction, forming an ordered series
Categorizes observations
Categories are organized by size or magnitude
Assigns numbers to categories, but there is a meaningful order (indicates placement)
Doesn’t indicate differences in magnitude
EXAMPLES
Rank in class (1st, 2nd, 3rd)
Level of satisfaction (ie. survey)
Placement in an Election (1st, 2nd, 3rd)
Clothing sizes (S, M, L, XL)
Olympic medals (gold, silver, bronze)
What is an interval scale?
Consists of an ordered series of categories that are all equal-sized intervals
Possible to differentiate name, direction and distance (magnitude differences) between categories
Labeled categories (have names)
Ordered categories (numbers have order)
Interval between categories of equal size
Arbitrary or absent zero point → ratios are not meaningful
EXAMPLES
Temperature = difference between 79°C and 80°C is the same as that between 40°C and 41°C
80°C is higher than 79°C and 41°C is higher than 40°C
No zero point → 0°C has a “value” → ratios are NOT meaningful
80°C is NOT twice as hot as 40°C
IQ = difference between 100 and 110 is the same as that between 140 and 150
110 is higher than 100 and 150 is higher than 140
What is a ratio scale?
An interval scale for which the zero point indicates none of the variable being measured
Labeled categories (differentiated by name)
Ordered categories
Order has meaning
Equal interval between categories
Magnitude differences are meaningful
Absolute/true zero point → ratios are meaningful
EXAMPLES
Number of correct answers
Time to complete task
Height
Weight
What additional information is obtained from measurements on an ordinal scale compared to measurements on a nominal scale?
Direction of difference (greater or less) between two measurements (order has meaning)
You know the placement/position of each category relative to the others
EXAMPLE = runners in a race
1st place > 2nd place > 3rd place (ie. time)
EXAMPLE = ranking movie satisfaction on a scale from 1-5
1 = that shi sucks ass booty cheeks; im burning down any movie theaters that show it
5 = im giving the director/producer head fr
What additional information is obtained from measurements on an interval scale compared to measurements on a ordinal scale?
Magnitude of Difference b/w two measurements
You can know the numerical difference between each category b/c intervals are equal
No true zero → can only add/subtract values
Answers: “this movie got more votes/is more popular than the other (ordinal), but by how much?”
EXAMPLE = how hot you are (temperature)
Bill Nye is hotter than Michael B. Jordan by 400,000°C and hotter than Henry Cavil by 420,000°C
Bill Nye = 500,000°C (GYATT)
Michael B. Jordan = 100,000°C
Henry Cavil = 80,000°C
Nominal → names (bill, michael, henry)
Ordinal → bill > michael > henry
What additional information is obtained from measurements on an ratio scale compared to measurements on a interval scale?
True/Absolute Zero Point → can calculate ratios
The ratio of two measurements, which allow comparisons such as “twice as much.”
EXAMPLE = how many people you can pull in a semester
Out of all three people (Jack, Ronald, Colin), Ronald pulled 3x more people than Jack
Ronald = pulled 12 people (mostly because his name is so so hot) in a semester
Jack = pulled 4 bitchez
Colin = get ZERO bitchez this semester
Nominal → names (jack, ronald, colin)
Ordinal → ronald > jack > colin
Interval → ronald pulled 8 more ppl than jack and 12 ppl more than colin
Describe the data for a correlational research study and explain how these data are different from the data obtained in experimental and nonexperimental studies, which also evaluate relationships between two variables.
A correlational study has only one group of individuals and measures two (or more) different variables for each individual.
Other research methods evaluating relationships between variables compare two (or more) different groups of scores.
The results of a recent study showed that children who routinely drank reduced fat milk (1% or skim) were more likely to be overweight or obese at ages 2 and 4 compared to children who drank whole or 2% milk (Scharf, Demmer, & DeBoer, 2013).
a. Is this an example of an experimental or a nonexperimental study?
b. Explain how individual differences could provide an alternative explanation for the difference in weight between the groups.
c. Create a research study that would be able to differentiate among those interpretations of the results.
A
Nonexperimental
No IV is manipulated
Participants are not randomly assigned to groups that receive different amounts of milkfat
B
Individual differences (ie. characteristics, environmental factors) can impact DV (ie. lifestyle, genetics, and home life)
EXAMPLE = participants in the reduced milkfat (skim or 1% milk) group (regularly drank reduced-fat milk) also tended to be more sedentary
C
1 = A researcher could randomly assign participants to groups that receive different amounts of milkfat
2 = A researcher could assign participants to two groups that receive different amounts of milkfat
Holding constant characteristics (ie. amount of physical activity by participants in each group)
3 = A researcher could assign participants to two groups that receive different amounts of milkfat
Matching the two groups in the amount of physical activity
A tax form asks people to identify their age, annual income, number of dependents, and social security number.
For each of these four variables, identify the scale of measurement that probably is used and identify whether the variable is continuous or discrete.
Age: ratio scale and continuous
Although people usually report whole-number years, the variable is the amount of time and time is infinitely divisible
Income: ratio scale and discrete
Income is determined by units of currency. For U.S. dollars, the smallest unit is the penny and there are no intermediate values between 1 cent and 2 cents.
Dependents: ratio scale and discrete
Family size consists of whole-number categories with no intermediate values.
Social Security: nominal scale and discrete
Social security numbers are essentially names that are coded as 9-digit numbers. There are no intermediate values between two consecutive social security numbers.
Ackerman and Goldsmith (2011) compared learning performance for students who studied material printed on paper versus students who studied the same material presented on a computer screen. All students were then given a test on the material and the researchers recorded the number of correct answers.
a. Identify the dependent variable for this study.
b. Is the dependent variable discrete or continuous?
c. What scale of measurement (nominal, ordinal, interval, or ratio) is used to measure the dependent variable?
a. Number of correct answers on the test (a measure of knowledge of the material)
b. Knowledge is a continuous variable. If it is measured with a 10-question test, it may appear to be discrete but it could be measured with a 100-question test, which means that each category can be further divided.
c. Ratio scale (has absolute zero = complete absence of correct answers)
Doebel and Munakata (2018) discovered that delay of gratification by children is influenced by social context. All children were told that they were in the “green group” and were placed in a room with a single marshmallow. Participants were told that they could either eat the single marshmallow now or wait for the experimenter to return with two marshmallows. Before choosing between one marshmallow now or two later, children were randomly assigned to one of two conditions. They were told that either (1) other children in the green group waited and kids in the orange group didn’t wait or (2) other children in the green group didn’t wait and kids in the orange group waited. Children were more likely to choose to wait after being told that other members of their group waited.
a. Did this study use experimental or nonexperimental methods? b. Identify the variables in this study
A
Experimental method = participants were randomly assigned to groups that received different instructions
B
IV = instructions received by participants (being told that their group waited and the other didn’t vs. being told that their group didn’t wait and the other group waited)
DV = behavior (whether or not children chose to wait for a larger reward)
How does the median and mean differ in terms of their “middle” definition?
median = 50% percentile (by score)
mean = midpoint (by distance)
multimodal distribution
multiple modes
bimodal distribution
two modes
trimodal distribution
three modes
How do you calculate weighted mean?
(total sum of scores from both data sets) divided by (total items from both data sets)
What happens to the mean when changing (increasing/decreasing) a score in a data set?
↑ score =↑ mean
↓ score = ↓ mean
What happens to the mean when adding or removing a score (larger than, equal to, and smaller than the mean) to/from the data set?
ADDING OF A SCORE
Adding a score LARGER THAN mean = ↑ mean
Adding a score EQUAL TO mean = no change
Adding a score SMALLER THAN mean = ↓ mean
REMOVING A SCORE
Adding a score LARGER THAN mean = ↓ mean
Adding a score EQUAL TO mean = no change
Adding a score SMALLER THAN mean = ↑ mean
Besides being the sum of the scores divided by the number of scores (arithmetic average), what are two alternative definitions of a mean?
Dividing the Total Equally = the amount each individual receives when the total is distributed equally
Balance Point = total distance below the mean is the same as (balanced by) the total distance above the mean
Regardless of the distribution, a seesaw would be balanced at the mean
What happens to the mean when adding or subtracting a constant from each score?
The score changes (increases/decreases) by the same amount you change it by
Adding 5 to each score changes the mean by +5
M = 20 → 25
Subtracting 5 to each score changes the mean by -5
M = 20 → 15
What happens to the mean when multiplying or dividing a constant from each score?
The score changes (increases/decreases) by the same factor
Multiplying 5 to each score changes the mean by x5
M = 20 → 100
Dividing 5 to each score changes the mean by x1/5
M = 20 → 4
What is a skewed distribution?
A strong tendency for the mean, median, and mode to be located in predictably different positions
Mode = located toward the side where the scores pile up
Mean = pulled toward the extreme scores in the
Negatively skewed or positively skewed
What does it mean if a distribution is positively skewed?
mode < median < mean
mean is the greatest value
What does it mean if a distribution is negatively skewed?
mean < median < mode
mean is the smallest value
Define the mean, median, and mode.
mean = balance point of distribution; midpoint of the dataset (by distance)
median = midpoint of the dataset or distribution (by score)
mode = score with the highest frequency
Define the mean
TRUE OR FALSE: The mode is a frequency.
FALSE: The mode is the category with the largest frequency
When do you use the mode to describe a data set (over a mean)?
Nominal Scales
Discrete Variables
Describing Shape
When would you use the median to describe a data set (over a mean)?
Extreme Scores or Skewed Distributions
Unlike the mean, it isn’t easily influenced by outliers
Undetermined (infinite) Values
Impossible to calculate mean if there are infinite values
Open-Ended Distributions
Ordinal Scale
Why is the mean typically considered the best and most commonly used central tendency? What is it useful for?
Takes every score/value in the distribution/dataset into account
Closely related to variance and standard deviation
Useful for inferential statistics
One sample of n=4 scores has a mean of M=10, and the second sample of n=0 has a mean of M=0. If the two samples are combined, then what value will be obtained for the mean of the combined sample?
a) equal to 15
b) greater than 15 but less than 20
c) less than 15 but more than 10
d) none of the above.
c) less than 15 but more than 10
The overall weight will be more weighted towards the data set with more
Find the precise median for the following scores measuring a continuous variable.
Scores: 1 4 5 5 5 6 7 8
a) 5
b) 5.17
c) 5.67
d) 6
b) 5.17
While the discrete median would be 5 (whole #) the continuous median could be any value between 5.00 and 5.49 (any decimal that would round to 5.00)
A population of N=10 scores has a mean of 30. If every score in the distribution is multiplied by 3, then what is the value of the new mean?
a) still 30
b) 33
c) 60
d) 90
d) 90
Multiplying or dividing each score will change the mean by the same factor.
A population of scores has a mean of = 100. Calculate the mean for each of the following:
a. A constant value of 50 is added to each score
b. A constant value of 50 is subtracted from each score
c. Each score is multiplied by a constant value of 2
d. Each score is divided by a constant value of 50.
a) 150
b) 50
c) 50
d) 2
A sample of n=6 scores has a mean of M=10. If one score was changed from X=21 to X=10, what is the value of the new sample mean?
6
If the mean, median, and mode are all computed for a distribution of scores, which of the following statements cannot be true?
a) No one had a score qual to the mean
b) No one had a score equal to the median.
c) No one had a score equal to the mode.
d) None of the above.
c) No one had a score equal to the mode.
???
A sample of n=6 scores has a mean of M=10. If one score with a value of X=12 is removed from the sample, then what is the value of the new sample mean?
9.60
A sample has a mean of M=72. If one person with a score of X=98 is removed from the sample, what effect will it have on the sample mean?
a. The sample mean will increase.
b. The sample mean will decrease.
c. The sample mean will remain the same.
d. Cannot be determined from the information given.
b. The sample mean will decrease.
For a distribution of scores, the mean is equal to the median. What is the most likely shape of this distribution?
a. Symmetrical
b. Positively skewed
c. Negatively skewed
d. Impossible to determine the shape
a. Symmetrical
For a positively skewed distribution with a mode of X=20 and a median of X=25, what is the most likely value for the mean?
a. Greater than 25
b. Less than 20
c. Between 20 and 25
d. Cannot be determined from the information given
a. Greater than 25
For a positively skewed distribution, the mean is GREATER THAN the median, which is greater than the mode (mean>median>mode). Therefore, the mean has to be GREATER THAN both 20 and 25.
For a negatively skewed distribution with a mode of X=37 and a median of X=28, what is the most likely value for the mean?
a. Greater than 37
b. Less than 28
c. Between 28 and 37
d. Cannot be determined from the information given
b. Less than 28
For a negatively skewed distribution, the mean is LESS THAN the median, which is less than the mode (mean<median<mode). Therefore, the mean has to be LESS THAN both 37 and 28.
For a positively skewed distribution, what is the most probable order for the three measures of central tendency from smallest to largest?
a. Mean, median, mode
b. Mean, mode, median
c. Mode, mean, median
d. Mode, median, mean
d. Mode, median, mean
For a positively skewed distribution, the mean is GREATER THAN the median, which is greater than the mode (mean>median>mode).
Therefore, from smallest to largest, it is mode<median<mean.
A researcher is measuring problem-solving times for a sample of n=20 laboratory rats. However, one of the rats fails to solve the problem so the researcher has an undetermined score. What is the best measure of central tendency for these data?
a. The mean
b. The median
c. The mode
d. Central tendency cannot be determined for these data.
b. The median
The median would be chosen over the mean as a central tendency in the following scenarios: (1) Extreme Scores or Skewed Distributions, (2) Undetermined Values, (3) Open-Ended Distributions, and (4) Ordinal Scale.
What is the best measure of central tendency for an extremely skewed distribution of scores?
a. The mean
b. The median
c. The mode
d. Central tendency cannot be determined for a skewed distribution.
b. The median
The median would be chosen over the mean as a central tendency in the following scenarios: (1) Extreme Scores or Skewed Distributions, (2) Undetermined Values, (3) Open-Ended Distributions, and (4) Ordinal Scale.
One item on a questionnaire asks students to identify their preferred animal for the school mascot from three different choices. What is the best measure of central tendency for the data from this question?
a. The mean
b. The median
c. The mode
d. Central tendency cannot be determined for these data.
c. The mode
The median would be chosen over the mean as a central tendency in the following scenarios: (1) Nominal Scale, (2) Discrete Values, and (3) Describing shape.
What is variability?
The degree to which scores are clustered or spread out
How peaked the curve is
How similar data points are to one another
Consistency or diversity
Distance between scores
Distance between a score and the mean
Affects how well any single score could represent the whole
What are the four basic ways of measuring variability?
Range
Interquartile range
Standard deviation
Variance
What is range?
Distance covered by the set of scores (between smallest and largest score)
When upper and lower boundaries are precise
Xmax- Xmin
What are some drawbacks for using range for measuring variability?
Very sensitive to outliers
Reflects information about extreme values but not much about the middle values
What is interquartile range (IQR)? How is it calculated?
The scores that make up the middle 50% of a range (25% to 75% or 1st quartile to 3rd quartile)
Imagine dividing combined scores into 100 equal parts
Each of these represents 1% of the data
quartile = score at each 25% interval
Q3 - Q1 = IQR
Why is IQR a better description of variability in data than range?
It trims the extreme scores and provides a range that reflects the middle 50% of scores in the center of the distribution
What is deviation?
What is standard deviation? What symbols represent it?
The square root of the variance
Provides a measure of the standard distance from the mean
What is variance? What symbols represent it?
the mean squared deviation (mean of sum of squares)