Sample: refers to a selection of individual people or items from a population
Population: consists of all possible people or items who/which have a particular characteristic
Experimenter Bias: the experimenters subconsciously chose to follow people who help support their hypothesis
Parameters: descriptions of populations
Statistics: descriptions of samples
Measures of Central Tendency: give us an indication of the typical score in our sample; effectively an estimate of the middle point of our distribution of scores
Mean: the sum of all the scores in a sample divided by the number of scores in that sample
Example:
mean of the sample of the 4 scores: 5, 6, 9, 2
(5 + 6 + 9 + 2) / 4 = 5.5
Ranking: where we arrange a set of scores in ascending order and then assign a position number (rank) to each one
Median: the middle score/value once all scores in the sample have been put in rank order
Example 1:
median for the sample: 2, 20, 20, 12, 12, 19, 19, 25, 20
Example 2:
median for the sample: 2, 12, 12, 19, 19, 20, 20, 20, 25, 26
Mode: the most frequently occurring score in a sample
Sampling Error: the difference between the population parameter and the sample statistic; the degree to which sample statistics differ from the equivalent population parameter
Exploratory Data Analysis (EDA): where we explore the data that we have collected in order to describe it in more detail. These techniques simply describe our data and do not try to draw conclusions about any underlying populations.
Frequency Histogram: a graphical means of representing the frequency of occurrence of each score on a variable in our sample. The x-axis contains details of each score on our variable and the y-axis represents the frequency of occurrence of those scores.
The frequency histogram is a good way for us to inspect our data visually.
The frequency histogram is useful for discovering other important characteristics of your data. In addition, your histogram gives you some useful information about how the scores are spread out; that is, how they are distributed.
The best way of generating a histogram by hand is to rank the data first. You then simply count up the number of times each score occurs in the data; this is the frequency of occurrence of each score.
Stem and Leaf Plots: similar to histograms but the frequency of occurrence of a particular score is represented by repeatedly writing the particular score itself rather than drawing a bar on a chart
Box Plots/Box and Whisker Plots: enable us to easily identify extreme scores as well as seeing how the scores in a sample are distributed.
Outliers/Extreme Scores: those scores in our sample that are a considerable distance either higher or lower than the majority of the other scores in the sample
One of the limitations of box plots is that it is often more difficult to tell when a distribution deviates from normality.
Scattergram: gives a graphical representation of the relationship between 2 variables. The scores on one variable are plotted on the *x-*axis and the scores on another variable are plotted on the y-axis.
Normal Distribution: a distribution of scores that is peaked in the middle and tails off symmetrically on either side of the peak; the distribution is often said the be ‘bell-shaped'; for a perfectly normal distribution, the mean, median and mode will be represented by the peak of the curve.
For a distribution to be classed as normal it should have the following characteristics:
It should be symmetrical about the mean.
The tails should meet the x-axis at infinity.
It should be bell-shaped.
Once we have the mean and standard deviation, we can plot the normal distribution by putting these values into a formula.
The more scores from naturally occurring variables you plot, the more like the normal distribution they become.
Variance or Variation of Scores: indicates the degree to which the scores on a variable are different from one another.
Range: the highest score in a sample minus the lowest score
Mean Deviation: gives us an indication of how much the group as a whole differs from the sample mean; to calculate, we have to sum the individual deviations and divide by the number of scores we have.
Variance: the average squared deviation of scores in a sample from the mean
Standard Deviation (SD): the degree to which the scores in a dataset deviate around the mean; it is an estimate of the average deviation of the scores from the mean; the square root of the variance
Kurtosis: a distribution is a measure of how peaked the distribution is
Skewed Distributions: those where the peak is shifted away from the center of the distribution and there is an extended tail on one of the sides of the peak
Negatively Skewed Distribution: the peak has been shifted to the right towards the high numbers of the scale and the tail is pointing to the low number (or even pointing to the negative numbers)
Positively Skewed Distribution: the peak shifted left, towards the low numbers, and has the tailed extended towards the high numbers
Bimodal Distribution: one that has two pronounced peaks; it is suggestive of there being two distinct populations underlying the data