1/55
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Percentile rank
The percentage of individuals in the distribution with scores at or below a particular value
Percentile
When a score is defined by it’s percentile rank, that score is called a _____.
Cumulative frequencies (what are they? how to compute? identifying its column in a frequency distribution table?)
They show the numbers located at or below each score.
How to compute? → Find your X value, add that value and all the F values below it.
Cumulative percentages (what are they? how to compute? identifying its column in a frequency distribution table? relationship to real limits of frequency distribution class intervals?)
Converting cumulative frequencies into percentages. Divide the cf value by N and by 100. Ex: cf/N (100). Each cumulative percentage value is associated with the upper real limit of its interval.
Determining percentiles and percentile ranks from frequency distribution table (Be able to do it; how to do it if desired value does not appear directly in table?) cumsum() function (what does it do? Understand the vector of values returned by the function)
Some values you can determine directly from the table (ex: 3.5, 70%) Use interpolation if you cannot find the value directly from the table.
Cumsum function calculates cumulative frequencies for a frequency distribution table. It outputs a vector containing the cumulative frequencies.
Adding cumulative frequencies to a table?
You can add Cumulative frequencies to a table using data.frame: CF Table name <- data.frame(rev(Old FD Table name), rec(CFs))
Adding cumulative percentages to frequency distribution tables?
You can add cumulative percentages using cbind(), after calculating N (calculating N using sum() of $Freq)
Interpolation, what is it and its purpose?
Gives us a method for finding values that are located between two specified numbers. Estimates intermediate values.
General process of Interpolation?
Single interval is measured on two separate scales. The endpoints of the interval are known for each scale.
You are given an intermediate value on one of the scales. The problem is you need to find the corresponding intermediate value on the other scale.
Perform simple interpolation from frequency distribution table to find percentile rank. Percentile rank corresponding to X = 7
Bounded by real limits of 7.5 and 6.5.
The cumulative percentages at these real limits is 20% and 44%
Interval width (between 7.5 and 6.5) is 1 and 24 for percentages
7 is located .5 away from upper real limit (7.5), and .5 = ½
So halfway down the percentage scale (24) is 12.
Minus the top interval (44) by 12. 44-12 = 32%
Perform simple interpolation from frequency distribution table to find percentile. Find the 50th Percentile
Value of 50% is not found in the table, but it is between 10 and 60. Corresponding values for those %: 10 → 0-4 (upper: 4.5) 60 → 5-9 (upper:9.5)
Interval width for limits: 5 | Interval width for %: 50
50% is located 10 points from the top of the percentage which is a fraction 10/50 → ⅕
Now, multiply interval width (5) by our found fraction ⅕. = 1
Subtract the top interval (9.5) by 1. 9.5 - 1 = 8.5
Stem and leaf display, what is it?
Simple alternative to a frequency distribution table or graph.
What is a stem?
The first digit ( or digits )
What is a leaf?
The last digit ( or digits )
How to construct a stem and leaf display
List all of the stems in a column, go through each data one score at a time and write the leaf for each score beside its stem.
Advantages of a stem and leaf display?
Easy to construct, Identifies every individual score in the data set, provides a picture of the distribution and a list of the scores. Easy to modify the display for a more detailed picture of the distribution.
splitting stems?
Regrouped the distribution using an interval width of 5 points instead of 10. Divided into Lower leaves (0-4) and Higher leaves (5-9)
Stem() function,what does it do?
Creates a stem and leaf display by sending our raw scores.
How does having “ordered” leaves help us?
It helps to pick out values of interest when reviewing data
Be able to pick out highest/lowest score of stem & leaf
The lowest score will be the first stem and the first leaf in our table.
The highest score will be the last score on the last leaf in our table.
Know how to use the “scale” parameter
Scale parameter to 4 → two stems for each first digit value.
Scale parameter to 2 → doubling the default (1)
Measure of central tendency, what is it?,what does it identify?
Identifies a single score as the most representative or the most typical of an entire distribution. Usually a value in the middle of the distribution.
3 measures of central tendency?
First Distribution: Symmetrical, easy to identify the center.
Second Distribution: Negatively skewed, scores piling off around one area but taper off.
Third distribution: Symmetrical but has two distinct “piles”
Mean, what is it? formulas? how to compute?
Computed by adding all the scores and dividing it by the number of scores (N).
Notation for sample/population?
Sample = X-bar, Population = mu
Weighted mean, what is it? how to calculate?
Samples are not the same size so one group will make a larger contribution to the total group, hence the name “weighted” mean. Calculated by combining the sigma X values for each group, combining the N values for each group, and then dividing.
Calculate a weighted mean given group n values.
Section 1: n1 = 12 students, average score X1 = 6
Section 2: n2 = 8 students, average score X2 = 7
6 (12) + 7(8) / 12 + 8 = 6.4
Calculate a weighted mean given group weights.
Consider all of the groups that will contribute to the overall mean.
Lets say overall there are 20 students. Divide each group by 20 to find the proportion.
8/20, 12/20. Now, multiply each mean by its weight. Recall that X1 = 6 and X2 = 7, so 0.6(6) + 0.4(7) = 6.4
How does changing a score affect the mean?
This will affect the sigma X value. If you take away or add a score this will affect the sigma X and N value.
adding/subtracting a constant?
The same constant will be added to or subtract from the mean. Ex: Subtracting 2 from each score will subtract the mean by 2. Mean = 4.33 - 2 = 2.33
multiplying/dividing by a constant?
The mean will be changed in the same way. Ex: Multiplying each score by 3 will also multiply the mean by 3.
mean() function (what does it do? how to use it?)
Pass a set of scores to it and it will return the mean.
Calculating weighted mean in RStudio (how?)
Weighted mean is: sum of each group’s X values / sum of each group’s N values
Each groups Sigma X value is = groupmean[1] * N[1]
So, calculate the weighted mean by summing all groups sigma x values and dividing by the corresponding group N values.
*with raw scores, no need to calculate mean, just use sum(), and length() for N and add sums / added lengths
*with supplied weights and means, multiply each weight by its mean and add values together.
median (what is it? equivalent to what percentile? are there special symbols/notation?)
Score that divides the distribution in half. Equal to the 50th percentile. No special symbols or notations.
Calculating the median when N is odd?
List scores in order from lowest to highest, the median is the middle score in the list.
Calculating the median when N is even?
List scores in order from lowest to highest, find the two middle numbers (Ex: 4 and 5), and divide by 2. (4+5) / 2 = 4.5
Median when there are several middle scores with the same value?
Use interpolation. Looking for the 50th percentile.
Using interpolation to calculate median (how? use interpolation to find median from frequency distribution table)
Looking for the 50th percentile which is in between 40 and 90.
Interval width is 1 for X and 50 for %
50% located 40% away from top. 40/50 = ⅘
⅘ (1) = ⅘ = 0.8 so, top interval(4.5 - 0.8) = 3.7
Median() function, what does it do? what method of calculating the median does it use?)
Takes a vector of raw scores. For an odd number of scores it will return the middle score, for an even it will return the two middle scores.
interp.median() function (what does it do? package?)
Takes a vector of raw scores and finds the interpolated median. To use it, download the psych package.
Mode, what is it? how to determine?
Score or category that has the greatest frequency. In a graph, the mode will appear as the tallest part of the figure.
major mode? minor mode?
If the two mode values are not identical:
Tallest peak is called the major mode
The shortest peak is called the minor mode
Unimodal distribution? bimodal? multimodal? amodal?
Distributions with one mode are unimodal
Distributions with two modes are bimodal
Distributions with several equal high points can be described as amodal
Finding the mode in RStudio (method/steps?)
Put the holder of the raw data into the sort() function with Decreasing = TRUE. Then use the names() function to report the name of the first column in the table → mode <- names(SortedTableName)[1]
*assuming distribution is not multimodal
most preferred measure in general? reasons why it's the most generally preferred?
Mean is usually the preferred measure. Mean is affected by every score in the distribution. Mean is closely related to common measure of variability (variance and standard deviation).
When to use the mode (advantages of mode? nominal scale?)
Only mode can be used on a nominal scale. Mean or median cannot be calculated on a nominal scale. Advantages: Easy to compute, can be used with any scale of measure (nominal, ordinal, interval,ratio), Getting mode along with mean can help indicate shape of a distribution. Ex: Academic major
When to use the median (four situations?)
When there are a few extreme scores in the distribution
When some scores have undetermined values
When there is an open-ended distribution
When the data is measured on an ordinal scale
Extreme scores, what are they? is median easily affected by extreme scores?
Scores that are very different in value from most of the others in the distribution.
Median is not easily affected by extreme scores.
Used for reporting central tendency for skewed distributions?
The median
undetermined values? can median be used when there are undetermined values? can the mean?)
Incomplete or missing data values. Prevents us from computing the mean but since the median only relies on order, we can compute the median.
Open-ended distributions? can median be used when there are undetermined values? can the mean?)
When there is no upper or lower limit for one of the categories (Ex: 5 or more). Cannot compute the mean, however, you can find the median.
Ordinal scale (which measure of central tendency is preferred?)
The median is preferred.
How does distribution shape relate to measures of central tendency?
The relationship between the mean, median, and mode are determined by the shape of the distribution.
symmetrical distributions, Unimodal? bimodal? “rectangular” distribution?
Right hand side of the graph will be a mirror image of the left hand side. Mean will be exactly at the center.
If the distribution is bimodal, mean and median will be together in the center with modes on each side.
If the distribution is “rectangular”, there will be no mode, but the mean and the median will still be together in the center.
Positively or Negatively skewed distributions?
Positively skewed peak is on the left (mode)
Mean will be located to the right of the median.
Negatively skewed peak is on the right (mode)
Mean will be located to the left of the median.
Be able to identify distribution shape from given mean, median and mode values
Positively Skewed: Mean>Median>Mode
Negatively Skewed (Left-Skewed): Mean < Median < Mode
Symmetrical & Unimodal: Mean = Median = Mode
Symmetrical & Bimodal: Mean = Median, but two different modes
Symmetrical & Rectangular: Mean = Median, no clear mode