Cumulative frequency is a running total of occurrences less than a specific x value.
To work with ranges of data, upper boundaries are used. These are plotted against the cumulative frequency.
Consider score value ranges:
10-20
20-30
30-40
40-50
50-60
60-70
70-80
80-90
90-100
Frequencies within each range are:
10-20: 2
20-30: 5
30-40: 7
40-50: 21
50-60: 36
60-70: 40
70-80: 27
80-90: 9
90-100: 3
Upper bounds for each range:
20, 30, 40, 50, 60, 70, 80, 90, 100
Cumulative frequency is calculated by adding up frequencies from previous ranges:
2
2 + 5 = 7
7 + 7 = 14
14 + 21 = 35
35 + 36 = 71
71 + 40 = 111
111 + 27 = 138
138 + 9 = 147
147 + 3 = 150
To find how many data points are less than a certain value, look at the cumulative frequency up to that value's range.
For example, if asked how many points are less than 60, look at the cumulative frequency for the 50-60 range, which is 71.
A cumulative frequency chart is a visual representation of the data.
For a data set of 150 points:
Median is at the 75th data point.
Quartile 1 is at the 37.5th data point.
Quartile 3 is at the 112.5th data point.
To find these values on the chart:
Locate the corresponding cumulative frequency on the y-axis.
Trace horizontally to the curve.
Drop down to the x-axis to find the value.
Calculations for Quartiles:
Quartile 1: Data Points * (1/4)
Median: Data Points * (1/2)
Quartile 3: Data Points * (3/4)
To determine where a certain percentile falls (e.g., the 90th percentile), multiply the percentile by the total number of data points.
Example: 90th percentile is 0.9 * 150 = 135. Find 135 on the cumulative frequency chart to determine the corresponding value.
Mode: The value that appears most often in a dataset.
Modal Class: The range that shows up most frequently.
Given the frequencies for the ranges:
10-20: 2
20-30: 5
30-40: 7
40-50: 21
50-60: 36
60-70: 40
70-80: 27
80-90: 9
90-100: 3
The modal class is 60-70 because it has the highest frequency (40).
Standard Deviation: The average difference of all points from the mean.
Calculator Steps:
Menu -> Statistics -> One-Variable Calculation
A small standard deviation indicates that data points are closely clustered around the mean.
A large standard deviation indicates a wider spread of data points from the mean.
Distance and Difference: The distance from a data point to the average.
If a basic operation (addition, subtraction) is performed on every data point, the spread (standard deviation) does not change, only the mean.
Multiplication and division affect the mean.
Pearson's correlation coefficient (r) measures the linear correlation between two sets of data.
It ranges between -1 and 1.
Calculator Steps:
Name x and y columns.
Menu -> Statistics -> Linear Regression (choose fx=ax+b).
The calculator provides the linear regression line equation,y = mx + b, where m is the slope, and b is the y-intercept.
Experiment: A repeatable procedure with a set of possible outcomes.
Sample Space: The set of all possible outcomes of an experiment.
Event: A subset of the sample space.
The number of times an event occurs divided by the total number of trials.
If all outcomes are equally likely and A is an event, the probability of A is the number of outcomes in A divided by the total number of possible outcomes.
Denoted as A'.
P(A') = 1 - P(A)
E = n * p
Where n is the number of trials and p is the probability of a specific event.
If you roll a dice 12 times, what's probability of rolling less than 3?
12 * (1/3) = 4
You should expect 4 rolls to be less than 3.
Visual representation of sets and their relationships, useful in probability.
Events that cannot occur at the same time.
P(A \cap B) = 0