Notes on Variable Types, Between- and Within-Groups, and Frequency Distributions

Nominal: anything with a number or category.
Ordinal: variables that have ranking.
Continuous: variables that have numbers as their values.
Ratio: they have interval variables, but they also have, like, a meaningful zero.
- Note: The transcript describes ratio as having a meaningful zero; standard definitions separate interval (no true zero) from ratio (true zero). The notes capture the speaker’s wording and also reflect common statistical definitions.

The order in which subjects experience conditions can matter.
You would control for the order that participants experience those conditions by counterbalancing or randomizing order (to mitigate order effects).
After controlling for order, you would use statistics to see how outcomes fall out under different conditions.
The speaker asks if there are questions about between- and within-groups, indicating this is a foundational topic for the week’s material.

Frequency distributions: break a scale into smaller categories to visualize data or perform frequency analysis.
Grouped tables (grouped frequency tables) take your scale and break it into categories to make interpretation easier.
One variable on the frequency count on the y-axis helps visualize the distribution in a histogram.
Histograms vs bar graphs:
- Histograms have a meaningfully arranged (numerical) x-axis. This is a key nuance for the class.
- Bar graphs typically display categorical x-axes without the same kind of numerical binning.
Grouped histograms involve binning continuous data into ranges; there is a caveat:
- You can influence the narrative by choosing specific bin values or groupings, which can “fudge” what the viewer perceives if not done carefully.

The material references Spotify data as an example of where grouped frequency distributions can be useful.
The instructor indicates there will be an example showing how to apply grouping to real-world data (Spotify) in class.

Identify which element(s) are being counted as frequencies. In the example, the focus is on the second column (the frequency counts).
To select a range in the spreadsheet (e.g., multiple rows), the instructor demonstrates using keyboard shortcuts such as Control+Shift+Down (Windows) or Command+Shift+Down (Mac).
Relative frequency (as a percentage) is used to express each category’s share of the total.
Steps to compute relative frequency in a column of counts:
- Determine the total number of counts, N, by summing the frequency column.
- Formula for total:
- N =
 \sum{i=1}^{k} fi
- Relative frequency for category i:
- $pi = \frac{fi}{N}$
- If you want percent form:
- \text{rel_freq}i = pi \times 100\% = \frac{f_i}{N} \times 100\%
In practice, you might map each category to a percentage or a bar height in a chart.

For any chart, you typically want:
- A clear title.
- Labeled axes (e.g., x-axis representing the grouped categories or bin ranges, y-axis representing frequency or relative frequency).
- Units where applicable (e.g., milliseconds for time measurements).
The instructor uses the example of milliseconds to illustrate grouping variables and labeling in charts.

When grouping with a continuous variable or a category, you create bins (established ranges) to put values into.
Instead of using unique values, you group into bins.
Key design choices:
- Choose a minimum value a and maximum value b for the data range.
- Decide the number of groups (bins) k, typically between 10 and 20.
- Compute the bin width:
- $w = \frac{b - a}{k}$
- Bin j boundaries (for j = 1, …, k):
- $\text{Bin}_j = [a + (j-1)w, a + jw)\,.$
Practical notes:
- Start with a sensible minimum (e.g., 0 for time in milliseconds).
- Use increments that make sense for the data (the speaker mentions starting at 0 and using increments of 1000 for milliseconds).
- The choice of bin edges can affect interpretation, so be transparent about bin choices and consider sensitivity analyses.

Determine the data range: find min a and max b.
Choose number of bins k (10–20 recommended).
Compute bin width w:
- $w = \frac{b - a}{k}$
Create bin edges: start at a, add w repeatedly to reach b.
Assign each observation to the appropriate bin based on its value.
Plot the histogram with the x-axis representing the bin ranges and the y-axis representing frequencies or relative frequencies.
Label the chart with a descriptive title and axis labels; include units (e.g., milliseconds).

Be mindful that choosing bin widths and bin edges can influence the apparent narrative of the data.
Avoid cherry-picking bin boundaries to support a desired conclusion; be transparent about binning decisions.
When comparing distributions, ensure that scales on axes are comparable and clearly labeled.

Total counts:
- $N = \sum{i=1}^{k} fi$
Relative frequency (as a proportion):
- $pi = \frac{fi}{N}$
Relative frequency (as a percent):
- \text{rel_freq}i = \frac{fi}{N} \times 100\%
Bin width for k bins over [a, b]:
- $w = \frac{b - a}{k}$
Bin boundaries:
- $\text{Bin}_j = [a + (j-1)w, a + jw)\quad\text{for } j = 1,2,…,k$

The material hints at applying frequency distributions and binning to real-world data, such as Spotify data, to visualize distributions and analyze patterns.
When you work with such data, consider how you will group the data (bins) and what you will plot in the histogram to highlight meaningful differences.