R Plotting Essentials: Indexing, Named Vectors, and Plot Parameters

Data indexing in data frames

  • In data frames (and matrices), you index with two dimensions using a comma: row index, column index, i.e., [row, column].
  • Example from transcript: to get the value from the first row and the first column ( Alabama example ), you would retrieve the value with the cell (row 1, column 1).
    • If the dataset contains the population figure for Alabama and that cell is 3615, then:
    • df[1,1] = 3615.
  • The transcript mentions a label in the top right corner and explains the idea of row/column indexing via the comma, illustrating how you access a specific element in a data structure.
  • Important note: In R, indexing is 1-based (the first row/column is index 1).

Named vectors and labels

  • The transcript discusses a numeric vector whose elements have names (labels).
  • Key idea: In an R vector, elements can have names; the numbers are the elements, while the characters (the names) are labels.
  • Conceptual example:
    • R v <- c(10, 20, 30) names(v) <- c("low", "medium", "high")
    • The elements are 10, 20, 30; the labels are "low", "medium", "high".
    • Access by position: v[1] = 10
    • Access by name: v["medium"] = 20
  • The takeaway: Named elements allow you to reference elements by their label, not just by numeric position.

Plotting in R: mapping of axes and arguments

  • In the plot function, x represents the horizontal axis and y represents the vertical axis.
  • The transcript emphasizes two ways to specify arguments:
    • Positional arguments: plot(x, y) maps the first provided value to x and the second to y by default.
    • Named arguments: plot(x = …, y = …) allows you to override positional ordering and be explicit about what is mapped to which axis.
  • If you omit the names, the default positional mapping applies:
    • The first argument becomes x, the second becomes y.
  • You can forcibly set the mapping by naming arguments, e.g.,:
    • R plot(x = some_x, y = some_y) plot(y = some_y, x = some_x) # swapped inputs
  • The transcript notes that there are many other arguments to control a plot, each with its own name, such as those controlling axis intervals, labels, titles, and margins.
  • Specific examples of these named arguments include (as discussed):
    • Axis interval/range: xlim, ylim
    • Axis labels: xlab, ylab
    • Plot title: main
    • Margins: set via par(mar = c(bottom, left, top, right))
  • The idea is that all of these are separate named arguments that you can set to customize the plot.

Plot type: drawing lines vs points

  • The transcript discusses the use of type = 'l' to draw lines that connect observations.
  • Default behavior (without type specified) is typically points (type = 'p').
  • When you specify: type = 'l', the plot connects the observations with a line, producing a line plot instead of just a scatter of points.
  • Example:
    • R plot(x, y, type = 'l') # line graph connecting points
  • The transcript also notes that using type = 'l' makes sense for a sequence of numbers where you want to visualize a trend between consecutive observations.

Colors and data types: discrete vs continuous coloring

  • The transcript discusses how color specification can affect the interpretation of the plot.
  • Key distinction:
    • If colors are specified using color names (strings), they are typically treated as discrete categories (when mapped to groups).
    • If colors are specified as numeric values, they are interpreted as a continuous scale and mapped to a color gradient.
  • Intuition from the transcript: one variable may be categorical (e.g., factor with levels) and the other continuous; color mapping will follow the data type:
    • Categorical (factor) variable often leads to discrete color groups.
    • Continuous numeric variable leads to a gradient color scale.
  • Practical notes:
    • Example of discrete colors: R plot(x, y, col = c("red", "blue", "green")), where color assignment follows the vector of colors.
    • Example of continuous coloring (conceptual): map a numeric variable to color via a gradient, e.g., using colorRampPalette or colorRamp with appropriate indexing.
  • The takeaway: your choice of color specification (names vs numeric values) changes whether the color map is treated as discrete groups or a continuous spectrum.

Two variables: one categorical and one continuous

  • When plotting two variables where one is categorical (factor) and the other is continuous, the plot will reflect the categorical grouping along one axis and continuous variation along the other, with color potentially representing another continuous or categorical dimension.
  • Practical implication: using color to reflect a third numeric variable can add an informative dimension to the plot, but you must choose an appropriate color mapping (discrete for categories, continuous for numeric values).

Practical connections and implications

  • These concepts are foundational for data visualization in R: how data are stored (vectors with/without names, data frames), how to access elements, and how to map data to a visual representation via base graphics.
  • Mis-mapping axes or mis-specifying type can lead to misleading plots (e.g., implying a trend where there is none).
  • Understanding named vs positional arguments helps you write clearer, more robust plotting code and ensures you’re plotting the intended data.

Quick recap of examples to remember

  • Element access in a data frame:
    • df[1,1] = 3615
  • Named vectors:
    • R v <- c(10, 20, 30) names(v) <- c("low", "mid", "high") v["mid"] # 20
  • Plot mappings and options:
    • Positional: plot(x, y)
    • Named: plot(x = x, y = y) or swapped with explicit names
    • Axis limits: xlim = c(min(x), max(x)), ylim = c(min(y), max(y))
    • Labels and title: xlab, ylab, main
    • Margins: par(mar = c(bottom, left, top, right))
    • Type control: type = 'l' for line plot
  • Color behavior:
    • Discrete: colors as names for categories
    • Continuous: numeric colors mapped to a gradient