distinguishing between variables and data

Overview

  • This material distinguishes between individuals, variables, and data using a Seattle on-street parking meters dataset.

  • Data come from the City of Seattle data portal.

  • The table shows 11 randomly selected cars (the individuals) and several attributes (the variables) collected for each car.

Individuals, Variables, and Data

  • Individuals: the 11 different cars highlighted in blue in the table; each row corresponds to one individual car.

  • Variables: the column headers in the table represent the variables:

    • Payment method

    • Amount paid

    • Duration in minutes

    • Side of street

    • Parking space number

  • Data: the actual values recorded under the column headings for each variable, i.e., the values in each row for those five variables.

Example row (first individual, as described)

  • Payment method: credit card

  • Amount paid: 3.75

  • Duration in minutes: 30

  • Side of street: west

  • Parking Space Number: 458

  • The first individual is one of the 11 cars highlighted in blue; all other rows follow similarly for the other cars.

Variables: what they are and how they’re categorized

  • The five variables are the column headers listed above.

  • These variables are the measurable aspects recorded for each individual car.

Qualitative vs Quantitative Variables

  • Qualitative (categorical) variables:

    • Payment method

    • Side of street

    • Parking space number

  • Characteristics:

    • They are categories or labels (not numeric measurements).

    • Side of street consists of categories like north, south, east, west, or combinations of those.

    • Parking space number is a category representing location, not a numeric quantity used in arithmetic.

  • Quantitative variables:

    • Amount paid

    • Duration in minutes

  • Characteristics:

    • They are numeric measurements.

    • Units: Amount paid is in dollars and cents; Duration is in minutes.

Quantitative variables: continuous vs discrete

  • Duration in minutes (continuous):

    • Time is treated as a continuous variable, even though the data are reported to the nearest minute.

    • Conceptually, time can take on any value in an interval, not just integer minutes.

  • Amount paid (discrete in theory, continuous in practice):

    • Amount paid is recorded in cents, so strictly it is a discrete variable.

    • There are gaps between potential cents values (e.g., not every possible cent value may occur).

    • In practice, dollar amounts rounded to the nearest penny are often treated as continuous for analysis.

Data values and their organization

  • Data are the values recorded under the variable headings for each row (each individual car).

  • The variables that have data values are: payment method, side of street, and parking space number (qualitative data).

Notes on units and interpretation

  • Amount paid: units are dollars (and cents). Example value: 3.75 dollars.

  • Duration: units are minutes. Example value: 30 minutes.

  • Spatial/categorical identifiers:

    • Side of street: qualitative category (north, south, east, west, or combinations).

    • Parking space number: qualitative identifier indicating location.

Relationships and potential analyses (conceptual, based on the data type)

  • Since duration is continuous and amount paid is treated as continuous in practice, analyses like averages, ranges, and distributions can be computed for these two variables.

  • For qualitative variables, analyses focus on frequencies, proportions, and cross-tabulations (e.g., distribution of payment methods by side of street).

  • When comparing groups (e.g., average duration by side of street), ensure the variable type is used correctly in statistical tests (continuous vs categorical).

Source and context

  • Data originate from the City of Seattle data portal and pertain to on-street parking meters.

  • The example emphasizes understanding the basic data-science distinction between individuals (rows), variables (columns), and data (cell values).

Key takeaways

  • Individuals correspond to rows (each car in the sample).

  • Variables correspond to columns (the attributes measured).

  • Data are the actual measurements/values in the table for each variable and each individual.

  • Qualitative variables are categories; quantitative variables are numerical measurements.

  • Among quantitative variables, duration is continuous; amount paid is discrete in theory but often treated as continuous in practice when using monetary values with pennies.