Study Notes on Statistical Data Representation and Analysis

Statistical Data Representation and Analysis

1. Types of Data

1.1 Categorical Data
  • Defined as data that can be grouped into categories.
  • Examples include methods of transportation (Bicycle, Bus, Car, Other).
1.2 Numerical Data
  • Data that is represented in numerical form.
  • It can be further classified into univariate and bivariate data.

2. Graphical Representations of Data

2.1 Bar Graphs
  • Used to represent categorical data.
  • Example: Proportions of students using various methods of transportation to school. For instance, a bar graph might show the following proportions:   - Bicycle: 0.4   - Bus: 0.1   - Car: 0.0   - Other: Not mentioned.
2.2 Line Plot/Dot Plot
  • Used for univariate numerical data.
  • Displays individual data points on a number line.
2.3 Stem-and-Leaf Plots
  • A method for displaying quantitative data in graphic form, splitting values into 'stems' (leading digits) and 'leaves' (trailing digits).
2.4 Circle Graph/Pie Chart
  • A circular statistical graphic divided into slices to illustrate numerical proportions. For example:   - Key: 1|2 = 12 (Each slice represents a proportion of the total).
2.5 Segmented Bar Chart
  • Represents data in segments within a single bar. Often used for categorical variables.
2.6 Histograms
  • A graphical representation that organizes a group of data points into user-specified ranges. For numerical univariate data:   - Example: Data could be organized in the ranges: 6-7, 8-9, 10-11, 12-13.
2.7 Box Plots
  • Used to show the distribution of data based on a five-number summary: Minimum, Q1, Median, Q3, and Maximum.   - Example: For a dataset:     - Minimum: 25     - Q1: 32     - Median: 45     - Q3: 52     - Maximum: 68
2.8 Line Graphs
  • Used to display data points over time. Example: Data related to the number of electric hybrid cars sold over years starting from 2000.
2.9 Trend Lines/Scatter Plots
  • Can illustrate the relationship between two variables, indicating trends by adding a line that balances points above and below.   - Example points: (2, 50) and (16, 500).
2.10 Two-Way Frequency Tables
  • Useful for summarizing categorical data. An example of a table for movie preferences shows numbers and percentages of respondents:   - Evening respondents = 200 - 25%   - Afternoon respondents = 20 out of 200 (10% adults who prefer the afternoon shows).

3. Conditional Relative Frequencies

3.1 By Row
  • For example, the total percentages of respondents who prefer afternoon shows:   - 200 respondents = 55% total students.
3.2 By Column
  • Analysis such as: Of all adult respondents, 33% prefer afternoon shows.
3.3 Formulas for Point-Slope Relations
  • Example of using point-slope form:   - For conditional frequencies represented as: y - 50 = 32.1(x - 2).
  • This representation is key for understanding relationships between two categorical variables.

4. Application of Data Skills

4.1 Testing Hypothesis Through Trials
  • Various trials show funding raised:   - Trial 1 and Trial 2 have dollar amounts ranging between 10 to 70.
  • Understanding these correlations helps interpret data better across different contexts.