Study Notes on Statistical Data Representation and Analysis
Statistical Data Representation and Analysis
1. Types of Data
1.1 Categorical Data
- Defined as data that can be grouped into categories.
- Examples include methods of transportation (Bicycle, Bus, Car, Other).
1.2 Numerical Data
- Data that is represented in numerical form.
- It can be further classified into univariate and bivariate data.
2. Graphical Representations of Data
2.1 Bar Graphs
- Used to represent categorical data.
- Example: Proportions of students using various methods of transportation to school. For instance, a bar graph might show the following proportions:
- Bicycle: 0.4
- Bus: 0.1
- Car: 0.0
- Other: Not mentioned.
2.2 Line Plot/Dot Plot
- Used for univariate numerical data.
- Displays individual data points on a number line.
2.3 Stem-and-Leaf Plots
- A method for displaying quantitative data in graphic form, splitting values into 'stems' (leading digits) and 'leaves' (trailing digits).
2.4 Circle Graph/Pie Chart
- A circular statistical graphic divided into slices to illustrate numerical proportions. For example:
- Key: 1|2 = 12 (Each slice represents a proportion of the total).
2.5 Segmented Bar Chart
- Represents data in segments within a single bar. Often used for categorical variables.
2.6 Histograms
- A graphical representation that organizes a group of data points into user-specified ranges. For numerical univariate data:
- Example: Data could be organized in the ranges: 6-7, 8-9, 10-11, 12-13.
2.7 Box Plots
- Used to show the distribution of data based on a five-number summary: Minimum, Q1, Median, Q3, and Maximum.
- Example: For a dataset:
- Minimum: 25
- Q1: 32
- Median: 45
- Q3: 52
- Maximum: 68
2.8 Line Graphs
- Used to display data points over time. Example: Data related to the number of electric hybrid cars sold over years starting from 2000.
2.9 Trend Lines/Scatter Plots
- Can illustrate the relationship between two variables, indicating trends by adding a line that balances points above and below.
- Example points: (2, 50) and (16, 500).
2.10 Two-Way Frequency Tables
- Useful for summarizing categorical data. An example of a table for movie preferences shows numbers and percentages of respondents:
- Evening respondents = 200 - 25%
- Afternoon respondents = 20 out of 200 (10% adults who prefer the afternoon shows).
3. Conditional Relative Frequencies
3.1 By Row
- For example, the total percentages of respondents who prefer afternoon shows:
- 200 respondents = 55% total students.
3.2 By Column
- Analysis such as: Of all adult respondents, 33% prefer afternoon shows.
- Example of using point-slope form:
- For conditional frequencies represented as: y - 50 = 32.1(x - 2).
- This representation is key for understanding relationships between two categorical variables.
4. Application of Data Skills
4.1 Testing Hypothesis Through Trials
- Various trials show funding raised:
- Trial 1 and Trial 2 have dollar amounts ranging between 10 to 70.
- Understanding these correlations helps interpret data better across different contexts.