AS

Unit 07 - Analyzing Data Notes

Unit 07 Notes: Analyzing Data

7.1 - Understanding Data Visualizations

  • The Data Analysis Process

  • Step 1: Collect or Choose Data

    • Gather the data needed for analysis.

  • Step 2: Clean/Filter

    • Remove errors or inconsistencies from the data. Focus on relevant data for analysis.

  • Step 3: Visualize and Find Patterns

    • Use graphs/charts to observe data for patterns.

  • Step 4: Generate New Information

    • Produce results based on observations.

  • Data Vs. Metadata

  • Data: Information collected for analysis.

  • Metadata: Data about data including:

    • Time of data collection

    • Type of data

    • Location of data collection

    • Method of collection

    • Collector of the data

  • Types of Visualizations

  • Bar Charts:

    • Can be vertical or horizontal.

    • Shows frequency analysis; taller/longer bars indicate more frequent values.

    • Insights from Bar Charts:

    • Identify most and least common values, range, and presence of values.

  • Pie Charts:

    • Represents percentages of unique values in a dataset.

    • Insights from Pie Charts:

    • Identify highest/lowest percentages and compare values.

  • Histograms:

    • Displays frequency of values within ranges.

    • Read similarly to bar charts.

    • Insights from Histograms:

    • Identify most and least common ranges.

  • Scatterplots:

    • Compares two data columns to find relationships.

    • Types of relationships: direct, inverse, or none.

    • Insights from Scatterplots:

    • Identify relationships and trends; make predictions.

7.2 - Analyzing Trends

  • Correlation Vs. Causation

  • Correlation: Indicates similarities and apparent patterns between data sets.

  • Causation: Implies one event causes another.

  • Important to remember: CORRELATION DOES NOT EQUAL CAUSATION.

  • Examples of Correlation with No Causation:

  • Divorce rate in Maine correlates with per capita margarine consumption; correlation does not imply one causes the other.

7.3 - Big, Open, & Crowdsourced Data

  • Big Data:

  • Collection Method: Data is gathered through data mining and web scraping.

  • Problems Solved:

    • Efficiency in business, disease identification in healthcare, crime prevention, supply chain management, real-time data analysis.

  • Open Data:

  • Collection Method: Freely available data with minimal restrictions; sourced from open data repositories.

  • Problems Solved:

    • Promotes public oversight, aids in tracking public health risks and environmental hazards.

  • Crowdsourced Data:

  • Collection Method: Data collected by ordinary people for decision-making.

  • Problems Solved:

    • Similar to big data; focuses on public health and climate action predictions.

7.4 - Machine Learning Limitations

  • Machine Learning:

  • Involves algorithms that analyze data and adapt. Used in daily tasks and AI.

  • Limitations and Bias:

  • Algorithms may reflect human biases if the input data is not diverse.

  • Bias can occur when certain demographic data is overrepresented in training datasets.

7.5 - Algorithmic Bias

  • Example of Bias:

  • Twitter’s cropping algorithm favored certain demographics due to biased training data.

  • Ways to Mitigate Bias:

  • Diversify training data by including underrepresented groups.

7.6 - Simulations

  • Definition:

  • A simulation models real-world situations/events. Useful for hypothesis testing when real experimentation is impractical or risky.

  • Usage:

  • Simulations help abstract complex processes and provide insights that cannot be easily realized in real life.