Data & Data Analysis Notes

Data & Data Analysis Notes

3.1 Understanding Data and Data Analysis

  • By the end of this chapter, you should understand:
  • Different types, uses, and representations of data.
  • The role of big data and data analytics in extracting and processing information.
  • Opportunities and dilemmas concerning data in the digital society.

Key Concepts

Data vs. Information

  • Data:
  • Raw, unorganized facts and figures (numbers, letters, images).
  • Individual units containing no inherent meaning; measured in bits and bytes.
  • Example: Temperature data = 50, Test score = 75.
  • Information:
  • Processed, organized data now ready for visualization or analysis.
  • Provides context and content derived from questions (who, what, where, when).
  • Example: 50°C represents temperature, 75% is a test score.

Data, Information, Knowledge, and Wisdom (DIKW Pyramid)

  1. Data: Unstructured facts.
  2. Information: Structured data offering context.
  3. Knowledge: Application of information to make decisions.
  4. Wisdom: Utilizing knowledge to predict outcomes and make informed decisions.

Stages of Gaining Wisdom

  • Steps to achieve wisdom from knowledge (discussion points):
  1. Recognize the need for information.
  2. Process information for clarity.
  3. Apply information to formulate knowledge.
  4. Implement knowledge to make informed decisions.

Types of Data

  1. Financial Data: Quantitative information related to business finances (cash flow, balance sheets).
  2. Medical Data: Collected during patient care (electronic health records, clinical trial registrations).
  3. Meteorological Data: Weather and climate data collected via instruments and technology.
  4. Geographical Data: Data indicating an object's position in geographic space (GPS technologies).

Examples of Data Collection and Usage

  • Citizen Science Example: Increased bird-watching during the pandemic led to a spike in data sharing about bird behaviors.

Data Collection Methods

  • Primary Data: Original data collected for the first time for a specific purpose (e.g., surveys, interviews).
  • Secondary Data: Pre-existing data collected for a different purpose (e.g., research articles).

Data Lifecycle

  1. Creation: Data can be generated manually or automatically.
  2. Storage: Data must be stored securely with appropriate access permissions.
  3. Usage: Data is processed or analyzed for various applications.
  4. Preservation: Ensuring data remains available for future use.
  5. Destruction: Permanent data removal complying with regulations.

Ways to Organize Data

  • Data stored in databases using tables (columns for attributes, rows for records).
  • Examples of common data types in databases:
  • Strings (text), Integers (numbers), Dates, Booleans.
  • Relational Databases: Store multiple tables linked by keys to ensure uniqueness of records.

Data Verification and Validation

  • Validation: Ensures only suitable data enters the database.
  • Verification: Confirms entered data matches original source.

Data Presentation

  • Effective data presentation utilizes charts, tables, and visualizations for clarity.
  • Examples include:
  • Bar charts for categorical data (e.g., rainfall).
  • Line graphs for continuous data (e.g., temperature).

Data Security

  • Data must be secure during storage and transmission.
  • Encryption: Converts data into unreadable forms to prevent unauthorized access. Types include:
  • Symmetric Key: Same key used for encryption and decryption.
  • Public Key: Different keys for encoding (public) and decoding (private).

Data Dilemmas

  • Ethics of Data Collection: Ensure ethical standards and privacy regulations are met during data collection and usage.
  • Discuss the implications of anonymity and potential misuse of data (e.g., cyberbullying).

Conclusion

  • Understanding the relationships and processes involving data, information, knowledge, and wisdom is crucial in navigating the digital landscape effectively.