Digital Data Notes.docx

1.2.1 Exercises

a. Examples of activities in each phase:
  • Collection phase:

    • Surveys, sensor data capture, web scraping, transaction recording.

  • Preparation phase:

    • Cleaning (removing duplicates, handling missing values), transforming (normalizing, aggregating), encoding (converting text to numerical values).

  • Analysis phase:

    • Statistical modeling, machine learning, visualization, trend identification.

  • Sharing phase:

    • Creating reports, dashboards, presentations, publishing findings.

  • Re-use phase:

    • Archiving, repurposing data for new projects, integrating into future analyses.

b. Why prepared data is more valuable than collected data?

Prepared data is structured, cleaned, and formatted for analysis, reducing errors and saving time. Raw data may be incomplete or inconsistent, while prepared data is ready for immediate use.

c. Why analysed data is more valuable than prepared data?

Analysed data provides actionable insights (e.g., trends, predictions) that drive decision-making. Prepared data is just input; analysed data answers questions.

d. Why shared data is more valuable than analysed data?

Shared data disseminates knowledge to stakeholders, enabling collaboration and informed actions. Analysed data alone has limited impact if not communicated effectively.

e. How data differs throughout the life cycle:
  • Form: Raw → Structured → Interpreted → Visualized/Reported.

  • Volume: May decrease during cleaning/aggregation.

  • Complexity: Increases with transformations (e.g., derived metrics).

  • Audience: Expands from analysts to decision-makers/public.

f. Validation across phases:
  • Collection: Check for completeness (e.g., missing survey responses).

  • Preparation: Verify consistency (e.g., outlier detection).

  • Analysis: Validate models (e.g., statistical significance tests).

  • Sharing: Peer review, stakeholder feedback.

g. Storage and management across phases:
  • Collection: Raw data in databases/cloud storage.

  • Preparation: Version-controlled intermediate files (e.g., CSV, Parquet).

  • Analysis: Results stored in databases or dashboards (e.g., Power BI).

  • Sharing/Re-use: Archived in repositories (e.g., SQL databases, SharePoint).

h. Documentation across phases:
  • Collection: Metadata (e.g., source, timestamp).

  • Preparation: Data dictionaries, transformation scripts.

  • Analysis: Methodologies, assumptions, code comments.

  • Sharing: Reports with clear explanations, annotations.