2.3 Extracting information from data (1)

2.3 Extracting Information from Data

Overview of Extracting Information from Data

  • Definition: Transforming raw data into meaningful insights and actionable conclusions.

  • Purpose: To analyze, summarize, and interpret data for specific questions or problems.

Data Collection

  • Gathering data from various sources:

    • Databases

    • Spreadsheets

    • Sensors

    • External datasets

What is Data Science?

Role of Data Science

  • Involves extraction and visualization of insights from large datasets.

  • Data is omnipresent in digital interactions, such as social media, online shopping, and advertising.

Sources of Data

  • Millions of data points are collected every second through:

    • Sensors tracking air quality, noise, temperature.

    • User-generated data from smartphones and online interactions.

Challenges in Data Interpretation

  • The need to give meaning to data to convey stories about businesses, customers, or societal trends.

Data and Information

Differentiation Between Data and Information

  • Data: Raw, unstructured pieces.

  • Information: Structured and organized data presenting relationships.

  • Visualization techniques: Tabs, charts, time series, line graphs, scatter plots.

Data Analysis

Understanding Data Analysis

  • Analyzing processes involved in organizing and interpreting data.

  • Geographic data organization and analysis often occur concurrently.

  • Distinction between data manipulation and its resultant analysis.

Data Models

Purpose of Data Models

  • To understand correlations and patterns from multiple datasets.

  • Capability to analyze changes over time using graphs like scatter plots and histograms for trend identification.

Structured vs. Unstructured Data

Structured Data

  • Characteristics:

    1. Highly organized, easy to query and process.

    2. Stored in defined formats (databases, spreadsheets).

    3. Represented through schemas or data models.

    4. Examples: Customer names, addresses, product prices.

Unstructured Data

  • Characteristics:

    1. Lacks a predefined structure or organization; often raw or free-form.

    2. Various forms: Text documents, images, audio, video, social media content.

  • Examples include emails, images, sensor data.

Issues with Unstructured Data

  • Difficulty in sorting, managing, and organizing.

  • Duplicate data and varying formats complicate clarity and analysis.

  • Necessity for data mining or analytics to add structure.

Metadata

Concept of Metadata

  • Definition: Data that provides information about other data.

  • Offers context and structure to understand associated data better.

Comparison of Data and Metadata

  • Data: Content that can be raw and may not always be informative.

  • Metadata: Processed information that is always informative and references other data.

Sample Exercises

Exercise 1

  • Photo data can determine:

    • Count of photos taken at specific locations (Correct: I and II)

Exercise 2

  • User growth estimate based on a steady increase in registrations (Answer: B, ~31.2 million in Year 12).

Exercise 3

  • Hypothesis consistent with data regarding mobile app usage and message length (Answer: D, average message length decreases).

Exercise 4

  • Least likely metadata in an e-book (Answer: A, archive of previous versions).

Exercise 5

  • Interest in application related to reading habits (Answer: A, more reading leads to more interest).