2.3 Extracting information from data (1)
2.3 Extracting Information from Data
Overview of Extracting Information from Data
Definition: Transforming raw data into meaningful insights and actionable conclusions.
Purpose: To analyze, summarize, and interpret data for specific questions or problems.
Data Collection
Gathering data from various sources:
Databases
Spreadsheets
Sensors
External datasets
What is Data Science?
Role of Data Science
Involves extraction and visualization of insights from large datasets.
Data is omnipresent in digital interactions, such as social media, online shopping, and advertising.
Sources of Data
Millions of data points are collected every second through:
Sensors tracking air quality, noise, temperature.
User-generated data from smartphones and online interactions.
Challenges in Data Interpretation
The need to give meaning to data to convey stories about businesses, customers, or societal trends.
Data and Information
Differentiation Between Data and Information
Data: Raw, unstructured pieces.
Information: Structured and organized data presenting relationships.
Visualization techniques: Tabs, charts, time series, line graphs, scatter plots.
Data Analysis
Understanding Data Analysis
Analyzing processes involved in organizing and interpreting data.
Geographic data organization and analysis often occur concurrently.
Distinction between data manipulation and its resultant analysis.
Data Models
Purpose of Data Models
To understand correlations and patterns from multiple datasets.
Capability to analyze changes over time using graphs like scatter plots and histograms for trend identification.
Structured vs. Unstructured Data
Structured Data
Characteristics:
Highly organized, easy to query and process.
Stored in defined formats (databases, spreadsheets).
Represented through schemas or data models.
Examples: Customer names, addresses, product prices.
Unstructured Data
Characteristics:
Lacks a predefined structure or organization; often raw or free-form.
Various forms: Text documents, images, audio, video, social media content.
Examples include emails, images, sensor data.
Issues with Unstructured Data
Difficulty in sorting, managing, and organizing.
Duplicate data and varying formats complicate clarity and analysis.
Necessity for data mining or analytics to add structure.
Metadata
Concept of Metadata
Definition: Data that provides information about other data.
Offers context and structure to understand associated data better.
Comparison of Data and Metadata
Data: Content that can be raw and may not always be informative.
Metadata: Processed information that is always informative and references other data.
Sample Exercises
Exercise 1
Photo data can determine:
Count of photos taken at specific locations (Correct: I and II)
Exercise 2
User growth estimate based on a steady increase in registrations (Answer: B, ~31.2 million in Year 12).
Exercise 3
Hypothesis consistent with data regarding mobile app usage and message length (Answer: D, average message length decreases).
Exercise 4
Least likely metadata in an e-book (Answer: A, archive of previous versions).
Exercise 5
Interest in application related to reading habits (Answer: A, more reading leads to more interest).