Intro to Data Concepts
Data Value and Importance
Data is a vital asset for businesses and organizations across multiple industries, often referred to as the "new oil" of the digital economy.
It provides insights to improve operations, understand customer behavior through predictive analytics, and foster innovation via machine learning models.
Non-profits and governments leverage data to address global challenges, such as tracking disease outbreaks or optimizing resource distribution during crises.
Data-Driven Decision Making (DDDM): The practice of basing decisions on the analysis of data rather than purely on intuition.
Types of Data
Structured Data
Organized in rows and columns (e.g., spreadsheets or relational tables).
Highly organized and easily searched using Structured Query Language (SQL).
Follows a rigid schema where data must fit into predefined fields.
Examples: Names, dates, addresses, credit card numbers, and stock trade information.
Unstructured Data
Lacks a predefined format or organization, making it more complex to collect and process.
Examples: Images, audio files, videos, social media posts, and PDF documents.
Often stored in Data Lakes rather than traditional databases.
Approximately of data generated globally is unstructured, and companies use Natural Language Processing (NLP) and AI to extract meaning from it.
Databases
Databases are organized collections of structured data stored electronically.
Relational Databases (RDBMS): Use tables to store data, linking them through unique identifiers called Primary Keys and Foreign Keys.
Non-Relational Databases (NoSQL): Better suited for unstructured or semi-structured data (e.g., MongoDB).
Scalability: Databases allow for horizontal and vertical scaling to handle growing amounts of information.
Data Types
Quantitative Data
Numerical data that can be measured and quantified.
Discrete Data: Counted values that cannot be divided (e.g., Number of employees = ).
Continuous Data: Values that can be measured on a scale and broken down into smaller parts (e.g., Temperature = or Height = meters).
Qualitative Data
Non-numerical data describing characteristics, qualities, or attributes.
Nominal Data: Categories without a natural order (e.g., Eye color, hair color, or nationality).
Ordinal Data: Categories with a specific, meaningful order but undefined intervals (e.g., Education level: High School, Bachelor's, Master's; or customer satisfaction ratings: Poor, Fair, Good).
Data Collection and Analysis
The Analysis Process:
Collection: Gathering raw data from various sources.
Cleaning (Wrangling): Removing errors, duplicates, and inconsistencies to ensure data integrity.
Analysis: Exploring patterns, correlations, and trends.
Visualization: Presenting findings through charts and dashboards for stakeholders.
Understanding data types and structures is a foundational requirement for selecting the correct statistical tests and analytical tools.
Data Value and Importance
Data is a vital asset for businesses and organizations across multiple industries, often referred to as the "new oil" of the digital economy.
It provides insights to improve operations, understand customer behavior through predictive analytics, and foster innovation via machine learning models.
Non-profits and governments leverage data to address global challenges, such as tracking disease outbreaks or optimizing resource distribution during crises.
Data-Driven Decision Making (DDDM): The practice of basing decisions on the analysis of data rather than purely on intuition.
Types of Data
Structured Data
Organized in rows and columns (e.g., spreadsheets or relational tables).
Highly organized and easily searched using Structured Query Language (SQL).
Follows a rigid schema where data must fit into predefined fields.
Examples: Names, dates, addresses, credit card numbers, and stock trade information.
Unstructured Data
Lacks a predefined format or organization, making it more complex to collect and process.
Examples: Images, audio files, videos, social media posts, and PDF documents.
Often stored in Data Lakes rather than traditional databases.
Approximately of data generated globally is unstructured, and companies use Natural Language Processing (NLP) and AI to extract meaning from it.
Databases
Databases are organized collections of structured data stored electronically.
Relational Databases (RDBMS): Use tables to store data, linking them through unique identifiers called Primary Keys and Foreign Keys.
Non-Relational Databases (NoSQL): Better suited for unstructured or semi-structured data (e.g., MongoDB).
Scalability: Databases allow for horizontal and vertical scaling to handle growing amounts of information.
Data Types
Quantitative Data
Numerical data that can be measured and quantified.
Discrete Data: Represents whole numbers or values that can be counted and cannot be divided.
Examples: children, DVDs, or chickens.
Continuous Data: Represents values that can be measured on a scale (finer levels) and broken down into smaller parts.
Examples: minutes, kilometers per hour, or meters.
Qualitative Data
Non-numerical, descriptive data describing characteristics, qualities, or attributes.
Nominal Data: Categories or labels without a natural order or rank.
Examples: Nationality (e.g., Greek), marital status (e.g., Married), or hair color (e.g., Blonde).
Ordinal Data: Categories with a specific, meaningful order (ordered) but undefined or unequal intervals.
Examples: Survey responses (e.g., Very likely, Likely, Neutral, Unlikely, Very unlikely) or education levels (e.g., High School, Bachelor's, Master's).
Data Collection and Analysis
The Analysis Process:
Collection: Gathering raw data from various sources.
Cleaning (Wrangling): Removing errors, duplicates, and inconsistencies to ensure data integrity.
Analysis: Exploring patterns, correlations, and trends.
Visualization: Presenting findings through charts and dashboards for stakeholders.
Understanding data types and structures is a foundational requirement for selecting the correct statistical tests and analytical tools.