Lecture 1 Intro to Data Science

What is Data?

  • Definition: Data refers to facts and statistics collected for reference or analysis.

  • Aspects of Data:

    • Data comes from facts and statistics.

    • Data is collected.

    • Data is used for reference or analysis (focus of this course).

  • Terminology:

    • "Data" is plural for "datum".

  • Definitions:

    • Objects: Individual entries (e.g., people in a company).

    • Features: Characteristics of objects (e.g., salary and age).

    Structured Data: Organized in a predictable format (e.g., databases).

    • Example: Data in tables.

  • Unstructured Data: Lacks a specific format (e.g., text documents).

    • Example: Emails, social media posts.

  • Semi-structured Data: Hybrid form; lacks strict structure yet contains tags (e.g., XML or JSON).

Introduction to Data Science

  • Definition: Field focusing on techniques and algorithms to extract insights from data.

  • Objective: Analyze data to gain valuable insights using advanced tools and methodologies.

  • Future Relevance: Integral to artificial intelligence development.

Importance of Data Science

  • Data is likened to oil; it fuels modern business intelligence.

  • Advantages:

    • Informed decision-making.

    • Predictive analytics for future trends.

    • Detection of patterns and anomalies.

    • Facilitates machine intelligence (AI).

    • Enables sentiment analysis to understand consumer behavior.

  • Tools and Languages:

    • Power BI: Works with structured data.

    • MS Excel: Limited for big data applications.

    • Python: Essential for data processing.

    • R Language: Powerful for statistical analysis.

  • Data Storage:

    • Apache Hadoop: For storing big data.

    • Apache Spark: For processing large datasets.

  • Visualization Tools:

    • Tableau, Matplotlib: Aid in data visualization.

Data Science Components

  1. Big Data: Daily production of significant unstructured data.

  2. Machine Learning: Core of data science, mimicking human cognitive functions.

  3. Business Intelligence: Helps make informed business choices through data analysis.

  4. Statistics: Provides methods for data analysis and interpretation.

  5. Domain Expertise: Specialized knowledge critical to data context.

  6. Data Engineering: Involves data manipulation and transformation.

  7. Visualization: Represents data visually for better understanding.

  8. Mathematics: Fundamental for quantitative analysis.

Applications of Data Science

  • Healthcare: Enhances disease detection and treatment.

  • Fraud Detection: Safeguards banking transactions.

  • Image Recognition: Powers social media tagging and filtering.

  • Recommendation Systems: Offers personalized suggestions on platforms like Netflix and Amazon.

  • Dynamic Pricing: Adjusts prices based on market data.

  • Sentiment Analysis: Assesses public perception and brand loyalty.

  • Self-driving Cars: Relies on data science for navigation and decision-making.