03_Types_of_Data_and_DataTypes_annotated

INTRODUCTION TO DATA SCIENCE

  • Overview of data science concepts.

  • Definitions of types of data, data types, and data categories.

Page 1: Types of Data, Data Types, and Data Category

Page 2: Recap of Last Week

  • Code example: df = pd.read_csv('/kaggle/input/california-housing-prices/housing.csv')

  • Display first five rows of the dataset using df.head().

  • Example data displayed:

    • Longitude, Latitude, Housing Median Age, Total Rooms, Total Bedrooms, Population, Households, Median Income, Median House Value, and Ocean Proximity.

Page 3: Types of Data

  • Introduction to various types of data relevant to data science.

Page 4: What is Data?

  • Definition of Data: Raw information, facts, or statistics in various forms (numbers, text, images, etc.).

Page 6: Broad Category of Data

  • Quantitative Data: Numerical data that can be measured.

    • Discrete Data: Countable values (e.g., number of cars, laptops).

    • Continuous Data: Measurable values (e.g., height, weight).

  • Qualitative Data: Descriptive data that can be categorized but not counted.

    • Includes Structured Data and Unstructured Data.

Page 7: Categories of Data

  • Types of data we will cover in the course:

    • Tabular, Text, Images, JSON, XML, HTML, Audio.

Page 8: Tabular Data

  • Definition: Structured data organized in rows and columns; resembles spreadsheets or database tables.

  • Examples: Demographic information, grades, etc.

Page 9: Text Data

  • Examples include reviews, articles, emails, and social media posts.

  • Focus on natural language and human-readable text.

Page 10: Graph Data

  • Represents relationships between entities using nodes and edges.

  • Examples: Social connections, websites, network traffic.

Page 11: Unstructured Data

  • Lacks predefined structure; challenging to analyze.

    • Examples:

      • Videos (e.g., Tik Tok)

      • Images (James Webb, faces, handwriting)

      • Audio (Alexa, music)

      • Biometrics (fingerprints, facial recognition)

      • Haptics (phone notifications)

Page 12: More Examples of Different Types of Data

  • Tabular Data: Heights of class members.

  • Graph Data: Social networks and dependencies, coursework prerequisites.

  • Geo Data: Flight paths, weather patterns.

Page 13: Raw and Hierarchical Data

  • Raw Data: Images, video, audio, telemetry data.

  • Hierarchies:

    • Taxonomy, family trees, file directories.

Page 14: Data Formats

  • Common formats: CSV, image formats (.jpg, .png), audio formats (.wav, .mpg), SQL databases.

Page 15: CSV/TSV Formats

  • CSV (Comma-Separated Values): Plain-text format.

  • TSV (Tab-Separated Values): Rows and columns separated by tabs.

  • These formats facilitate data import/export across various tools.

Page 16: Tabular Data - Example

  • An example CSV file (classic rock playlist).

    • Structure includes Artist, Music, Album, Year, Genre.

Page 17: Tabular Data Representation

  • Example format of CSV file:

    • `Artist, Music, Album, Year, Genre`.

  • Use Python's pandas library for data manipulation.

Page 18: Data Format: Images

  • Image Data: Visual content properties—colors, shapes, pixel values.

Page 19: Pixel Structures in Images

  • Images composed of pixels with organized grids.

  • Each pixel holds color information (RGB channels).

Page 20: Image Compression

  • Lossy Compression: Reduces size by sacrificing some data (e.g., JPEG).

  • Lossless Compression: Retains quality, used for critical images (e.g., PNG).

Page 21: Databases

  • Definition: Organized collections of structured information stored electronically.

  • Manages complex data relationships efficiently.

Page 23: JSON - JavaScript Object Notation

  • Lightweight data interchange format, easy for humans and machines.

  • Used in web APIs and client-server communication.

Page 24: JSON Structure

  • Represents data with key-value pairs; organized hierarchically.

  • Supports various data types such as strings, numbers, arrays, objects, etc.

Page 25: JSON Example

  • Example showcasing the structure of JSON data:

    • Demonstrates nested data with objects and arrays.

Page 26: JSON in Python

  • Use json module to work with JSON data in Python:

    • json.dumps(): Convert Python objects to JSON format.

    • json.loads(): Convert JSON back to Python objects.

Page 27: XML / HTML

  • HTML: Used for webpage creation; predefined tags for content.

  • XML: Used for data transport and storage; allows custom tags.

Page 30: Data Acquisition Methods

  • Sources to get data:

    • Provided by companies.

    • Gathered from databases and the internet.

    • Using RESTful APIs.

Page 31: Beautiful Soup

  • Python library for parsing HTML and XML.

  • Facilitates web scraping and data extraction.

Page 32: RESTful APIs

  • Structured way to access web data; relies on requests and responses.

  • Documentation is crucial for proper usage and data interpretation.

Page 34: Data Types

  • Overview of data types in the context of data science.

Page 36: Broad Data Categories

  • Revisits key data categories:

    • Quantitative (Discrete, Continuous) and Qualitative.

Page 37: Data Categories Defined

  • Further classification of data:

    1. Continuous or Discrete.

    2. Categorical or Non-Categorical.

    3. Ordinal or not?

Page 38: Discrete vs Continuous Attributes

  • Discrete Attribute: Finite/countable values (e.g., zip codes).

  • Continuous Attribute: Real numbers (e.g., weight measures).

Page 40: Types of Attribute Values

  • Nominal: Categorical values (e.g., profession).

  • Ordinal: Values with order (e.g., rankings).

  • Binary: Only two states (0 and 1).

  • Interval: Equal size units meaningful differences (e.g., temperature).

  • Ratio: Both differences and ratios are meaningful (e.g., length).

Page 46: Summary

  • Types of Data: Impact on data preparation in data science.

  • File Formats: Essential for data ingestion and transformation.

  • Databases: Central to data management.

  • Data Acquisition: RESTful APIs and web scraping as data gathering methods.

robot