03_Types_of_Data_and_DataTypes_annotated
INTRODUCTION TO DATA SCIENCE
Overview of data science concepts.
Definitions of types of data, data types, and data categories.
Page 1: Types of Data, Data Types, and Data Category
Page 2: Recap of Last Week
Code example:
df = pd.read_csv('/kaggle/input/california-housing-prices/housing.csv')Display first five rows of the dataset using
df.head().Example data displayed:
Longitude, Latitude, Housing Median Age, Total Rooms, Total Bedrooms, Population, Households, Median Income, Median House Value, and Ocean Proximity.
Page 3: Types of Data
Introduction to various types of data relevant to data science.
Page 4: What is Data?
Definition of Data: Raw information, facts, or statistics in various forms (numbers, text, images, etc.).
Page 6: Broad Category of Data
Quantitative Data: Numerical data that can be measured.
Discrete Data: Countable values (e.g., number of cars, laptops).
Continuous Data: Measurable values (e.g., height, weight).
Qualitative Data: Descriptive data that can be categorized but not counted.
Includes Structured Data and Unstructured Data.
Page 7: Categories of Data
Types of data we will cover in the course:
Tabular, Text, Images, JSON, XML, HTML, Audio.
Page 8: Tabular Data
Definition: Structured data organized in rows and columns; resembles spreadsheets or database tables.
Examples: Demographic information, grades, etc.
Page 9: Text Data
Examples include reviews, articles, emails, and social media posts.
Focus on natural language and human-readable text.
Page 10: Graph Data
Represents relationships between entities using nodes and edges.
Examples: Social connections, websites, network traffic.
Page 11: Unstructured Data
Lacks predefined structure; challenging to analyze.
Examples:
Videos (e.g., Tik Tok)
Images (James Webb, faces, handwriting)
Audio (Alexa, music)
Biometrics (fingerprints, facial recognition)
Haptics (phone notifications)
Page 12: More Examples of Different Types of Data
Tabular Data: Heights of class members.
Graph Data: Social networks and dependencies, coursework prerequisites.
Geo Data: Flight paths, weather patterns.
Page 13: Raw and Hierarchical Data
Raw Data: Images, video, audio, telemetry data.
Hierarchies:
Taxonomy, family trees, file directories.
Page 14: Data Formats
Common formats: CSV, image formats (.jpg, .png), audio formats (.wav, .mpg), SQL databases.
Page 15: CSV/TSV Formats
CSV (Comma-Separated Values): Plain-text format.
TSV (Tab-Separated Values): Rows and columns separated by tabs.
These formats facilitate data import/export across various tools.
Page 16: Tabular Data - Example
An example CSV file (classic rock playlist).
Structure includes Artist, Music, Album, Year, Genre.
Page 17: Tabular Data Representation
Example format of CSV file:
`Artist, Music, Album, Year, Genre`.
Use Python's pandas library for data manipulation.
Page 18: Data Format: Images
Image Data: Visual content properties—colors, shapes, pixel values.
Page 19: Pixel Structures in Images
Images composed of pixels with organized grids.
Each pixel holds color information (RGB channels).
Page 20: Image Compression
Lossy Compression: Reduces size by sacrificing some data (e.g., JPEG).
Lossless Compression: Retains quality, used for critical images (e.g., PNG).
Page 21: Databases
Definition: Organized collections of structured information stored electronically.
Manages complex data relationships efficiently.
Page 23: JSON - JavaScript Object Notation
Lightweight data interchange format, easy for humans and machines.
Used in web APIs and client-server communication.
Page 24: JSON Structure
Represents data with key-value pairs; organized hierarchically.
Supports various data types such as strings, numbers, arrays, objects, etc.
Page 25: JSON Example
Example showcasing the structure of JSON data:
Demonstrates nested data with objects and arrays.
Page 26: JSON in Python
Use
jsonmodule to work with JSON data in Python:json.dumps(): Convert Python objects to JSON format.json.loads(): Convert JSON back to Python objects.
Page 27: XML / HTML
HTML: Used for webpage creation; predefined tags for content.
XML: Used for data transport and storage; allows custom tags.
Page 30: Data Acquisition Methods
Sources to get data:
Provided by companies.
Gathered from databases and the internet.
Using RESTful APIs.
Page 31: Beautiful Soup
Python library for parsing HTML and XML.
Facilitates web scraping and data extraction.
Page 32: RESTful APIs
Structured way to access web data; relies on requests and responses.
Documentation is crucial for proper usage and data interpretation.
Page 34: Data Types
Overview of data types in the context of data science.
Page 36: Broad Data Categories
Revisits key data categories:
Quantitative (Discrete, Continuous) and Qualitative.
Page 37: Data Categories Defined
Further classification of data:
Continuous or Discrete.
Categorical or Non-Categorical.
Ordinal or not?
Page 38: Discrete vs Continuous Attributes
Discrete Attribute: Finite/countable values (e.g., zip codes).
Continuous Attribute: Real numbers (e.g., weight measures).
Page 40: Types of Attribute Values
Nominal: Categorical values (e.g., profession).
Ordinal: Values with order (e.g., rankings).
Binary: Only two states (0 and 1).
Interval: Equal size units meaningful differences (e.g., temperature).
Ratio: Both differences and ratios are meaningful (e.g., length).
Page 46: Summary
Types of Data: Impact on data preparation in data science.
File Formats: Essential for data ingestion and transformation.
Databases: Central to data management.
Data Acquisition: RESTful APIs and web scraping as data gathering methods.