Lecture 1 Intro to Data Science
What is Data?
Definition: Data refers to facts and statistics collected for reference or analysis.
Aspects of Data:
Data comes from facts and statistics.
Data is collected.
Data is used for reference or analysis (focus of this course).
Terminology:
"Data" is plural for "datum".
Definitions:
Objects: Individual entries (e.g., people in a company).
Features: Characteristics of objects (e.g., salary and age).
Structured Data: Organized in a predictable format (e.g., databases).
Example: Data in tables.
Unstructured Data: Lacks a specific format (e.g., text documents).
Example: Emails, social media posts.
Semi-structured Data: Hybrid form; lacks strict structure yet contains tags (e.g., XML or JSON).
Introduction to Data Science
Definition: Field focusing on techniques and algorithms to extract insights from data.
Objective: Analyze data to gain valuable insights using advanced tools and methodologies.
Future Relevance: Integral to artificial intelligence development.
Importance of Data Science
Data is likened to oil; it fuels modern business intelligence.
Advantages:
Informed decision-making.
Predictive analytics for future trends.
Detection of patterns and anomalies.
Facilitates machine intelligence (AI).
Enables sentiment analysis to understand consumer behavior.
Tools and Languages:
Power BI: Works with structured data.
MS Excel: Limited for big data applications.
Python: Essential for data processing.
R Language: Powerful for statistical analysis.
Data Storage:
Apache Hadoop: For storing big data.
Apache Spark: For processing large datasets.
Visualization Tools:
Tableau, Matplotlib: Aid in data visualization.
Data Science Components
Big Data: Daily production of significant unstructured data.
Machine Learning: Core of data science, mimicking human cognitive functions.
Business Intelligence: Helps make informed business choices through data analysis.
Statistics: Provides methods for data analysis and interpretation.
Domain Expertise: Specialized knowledge critical to data context.
Data Engineering: Involves data manipulation and transformation.
Visualization: Represents data visually for better understanding.
Mathematics: Fundamental for quantitative analysis.
Applications of Data Science
Healthcare: Enhances disease detection and treatment.
Fraud Detection: Safeguards banking transactions.
Image Recognition: Powers social media tagging and filtering.
Recommendation Systems: Offers personalized suggestions on platforms like Netflix and Amazon.
Dynamic Pricing: Adjusts prices based on market data.
Sentiment Analysis: Assesses public perception and brand loyalty.
Self-driving Cars: Relies on data science for navigation and decision-making.