INTRO TO DS
INTRODUCTION TO DATA SCIENCE LECTURE NOTES
UNIT - 1 Introduction to Data Science
Definition of Data Science
- Data Science: A domain of study dealing with vast volumes of data using modern tools and techniques to find unseen patterns, derive meaningful information, and make business decisions.
- Uses complex machine learning algorithms to build predictive models.
- Analyzes data from various sources, in different formats.
- Involves extraction, preparation, analysis, visualization, and maintenance of information.
- A cross-disciplinary field using scientific methods and processes to draw insights from data.
Data Science Lifecycle
The Data Science Lifecycle consists of five distinct stages:
Capture:
- Data Acquisition, Data Entry, Signal Reception, Data Extraction.
- Gathering raw structured and unstructured data.Maintain:
- Data Warehousing, Data Cleansing, Data Staging, Data Processing, Data Architecture.
- Transforming raw data into usable formats.Process:
- Data Mining, Clustering/Classification, Data Modeling, Data Summarization.
- Examining patterns, ranges, and biases in prepared data to assess its value for predictive analysis.Analyze:
- Exploratory/Confirmatory, Predictive Analysis, Regression, Text Mining, Qualitative Analysis.
- Involves performing various analyses on the data.Communicate:
- Data Reporting, Data Visualization, Business Intelligence, Decision Making.
- Presenting analyses in readable formats such as charts and reports.
Evolution of Data Science: Growth & Innovation
- Emerged from the merging of applied statistics with computer science to leverage modern computing.
- 1962: John W. Tukey articulates the "data science" vision in "The Future of Data Analysis."
- 1977: Establishment of the International Association for Statistical Computing (IASC) to link statistical methods, computer technology, and domain expertise.
- 1980s and 1990s: Significant strides with the first Knowledge Discovery in Databases (KDD) workshop and the International Federation of Classification Societies (IFCS) founded.
- 1994: Business Week publishes about “Database Marketing.”
- 1990s and early 2000s: Growth of data science as a recognized field with the emergence of academic journals.
- 2000s: Increased internet connectivity enables massive data collection capabilities.
- 2005: Introduction of Big Data driven by companies like Google and Facebook, requiring technologies like Hadoop, Spark, and Cassandra.
- 2014: Demand for data scientists surges as organizations seek data-driven insights.
- 2015: Enters machine learning, deep learning, and Artificial Intelligence (AI) into data science.
- 2018: New regulations impacting data science practices emerge.
- 2020s: Breakthroughs in AI, machine learning, and increased demand for big data professionals continue.
Roles in Data Science
Data Analyst:
- Responsibilities: Visualization, munging, processing data, performing database queries.
- Key skills: SQL, R, SAS, Python.
- Important responsibilities include:
- Extracting data from sources.
- Maintaining databases.
- Performing data analysis and report generation with recommendations.Data Engineer:
- Responsibilities: Building and testing scalable Big Data ecosystems, updating systems for efficiency.
- Key skills: Hive, NoSQL, R, Ruby, Java, C++, Matlab.
- Important responsibilities include:
- Design and maintain data management systems.
- Data collection and management.
- Conducting research.Database Administrator:
- Responsibilities: Ensuring proper database functioning, managing data access services.
- Important responsibilities include:
- Database software and management.
- Designing and developing databases.
- Implementing security measures.Machine Learning Engineer:
- Responsibilities: Designing ML systems, testing systems, implementing algorithms.
- Key skills: SQL, REST APIs.
- Important responsibilities include:
- Developing ML systems.
- Researching ML algorithms.Data Scientist:
- Responsibilities: Understanding business challenges, performing predictive analysis.
- Key skills: R, Matlab, SQL, Python.