Data Science - UNIT-1- Notes

Unit I: Introduction to Microsoft Excel

  • Creating Excel Tables

    • Data in Excel must be structured in rows and columns.

    • To convert data into a table, Excel must be specifically instructed.

    • Features of Excel Tables:

      • Dynamic (expand and contract automatically).

      • Integrated sort and filter options.

      • Easy formatting with built-in table styles.

      • Quick totals allow summation, counting, and averaging.

      • Calculated columns enable easy computation over entire columns.

      • Dynamic charts adjust automatically with table data changes.

  • Basic Operations in Excel

    • Addition, Subtraction, Multiplication, Division.

    • Understand how to apply operators correctly.

  • Formulas and Functions

    • Excel utilizes logical functions (AND, OR, NOT) for complex calculations.

    • Formulas begin with an equal sign (=).

    • Function library contains over 400 functions.

  • Data Validation, Filters, Grouping

    • Excel Data Validation restricts inputs to specified criteria.

    • Filters allow users to view only specific data.

    • Grouping permits users to collapse/expand data for better organization.

  • Visualizing Data Using Charts

    • Charts help represent data graphically, enhancing data comprehension.

  • Importing Data into Excel

    • XML, CSV, and MS Access data can be imported directly into Excel.

    • Each import method has specific steps to follow for successful data integration.

Unit II: Introduction to Data Science

  • What is Data Science?

    • Engages methods from statistics, process automation, AI, etc.

  • Probability Theory

    • Bayes theorem and its applications in predictive modeling.

  • Introduction to SQL

    • Basic operations: creation, insertion, deletion, retrieval of tables.

    • Hands-on practice with SQL to reinforce learning.

Unit III: Data Science Components

  • Tools for Data Science

    • Familiarize with popular data science tools and their functionalities.

  • Machine Learning

    • Differentiate between types of machine learning: supervised, unsupervised, reinforcement.

    • List key algorithms for classification, clustering, and feature selection.

  • Linear Regression and Logistic Regression

    • Understanding the concepts and their applications.

    • Gaussian distribution and its importance in predicting outcomes.

    • Standard Normal Probability Distribution

      • Z-scores and the Central Limit Theorem are crucial for statistical analysis.

Unit IV: Data Visualization

  • Graphical Representation

    • Use scatter plots, charts, graphs, histograms, and maps for data visualization.

  • Statistical Analysis

    • Descriptive statistics: Mean, Standard Deviation, Frequency, and Percentage.

    • Discuss applications of data science and its life cycle.


Page 1: Excel Basics

  • Excel Workbook

    • Workbook = entire Excel file.

    • Default name Book1, contains three sheets in earlier versions.

  • Worksheet

    • Collection of cells for data, default sheets can be renamed easily.

Page 2: Data Types and Structures

  • Cell Name and Data Types

    • Cells are named by their location, e.g., A1.

    • Data types include Labels, Number data, and Formulas.

  • Formula Bar

    • Area to build formulas, starting with '='.

  • Ribbon Functionality

    • Ribbon contains the tools and functionalities needed for tasks.

Page 3: Creating and Utilizing Tables

  • Excel Tables

    • Enable dynamic management of sets, offering sorting, filtering, and calculated columns.

  • Creating a Table in Excel

    • Steps: Select data, Use ‘Insert’ or ‘Home’ tab, Confirm with ‘Create Table’ dialog.

Page 4: Table Features

  • Advantages of Excel Tables

    • Table headers are always visible.

    • Quick Total row for easy calculations.

    • Tables expand automatically on adjacent entries.

  • Dynamic Design

    • Tables adjust visually as data is added or removed.

Page 5: Functions and Operators in Excel

  • Types of Operators

    • Arithmetic, comparison, text concatenation, and reference operators.

    • Understand syntax and functions for different operations.

Page 6: Logical Functions

  • Excel provides four logical functions:

    • AND, OR, XOR, NOT: facilitate complex decision-making formulas.

Page 7: Exercises

  • Engage in practical exercises to reinforce arithmetic, logical operations, and function usage.

Key Functions to Know

  • Common Functions:

    • SUM, AVERAGE, MAX, MIN, IF, TRIM, LEN, CONCATENATE.

  • Data Validation Options: Apply restrictions on cell input, including lists and ranges.

In each following page, the content builds upon the foundation established in previous pages by diving deeper into the functionalities of Excel including data validation techniques, filtering, grouping, importing from different sources, and hands-on applications in data science and visualization techniques. Each section reflects progressive complexity tailored to beginner through advanced users.