Data Science - UNIT-1- Notes
Unit I: Introduction to Microsoft Excel
Creating Excel Tables
Data in Excel must be structured in rows and columns.
To convert data into a table, Excel must be specifically instructed.
Features of Excel Tables:
Dynamic (expand and contract automatically).
Integrated sort and filter options.
Easy formatting with built-in table styles.
Quick totals allow summation, counting, and averaging.
Calculated columns enable easy computation over entire columns.
Dynamic charts adjust automatically with table data changes.
Basic Operations in Excel
Addition, Subtraction, Multiplication, Division.
Understand how to apply operators correctly.
Formulas and Functions
Excel utilizes logical functions (AND, OR, NOT) for complex calculations.
Formulas begin with an equal sign (=).
Function library contains over 400 functions.
Data Validation, Filters, Grouping
Excel Data Validation restricts inputs to specified criteria.
Filters allow users to view only specific data.
Grouping permits users to collapse/expand data for better organization.
Visualizing Data Using Charts
Charts help represent data graphically, enhancing data comprehension.
Importing Data into Excel
XML, CSV, and MS Access data can be imported directly into Excel.
Each import method has specific steps to follow for successful data integration.
Unit II: Introduction to Data Science
What is Data Science?
Engages methods from statistics, process automation, AI, etc.
Probability Theory
Bayes theorem and its applications in predictive modeling.
Introduction to SQL
Basic operations: creation, insertion, deletion, retrieval of tables.
Hands-on practice with SQL to reinforce learning.
Unit III: Data Science Components
Tools for Data Science
Familiarize with popular data science tools and their functionalities.
Machine Learning
Differentiate between types of machine learning: supervised, unsupervised, reinforcement.
List key algorithms for classification, clustering, and feature selection.
Linear Regression and Logistic Regression
Understanding the concepts and their applications.
Gaussian distribution and its importance in predicting outcomes.
Standard Normal Probability Distribution
Z-scores and the Central Limit Theorem are crucial for statistical analysis.
Unit IV: Data Visualization
Graphical Representation
Use scatter plots, charts, graphs, histograms, and maps for data visualization.
Statistical Analysis
Descriptive statistics: Mean, Standard Deviation, Frequency, and Percentage.
Discuss applications of data science and its life cycle.
Page 1: Excel Basics
Excel Workbook
Workbook = entire Excel file.
Default name Book1, contains three sheets in earlier versions.
Worksheet
Collection of cells for data, default sheets can be renamed easily.
Page 2: Data Types and Structures
Cell Name and Data Types
Cells are named by their location, e.g., A1.
Data types include Labels, Number data, and Formulas.
Formula Bar
Area to build formulas, starting with '='.
Ribbon Functionality
Ribbon contains the tools and functionalities needed for tasks.
Page 3: Creating and Utilizing Tables
Excel Tables
Enable dynamic management of sets, offering sorting, filtering, and calculated columns.
Creating a Table in Excel
Steps: Select data, Use ‘Insert’ or ‘Home’ tab, Confirm with ‘Create Table’ dialog.
Page 4: Table Features
Advantages of Excel Tables
Table headers are always visible.
Quick Total row for easy calculations.
Tables expand automatically on adjacent entries.
Dynamic Design
Tables adjust visually as data is added or removed.
Page 5: Functions and Operators in Excel
Types of Operators
Arithmetic, comparison, text concatenation, and reference operators.
Understand syntax and functions for different operations.
Page 6: Logical Functions
Excel provides four logical functions:
AND, OR, XOR, NOT: facilitate complex decision-making formulas.
Page 7: Exercises
Engage in practical exercises to reinforce arithmetic, logical operations, and function usage.
Key Functions to Know
Common Functions:
SUM, AVERAGE, MAX, MIN, IF, TRIM, LEN, CONCATENATE.
Data Validation Options: Apply restrictions on cell input, including lists and ranges.
In each following page, the content builds upon the foundation established in previous pages by diving deeper into the functionalities of Excel including data validation techniques, filtering, grouping, importing from different sources, and hands-on applications in data science and visualization techniques. Each section reflects progressive complexity tailored to beginner through advanced users.