What is Data Science?
Data Science:
Data Science is the process of extracting knowledge and insights from data by using scientific methods. Scientific methods include programming, statistics and business.
Key programming languages used in data science are Matlab, C, C++, Python, SQL, etc
Data Life Cycle
Data Life Cycle consists of the different stages data goes through, the cycle is:
BUISNESS REQUIREMENT → DATA AQUISITION → DATA PROCESSING → DATA EXPLORATION → MODELLING → DEPLOYMENT
Role of Statistics and Probability
Data refers to facts and statistics collected together for reference and analysis, Thus, without statistics the data gathered would be useless.
Data is of two types. It can be either qualitative data or quantitative data.
Qualitative Data
Qualitative data deals with characteristics and descriptors that can’t be easily measured but can be observed subjectively. It is of two types, Nominal Data and Ordinal data.
Nominal Data is data with no inherit order or ranking. eg, gender, race, etc.
Ordinal Data is data with an ordered series. eg data in a table.
Quantitative Data
Quantitative data deals with numbers and things that can be measured objectively. It is also of two types, discrete data and continuous data.
Discrete Data also known as categorial data can hold finite number of possible values. eg number of students in a classroom.
Continuous Data is data that can hold an infinite number of values. eg weight.
Statistics
Statistics is an area of applied mathematics concerned with data collection, analysis, interpretation and presentation.
Important Statistics Terminology
Population: It is a collection or set of individuals or objects or events whose properties are to be analyzed.
Sample: It is a subset of the population.
Sampling: It is a statistical method that deals with the selection of individual observations within a population. It is performed in order to infer statistical knowledge about a population.