Introduction to Data Science

Data Science is the **process of extracting knowledge and insights from data by using scientific methods**. Scientific methods include programming, statistics and business.

Key programming languages used in data science are Matlab, C, C++, Python, SQL, etc

Data Life Cycle consists of the different stages data goes through, the cycle is:

**BUISNESS REQUIREMENT → DATA AQUISITION → DATA PROCESSING → DATA EXPLORATION → MODELLING → DEPLOYMENT**

Data refers to facts and statistics collected together for reference and analysis, Thus, without statistics the data gathered would be useless.

Data is of two types. It can be either **qualitative** data or **quantitative** data.

Qualitative data deals with characteristics and descriptors that can’t be easily measured but can be observed subjectively. It is of two types, Nominal Data and Ordinal data.

**Nominal Data** is data with no inherit order or ranking. eg, gender, race, etc.

**Ordinal Data** is data with an ordered series. eg data in a table.

Quantitative data deals with numbers and things that can be measured objectively. It is also of two types, discrete data and continuous data.

**Discrete Data** also known as categorial data can hold finite number of possible values. eg number of students in a classroom.

**Continuous Data** is data that can hold an infinite number of values. eg weight.

Statistics is an area of applied mathematics concerned with data collection, analysis, interpretation and presentation.

**Population**: It is a collection or set of individuals or objects or events whose properties are to be analyzed.

**Sample**: It is a subset of the population.

**Sampling**: It is a statistical method that deals with the selection of individual observations within a population. It is performed in order to infer statistical knowledge about a population.

Data Science is the **process of extracting knowledge and insights from data by using scientific methods**. Scientific methods include programming, statistics and business.

Key programming languages used in data science are Matlab, C, C++, Python, SQL, etc

Data Life Cycle consists of the different stages data goes through, the cycle is:

**BUISNESS REQUIREMENT → DATA AQUISITION → DATA PROCESSING → DATA EXPLORATION → MODELLING → DEPLOYMENT**

Data refers to facts and statistics collected together for reference and analysis, Thus, without statistics the data gathered would be useless.

Data is of two types. It can be either **qualitative** data or **quantitative** data.

Qualitative data deals with characteristics and descriptors that can’t be easily measured but can be observed subjectively. It is of two types, Nominal Data and Ordinal data.

**Nominal Data** is data with no inherit order or ranking. eg, gender, race, etc.

**Ordinal Data** is data with an ordered series. eg data in a table.

Quantitative data deals with numbers and things that can be measured objectively. It is also of two types, discrete data and continuous data.

**Discrete Data** also known as categorial data can hold finite number of possible values. eg number of students in a classroom.

**Continuous Data** is data that can hold an infinite number of values. eg weight.

Statistics is an area of applied mathematics concerned with data collection, analysis, interpretation and presentation.

**Population**: It is a collection or set of individuals or objects or events whose properties are to be analyzed.

**Sample**: It is a subset of the population.

**Sampling**: It is a statistical method that deals with the selection of individual observations within a population. It is performed in order to infer statistical knowledge about a population.