knowt logo

Introduction to Data Science

What is Data Science?

Data Science:

Data Science is the process of extracting knowledge and insights from data by using scientific methods. Scientific methods include programming, statistics and business.

Key programming languages used in data science are Matlab, C, C++, Python, SQL, etc

Data Life Cycle

Data Life Cycle consists of the different stages data goes through, the cycle is:

BUISNESS REQUIREMENT → DATA AQUISITION → DATA PROCESSING → DATA EXPLORATION → MODELLING → DEPLOYMENT

Role of Statistics and Probability

Data refers to facts and statistics collected together for reference and analysis, Thus, without statistics the data gathered would be useless.

Data is of two types. It can be either qualitative data or quantitative data.

Qualitative Data

Qualitative data deals with characteristics and descriptors that can’t be easily measured but can be observed subjectively. It is of two types, Nominal Data and Ordinal data.

Nominal Data is data with no inherit order or ranking. eg, gender, race, etc.

Ordinal Data is data with an ordered series. eg data in a table.

Quantitative Data

Quantitative data deals with numbers and things that can be measured objectively. It is also of two types, discrete data and continuous data.

Discrete Data also known as categorial data can hold finite number of possible values. eg number of students in a classroom.

Continuous Data is data that can hold an infinite number of values. eg weight.

Statistics

Statistics is an area of applied mathematics concerned with data collection, analysis, interpretation and presentation.

Important Statistics Terminology

Population: It is a collection or set of individuals or objects or events whose properties are to be analyzed.

Sample: It is a subset of the population.

Sampling: It is a statistical method that deals with the selection of individual observations within a population. It is performed in order to infer statistical knowledge about a population.

Introduction to Data Science

What is Data Science?

Data Science:

Data Science is the process of extracting knowledge and insights from data by using scientific methods. Scientific methods include programming, statistics and business.

Key programming languages used in data science are Matlab, C, C++, Python, SQL, etc

Data Life Cycle

Data Life Cycle consists of the different stages data goes through, the cycle is:

BUISNESS REQUIREMENT → DATA AQUISITION → DATA PROCESSING → DATA EXPLORATION → MODELLING → DEPLOYMENT

Role of Statistics and Probability

Data refers to facts and statistics collected together for reference and analysis, Thus, without statistics the data gathered would be useless.

Data is of two types. It can be either qualitative data or quantitative data.

Qualitative Data

Qualitative data deals with characteristics and descriptors that can’t be easily measured but can be observed subjectively. It is of two types, Nominal Data and Ordinal data.

Nominal Data is data with no inherit order or ranking. eg, gender, race, etc.

Ordinal Data is data with an ordered series. eg data in a table.

Quantitative Data

Quantitative data deals with numbers and things that can be measured objectively. It is also of two types, discrete data and continuous data.

Discrete Data also known as categorial data can hold finite number of possible values. eg number of students in a classroom.

Continuous Data is data that can hold an infinite number of values. eg weight.

Statistics

Statistics is an area of applied mathematics concerned with data collection, analysis, interpretation and presentation.

Important Statistics Terminology

Population: It is a collection or set of individuals or objects or events whose properties are to be analyzed.

Sample: It is a subset of the population.

Sampling: It is a statistical method that deals with the selection of individual observations within a population. It is performed in order to infer statistical knowledge about a population.