Quanti-Reviewer-Flashcards

QUANTITATIVE METHODS

Quantitative methods – refer to a set of statistical and mathematical techniques used to analyze and interpret data in order to draw conclusions and make predictions.

Objectivity – one of the main advantages of quantitative methods is that they provide an objective approach to research.

Modeling techniques – involve using statistical models to represent relationships between variable in data.

  • This includes linear regression models, logistic regression models, and ANOVA (analysis of variance) models, among others.
  • Modeling techniques are used to identify patterns or relationships in the data and to make predictions about future outcomes.

Steps in quantitative research process

Defining the research question – the first step in any research project is to clearly define the research question or problem that you want to investigate.

Selecting a sample – once you have define your research question, you need to select a representative sample from the population you are studying. This typically done through random sampling techniques.

Collecting data – data can be collected through a variety of methods, including surveys, questionnaires, experiments, or observational studies.

Analyzing data – once the data has been collected, it needs to be analyzed using statistical methods to identify patterns and relationship between variables.

Drawing conclusions – based on the results of the data analysis, you can draw conclusions and make recommendations.

Some quantitative methods

Surveys - This involve asking a set of standardized questions to a sample of individuals in order to collect data on their attitudes, beliefs, or behaviors.

Experiments – involve manipulating one or more variables in a controlled setting to measure the effect on other variables

Observational studies – involve observing and recording data on a particular phenomenon without manipulating any variables

Statistical analysis – involves using mathematical techniques to analyze data, such as calculating means, standard deviations and correlations.

Analytics – is the process of using data, statistical and quantitative methods, and computational techniques to uncover insights and make informed decisions.

Defining the problem – understanding the business problem or question that needs to be answered.

Data collection – collecting relevant data from various sources, including internal and external sources.

Data preparation - cleaning, transforming, and structuring the data to make it ready for analysis.

Exploratory data analysis – analyzing and visualizing the data to understand the patterns and relationships

Statistical analysis – using statistical methods to draw insights and make inferences from the data

Predictive modeling – building models to predict future outcomes or behaviors based on historical data

Prescriptive analytics – providing recommendations or suggestions for action based on the insights and predictions from the data

Types of analytics

Descriptive analytics – descriptive analytics helps to summarize and describe past events and trends.

Diagnostic analytics – diagnostic analytics helps to identify the root cause of a problem or issue

Predictive analytics – uses statistical models and machine learning algorithms to predict future outcomes based on historical data.

Prescriptive analytics – helps to identify the best course of action to take in a given situation

Descriptive statistics – is a branch of statistics that involves the collection, presentation, and analysis of data.

  • The main goal of descriptive statistics is to summarize and describe the key features of a data set.

Measures of central tendency – the first step in analyzing a data set is to determine its measures of central tendency.

  • These measures help to describe the typical value or behavior of the data set.

Mean – is the average of all the data points in a data set, it is calculated by adding up all the data points and dividing by the total number of data points.

Median – is the middle value in a data set. To calculate the meadian, the data must first be sorted from the smallest to largest.

  • If there is an even number of data points, then the median is the average of the two middle values

Mode – is the most frequently occurring value in a data set.

  • If there is no value that occurs more than once, the data set has no mode.

Measures of variability - describe the spread or dispersion of a data set.

Range – is the difference between the largest and smallest values in a data set

Variance - measures how spread out the data is from the mean. It is calculated by taking the sum of the squared differences between each data point and the mean, and then dividing by the total number of data points minus one.

Standard deviation – is the square root of the variance. It is a measure of how spread out the data from the mean.

Frequency distribution – is a way to summarize categorical data. It shows how often each category occurs in the dataset

Inferential statistics – are used to draw conclusion or make prediction about a population based on a sample of data. This includes hypothesis testing, confidence intervals, and regression analysis

Hypothesis testing – is a technique used to test a hypothesis about a population parameter, such as the mean or proportion.

  • It involves comparing the sample statistics to the expected values under the null hypothesis, and calculating the probability of observing the sample data given the null hypothesis.

Confidence intervals – are a range of values around a sample statistic that is likely to contain the true population parameter with a certain level of confidence. The confidence level is typically set at 95% or 99%.

Regression analysis – is a technique used to model the relationship between two or more variables. It allows researchers to make predictions about the dependent variable based on the values of the independent variables.

Chi squared test – is a technique used to test the independence of two categorical variables. It involves comparing the observed frequencies to the expected frequencies to the expected frequencies under the null hypothesis.

Mathematical modeling – involves using mathematical equations to describe and predict the behavior of complex systems. This includes techniques such as linear and nonlinear models, optimization and simulations.

  • Is the process of creating a mathematical representation of a real-world system or problem.

Process of mathematical modeling

Define the problem – the first step in mathematical modeling is to define the problem or system being studied. This involves identifying the key variables, parameters, and relationships that are relevant the problem.

Formulate the model – once the problem has been defined the next step is to create a mathematical model that represents the system or phenomenon. This involves choosing the appropriate equations, variables, and parameters to describe the behavior of the system.

Analyze the model – after the model has been formulated, it is analyzed to understand its properties and behavior. This may involve solving the equations analytically, simulating the model numerically, or using other computational techniques.

Validate the model – once the model has been validated, it can be used to explore the behavior of the system under different conditions, make predictions about future outcomes, or optimize the performance of the system, it can be used to make predicts the behavior of the system, it can be used to make predictions or test hypothesis.

Use the model – once the model has been validated, it can be used to explore the behavior of the system under different conditions, make predictions about future outcomes or optimize the performance of the system.

Data visualization – involves creating graphical representations of data to communicate patterns and relationship of data to communicate patterns and relationship. This includes techniques such as scatter plots, histogram and box plots.

Probability theory – is a branch of mathematics focusing on the analysis of random phenomena. It is an important – skill for data scientists using data affected by chance.

  • Commonly used by data scientist to model situations where experiments, conducted during similar circumstances, yield different results
  • A tool employed by the researchers, businesses, investment analysts and countless other for risk management and scenario analysis.

Classical – also known as the axiomatic method, this type of probability involves a set of axioms (rules) attached to it.

Relative frequency – this involves looking at the occurrence ratio of a singular event in comparison to the total number of outcomes. This type of probability is often used after the data from an experiment has been gathered to compare a subset of data to the total amount of collected data.

Subjective probability – when using the subjective approach probability is the likelihood of something happening based on one’s experiences or personal judgment. Here, there are no formal calculations for subjective probability for it is based on one’s beliefs, judgment and personal reasoning.

Epidemiology – is the science of disease distribution. Researchers in this field study disease frequency, assessing how the probability differs across groups of people.

Insurance – who are often employed in the insurance industry make primary use of probability, statistics and other data science tools to calculate the probability of uncertain future events occurring over a period of time.

Small business – where owners cannot always turn to their hunches and instincts to run a successful company.

Classical – is used when all probable outcomes have an equal likelihood of happening and every outcome is known in advance.

Relative frequency – offers the advantage of being able to handle scenarios where outcomes have different theoretical probability of occurring. This approach can also manage a probability situation where possible outcomes are unknown.

Subjective – problems that benefit from subjective probability are those that require some level of belief to make possible.

Quantitative Methods - refer to a set of statistical and mathematical techniques use to measure, quantify, analyze and interpret data in order to draw conclusions and make predictions.

- provide a systematic and objective approach to gathering, analyzing, and interpreting data.

- refer to the collection and analysis of numerical data to answer research questions/ test hypotheses.

Importance Quantitative methods:

Objectivity -

Precision -

Quantitative Research Process:

1. Defining the Research question - to clearly define the research question or problem that want to investigate.

2. Selecting Sample - to select a representative sample from the population you are 333studying.

Collecting Data - data can be collected through a variety of methods including Surveys/Questionnaires - a set of standardized questions in order to collect data.

Experiments - manipulating one/more variables in a controlled setting to measure the effect on other variables.

Observational studies - observing and recording data on a particular phenomenon w/o manipulating any variables.

Statistical Analysis - using mathematical techniques to analyze data, such as calculating mean, standard deviations and correlations.

3. Analyzing Data - it needs to be analyzed using statistical methods to identify patterns and relationships between variables.

4. Drawing Conclusions - draw conclusions and make recommendations

Types of statistical analysis

Descriptive statistic

- summarizing and describing the main features of the data that has been collected.

- provide a snapshot of the data and for identifying patterns or trend.

- includes measures of central tendency and measures of dispersion/variability.

Inferential statistic

-the data that has been collected from the sample to make inferences about the larger population.

- are used to test hypothesis, make predictions, and draw conclusions about the population based on the sample data.

- involves probability theory

Modeling Techniques

- represent relationships between variables in the data.

- used to identify relationships in the data and to make predictions about future outcomes.

- includes linear regression models, logistic regression models, ANOVA models.

Role of Statistical analysis

- provide a framework for collecting, organizing, analyzing, interpreting and presenting data.

- draw meaningful conclusions, make predictions, and inform decision-making based on data.

LESSON 1

Analytics - is the process of using data, statistical and QM and computational techniques to uncover insights and make informed decisions.

Data Analytics Process:

1. Defining Problem - understanding the business problem/question that need to be answered.

2. Data Collection - collecting relevant data from various sources, including internal and external answered.

3. Data Preparation - cleaning, transforming and structuring the data to make it ready for analysis.

4. Exploratory Data Analysis- analyzing and visualizing the data to understand the patterns and relationships.

5. Statistical Analysis - using statistical methods to draw insights and inferences from the data.

6. Predictive Modeling - building models to predict future outcomes or behaviors based on historical data.

7. Prescriptive Analytics - providing recommendations for action based on the insights and predictions from the data.

Types of Analytics :

Descriptive Analytics - Helps to summarize and describe past events and trends.

Diagnostic Analytics - Helps to identify the root cause of a problem or issue .

Predictive Analytics - uses statistical models and machine testing algorithms to predict future outcomes based on historical data.

Prescriptive Analytics - to identify the best course of action to take in a given situation.

LESSON 2

Common Quantitative Methods

Descriptive Statistic - is a branch of statistic that involves the collection, presentation and analysis of data.

- main goal is to summarize and describe the key features of a data set.

- can be done using various measure of central tendency(mean, median and mode) and measures of variability(range, variance and standard deviation) .

Types of Data:

Categorical Data - is non-numeric data that is typically grouped into categories/labels(gender or color) .

Numerical Data - is quantitative data that can be measured and expressed as numbers(age / weight).

Common techniques Descriptive Statistic:

Measures of Central Tendency

- these measures help to describe the typical value of the tendency.

Common measures of Central Tendency:

1. Mean - is the ave of all data points in a data set. Mean is sensitive to outliers.

Mean = (sum of values) / (no. of values)

2. Median - is the middle value in a data set. Median is less sensitive to outliers.

If the no. of values is EVEN, add the 2 middle value and divide to 2.

Ex. 1, 2, 3, 4, 5, 6

(3 +4) / 2 = 3.5

Note: must arranged first the values in smallest to largest order.

3. Mode - is the most frequently occurring value in a data set.

Measures of Variability

- describe the spread / dispersion of a data set.

-

Common measures of Variability:

1. Range - is the difference between the largest and smallest in a data set.

2. Variance - it’s how spread out the data is from the mean.

3. Standard Deviation - is the square root of the variance.

Frequency Distribution - is a way to summarize the categorical data.

Data Presentation

- most important aspects of descriptive statistics.

- Data can be presented in various forms(tables, charts or graphs).

- presentation format is depends n the types of data and the purpose of the analysis.

- Table are useful for categorical data (such as frequency distributions/contingency tables).

- Charts and Graphs are often use to present numerical data(such as Histograms/Scatter Plots).

Inferential Statistic - are used to draw conclusions or make predictions about a population based on a sample of data.

Common techniques in inferential Statistics :

Hypothesis testing - is a technique use to test a hypothesis about a population parameter.

Confidence Intervals - are a range of values around a sample statistic that is likely to contain the true population parameter with a certain level of confidence. Confidence level is typically set at 95% or 99%.

Regression Analysis - is a technique used to model the relationship between two/more variables.

- Allow researcher to make predictions about dependent variable based on the values of the independent variables.

Analysis Of Variance (ANOVA) - is a technique used to compare the means of three or more groups.

- It test whether the difference between the groups are significant or due to chance.

Chi-squared Test - is a technique used to test the independence of two categorical variable.

- it compare the observes frequencies to the expected frequencies under the null hypothesis.

Mathematical Modeling - using mathematical equations to describe and predict the behavior of complex systems.

- includes linear and non-linear models, optimization and simulation.

- powerful tool for understanding complex systems and making predictions about their behavior.

Process of mathematical Modeling -

Define the problem - identify the key variables, parameters and relationships that are relevant to the problem.

Formulate the model - create a mathematical model that represents the system/phenomenon by choosing the appropriate equations, variables, and parameters to describe the behavior of the system.

Analyze the model - solve the equations analytically, simulate the model numerically, or use other computational techniques.

Validate the Model - validate the model by comparing its predictions to real-world data/ experimental results.

Use the Model - explore the behavior of the system under different conditions, make predictions about future outcomes or optimize the performance of the system.

Data visualization - creating graphical representations of data to communicate patterns and relationships.

- includes scatter plots, histograms and box plots.

- effective visualization requires careful planning and design.

Probability Theory

- branch of mathematics

- it’s an important skill for data scientist using data effected by chance.

3 Types of Probability

Classical - also known axiomatic method, is a set of axioms(rules) attached to it.

Relative Frequency - is often used after data from the experiment has been gathered to compare a subset of data to the total amount of collected data.

Subjective probability - no formal calculations, its based on one’s beliefs, judgement and personal reasoning.