Introduction to Data Analysis (Engineering Data Analysis)

0.0(0)

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/97

Earn XP

Description and Tags

These flashcards provide key terms and definitions related to the concepts of data analysis covered in the course.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

98 Terms

New cards

What is The key Engineering Mindset?

-Statistics
-Decision Making

-Efficient Application

-Process Improvement

New cards

Statistics

a science that helps us make decisions and draw conclusions in the presence of variability

New cards

Decision Making

Engineers are the professionals that formulates [BLA-NK] based on technical data and reliable data

New cards

Efficient Application

An engineer is someone who solves problems of interest to society by the [BLA-NK] of scientific principles

New cards

[Process Improvement]~~(The Engineering Method)~~

Engineers solves the problem this by either refining an existing process or by designing a new process that meets clients’ needs

<p>Engineers solves the problem this by either <strong>refining </strong>an existing process or by <strong>designing </strong>a new process that meets clients’ needs</p>

New cards

Scientific Data

The use of statistical methods various sectors including industries, research and development, and many other areas involves the gathering of information or WAT?

New cards

Statistics my love

The field of [~~MEOW~~?] deals with the collection, presentation, analysis, and use of data to make decisions, solve problems, and design products and processes

New cards

Statistical techniques

[BLAAANK] can be powerful aids in designing new products and systems, improving existing designs, and designing, developing, and improving production processes.

New cards

THATS THE STARING LINE LMAO

HARDMODE IN COMIN love u tho

New cards

Mean

It is simply numerical average

<p>It is simply <strong>numerical average</strong></p>

New cards

Other Mean Formula?

Geometric | Weighted

New cards

Median

It is the midpoint of the data array

<p>It is the <strong>midpoint </strong>of the data array</p>

New cards

Mode

It is the most frequent data value in a data set.

-If the data set is said to have 2 modes, it is bimodal.

-If the data set is said to have 3 modes, it is trimodal.

New cards

U GOOD? TAKE 3 mins rest

BREATHEEEEEEEEEEEEEEEEEEE

New cards

Fractiles and Quantiles

It is describes or locate the position of a certain piece of data relative to the entire set of data

New cards

Quartile

It is the fractile obtained by dividing the set of data into four equal parts

New cards

Four parts Of Quartile habibi?

Q1: lower quartile- which contains the lowest 25% of the data.

Q2: median- which divides the data into two equal parts.

Q3: higher quartile- which contains the lowest 25% of the data.

<p>Q1<strong>: lower quartile</strong>- which contains the lowest 25% of the data.</p><p>Q2: <strong>median</strong>- which divides the data into two equal parts.</p><p>Q3: <strong>higher quartile</strong>- which contains the lowest 25% of the data.</p>

New cards

Deciles?

It is the fractile obtained by dividing the set of data into ten equal parts

New cards

Percentile YA KNOW DIS YES?

It is the measure of position made by dividing the set up to one hundred equal parts

New cards

Finding Quartiles, Deciles and Percentiles lemme tell u :3

1. Arrange the data set from least to greatest. 2. Calculate the position of the fractile element by the formula:

$1. Arrange the data set from least to greatest. 2. Calculate the position of the fractile element by the formula:$

New cards

Five Number Summary

The group of numerical presentation of the following values:

- Minimum(Lowest Value)

- Q1: lower quartile

- Median

- Q3:higher quartile

- Maximum(Highest Value

<p>The group of numerical presentation of the following values: </p><p><strong>- Minimum(Lowest Value) </strong></p><p><strong>- Q1: lower quartile </strong></p><p><strong>- Median </strong></p><p><strong>- Q3:higher quartile </strong></p><p><strong>- Maximum(Highest Value</strong></p>

New cards

Box and Whisker Plot

The graphical representation of the five-number summary:

-Minimum(Lowest Value)

-Q1: lower quartile

-Median

-Q3:higher quartile

-Maximum(Highest Value)

-Fences

Lower Fence = Q1– 1.5IQR

UpperFence = Q3 +1.5IQR

<p>The graphical representation of the five-number summary:</p><p><strong>-Minimum(Lowest Value)</strong></p><p><strong>-Q1: lower quartile</strong></p><p><strong>-Median</strong></p><p><strong>-Q3:higher quartile</strong></p><p><strong>-Maximum(Highest Value)</strong></p><p><strong>-Fences</strong></p><p></p><p><strong>Lower Fence = Q1– 1.5IQR</strong></p><p><strong>UpperFence = Q3 +1.5IQR</strong></p>

New cards

GO TO SLEEP M8

LATE AT NIGHT UTUTUTUTUUTUTUTUTUTTUT

New cards

Range

The difference between the largest and smallest number in the set.

<p>The difference between the <strong>largest and smallest number</strong> in the set.</p>

New cards

Mean Absolute Deviation

The average of unsigned deviations from mean.

New cards

Standard Deviation

The measure of closeness of data value set to each other.

New cards

Variance

The measure of variation equal to the square of the standard deviation.

New cards

Coefficient of Variation

The percentage of the ratio of the standard deviation to the mean.

New cards

FAR MORE AWAY SO BREATHEEE AND MAYBE REST pls :3

U get kisses for finding this

New cards

Skewness

-The degree of asymmetry of distribution about a mean.

-Based on the computed values of skewness, with a certain degree of tolerance, the data distribution can be interpreted as symmetric, positively skewed or negatively skewed.

New cards

Skewness (skewness coefficient)

The skewness coefficient of the distribution can interpreted the skewness of a data set as shown:

<p>The <strong>skewness coefficient</strong> of the distribution can interpreted the skewness of a data set as shown:</p>

New cards

The skewness coefficient of the distribution can be determined using the following:

❑ Moment-Based Coefficient of Skewness

❑ Bowley’s Coefficient of Skewness

❑ Pearsonian Coefficient of Skewness

New cards

Moment-Based Coefficient of Skewness

The third degree moment from the mean

<p>The <strong>third degree moment</strong> from the mean</p>

New cards

Bowley’s Coefficient of Skewness

The coefficient of skewness based on three quartiles in a given data set.

<p>The coefficient of skewness based on <strong>three quartiles</strong> in a given data set.</p>

New cards

Pearsonian Coefficient of Skewness

Themost practical estimation of skewness of a data set

New cards

SAKIT SA MATA NYAN PERDS

HEHEHEHEHHEEHHEHEHEHE

New cards

Kurtosis

-The degree of peakedness of distribution about a normal distribution

-Based on the computed values of kurtosis, with a certain degree of tolerance, the data distribution can be interpreted as leptokurtic, platykurtic or mesokurtic

New cards

Kurtosis (The kurtosis of the distribution)

The kurtosis of the distribution can interpreted the skewness of a data set as shown:

<p>The <strong>kurtosis </strong>of the distribution can interpreted the skewness of a data set as shown:</p>

New cards

Kurtosis DATA YES DATA FORMULAAAAAAAAAAAA

The kurtosis of the data set can be computed as:

New cards

Data

-A factual information used as a basis for reasoning, discussion, or calculation

-A group of raw information from which statistics are created

New cards

Primary Data

a set of data obtained or directly generated by the researcher through surveys, interviews, experiments

New cards

Secondary Data

a set of existing data which were obtained by previous researcher by presenting it on a publication such as journals, research articles, census, reports, previous records by various institutions and organizations

New cards

Primary vs Secondary Data nigga data

New cards

Parameter

a numerical measurement describing some characteristic of a population.

New cards

Statistic

a numerical measurement describing some characteristic of a sample.

New cards

Types of Data:

New cards

Quantitative Data

A set of data consists of numbers representing counts or measurements with units

New cards

Discrete Data

refers to the data values that are finite or countable

New cards

Continuous Data

refers to the data value that result from infinitely many possible quantitative values (which is no longer countable

New cards

Types of Data AGAIN

Qualitative Data

A set of data consists of names, labels, (or even a number but it does not represent any count for measurement).
Nominal Data – characterized by data that consist of names, labels, or categories only and the data cannot be arranged in some order (such as low to high).
Ordinal Data – data that can be arranged in some order, but differences (obtained by subtraction) between data values either cannot be determined or are meaningless

JUST READ THE HIGHLTED IF SKIMMIN

New cards

Common Data Collection Methods

Sample Survey

Census

Experiment

Observational Study

New cards

Sample Survey

the data collection process in which a sample is collected and studied to gain information about a population

New cards

Census (Big Sample Survey)

the process of complete enumeration. It is a process of collecting information from every unit in the target population.

New cards

Experimental Study (Designed Experiment)

a study that makes deliberate or purposeful changes in the controllable variables of the system or process, observes the resulting system output, then makes an inference or decision about which variables are responsible for the observed changes in output performance

New cards

Experiment

a planned activity designed to compare “treatments.”

New cards

Observational Study

a data collection activity in which the experimenter merely plays the role of an observer

New cards

Retrospective Study

a studies that utilizes past or historical data collected (typically log records or past journal data)

New cards

Population

is the entire group of individuals or items in a study

New cards

Sample

a part of a population that is actually studied

New cards

Sampling Techniques

-Non-Probability

Convenience Sampling

Quota Sampling

Purposive Sampling

-Probability

Simple Random Sampling

Systematic Sampling

Cluster Sampling

Stratified Sampling

New cards

Non-Probability Sampling

❑Convenience Sampling– a sampling technique based primarily on the availability of the respondents (or experimental conditions)

❑Quota Sampling– a sampling technique where there is a desired number of sample and the respondents were taken as volunteers,

❑Purposive Sampling– a type of sampling where the sample is obtained based on a certain premise.

New cards

Probability Sampling - Simple Random Sampling

a sampling technique performed by arranging the population according to a certain rule, each element being numbered and a sample is taken by various randomizing principles.

<p> a sampling technique performed by arranging the population according to a certain rule, each element being numbered and a sample is taken by various randomizing <strong>principles.</strong></p>

New cards

Probability Sampling - Systematic Sampling

a sampling technique done by arranging the population in accordance to a certain order and the sample will be taken by dividing the population into equal groups and obtaining the kth element in each group

<p>a sampling technique done by arranging the population in accordance to a certain <strong>order</strong> and the sample will be taken by dividing the population into <strong>equal groups</strong> and obtaining the <strong>kth element</strong> in each group</p>

New cards

Probability Sampling - Stratified Sampling

a sampling technique performed by arranging the population according to strata or subpopulation, with generally homogeneous or similar characteristics. Then a random sampling is performed per strata proportional to the size of each strata.

<p>a sampling technique performed by arranging the population according to strata or <strong>subpopulation</strong>, with generally homogeneous or similar characteristics. Then a random sampling is performed per strata proportional to the size of each strata.</p>

New cards

Probability Sampling - Cluster Sampling

a sampling technique performed by arranging the population according to clusters, with elements as heterogeneous or diverse characteristics as possible.

<p>a sampling technique performed by arranging the population according to clusters, with elements as heterogeneous or diverse characteristics as possible.</p>

New cards

Qualitative Data (DESCRIPTIVE/CATEGORICAL)

-Bar Chart

-Pie Chart

-Frequency Polygon

-Pareto Chart

New cards

Bar Chart

❑ A bar chart is created by plotting all the categories in the data on one axis and the frequency (or relative frequency or percentages) of occurrence of each category in the data on the other axis.

❑ Either horizontal or vertical bars of height (or length) equal to the frequency are drawn.

<p>❑ A bar chart is created by plotting all the <strong>categories</strong> in the data on one axis and the <strong>frequency </strong>(or relative frequency or percentages) of occurrence of each category in the data on the other axis. </p><p>❑ Either <strong>horizontal </strong>or <strong>vertical bars</strong> of height (or length) equal to the frequency are drawn.</p>

New cards

Pie Chart

A circular chart representing the percentage occurrence of each group in the sample.

New cards

Frequency Polygon

A line graph that is used to compare the rate of increase and decrease per class interval (class marks versus frequency.

New cards

Pareto Chart

the process of sorting out the few vital causes from the trivial many.

Pareto Chart is a result of the ‘Pareto Principle’ due to imbalance of land distribution and ownership in Italy

“80% of consequences come from 20% of the causes”

<p>the process of sorting out the few vital causes from the trivial many.</p><p>Pareto Chart is a result of the <strong>‘Pareto Principle’ </strong>due to imbalance of land distribution and ownership in Italy</p><p><strong>“80% of consequences come from 20% of the causes”</strong></p>

New cards

Quantitative Data (NUMERICAL)

Stem and Leaf Diagram
Histograms
Scatter Diagrams
Boxplots
Time Sequence Plots

New cards

Stem and Leaf Diagram (Dotplot)

❑ The dotplot is a useful data display for small samples up to about 20 observations.

❑ If used for many data observations, dotplot is no longer efficient

New cards

Boxplot (Box-and-Whisker Plot)

A graphical display that simultaneously describes several important features of a data set, such as center, spread, departure from symmetry, and identification of unusual observations or outliers.

<p>A graphical display that simultaneously describes several important features of a data set, such as center, spread, departure from symmetry, and identification of unusual observations or outliers.</p>

New cards

Boxplot (Box-and-Whisker Plot)

The graphical representation of the five-number summary:
❑ Minimum(Lowest Value)

❑ Q1: lower quartile

❑ Median

❑ Q3:higher quartile

❑ Maximum(Highest Value)

❑ Fences

❑ Lower Fence = Q1– 1.5IQR

❑ UpperFence = Q3 +1.5IQR

<p>The graphical representation of the five-number summary:<br><strong>❑ Minimum(Lowest Value) </strong></p><p><strong>❑ Q1: lower quartile </strong></p><p><strong>❑ Median </strong></p><p><strong>❑ Q3:higher quartile </strong></p><p><strong>❑ Maximum(Highest Value) </strong></p><p><strong>❑ Fences </strong></p><p><strong>❑ Lower Fence = Q1– 1.5IQR </strong></p><p><strong>❑ UpperFence = Q3 +1.5IQR</strong></p>

New cards

Time Sequence Plots

❑ Time Series - a data set in which the observations are recorded in the order in which they occur.

❑ Time Series Plot - a graph in which the vertical axis denotes the observed value of the variable (x) and the horizontal axis denotes the time (which could be minutes, days, years, etc.).

❑ When measurements are plotted as a time series, we often see trends, cycles, or other broad features of the data that could not be seen otherwise

<p>❑ <strong>Time Series </strong>- a data set in which the observations are recorded in the order in which they occur.</p><p>❑ <strong>Time Series Plot </strong>- a graph in which the vertical axis denotes the observed value of the variable (x) and the horizontal axis denotes the time (which could be minutes, days, years, etc.).</p><p> ❑ When measurements are plotted as a time series, we often see <strong>trends, cycles,</strong> or other broad features of the data that could not be seen otherwise</p>

New cards

Time Sequence Plots

a graph combination of stem-and-leaf plot and time series plot.

New cards

Scatter Diagram

It is a graphical representation of possible relationship between two measurement variables.

New cards

Scatter Diagram - Illustration

New cards

Scatter Matrix Diagram

A set of scatter diagram when two or more variables are analyzed for pairwise relationships between the variables in the sample.

New cards

Sample Correlation Coefficient

a quantitative measure of the strength of the linear relationship between two random variables (as x and y).

New cards

Measurement Bias

A sampling method is biased if it tends to give samples in which some characteristic of the population is underrepresented or overrepresented

<p>A sampling method is biased if it tends to give samples in which some characteristic of the population is <strong>underrepresented or overrepresented</strong></p>

New cards

Measurement Bias: Mathematical Representation

New cards

Key Elements of a Survey

❑ Stating Clear Objectives (General and Specific)
❑ Defining Target Population.
❑ Sample Selection Plan (to reduce bias)
❑ Methodof Measurement (to minimize bias) ❑ Pretest/Simulation
❑ Organized Data Collection and Data Management
❑ Thorough Data Analysis
❑ Conclusions (in line to the objectives)

New cards

Sources of Bias in Surveys

Wording Effect Bias - Improper wording of a question
Nonrandom selection of sample
Response Bias – bias that occurs due to the behavior of the interviewer or the respondent.
Nonresponse Bias – bias that occurs when the selected sample refuse to answer
Undercoverage Bias – bias that occurs if a part of the population is left out of the selection process

New cards

Experiment Flowchart: In a Glance

❑ In a designed experiment, a researcher makes deliberate or purposeful change in the variables of the system or process, observes the resulting system output data, and then makes an inference or decision about which variables are responsible for the observed changes in output performance.

❑ Experiments designed with basic principles such as randomization are needed to establish cause-and-effect relationships.

<p>❑ In a designed experiment, a researcher makes deliberate or <strong>purposeful change in the variables</strong> of the system or process, observes the resulting system <strong>output data</strong>, and then makes an inference or decision about which variables are responsible for the <strong>observed changes</strong> in output performance. </p><p>❑ Experiments designed with basic principles such as <strong>randomization</strong> are needed to establish <strong>cause-and-effect relationships.</strong></p>

New cards

Types of Experimental Studies

❑ Comparative Experiment – when a research study investigates the difference in output of two or more than setups.

❑ Factorial Experiment – a research study that investigates different factors (or treatment) towards a certain output measurable as an indicator.- Each factor form a basic experiment of more than two settings

New cards

WERE SO CLOSEEEEEEE;)

🙂 😉 plsssssssssss

New cards

Comparative Experiment

New cards

Factorial Experiment

New cards

Factorial Experiment

New cards

Factorial Experiment

❑ An important advantage of factorial experiments is that they allow one to detect an interaction between factors.

❑ A fractional factorial experiment is a variation of the basic factorial arrangement in which only a subset of the factor combinations is actually tested

$❑ An important advantage of factorial experiments is that they allow one to detect an interaction between factors. ❑ A fractional factorial experiment is a variation of the basic factorial arrangement in which only a subset of the factor combinations is actually tested$

New cards

Key Terminologies in Experimental Study

❑ Factor– a variable whose effect on the response is of interest in the experiment

❑ Levels – the values of a factor used in the experiment

❑ Treatments – the factor-level combinations used in the experiment

❑ Experimental Unit – the smallest unit to which a treatment is applied

New cards

Key Terminologies in Experimental Study

❑ Response (Output) Variable - the outcome of interest to be measured in an experiment. (This is sometimes called the dependent variable.)

❑ Explanatory variable - a variable that attempts to explain differences among responses. (This is sometimes called the independent variable.)

❑ Confounding variable - a variable whose effect on the response cannot be separated from the effect of the treatments.

New cards

Key Terminologies in Experimental Study

❑ Block – a group of homogeneous (similar in characteristics) experimental units.

❑ Interaction Effect – when one factor produces a different pattern of responses at one level of a second factor than it does at another level

❑ Replication– the number of times a treatment appears in an experiment or the number of experimental units to which each treatment is applied in an experiment

New cards

Design of Experiments

❑ A protocol that describes exactly how the experiment is to be done is known as the design of an experiment.

❑ In order to develop an appropriate design for an experiment, first the researcher must identify major sources of variation that possibly affect the response.

New cards

Designs of Experiments

❑ Completely Randomized Design – treatments are assigned randomly to the experimental units

❑ Randomized Block Design – all experimental units are grouped by certain characteristics to form homogeneous blocks, and a completely randomized design is applied within each block.

New cards

DAs all mylove

well done