L03 - Exploratory Data Analysis

0.0(0)

Studied by 0 people

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/18

Earn XP

Description and Tags

1. What is data? 2. Data Collection 3. Data Quality 4. Use Case: Aircraft Engines 5. Exploratory Data Analysis

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

19 Terms

New cards

Illustration of measurement errors

Errors can be classified into three types that can overlap.

What are the measurement errors?

Measurement errors are samples that do not represent the real physical value. However, only having the sensor’s information creates a difficulty for measurement error classification.

<p><strong><u>Measurement errors</u></strong> are samples that do not represent the real physical value. However, only having the sensor’s information creates a difficulty for measurement error classification.</p>

New cards

What is data quality and meta data?

Data quality: Data are of high quality if they are suitable for their intended use in operations, for decision support and for the planning of those.
Meta Data: Structured information, which describes, explains, localizes, or simplifies in another way the fetch, usage or management of an information source.

New cards

What is the maintenance?

Maintenance is the combination of all technical, administrative and managerial actions during the life cycle of an item intended to retain it in, or restore it to, a state in which it can perform the required function.

New cards

What is remaining useful life-time?

Remaining useful life-time describes the time difference between the forecast start point and the time where the item/component/system is expected to fail.

<p><strong><u>Remaining useful life-time</u></strong> describes the time difference between the forecast start point and the time where the item/component/system is expected to fail.</p>

New cards

Data Understanding - Purposes

Getting to know the data

What attributes make up the data?
What kind of values does each attribute have?
How are the values distributed?

Useful and inevitable step for data preprocessing and modeling

GIGO: garbage in – garbage out

New cards

Attribute Types

New cards

Nominal Attributes

Categorical data
No meaningful order
Not quantitative

<ul><li><p>Categorical data </p></li><li><p>No meaningful order</p></li><li><p>Not quantitative</p></li></ul><p></p>

New cards

Ordinal Attributes

Values have a meaningful order
Difference between successive values not known

<ul><li><p>Values have a meaningful order </p></li><li><p>Difference between successive values not known</p></li></ul><p></p>

New cards

Numeric Attributes

Quantitative values
Values have a meaningful order
Differences between values can be quantified
Values can be discrete or continuous
Interval-scaled attributes:
- Arbitrary zero-point
- Ratios and multiples not meaningful
Ratio-scaled attributes:
- Inherent zero-point
- Ratios and multiples can be quantified

<ul><li><p>Quantitative values </p></li><li><p>Values have a meaningful order </p></li><li><p>Differences between values can be quantified</p></li><li><p>Values can be discrete or continuous</p></li><li><p>Interval-scaled attributes: </p><ul><li><p>Arbitrary zero-point </p></li><li><p>Ratios and multiples not meaningful</p></li></ul></li><li><p>Ratio-scaled attributes: </p><ul><li><p>Inherent zero-point </p></li><li><p>Ratios and multiples can be quantified</p></li></ul></li></ul><p></p>

New cards

Basic statistical descriptions

Central Tendency

Mean, median and mode are different measures of the center of a data distribution.

Mean: average value
- For numeric data
Median: middle value, separates the data into two equal-sized halfs
- For numeric and ordinal data
Mode: value that occurs most frequently
- For numeric, ordinal and nominal data

New cards

Basic statistical descriptions Central Tendency

Which measure should be used?

Mean value is more sensitive to outliers
Median is a better measure for skewed data

<ul><li><p>Mean value is more sensitive to outliers</p></li><li><p>Median is a better measure for skewed data</p></li></ul><p></p>

New cards

Basic statistical descriptions Dispersion of the data

Range, Quantiles

Range = maximum – minimum
Quantiles: split the data into equal-size sets
- Quartiles Q1, Q2 (= median) and Q3
- Percentiles

<ul><li><p>Range = maximum – minimum</p></li><li><p>Quantiles: split the data into equal-size sets</p><ul><li><p>Quartiles Q1, Q2 (= median) and Q3 </p></li><li><p>Percentiles</p></li></ul></li></ul><p></p>

New cards

Basic Statistical Descriptions Visualization of a Data Distribution

5-point descriptions

Minimum
Lower quartile Q1
Median Q2
Upper quartile Q3
Maximum

<ul><li><p>Minimum</p></li><li><p>Lower quartile Q1</p></li><li><p>Median Q2</p></li><li><p>Upper quartile Q3</p></li><li><p>Maximum</p></li></ul><p></p>

New cards

Definition for variance and standard deviation measure how much a data distribution spreads around the aritmetic mean.

Variance and standard deviation measure how much a data distribution spreads around the arithmetic mean.

New cards

Definition of Skewness and Kurtosis

The skewness is a measure for the asimmetry of a data distribution.
The kurtosis is a measure for the tailedness of a data distribution.

<ul><li><p>The skewness is a measure for the asimmetry of a data distribution.</p></li><li><p>The kurtosis is a measure for the tailedness of a data distribution.</p></li></ul><p></p>

New cards

EXPLORATIVE STATISTICS
Correlation, Signal Processing

Cross-correlation and Auto-correlation

Cross correlation is a measure of similarity between a random signal 𝑥(𝑡) and a time-shifted random signal 𝑦(𝑡).

Auto-correlation is a measure of similarity of a random signal 𝑥(𝑡) with its shifted version.

New cards

Correlation Statistical Correlation

Pearson correlation coefficient

The Pearson correlation coefficient is a statistical measure of the strength of a linear relationship between paired data.

New cards

Correlation

Pearson correlation coefficient

Prerequisites:

Linearity
Attributes must be numeric or binary.

<ul><li><p>Linearity</p></li><li><p>Attributes must be numeric or binary.</p></li></ul><p></p>

New cards

The Spearman rank correlation

The Spearman rank correlation coefficient or Spearman’s rho is a statistical measure of the monotonic relationship between two variables.