L03 - Exploratory Data Analysis

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/18

flashcard set

Earn XP

Description and Tags

1. What is data? 2. Data Collection 3. Data Quality 4. Use Case: Aircraft Engines 5. Exploratory Data Analysis

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

19 Terms

1
New cards

Illustration of measurement errors

Errors can be classified into three types that can overlap.

What are the measurement errors?

Measurement errors are samples that do not represent the real physical value. However, only having the sensor’s information creates a difficulty for measurement error classification.

<p><strong><u>Measurement errors</u></strong> are samples that do not represent the real physical value. However, only having the sensor’s information creates a difficulty for measurement error classification.</p>
2
New cards

What is data quality and meta data?

  • Data quality: Data are of high quality if they are suitable for their intended use in operations, for decision support and for the planning of those.

  • Meta Data: Structured information, which describes, explains, localizes, or simplifies in another way the fetch, usage or management of an information source.

3
New cards

What is the maintenance?

Maintenance is the combination of all technical, administrative and managerial actions during the life cycle of an item intended to retain it in, or restore it to, a state in which it can perform the required function.

4
New cards

What is remaining useful life-time?

Remaining useful life-time describes the time difference between the forecast start point and the time where the item/component/system is expected to fail.

<p><strong><u>Remaining useful life-time</u></strong> describes the time difference between the forecast start point and the time where the item/component/system is expected to fail.</p>
5
New cards

Data Understanding - Purposes

Getting to know the data

  • What attributes make up the data?

  • What kind of values does each attribute have?

  • How are the values distributed?

Useful and inevitable step for data preprocessing and modeling

  • GIGO: garbage in – garbage out

6
New cards

Attribute Types

knowt flashcard image
7
New cards

Nominal Attributes

  • Categorical data

  • No meaningful order

  • Not quantitative

<ul><li><p>Categorical data </p></li><li><p>No meaningful order</p></li><li><p>Not quantitative</p></li></ul><p></p>
8
New cards

Ordinal Attributes

  • Values have a meaningful order

  • Difference between successive values not known

<ul><li><p>Values have a meaningful order </p></li><li><p>Difference between successive values not known</p></li></ul><p></p>
9
New cards

Numeric Attributes

  • Quantitative values

  • Values have a meaningful order

  • Differences between values can be quantified

  • Values can be discrete or continuous

  • Interval-scaled attributes:

    • Arbitrary zero-point

    • Ratios and multiples not meaningful

  • Ratio-scaled attributes:

    • Inherent zero-point

    • Ratios and multiples can be quantified

<ul><li><p>Quantitative values </p></li><li><p>Values have a meaningful order </p></li><li><p>Differences between values can be quantified</p></li><li><p>Values can be discrete or continuous</p></li><li><p>Interval-scaled attributes: </p><ul><li><p>Arbitrary zero-point </p></li><li><p>Ratios and multiples not meaningful</p></li></ul></li><li><p>Ratio-scaled attributes: </p><ul><li><p>Inherent zero-point </p></li><li><p>Ratios and multiples can be quantified</p></li></ul></li></ul><p></p>
10
New cards

Basic statistical descriptions

Central Tendency

Mean, median and mode are different measures of the center of a data distribution.

  • Mean: average value

    • For numeric data

  • Median: middle value, separates the data into two equal-sized halfs

    • For numeric and ordinal data

  • Mode: value that occurs most frequently

    • For numeric, ordinal and nominal data

11
New cards

Basic statistical descriptions Central Tendency

Which measure should be used?

  • Mean value is more sensitive to outliers

  • Median is a better measure for skewed data

<ul><li><p>Mean value is more sensitive to outliers</p></li><li><p>Median is a better measure for skewed data</p></li></ul><p></p>
12
New cards

Basic statistical descriptions Dispersion of the data

Range, Quantiles

  • Range = maximum – minimum

  • Quantiles: split the data into equal-size sets

    • Quartiles Q1, Q2 (= median) and Q3

    • Percentiles

<ul><li><p>Range = maximum – minimum</p></li><li><p>Quantiles: split the data into equal-size sets</p><ul><li><p>Quartiles Q1, Q2 (= median) and Q3 </p></li><li><p>Percentiles</p></li></ul></li></ul><p></p>
13
New cards

Basic Statistical Descriptions Visualization of a Data Distribution

5-point descriptions

  • Minimum

  • Lower quartile Q1

  • Median Q2

  • Upper quartile Q3

  • Maximum

<ul><li><p>Minimum</p></li><li><p>Lower quartile Q1</p></li><li><p>Median Q2</p></li><li><p>Upper quartile Q3</p></li><li><p>Maximum</p></li></ul><p></p>
14
New cards

Definition for variance and standard deviation measure how much a data distribution spreads around the aritmetic mean.

Variance and standard deviation measure how much a data distribution spreads around the arithmetic mean.

<p>Variance and standard deviation measure how much a data distribution spreads around the arithmetic mean.</p>
15
New cards

Definition of Skewness and Kurtosis

  • The skewness is a measure for the asimmetry of a data distribution.

  • The kurtosis is a measure for the tailedness of a data distribution.

<ul><li><p>The skewness is a measure for the asimmetry of a data distribution.</p></li><li><p>The kurtosis is a measure for the tailedness of a data distribution.</p></li></ul><p></p>
16
New cards

EXPLORATIVE STATISTICS
Correlation, Signal Processing

Cross-correlation and Auto-correlation

Cross correlation is a measure of similarity between a random signal 𝑥(𝑡) and a time-shifted random signal 𝑦(𝑡).

Auto-correlation is a measure of similarity of a random signal 𝑥(𝑡) with its shifted version.

17
New cards

Correlation Statistical Correlation

Pearson correlation coefficient

The Pearson correlation coefficient is a statistical measure of the strength of a linear relationship between paired data.

18
New cards

Correlation

Pearson correlation coefficient

Prerequisites:

  • Linearity

  • Attributes must be numeric or binary.

<ul><li><p>Linearity</p></li><li><p>Attributes must be numeric or binary.</p></li></ul><p></p>
19
New cards

The Spearman rank correlation

The Spearman rank correlation coefficient or Spearman’s rho is a statistical measure of the monotonic relationship between two variables.