Descriptive Analytics in Information Systems and Supply Chain Management

Dispersion Measures: Indicate the spread of data points.
- Range: Difference between the maximum and minimum values in the dataset.
- Calculation: ext{Range} = ext{Max} - ext{Min}.
- Interquartile Range (IQR): Difference between first quartile (Q1) and third quartile (Q3); indicates the middle 50% of the data.
- Calculation: ext{IQR} = Q3 - Q1.
- Standard Deviation: Indicates how spread out data points are around the mean.

General R commands for data analysis in R:
- nrow(): Provides total number of rows in a data frame.
- Example: nrow(real_estate) returns the number of rows in the real_estate data frame.
- ncol(): Provides total number of columns in a data frame.
- Example: ncol(real_estate) returns the number of columns in the real_estate data frame.

head(): Displays the first n observations of a data frame.
- Syntax: head(<data frame object>, n=<number of rows to show>)
tail(): Displays the last n observations in a data frame.
- Syntax: tail(<data frame object>, n=<number of rows to show>)

unique(): Returns distinct values from a specified column.
- Example: unique(real_estate$Location)
- Can be used to create a new data frame without duplicates:
- Syntax: <new data frame object name> = unique(<data frame object>)

summary(): Returns essential summary statistics:
- Outputs include Minimum (Min.), First Quartile (1st Qu.), Median (Q2), Mean, Third Quartile (3rd Qu.), Maximum values.
- Together these represent the 5-number summary, useful for boxplots and skewness analysis.
- Syntax: summary(<data frame object>)
To get summary for a specific column:
- Syntax: summary(<data frame name>$<column name>)

by(): Creates summary statistics grouped by a specific column.
- Syntax: by(<data frame object>, <data frame object>$<column>, summary)
- Example: by(real_estate, real_estate$Location, summary) gives summaries for each location.

Frequency Distribution with 'table()':
- Syntax: table(<data frame object>$<column>)
- Example: For Locations: table(real_estate$Location).

tapply(): Tabulates statistics based on categories from another column.
- Syntax: tapply(<data frame object>$<numerical column>, <data frame object>$<categorical column>, <statistical measure>)
- Example: tapply(real_estate$Price, real_estate$Location, mean) averages prices by location.

Example: Modify an entry dynamically:
- real_estate[7,3] = mean(real_estate$PriceUSD)
- This substitutes the value in row 7, column 3 with the calculated mean of 'PriceUSD'.

Barplot: Represents frequencies of a categorical variable.
- Syntax: barplot(table(<data frame object>$<categorical column>), col ="<insert color name>", main="<title of plot>", xlab = "<x-axis label>", ylab= "<y-axis label>")

Histogram: Displays frequency of numerical variables within range classes.
- Classes on the x-axis, frequencies on the y-axis.
- Syntax: hist(<data frame object>$<numerical column>, col = "<insert color name>", main="<title of plot>", xlab = "<x-axis label>", ylab= "<y-axis label>")

Boxplot: Visual representation based on the 5-number summary.
- Displays minima, maxima, quartiles, and outliers.
- Syntax for one numerical variable:
- boxplot(<data frame object>$<numerical column>, col ="<insert color name>", main="<title of plot>", xlab = "<x-axis label>", ylab= "<y-axis label>")
- Syntax for categorical group comparison:
- boxplot(<data frame object>$<numerical column> ~ <data frame object>$<categorical column>, col ="<insert color name>", main="<title of plot>", xlab = "<x-axis label>", ylab= "<y-axis label>", horizontal = TRUE)

Quartiles: Divide data set into four equal parts.
IQR: ext{IQR} = Q3 - Q1
Outliers defined as values below Q1 - 1.5(IQR) or above Q3 + 1.5(IQR).
- Extreme outliers: values below Q1 - 3(IQR) or above Q3 + 3(IQR).

Definition: Used for visualizing relationships between two numerical variables.
- Hypothesized independent variable on x-axis, dependent variable on y-axis.
- Correlation does not imply causation.
Syntax: plot(<data frame object>$<numerical column>, col ="<insert color name>", main="<title of plot>", xlab = "<x-axis label>", ylab = "<y-axis label>")

Positive correlation: Dots trend upwards.
- Higher independent values correspond with higher dependent values.
Negative correlation: Dots trend downwards.
- Higher independent values correspond with lower dependent values.
No relationship: Dots display no clear trend.

Syntax: plot(<data frame object>$<numerical column>, <data frame object>$<numerical column>, col ="<insert color name>", main="<title of plot>", xlab = "<x-axis label>", ylab = "<y-axis label>")