1/19
This set of flashcards covers key concepts and commands related to descriptive analytics, focusing on measures of central tendency, dispersion, and visualization in R.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
What is the purpose of descriptive statistics?
To describe, summarize, and visualize data.
What are the three measures of central tendency?
Mean, Median, Mode.
How is the Mean calculated?
By dividing the sum of the values by the number of observations.
What is the Median?
The middle value in a sorted dataset.
What does the Mode represent?
The most frequently occurring value in the dataset.
How is the Range calculated?
As the difference between the maximum and minimum values.
What is the Interquartile Range?
The difference between the first (Q1) and third (Q3) quartiles.
What does standard deviation measure?
The variability of data points from each other.
What is frequency distribution?
It describes how often values occur in a dataset.
What does Skewness indicate?
The asymmetry in the distribution of data.
How to create a histogram in R?
Use hist(dataframe$column, main="title", xlab="x-axis label", ylab="y-axis label").
What is a boxplot based on?
The 5 Number Summary (Min, Q1, Median, Q3, Max).
What do outliers represent in a boxplot?
Values that fall below Q1-1.5(IQR) or above Q3+1.5(IQR).
What command in R displays the first n observations of a dataframe?
head(dataframe, n).
What function returns the distinct values of a column in R?
unique(dataframe$column).
What is the full summary function in R used for?
To provide a summary statistics including Min, Q1, Median, Mean, Q3, Max.
How do you check for missing values in calculating mean in R?
Use mean(dataframe$column, na.rm=TRUE) to ignore NA values.
What is tapply used for in R?
To apply a function to subsets of a vector, categorized by another vector.
How do you visualize the relationship between two numerical variables?
By using a scatterplot.
What is a scatterplot meant to illustrate?
The potential relationship between two numerical variables.