ML

Chapter 1 Definitions

Individual: An object described in a set of data. Individuals can be people, animals, or things

Variable: An attribute that can take different values for different individuals

Categorial variable: Assigns labels that place each individual into a particular group, called a category

Quantitative variable: Takes number values that are quantities----counts or measurements

Univariate Data: If we only take one measurement on each object

Bivariate data: Two measurements on each object

Discrete variables: These result from counting something. A quantitative variable that takes a fixed set of possible values with gaps between them.

Continuous variables: It results from measuring something. A quantitative variable that can take any value in an interval on the number line.

Distribution: This tells us what values the variable takes and how often it takes those values

Frequency table: Shows the number of individuals having each value. The frequency of a value is the number of times that observation occurs.

Relative frequency table: Shows the proportion or percent of individuals having each value. The relative frequency of a value is the ratio of the frequency to the total number of observations.

Cumulative Frequency: It gives the number of observations less than or equal to a specified value.

Frequency Distribution Table: A table giving all possible values of a variable and its frequencies.

Bar Graph: It shows each category as a bar. The heights of the bars show the category frequencies or relative frequencies. They display categorical data and will have space between each other.

Pie chart: It shows each category as a slice of the pie. The area of the slices is proportional to the category frequencies or relative frequencies. They also display amounts and frequencies in a set of data.

Two-way table:  It is a table of counts that summarizes data on the relationship between two categorical variables for some group of individuals.

Marginal Relative Frequency: It gives the percent or proportion of individuals that have a specific value for one categorical variable

Joint Relative Frequency: It gives the percent or proportion of individuals that have a specific value for one categorical variable and a specific value for another categorical variable

Conditional Relative Frequency: It gives the percent or proportion of individuals that have a specific value for a categorical variable among individuals who share the same value of another categorical variable

Side by Side Bar Graph: It displays the distribution of a categorical variable for each value of another categorical variable. The bars are grouped based on the values of one of the categorical variables and placed side by side

Segmented Bar Graph: It displays the distribution of a categorical variable as segments of a rectangle, with the area of each segment proportional to the percent of individuals in the corresponding category.

Mosaic Plot: A modified segmented bar graph in which the width of each rectangle is proportional to the number of individuals in the corresponding category

Association: There is an association between two variables if knowing the value of one variable helps us predict the value of the other variable. If knowing the value of one variable does not help us predict the value of the other, then there is no association between the variables.