1/23
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
cases
the objects described by a set of data
variable
a characteristic of a case
different cases can have different _____ of the variables
values
label
a special variable used in some data sets to uniquely identify different cases
categorical variable
type of variable that places a case into one of several groups or categories (eg. location)
quantitative variable
a type of variable that takes numerical values for which arithmetic operations such as adding and averaging make sense (eg. amount)
converting categorical variables into quantitative variables
letter grades (categorical) can be converted to a numerical value to make a GPA. this is now quantitative because you can average these numbers to make the GPA. it is still important to be careful when making these conversions because the scale is not always perfectly balanced (eg. A and B difference not the same as D and F difference)
you can adjust one variable to create another, like when calculating a ____
rate
vector (in R)
basic data structure, an ordered collection of elements of the same data type.
what to put in comments in R after the #
the why of your actions, because this is hard or impossible to get just from reading the code. the how or the what can be seen even tediously from the code itself. easier to keep up with what you’ve done so far. explain plan and mode of attack and record important insights as you meet them.
<—
reads as gets. it is used to assign a value to a variable or an object
resistant measure
because the mean can not resist the influence of extreme observations (ie. outliers), it is not a resistant measure of center. a resistant measure will limit the influence of outliers, and its value does not respond strongly to changes in a few observations, no matter how large the changes are. sometimes called a robust measure.
n + 1 over 2
how you find where the center of a dataset is to find the median. ie. if there are 24 observations, 25/2= 12.5, so the median will be here between the 12 and 13th observations.
the five number summary
gives a quick summary of both center and spread: the smallest observation (minimum), the first quartile (Q sub 1), the median (M), the third quartile (Q sub 3), and the largest observation (maximum). graphed by the boxplot!
1.5 x IQR rule for outliers
call an observation a suspected outlier if it falls more than 1.5 x IQR below the first quartile or above the third quartile v
modified boxplot
for the 1.5 x IQR rule, the lines of the boxplot are modified.
side by side boxplots
use two or more boxplots in the same graph to compare groups measured on the same variable.
min () and max()
for largest and smallest values
quantile(x, 0.25/0.75)
finds the value of x that is greater than 25% or 75% of those values
IQR()
quantile(x, 0.75) - quantile(x, 0.25)
first (x), last (x), or nth(x, n)
extracts a value at a specific position
filter()
gives you all variables with each observation in a separate row