Exploring Data

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/48

flashcard set

Earn XP

Description and Tags

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

49 Terms

1
New cards

descriptive methods

the different methods for organizing and summarizing collected data, including tabular methods, graphical methods, and numerical methods

2
New cards

categorical/qualitative variables

variables that place the individuals being studied into one of several groups or categories

3
New cards

numerical/quantitative variables

variables that have outcomes that can be analysed using arithmetic operations

4
New cards

univariate data

data with only one measurement on each object

5
New cards

bivariate data

data with two measurements on each object

6
New cards

frequency

the number of times that an observation occurs, usually denoted with f

7
New cards

relative frequency

the ratio of the frequency (f) to the total number of observations (n), usually denoted by rf, and rf = f/n

8
New cards

cumulative frequency

the number of observations less than or equal to a specified value, usually denoted by cf

9
New cards

frequency distribution table

a table giving all possible values of a variables and their frequencies

10
New cards

center of a distribution

the “typical” or central data point, measured in several ways, including mean, median, and mode

11
New cards

spread of a distribution

how far the data points are from the center, measured through the range, standard deviation, or variance

12
New cards

shape of a distribution

tells where most of the data is, can be symmetric or skewed

13
New cards

symmetric distribution

when the left half of the distribution is approximately a mirror image of the right half, meaning that the data is spread out in the same way on both sides, with the same amount of data on both sides of the center

14
New cards

skewed distribution

when there are extreme values in only one direction that causes one side to have a longer tail, being right-skewed if the tail is on the right, and left-skewed if the tail is on the left

15
New cards

outliers

an observation that is surprisingly different from the rest of the data

16
New cards

stem in stemplot

the left-most part of each observation

17
New cards

leaf in stemplot

the remaining part of each observation, excluding the left-most part

18
New cards

percentage frequency/relative frequency

the frequency of an observation in relativity to the whole sample

19
New cards

population

the entire group of individuals or things

20
New cards

sample

the part of the population that is studied

21
New cards

mean

the average value in a data set. nonresistant and affected by extreme or outlier measurements. for a population, denoted by μ, and for a sample, denoted by x̄

22
New cards

median

the point that divides the measurements in half. resistant and not affected by extreme or outlier measurements, better to use for skewed data or data sets with outliers. sometimes denoted as M

23
New cards

range

the difference between the largest and smallest measurement in a data set, not reliable as it depends on the two extreme measurements

24
New cards

interquartile range (IQR)

the range of the middle 50% of the data, or the difference between the third and first quartiles. resistant and not affected by extreme or outlier measurements

25
New cards

standard deviation

a measure of variation that takes every measurement into account. nonresistant and affected by extreme or outlier measurements

26
New cards

variance

the square of the standard deviation

27
New cards

percentiles

the division of a set of values into 100 equal parts

28
New cards

quartiles

the division of a set of values into four equal parts by using the 25th, 50th, and 75th percentiles

29
New cards

standardized scores/z-scores

(Observed measurement - mean) / standard deviation

30
New cards

linear regression

a model to measure the strength of the relationship between two quantitative variables with a linear relation

31
New cards

Pearson’s correlation coefficient

a numerical summary measure calculated to represent the linear dependence of two variables between -1 and 1. the further away from 0, the stronger the relationship

32
New cards

scatterplot

a graphical summary measure used to describe the nature, degree, and direction of the relation between two variables x and y, where (x, y) gives a pair of measurements

33
New cards

linear regression model equation

Y = α + βX where Y is the response variable, X is the explanatory variable, α is the y-intercept, and β is the slope

34
New cards

predicted value of y

ŷ = a + bx

35
New cards

least-squares regression line

a line that minimizes the sum of the squares of the residuals, otherwise known as the line of best fit. the line will always pass through the point (X̄, Ȳ) and will always have the slope β1 = (r) [Sy/Sx]

36
New cards

coefficient of determination

measures the percent of variation in Y-values explained by the linear relation between X- and Y-values. denoted by R2, which is equal to the square of the correlation coefficient. always between 0 and 1.

37
New cards

random error

a measure of how wrong the predicted values were from the measured values - denoted with ε

38
New cards

influential observation

an observation that strongly affects a statistic

39
New cards

residual plot

a plot of residuals versus the predicted values of Y

40
New cards

transformation

a change made to the equation for variables to make a linear form

41
New cards

log transformation

Z = ln(Y) used to linearize the regression model when the relationship between Y and X suggests a model with a consistently increasing slope

42
New cards

square root transformation

Z = Y = Y1/2 used to linearize the regression model when the spread of observations increases with the mean

43
New cards

reciprocal transformation

Z = 1/Y1 used to minimize the effect of large values of X

44
New cards

square transformation

Z = Y2 used when the slope of the relation consistently decreases as the independent variable increases

45
New cards

power transformation

ln(Y) and ln(X) used if the relation between dependent and independent variables is modeled by Y = aXb

46
New cards

contingency table

a table of data classified by r categories of classification criteria 1 and c categories of classification criteria 2

47
New cards

marginal frequency

the frequency with which each category occurs

48
New cards

conditional relative frequency

the relative frequency of one category given the other category has occurred

49
New cards

association

a measurement of relation between two categorical variables