STA 2023 Exam 1 Study Guide (University of Florida)

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/140

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

141 Terms

1
New cards

Statistics

the art and science of learning from data, which involves data collection, organization, and interpretation.

2
New cards

Design

the process of determining how to get data

3
New cards

Description

the process of organizing data in a meaningful way, with numerical summaries or graphs.

4
New cards

Inference

involves making some conclusion about the population of interest based on a random, representative sample

5
New cards

Population

all of the subjects we're interested in

6
New cards

Sample

all of the subjects for whom we have data

7
New cards

Statistic

a numerical summary of the data for a SAMPLE

8
New cards

Parameter

a numerical summary of the data for a POPULATION

9
New cards

Categorical variables

variables that do not take on a meaningful numerical value (ie. gender, race, major)

10
New cards

Are zip codes quantitative variables?

No, they are categorical. They cannot be meaningfully averaged together.

11
New cards

Quantitative variables

variables that take on meaningful numerical values, which can be described in terms of center and spread

12
New cards

Discrete variables

a type of quantitative variable that can take on a finite list of outcomes (ie. the number of defectives)

13
New cards

Continuous variables

a type of quantitative variable that can take on an infinite list of outcomes (ie. height and weight).

14
New cards

Which graphical summaries are appropriate for categorical variables?

bar charts and pie charts

15
New cards

Which graphical summaries are appropriate for quantitative variables?

dotplots, stemplots, and histograms

16
New cards

On stemplots, what does the number with the parentheses mean?

it means that the median is in that row

17
New cards

"Mound" or "bell-shaped" distribution

where most of the values are concentrated in the middle—unimodal and roughly symmetric

<p>where most of the values are concentrated in the middle—unimodal and roughly symmetric</p>
18
New cards

"Uniform" or "rectangular distribution"

a distribution in which all of the options are equally likely

<p>a distribution in which all of the options are equally likely</p>
19
New cards

"bimodal" distribution

a distribution with two modes, which are represented by two big bumps in the associated histogram

<p>a distribution with two modes, which are represented by two big bumps in the associated histogram</p>
20
New cards

"skewed left" distribution

a distribution with a tail extending to the left, where most of the observations are clustered around the higher values

<p>a distribution with a tail extending to the left, where most of the observations are clustered around the higher values</p>
21
New cards

"skewed right" distribution

a distribution with a tail extending to the right, where most of the observations are clustered around the lower values

<p>a distribution with a tail extending to the right, where most of the observations are clustered around the lower values</p>
22
New cards

mean

the average of the observations (x bar)

23
New cards

median

the middle observation

24
New cards

mode

the most common observation

25
New cards

position formula

a formula used to find the position of the median on the data set

<p>a formula used to find the position of the median on the data set</p>
26
New cards

Is the mean resistant to outliers?

No, it is highly susceptible to outliers

27
New cards

Is the median resistant to outliers?

Yes, the median is resistant to outliers.

28
New cards

Is the mode resistant to outliers?

Yes, the mode is resistant to outliers.

29
New cards

Range

the difference between the largest and smallest values in a data set

30
New cards

Variance

(s) the average of the squared deviations from the mean

31
New cards

Standard deviation

the square root of the variance, expressed in linear units.

32
New cards

Can the standard deviation be negative or zero?

No, never negative, only zero when all of the numbers are the same.

33
New cards

What percentage of observations fall within one standard deviation of the mean?

68%, empirical rule

34
New cards

What percentage of observations fall within two standard deviations of the mean?

95%, empirical rule

35
New cards

What percentage of observations fall within three standard deviations of the mean?

99.7%, empirical rule

36
New cards

Lower quartile (Q1)

the 25th percentile; the median of the lower half of the data set; 25% of the data is lower than this value

37
New cards

Upper quartile (Q3)

the 75th percentile; the median of the upper half of the data set; 75% of the data is lower than this value

38
New cards

Is the range resistant to outliers?

No, the range is strongly affected by outliers.

39
New cards

Interquartile range (IQR)

a measure of spread; describes how spread out the central 50% of data is; (Q3-Q1)

40
New cards

Five number summary of positions

the minimum value, lower quartile, median, upper quartile, and maximum value

41
New cards

Boxplot

a graphical way to present the five number summary of positions

<p>a graphical way to present the five number summary of positions</p>
42
New cards

What do the asterisks at the end of the whiskers in a boxplot stand for?

outliers

<p>outliers</p>
43
New cards

What does a skewed right boxplot look like?

the box is more in the lower numbers

44
New cards

What does a skewed left boxplot look like?

the box is more in the higher numbers

45
New cards

What does a roughly symmetric boxplot look like?

the box is in the middle of the numbers

46
New cards

Explanatory variable

the independent variable, what we manipulate (also called predictor variable; X)

47
New cards

Response variable

the dependent variable, what we're trying to make a statement about (Y)

48
New cards

Contingency tables are used to describe the association between...?

two categorical variables

49
New cards

Scatterplots are used to describe the association between...?

two quantitative variables

50
New cards

Correlation coefficient

r; a quantitative measure that tells us about the strength and direction of a linear relationship between two variables

51
New cards

the correlation coefficient can vary between...?

-1.00 (perfect negative correlation) and +1.00 (perfect positive correlation)

52
New cards

Coefficient of determination

R^2; equal to the square of correlation, tells us what percentage of variability in the y value can be explained by the linear regression on x.

53
New cards

When do we not interpret the vertical intercept of a regression line?

when it would not make sense to have data around x=0

54
New cards

What is the general equation of the least-squares regression line?

yhat= a + bx

55
New cards

In the general equation of the least-squares regression line, what does a mean?

the vertical intercept

56
New cards

In the general equation of the least-squares regression line, what does b mean?

the slope

57
New cards

In the general equation of the least-squares regression line, what does yhat mean?

the predicted equation for a line of best fit in linear regression

58
New cards

Residual

the difference between the observed value and the predicted value; y-yhat

59
New cards

Least squares regression method

a regression method that fits a line to the data by minimizing the squares of the residuals

60
New cards

Does a least-squares regression line pass through the point (xbar, ybar)?

Yes, the LSRL goes through that point.

61
New cards

Extrapolation

Using a trend in a data set to predict future values that lie outside the range of the data

62
New cards

Influential outlier

an outlier that lies so far outside the rest of the data that it causes a major change in the correlation coefficient, coefficient of determination, and the least-squares regression line.

63
New cards

Lurking variable

an extraneous variable that influences the association between the variables we're interested in

64
New cards

Confounding

occurs when the effects of two variables on the response variable are so intertwined that they cannot be separated

65
New cards

Simpson's paradox

when the association between two categorical variables is reversed upon the addition of a third variable into the mix

66
New cards

Experiment

involves dividing subjects up and assigning experimental treatments to them

67
New cards

Observational study

involves simply witnessing what's happening, without assigning treatments to specific groups

68
New cards

Volunteer sample

Sample that consists of people who volunteered to participate, rather than those randomly selected

69
New cards

Convenience sample

Sample that has people who were selected not at random, but rather because selecting them was easy.

70
New cards

Random sampling

a practice in which every set of individuals has an equal chance of being selected because they're chosen by chance.

71
New cards

What are the three methods for delivering a survey?

personal interviews, telephone interviews, and questionnaires

72
New cards

margin of error

one over the square root of n; accounts for the fact that the random sample may not be representative of the whole population

73
New cards

Undercoverage

occurs when the sampling frame is missing certain parts of the population

74
New cards

Nonresponse bias

occurs when some people are unwilling to participate in a survey, and those people may have different positions on relevant issues than those who participated

75
New cards

Response bias

occurs when a person who response to a survey gives false information, either intentionally or unintentionally

76
New cards

Experimental units

the individuals or subjects involved in the experiment

77
New cards

Placebo

a treatment with no active ingredient, given to control for the psychological effects of simply receiving a treatment

78
New cards

Blind study

a study in which the subject does not know which treatment they are getting

79
New cards

Double blind study

a study in which neither the subject nor the person administering the treatment or making the measurement knows which treatment was given

80
New cards

Control group

in an experiment, the group gets either the placebo treatment or no treatment at all

81
New cards

Replications

the number of people to get each treatment

82
New cards

Factors

an experiment's categorical explanatory values (x values, things being changed)

83
New cards

Levels

the different alternatives available for each factor

84
New cards

Treatments

a combination of factors and levels

85
New cards

Matched pairs design

an advanced form of experimentation in which similar experimental units (such as twins) are matched and each receives a different treatment

86
New cards

Cross-over design

an advanced form of experimental design in which the same experimental unit is given different treatments at different times

87
New cards

Block design

an experimental design similar to a matched pairs design, except blocks of three or more experimental units are used (instead of two experimental units)

88
New cards

Cross-sectional study

Takes a 'snapshot' in time, with observations of the here-and-now.

89
New cards

Case-control study

involves a retrospective look at the differences between one group of people who had a positive outcome and another group of people who had a negative outcome.

90
New cards

Prospective study

a forward-looking study in which experimental units are identified and followed into the future

91
New cards

Probability (of a random event)

the proportion of occurrences of that outcome in an extremely long series of independent trials

92
New cards

What does it mean to say that trials are independent?

the outcome of one trial is not affected by the outcome of other trials

93
New cards

Sample space

the set of all possible outcomes

94
New cards

Event

a particular outcome or group of outcomes (a subset of the sample space)

95
New cards

Complement rule

the probability that an event will NOT happen is one minus the probability that the event WILL happen.

96
New cards

Disjoint events

two events that do not share any outcomes in common

97
New cards

Multiplication rule

the probability of two INDEPENDENT events occurring is the product of their probabilities; P(A and B) = P(A) x P(B)

98
New cards

Conditional probability

the probability that one event will occur GIVEN that another has occurred.

99
New cards

How can you read the probability P(A | B)?

The probability that A will occur GIVEN that B has occurred.

100
New cards

False positive

a case in which a test says that the subject has the condition we're testing for when he or she really doesn't

Explore top flashcards

PE - Body systems
Updated 62d ago
flashcards Flashcards (49)
Party factions
Updated 976d ago
flashcards Flashcards (25)
Vocab 2A
Updated 477d ago
flashcards Flashcards (47)
Bio Unit 3: DNA
Updated 849d ago
flashcards Flashcards (38)
1984 Vocabulary
Updated 107d ago
flashcards Flashcards (20)
PE - Body systems
Updated 62d ago
flashcards Flashcards (49)
Party factions
Updated 976d ago
flashcards Flashcards (25)
Vocab 2A
Updated 477d ago
flashcards Flashcards (47)
Bio Unit 3: DNA
Updated 849d ago
flashcards Flashcards (38)
1984 Vocabulary
Updated 107d ago
flashcards Flashcards (20)