Statistics and Data Analysis: Key Concepts for Students

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/143

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

144 Terms

1
New cards

Which one of the following activities is not an example of data gathering?

Reaching a conclusion about the results of a reading program

2
New cards

Which of these are categorical data?

The different types of anteaters

3
New cards

Which of the variables you collect are continuous data?

Only average annual income

4
New cards

What are the categorical variables in your survey?

Grade of youngest child, dad's occupation, whether mom works, and kind of pets

5
New cards

True or False: In the survey of your neighbors described above, the only discrete quantitative data you're collecting about your neighbors is family size.

True

6
New cards

Which of the following would most likely be graphed as a bar chart rather than a histogram?

All of the above

7
New cards

Consider a complete table of relative frequencies. The sum of the relative frequency column in such a table must be:

1

8
New cards

Histograms are most useful in displaying:

Large numeric data sets

9
New cards

The relative frequency of students who study 5 - 6 hours a day is:

0.223

10
New cards

The cumulative relative frequency of students who study 4 hours or fewer a day is:

0.615

11
New cards

Estimate the relative frequency of countries with 20 - 22 days of no rainfall in the month of January.

0.32

12
New cards

Which league has the team that had hit the most home runs by midseason, and how many home runs had that team hit?

American, 149

13
New cards

True or False: The number of home runs made by most National League teams were in the 90s; the number of home runs made by most teams overall were in the 80s.

True

14
New cards

How many fewer home runs did the team with the fewest home runs in the American League have than the team with the fewest home runs in the National League?

14

15
New cards

How many more home runs did the team with the highest home runs in the American League have than the highest team in the National League?

33

16
New cards

A symmetric distribution can't have which of the following characteristics?

A long tail on one side

17
New cards

True or False: If the population of interest is all day care centers in the United States, a sample of day care centers could be either all day care centers in New York City or a randomly selected group of day care centers throughout the US. Either sample is equally good.

False

18
New cards

The kind of sampling strategy least likely to produce statistics that are good estimates of population parameters is a:

haphazard sample

19
New cards

Consider these eight observations: {11,6,2,5,8,4,4,9}. What is the mean?

6.125

20
New cards

Consider these eight observations: {11,6,2,5,8,4,4,9}. What is the median?

5.5

21
New cards

Since the distribution of housing prices in a community is usually skewed right, which measure of center should you use for housing prices?

Median

22
New cards

Six radio listeners are surveyed. Their favorite FM stations are: 89.1, 89.1, 89.1, 94.7, 94.7 and 104.3. Based on these data, you want to name the favorite station of a typical listener. You should name:

The mode, which is 89.1

23
New cards

Calculate, to the nearest whole number, the sample standard deviation of this data set: {71, 75, 65, 73, 69, 77, 67}.

4

24
New cards

All the following statements about the sample standard deviation are true, except:

the standard deviation is negative when there are extreme values in the sample.

25
New cards

The box plot below represents a distribution that is:

right skewed

26
New cards

The median points per player per game for each team are:

Boston = 10.0; Chicago = 9.0; Seattle = 8.9.

27
New cards

Create a modified box-and-whisker plot for each of the three data sets. Which team has the greatest variation in the points per game for the middle 50% of their observations?

Boston

28
New cards

The following hypothetical data set shows the purchase price (in thousands) for a sample of 3-bedroom, 2-bathroom homes in Essex County, MA over the past year. How many outliers are present in this distribution?

3

29
New cards

What is the median for the data in the table?

306.5

30
New cards

What is the upper quartile for the data in the table?

319.5

31
New cards

What is the lower quartile for the data in the table?

294

32
New cards

What is the interquartile range for the data in the table?

25.5

33
New cards

Based on the data in the table, what is the largest number of births in a month you could possibly have and still have a lower outlier?

255

34
New cards

Why is the IQR considered to be a resistant statistic?

Adding a new extreme observation has little effect on it.

35
New cards

Which measure of central tendency and measure of variation should be used with a normally distributed distribution?

The mean and standard deviation

36
New cards

Which of the following can stemplots show?

I, II, III, and IV

37
New cards

Which of the following can outliers affect significantly?

I, III, and IV

38
New cards

Give, in millimeters, a minimum and maximum thickness that includes 68% of the population of bolts.

19.99 to 20.01 millimeters

39
New cards

Give, in millimeters, a minimum and a maximum thickness that will include 95% of the population of bolts.

19.98 to 20.02 millimeters

40
New cards

Assume that normal curve A and normal curve B have identical population means. Which curve is taller, and why?

Curve B is taller because smaller standard deviations produce thinner curves.

41
New cards

Using the empirical rule, you can assume that what percent of the normal distribution is outside two standard deviations of the mean in both directions?

5%

42
New cards

For population W, find the z-score associated with a weight of 120 pounds.

z = -1.6

43
New cards

For population W, what is the percentile for the weight 160 pounds?

50th percentile

44
New cards

In population W, what is the probability that a randomly selected subject will weigh between 140 and 180 pounds?

0.576

45
New cards

In population H, what is the height, to the nearest tenth of an inch, of the 70th percentile?

67.3 inches

46
New cards

In population H, what is the z-score, to the nearest tenth, associated with the height 65 inches?

z = -0.4

47
New cards

To the nearest whole number, what percentile is associated with z = -0.68?

25th percentile

48
New cards

To the nearest whole number, what percentile is associated with z = 1.2?

88th percentile

49
New cards

What area, to the nearest whole percent, of the normal curve is located between z = -0.6 and z = 1.4?

64%

50
New cards

What percentage of applicants scored between 500 and 700?

60%

51
New cards

What percentage of applicants scored above 450 on the GRE?

82%

52
New cards

What percentage of applicants had a GRE score below 625?

78%

53
New cards

What is the GRE score at the 77th percentile?

620

54
New cards

Find the GRE score at the upper quartile, Q3.

613

55
New cards

Find the z-score for the lower quartile of any normal curve.

-0.67

56
New cards

Consider a normal distribution with μ = 65 and σ = 4. A sample of size 950 is drawn from this population. Approximately how many of the 950 cases would you expect to find between 57 and 73?

903

57
New cards

A normal probability plot:

graphs raw scores against z-scores of percentile ranks

58
New cards

What does a normal probability plot of 30 data points indicate?

It provides evidence that the original data is normally distributed.

59
New cards

Which statement about the normal distribution is false?

The normal curve crosses the x-axis at z-scores above 3.0 and below -3.0.

60
New cards

What is the proper notation for a normal distribution with a mean of 250 and standard deviation of 25?

N(250, 25)

61
New cards

What is true about the areas under the normal curve?

Fewer than one percent of the cases are located three standard deviations above or below the mean.

62
New cards

In a normal distribution with a mean of 30 and a standard deviation of 5, where is the largest proportion of cases found?

Between x = 25 and x = 35.

63
New cards

What can be inferred if the area above a given value of x is 0.35 in a normal distribution?

None of the above.

64
New cards

What does a normal curve table indicate for z = -1?

The probability lying below z = -1 is 0.1587.

65
New cards

In which population is the area under the normal curve greatest?

Area above 65 in Population A.

66
New cards

How many crimes a month are predicted with 7 streetlights on a block?

1.0

67
New cards

How do you calculate the residual for a block with 10 streetlights and 1 crime a month?

0.6

68
New cards

What does the slope of the regression equation indicate?

For every additional streetlight per block, the crimes per month decrease by 0.2.

69
New cards

True or False: On the least-squares regression line, the point (x, y) always has a residual of 0.

True

70
New cards

What is the least-squares regression line?

y = 1.2 + 0.3x

71
New cards

How do you calculate the correlation coefficient?

0.

72
New cards

True or False: All positive correlations indicate stronger relationships than all negative correlations.

False

73
New cards

What determines the sign of r?

Whether the value of y increases or decreases as the value of x increases.

74
New cards

If all sample data points are on the same line with a positive slope, what would r be?

r would be +1.0.

75
New cards

What does an r^2 of 0.85 in a bivariate scatterplot mean?

85% of the variation in y is explained by the changes in x.

76
New cards

What is the proper interpretation of the coefficient of determination for data set A: (2,8), (3,6), (4,9), (5,9)?

Thirty percent of the variation in the y-values can be explained by variation of the x-values.

77
New cards

What is a residual?

It is how much an observed y-value differs from a predicted y-value.

78
New cards

What is an outlier?

It usually has a strong effect on the correlation coefficient and regression line and can also be an influential point.

79
New cards

True or False: An r of -1.0 proves a strong cause and effect relationship between x and y.

False

80
New cards

What should be done with influential points and outliers?

They should be examined carefully to determine if they're part of the data set.

81
New cards

What does a linear regression line indicate in terms of CuSO₄ dissolution?

For each one-degree rise in temperature, you can dissolve 0.51 more grams of CuSO₄.

82
New cards

What is the explanatory variable in a scatterplot?

The variable along the horizontal axis.

83
New cards

Which indicates the strongest relationship between two variables?

r^2 = 0.23.

84
New cards

What is the goal of the least-squares regression?

To compute a line that minimizes the sum of the squared residuals.

85
New cards

What is the predicted test score for an individual who studies 8 hours a day?

95

86
New cards

What is it called when predicting a test score for a student who studied 8 hours, exceeding the maximum sample hours?

Extrapolation.

87
New cards

What is the marginal distribution for crime rate?

51, 56, 43.

88
New cards

What is the conditional distribution of temperature by above average crime rate?

0.12, 0.56, 0.33.

89
New cards

Which table shows the aggregation of the data?

Remissions | Death | Rate: A 9 | 7 | 0.563, B 11 | 8 | 0.579.

90
New cards

What is true about success rates for early and advanced cancers?

When kept separate, Treatment A has a higher success rate, but combined, Treatment B has a higher rate.

91
New cards

What is the lurking variable when data for early and advanced cancers are combined?

Stage of cancer (Early or Advanced).

92
New cards

Which treatment is actually more effective?

Treatment A.

93
New cards

Which situation probably does not have a lurking variable?

None of the above.

94
New cards

What is a characteristic of a census?

It gathers data from every member of a population.

95
New cards

What might be the treatment in an experimental study on vitamin C and flu recovery?

The amount of vitamin C taken per day: 0 mg, 1000 mg, 2000 mg, or 3000 mg.

96
New cards

Why might a count estimated from random samples be more accurate than a census?

A census often can't find every population member, so some groups are often under-represented.

97
New cards

What is the best representative sample of the adult population in the United States?

Simple random sample of 1000 adults from across the country.

98
New cards

What is a block in research?

A group of subjects that are similar in some way known to affect the response to the treatment.

99
New cards

What is replication in research?

The policy of repeating an experiment on different subjects to reduce chance variation.

100
New cards

What is double-blind in research?

A design in which neither the experimenter nor the subject knows who is in the treatment group.