data 101 quiz 2

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/287

There's no tags or description

Looks like no tags are added yet.

Last updated 7:47 PM on 4/8/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

288 Terms

New cards

A decision tree trained to predict the hired outcome contains the following node:

Experience = Low 800 120 Yes

how many candidates in this node are correctly predicted as hired?

C. 680

New cards

A node in an rpart model summarizing applicants with basic education is given as:

Education = Basic 500 50 Yes

How many individuals in this group are actually not hired?

B. 50

New cards

Consider a classification tree node describing candidates with poor skills:

Skills = Poor 900 180 Yes

How many observations correspond to correct predictions of hiring?

B. 720

New cards

A split in a hiring prediction tree isolates candidates with low GPA:

GPA = Low 300 30 Yes

How many cases in this node are misclassified?

A. 30

New cards

A terminal node of the model is defined for applicants with weak portfolios:

Portfolio = Weak 750 75 Yes

How many candidates are correctly predicted as hired?

B. 675

New cards

In a hiring decision tree, the following node appears:

Internship = None 1000 100 Yes

How many applicants in this group are actually hired?

B. 900

New cards

A node summarizing candidates with poor communication skills is shown below:

Communication = Poor 620 62 Yes

How many candidates in this node are not hired?

A. 62

New cards

The model isolates candidates with low project quality in the following node:

ProjectQuality = Low 410 41 Yes

How many observations are correctly classified as hired?

B. 369

New cards

Suppose a node in the tree is defined as follows:

CodingScore = Low 200 20 Yes

How many candidates in this node are misclassified?

B. 20

New cards

A node describing candidates with weak recommendations is given as:

Recommendation = Weak 850 170 Yes

How many candidates are correctly predicted to be hired?

C. 680

New cards

A decision tree node for medium experience applicants is shown below:

Experience = Medium 400 80 Yes

How many candidates in this group are actually hired?

B. 320

New cards

Let the prediction model be defined by the following vector:

prediction <- rep('NotRated', nrow(imdb))

prediction[imdb$Genre == 'Action' & imdb$Imdb_score < 6.0] <- 'PG'

prediction[imdb$Country == 'Italy'] <- 'PG-13'

prediction[imdb$Imdb_score > 8.0] <- 'R'

What rating will be assigned to US Action movie with imbd score 5.5?

C. PG

New cards

Consider a model that assigns movie ratings according to the following rules:

rating <- rep('NR', nrow(imdb))

rating[imdb$Genre == 'Comedy'] <- 'PG'

rating[imdb$Country == 'France'] <- 'PG-13'

rating[imdb$Imdb_score < 5.0] <- 'R'

What rating is assigned to a French Comedy movie with imbd score 6.5?

B. PG-13

New cards

Assume predictions are generated using the following assignment vector:

label <- rep('NotRated', nrow(imdb))

label[imdb$Imdb_score < 6.0] <- 'PG'

label[imdb$Genre == 'Drama'] <- 'R'

label[imdb$Country == 'Italy'] <- 'PG-13'

What rating is predicted for an Italian Drama movie with imbd score 5.5?

C. PG-13

New cards

The following code defines a classification rule for movies:

outcome <- rep('NR', nrow(imdb))

outcome[imdb$Genre == 'Action' & imdb$Imdb_score < 6.5] <- 'PG'

outcome[imdb$Genre == 'Action' & imdb$Imdb_score >= 6.5] <- 'PG-13'

What rating is assigned to a US Action movie with imbd score 6.0?

A. PG

New cards

A prediction vector is constructed as follows:

decision <- rep('NotRated', nrow(imdb))

decision[imdb$Country == 'Italy'] <- 'PG-13'

decision[imdb$Genre == 'Comedy'] <- 'PG'

decision[imdb$Genre == 'Comedy' & imdb$Imdb_score > 7.5] <- 'G'

What rating will be assigned to an Italian Comedy movie with imbd score 8.0?

A. G

New cards

Suppose the following assignment rules are used:

pred <- rep('NR', nrow(imdb))

pred[imdb$Imdb_score > 8.5] <- 'R'

pred[imdb$Country == 'Germany'] <- 'PG'

What rating is predicted for a German Drama movie with imdb score 9.0?

A. PG

New cards

The model below determines the output rating:

result <- rep('NotRated', nrow(imdb))

result[imdb$Genre == 'Action'] <- 'PG'

result[imdb$Country == 'Italy'] <- 'PG-13'

result[imdb$Imdb_score < 5.0] <- 'R'

What rating is assigned to an Italian Action movie with imdb score 4.0?

C. R

New cards

Consider the following rule-based predictor:

assign <- rep('NR', nrow(imdb))

assign[imdb$Genre == 'Comedy'] <- 'PG'

assign[imdb$Imdb_score > 7.0] <- 'PG-13'

assign[imdb$Country == 'Italy'] <- 'R'

What rating is associated to an Italian Comedy movie with imdb score 8.0?

C. R

New cards

Ratings are produced using this vector definition:

classification <- rep('NotRated', nrow(imdb))

classification[imdb$Imdb_score < 6.0] <- 'PG'

classification[imdb$Genre == 'Drama'] <- 'R'

What rating is assigned to a French Drama movie with imdb score 5.0?

B. R

New cards

The following vector specifies the prediction logic:

yhat <- rep('NR', nrow(imdb))

yhat[imdb$Country == 'Italy'] <- 'PG-13'

yhat[imdb$Genre == 'Action'] <- 'PG'

yhat[imdb$Genre == 'Action' & imdb$Imdb_score > 7.5] <- 'PG-13'

What rating is predicted for an Italian Action movie with imdb score 9.0?

B. PG-13

New cards

Let the prediction model be defined by the following vector:

prediction <- rep('NR', nrow(imdb))

prediction[imdb$Genre == 'Action'] <- 'PG'

prediction[imdb$Country == 'Italy'] <- 'PG-13'

What rating is assigned to a US Drama movie with imdb score 7.0?

B. NR

New cards

Consider the following rule-based assignment:

rating <- rep('NotRated', nrow(imdb))

rating[imdb$Genre == 'Comedy'] <- 'PG'

rating[imdb$Country == 'France'] <- 'PG-13'

What rating is assigned to a German Drama movie with imdb score 6.5?

C. NotRated

New cards

Assume predictions are produced using this vector:

label <- rep('NR', nrow(imdb))

label[imdb$Genre == 'Action' & imdb$Imdb_score < 6.0] <- 'PG'

What rating is predicted for a US Drama movie with imdb score 7.5?

B. NR

New cards

The following code defines a rating model:

outcome <- rep('NotRated', nrow(imdb))

outcome[imdb$Genre == 'Horror'] <- 'R'

outcome[imdb$Country == 'Spain'] <- 'PG-13'

What rating is assigned to a US Comedy movie with imdb score 6.0?

A. NotRated

New cards

A classification vector is defined as follows:

result <- rep('NR', nrow(imdb))

result[imdb$Genre == 'Drama' & imdb$Imdb_score > 8.0] <- 'PG-13'

What rating is assigned to a French Comedy movie with imdb score 6.5?

B. NR

New cards

The model uses the following rules:

assign <- rep('NotRated', nrow(imdb))

assign[imdb$Imdb_score > 9.0] <- 'NC-17'

assign[imdb$Country == 'Italy'] <- 'PG-13'

What rating is assigned to a US Drama movie with imdb score 8.0?

C. NotRated

New cards

We perform 10000 permutation tests. They observed the value of the test statistic is D = 6.5. Out of these permutations, 7,000 produce values less than 5, and 500 produce values greater than 8. What are the best bounds on the p-value?

B. 0.05 <= p <= 0.30

New cards

We perform 10,000 permutation tests. The observed value is D = 4.5. Out of these permutations, 5,000 produce values less than 3, and 1,000 produce values greater than 6. What are the best bounds on the p-value?

B. 0.10 <= p <= 0.50

New cards

We perform permutation tests and observe D = 10. Out of the permutations, 5,000 produce values less than 6, and 700 produce values greater than 9. What is the best upper bound on the p-value?

A. p <= 0.07

New cards

We perform permutation tests and observe D = 8. Out of the permutations, 8,000 produce values less than 6, and 400 produce values greater than 10. What are the best bounds on the p-value?

B. 0.04 <= p <= 0.20

New cards

We perform permutation tests and observe D = 2.5. Out of the permutations, 7,200 produce values less than 4, and 600 produce values greater than 7. What is the best lower bound on the p-value?

D. p >= 0.28

New cards

We perform permutation tests and observe D = 4. Out of the permutations, 4,500 produce values less than 5, and 500 produce values greater than 9. What is the best lower bound on the p-value?

D. p >= 0.55

New cards

We perform permutation tests and observe D = 9. Out of the permutations, 9,000 produce values less than 8, and 300 produce values greater than 10. What are the best bounds on the p-value?

C. 0.03 <= p <= 0.10

New cards

A permutation test returns p = 0.002

What is the minimum number of permutations that must have been run?

A. 500

New cards

A randomization test produced a p-value of 0.002?

What is the smallest number of shuffles that could have been used? (1/.002)

B. 500

New cards

A permutation procedure reports probability value = 0.004

What is the minimum number of permutations consistent with that result?

A. 250

New cards

A shuffling-based hypothesis test gives p = 0.001

What is the fewest number of random permutations that must have been generated?

C. 1000

New cards

In a permutation test, the reported p-value is 0.005

What is the smallest possible number of permutation runs?

B. 200

New cards

A randomization experiment returns p = 0.01

What is the number of shuffled samples that must have been used?

C. 100

New cards

A permutation based calculation yields p = 0.0005

What is least number of permutations required to make that value possible?

C. 2000

New cards

A test based on repeated label shuffling reports p = 0.02

What is the minimum number of permutations that must have been performed?

B. 50

New cards

A permutation method outputs p = 0.0025

What is the smallest number of shuffled trials compatible with that result?

B. 400

New cards

A randomized test reports p = 0.0002

What is the minimum number of permutations that must have been run?

B. 5000

New cards

A permutation test returns p = 0.05

What is the fewest number of permutation samples that could produce this value?

C. 20

New cards

A shuffling test gives p = 0.002

What is the minimum number of random rearrangements needed?

C. 500

New cards

Suppose one group has an average value that is twice as large as another group. When testing whether the first group has a higher mean, what can we conclude?

B. The decision depends on the standard deviation of the difference between the groups

New cards

If the observed mean of Group A is 1.5 times that of Group B, what determines whether the difference is statistically significant?

C. The z-score of the difference in means

New cards

A dataset shows that one category has an average twice as high as another. What must be checked before rejecting the null hypothesis?

B. The standard deviation associated with the difference in means

New cards

Even if the one mean is ten times another, what determines whether we reject the null hypothesis?

B. The variability of the data relative to the difference

New cards

Suppose one group clearly has a larger average than another. What additional factor determines the outcome of a hypothesis test?

A. The standard deviation of the difference between the means

New cards

If the observed difference in means in large, what ultimately determines whether the null hypothesis is rejected?

B. The z-score associated with the difference

New cards

A comparison shows one group has a much higher average than another. What is required to make a statistical decision?

B. Evaluation of how large the difference is relative to its standard deviation

New cards

Even with a strong observed difference between two means, what determines statistical significance?

B. The standardized difference (z-score)

New cards

If the average of one group is twice that of another, what must be considered before concluding that the difference is real?

B. The variability of the difference between groups

New cards

A large difference in means is observed between two groups. What determines whether this difference is statistically meaningful?

B. The standard deviation of the difference between the groups

New cards

Let the observed difference 𝐷∗between the means of IMDB scores of dramas and comedies be 0.5

Suppose we run a permutation test 1000 times, and for 60 permutations we obtain a value of 𝐷larger than 0.8.

Can you estimate the p-value?

A. p >= 0.06

New cards

The observed difference in average ratings is 0.4

After 2000 random permutations, 150 of them produce a difference greater than 0.6

What can we conclude about the p-value?

B. p >= 0.075

New cards

A permutation test has observed difference 1.2

Out of 5000 shuffled datasets, 200 produce a statistic larger than 1.5

Which statement must be true about the p-value?

C. p >= 0.04

New cards

The observed gap between two sample means is 2.5.

In 4000 permutations, 120 generated a value above 3.0.

What is the best guaranteed statement about the p-value?

A. p >= 0.03

New cards

Suppose the observed difference in means equals 0.8.

A permutation procedure is run 10,000 times, and 500 shuffled samples give a value above 1.0.

Which conclusion about the p-value is definitely correct?

B. p >= 0.05

New cards

The observed test statistic is 1.1.

Among 3000 permutations, 90 produce a statistic greater than 1.4.

What must be true about the p-value?

C. p >= 0.03

New cards

An observed mean difference of 4 is reported.

In 20,000 permutations, 600 values exceed 5.

Which lower bound for the p-value is valid?

B. p >= 0.03

New cards

The observed difference between the two groups is 0.25.

Out of 800 permutations, 40 produce a value greater than 0.4.

Which statement is necessarily true about the p-value?

D. p >= 0.05

New cards

A permutation analysis gives an observed statistic of 3.2.

After 5000 permutations, 250 outcomes are larger than 3.8.

What can we safely say about the p-value?

A. p >= 0.05

New cards

The observed difference in averages is 6.

A total of 1000 permutations are run, and 80 of them produce a value above 7.

Which conclusion about the p-value is certain?

B. p >= 0.08

New cards

Suppose the observed statistic is 1.7.

Among 2500 shuffled samples, 125 are larger than 2.1.

What must hold for the p-value?

C. p >= 0.05

New cards

et the observed difference 𝐷∗between the mean prices of German cars and American cars be 5000.

Suppose we run a permutation test 10,000 times, and for 1000 permutations we obtain a value of 𝐷larger than 8000.

Can you provide the best estimate of the p-value?

A. p >= 0.10

New cards

The observed difference in mean salaries between two cities is 12,000.

A permutation test is run 20,000 times, and 3,000 shuffled datasets produce a difference greater than 15,000.

Which statement about the p-value must be true?

B. p >= 0.15

New cards

Suppose the observed gap between two average rents is 800.

Out of 5,000 permutations, 600 generate a value above 1,000.

What is the best conclusion about the p-value?

C. p >= 0.12

New cards

The observed difference between two average test scores is 18.

After 4,000 permutations, 500 results exceed 22.

Which statement is definitely correct?

C. p >= 0.125

New cards

An observed mean difference of 2.5 is reported.

In 10,000 permutations, 2,000 values are larger than 3.0.

What must be true about the p-value?

D. p >= 0.20

New cards

The observed difference in average prices is 90.

Among 2,000 shuffled samples, 300 produce a value greater than 120.

Which is the strongest valid statement about the p-value?

A. p >= 0.15

New cards

Suppose the observed difference in means is 40.

A permutation experiment with 8,000 runs finds 1,200 outcomes above 55.

What can we conclude about the p-value?

B. p >= 0.15

New cards

The observed gap between two group averages is 0.9.

Out of 1,000 permutations, 200 values exceed 1.3.

Which statement must hold?

C. p >= 0.20

New cards

An observed difference in average spending equals 300.

In 6,000 permutations, 900 are greater than 420.

Which is the best estimate statement for the p-value?

D. p >= 0.15

New cards

The observed difference in means is 7.

A total of 3,000 permutations are run, and 450 produce a value larger than 9.

What is definitely true about the p-value?

B. p >= 0.15

New cards

Suppose the observed difference between two sample means is 25.

After 12,000 random permutations, 1,800 results exceed 30.

Which conclusion is valid?

A. p >= 0.15

New cards

Suppose the observed average IMDB score of dramas is higher than the observed average IMDB score of comedies.

What is the correct null hypothesis?

B. The mean IMDB score of dramas is equal to the mean IMDB score of comedies

New cards

The sample average salary in City A is higher than the sample average salary in City B.

Which statement is the correct null hypothesis?

A. The population mean salary in City A is the same as the population mean salary in City B

New cards

In the data, students who studied with music have a higher observed average score than students who studied in silence.

What is the proper null hypothesis?

A. The true mean score is equal in the two groups

New cards

The observed average rent is larger in neighborhood X than in neighborhood Y.

Which option states the correct null hypothesis?

B. The true mean rent is the same in both neighborhoods

New cards

Suppose the sample mean blood pressure is greater for Group 1 than for Group 2.

What should the null hypothesis be?

A. The population means are equal

New cards

The observed average waiting time is higher on weekends than on weekdays.

Which of the following is the correct null hypothesis?

C. The mean waiting time is equal on weekends and weekdays

New cards

In a sample, electric cars have a higher average resale value than gas cars.

What is the appropriate null hypothesis?

C. The two population mean resale values are equal

New cards

The observed mean exam score is greater for students in the front row than for students in the back row.

Which statement is the correct null hypothesis?

C. The true mean scores are equal

New cards

The data show a higher average spending amount for customers using coupons than for customers not using coupons.

What should be used as the null hypothesis?

B. The population mean spending amount is equal for the two groups

New cards

Suppose one treatment group shows a higher observed mean improvement than another treatment group.

What is the correct null hypothesis?

A. The treatment means are equal in the population

New cards

The sample average house price is higher in coastal towns than in inland towns.

Which statement is the right null hypothesis?

C. The true mean house prices are equal

New cards

Suppose you run a permutation test with 1000 permutations. Is it possible for

the test to return p = 0.0035?

B. No

New cards

A permutation procedure is carried out using 500 shuffled samples.

Can the reported p-value equal 0.003?

B. Such a value is not attainable

New cards

You perform a randomization test with 2000 permutations.

Is it possible for the p-value to be 0.0045?

A. Yes this value can occur

New cards

A permutation experiment uses 100 samples.

Could the resulting p-value be 0.035?

C. This value cannot be produced

New cards

Suppose 400 permutations are used in a test.

Is a p-value of 0.0075 possible?

B. Yes it is a valid outcome

New cards

A randomization test is based on 800 permutations.

Can the p-value equal 0.003?

A. No such a value cannot arise

New cards

A permutation method uses 250 shuffled datasets.

Could the p-value be 0.012?

C. This value is possible

New cards

A test relies on 1500 permutations.

Is it possible to obtain a p-value of 0.0035?

B. No

New cards

Using 600 permutations, can a test produce p = 0.005?

B. This value is achievable

New cards

A permutation test is performed with 300 permutations.

Could the p-value be 0.01?

C. This value is attainable

100

New cards

A randomization test uses only 40 permutations.

Is a p-value of 0.125 possible?

B. Yes