Unit 5 Data Test Review

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/14

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

15 Terms

1
New cards

A photographer stores digital photographs on her computer. In this case the photographs are considered the data. Each photograph also includes multiple pieces of metadata including:

Date: The date the photograph was taken

Time: The time the photograph was taken

Location: The location where the photograph was taken

Device: Which camera the photo was taken with

Which of the following could the photographer NOT do based on this metadata?

a. Filter photos to those taken in the last week

b. Filter photos to those taken in a particular country

c. Filter photos to those taken of buildings

d. Filter photos to those taken with her favorite camera

c. Filter photos to those taken of buildings

Explanation: Since the metadata contains nothing about the content of the images themselves, the photographer could not filter photos to those taken of any particular object, including buildings.

2
New cards

This question refers to the same collection of photos and their metadata described in the previous question.

Due to a computer error, the Time metadata for all of the photographs is accidentally set to the same time of day. The other pieces of metadata and the data itself are not affected by the error. Which of the following is MOST likely to be the result of this problem?

answer choices

a. The photographer will not be able to view any of the images

b. The photographer will not be able to sort images by the date they were taken

c. The photographer will not be able to filter to photographs taken in the morning

d. The photographer will not be able to find images taken on the same date

c. The photographer will not be able to filter to photographs taken in the morning

Explanation:

With all of the metadata about the time the photographs were taken set the same time of day, it will be impossible for the photographer to tell which photos were taken at that time of day and when all the others were truly taken.

3
New cards

The next three questions all refer to the same dataset described below.

A school is conducting a survey of students to learn more about how they get to school. Students were asked how they travel to school, how long it takes them to get to school, what time they arrive at school, and for a description of their most significant challenges when traveling to school. Several rows of the data collected are shown in the table below.

Which column is data will likely be most difficult to visualize or analyze?

answer choices

a. How Travel

b. How Long

c. Time Arrive

d. Biggest Challenges

d. Biggest Challenges

Explanation:

The "Biggest Challenge" column has incomplete data and the data that it does contain is not at all standardized in its format. Both of these factors will make it difficult to visualize and analyze the data from this column.

4
New cards

Which column of data might be able to be visualized or analyzed but first would need to be cleaned?

answer choices

a. How Travel

b. How Long

c. Time Arrive

d. Biggest Challenges

b. How Long

Explanation: After deciding on a standard form for the data in this column, the data could all be converted to this form and then visualized and analyzed.

5
New cards

Which information could likely NOT be found with the data collected?

answer choices

a. Whether students who are new to the school take longer to get to school than other students

b. The average time it takes students to get to school

c. The most common travel mode students use to get to school

d. Whether students who arrive to school later take longer to get to school

a. Whether students who are new to the school take longer to get to school than other students

Explanation: Since there is no information about how long each student has attended the school, we can't make any claims about how long new students take to get to school.

6
New cards

The chart above shows the frequency of searches for the words "pencil" and "pen" over a 5 year period. Which of the following is the best interpretation of the chart above?

answer choices

a. Pens are a better writing tool than pencils

b. More people use pens than pencils

c. The number of pens sold has increased slowly over time

d. At any point in time there are typically more people searching for the word "pen" than "pencil"

d. At any point in time there are typically more people searching for the word "pen" than "pencil"

Explanation: The facts of the matter are related to searches. The other options are either opinions or would require more data than what is shown in the graph.

7
New cards

The mayor of a city is interested in learning what goals are most important for residents of the city. Members of her staff visit one of the many neighborhoods in the city and ask 20 residents to fill out a survey. The mayor is concerned that the survey may be biased and not accurately reflect the overall interests in her town. Which of the following strategies is MOST likely to address concerns about the data being biased or inaccurate

answer choices

a. Finding the same people surveyed previously to ask more detailed questions.

b. Visiting the same neighborhood to collect more survey responses

c. Visiting multiple new neighborhoods to collect more survey responses

d. Having her staff collect the data using an app rather than paper surveys

c. Visiting multiple new neighborhoods to collect more survey responses

Explanation: Compared with all the other options, visiting multiple new neighborhoods will allow the mayor to collect a set of responses more representative of her town as a whole.

8
New cards

In which of the following situations would parallel systems MOST likely be used to help analyze data?

answer choices

a. Data analysis involving two or more columns of data

b. Data analysis involving both string and numeric data

c. Data analysis involving large datasets

d. Data analysis that could result in two or more different types of visualizations

c. Data analysis involving large datasets

Explanation: Since parallel systems are easily scalable, they lend themselves to handing data sets too big to be processed on one computer.

9
New cards

The next two questions refer to the following chart.

The chart below has a single point for every breed of dog. Each point indicates the maximum weight of that breed and the maximum number of years that breed of dog lives to.

Which of the following conclusions is BEST supported by the scatter plot?

answer choices

a. Dog breeds with a higher maximum weight tend to have shorter maximum lifespans

b. Dog breeds with a higher maximum weight tend to have longer maximum lifespans

c. All dog breeds tend to have the same average lifespan

d. Older dogs weigh more than younger dogs

a. Dog breeds with a higher maximum weight tend to have shorter maximum lifespans

Explanation: The points on the left side of the graph tend to be higher than the points on the right side of the graph. You can see this pattern if you add a trendline to the data.

10
New cards

Which of the following conclusions is BEST supported by the scatter plot?

answer choices

a. Most dogs breeds have a maximum weight of 100 lbs or more

b. Most dog breeds have a maximum life span of 10 or fewer years

c. All dog breeds that weigh less than 50 pounds have a maximum lifespan of more than 10 years

d. No dog breeds that weigh more than 150 pounds are expected to live more than 10 years.

c. All dog breeds that weigh less than 50 pounds have a maximum lifespan of more than 10 years

Explanation: In the picture below, you can see there are no data points below 10yrs on the y-axis. All of the other claims are contradicted by the data shown on the graph.

11
New cards

The histogram chart below has a single bar for every state in the United States. Each state is placed in a bucket, or range, based on its area in square miles.

Which of the following conclusions is BEST supported by the histogram chart?

answer choices

a. More than half of all states are larger than 100,000 square miles

b. More than half of all states are smaller than 50,000 square miles

c. Some states are larger than 200,000 square miles

d. The most common size of states is between 50,000 and 100,000 square miles

d. The most common size of states is between 50,000 and 100,000 square miles

Explanation: Histograms show frequency of data in a range or "bucket". In this case, the range that contained the most data (individual states) was 50,000-100,000 square miles.

12
New cards

A restaurant is interested in learning about the food preferences of people living nearby to the restaurant and intends to use survey data to help decide which new items to add to the menu. Which of the following is LEAST likely to be part of the process used to analyze the data?

answer choices

a. Cleaning a data visualization to remove unwanted patterns

b. Iteratively creating visualizations to ask and answer new questions

c. Cleaning data to remove inconsistencies

d. Filtering the data to look at the responses from only certain groups

a. Cleaning a data visualization to remove unwanted patterns

Explanation: While cleaning data in its raw format can be an important part of the Data Analysis Process, the visualizations are not cleaned and/or altered but rather they are interpreted to discover what patterns exist in the data.

13
New cards

A researcher is interested in learning more about the different kinds of plants growing in different areas of the state she lives in. The researcher creates an app that allows residents of the town to photograph plants in their area using a smartphone and record date, time, and location of the photograph. Afterwards the researcher will analyze the data to try to determine where different kinds of plants grow in the state.

Which of the following does this situation best demonstrate?

answer choices

a. Citizen science

b. Crowdfunding

c. Open Data

d. Machine Learning

a. Citizen science

14
New cards

A town decides to publicize data it has collected about electricity usage around the city. The data is freely available for all to use and analyze in the hopes that it is possible to identify more efficient energy usage strategies.

Which of the following does this situation best demonstrate?

answer choices

a. Citizen science

b. Crowdfunding

c. Open Data

d. Machine Learning

c. Open Data

15
New cards

The bank intends to develop the algorithm using machine learning techniques. The algorithm will be trained using data from past loan decisions made by human bankers.

Which of the following best describes whether this algorithm will include bias?

answer choices

a. The algorithm will not be biased because using machine learning eliminates human biases

b. While the algorithm may be biased, the eventual decision made the algorithm will not be

c. The algorithm will likely reflect the human biases in the data used to train it

d. Machine learning algorithms cannot be developed using biased data so if there is bias in the data it will be impossible to develop the algorithm

c. The algorithm will likely reflect the human biases in the data used to train it