Pt 2 Technology and Values

0.0(0)

Studied by 0 people

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/20

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

21 Terms

New cards

"flights" is a dataset containing observations of flights from the NYC area in 2013. The attribute "origin" indicates the airport of departure. The attribute "dest" indicates the airport of arrival. The attributes "month" and "day" indicate the date of the flight. The R/dplyr code: flights %>% filter(origin "EWR") %>% filter(dest "MSP") returns all of the flights that departed from Newark airport (which has the airport code "EWR") to Minneapolis-St Paul International Airport (airport code "MSP").

True

False

True

New cards

In R, running the code 2 = 3 returns:

An error.

a dataset.

FALSE

TRUE

An error

New cards

In general, dplyr "verb" functions, such as filter() or summarise():

take the "verb" to be performed to the dataset as the first argument and a dataset as the second argument.

take a comparison as a first argument and a verb as a second argument.

cannot be used with the pipeline operator.

take a dataset as a first argument, with subsequent arguments indicating the "verb" to be performed to the dataset.

New cards

Deriving attributes refers to:

enhancing financial data by integrating external data sources, such as market data, economic indicators, or industry benchmarks.

creating new attributes or variables based on existing data.

identifying and eliminating duplicate records within the data to avoid double-counting or erroneous analysis.

applying conditions to include or exclude specific data based on predefined criteria or business rules.

creating new attributes or variables based on existing data.

New cards

In R, factors are useful for sorting rows alphabetically.

False

True

False

New cards

Filtering a dataset is primarily useful for:

Combining multiple values into a single value, such as a sum or average.

Performing statistical analysis on the relationships between data items.

Selecting observations based on the value of a data item.

Modifying values or creating new ones.

Selecting observations based on the value of a data item.

New cards

In R, running the code 1 == 1 returns:

FALSE

An R object called "1" that is equal to 1.

TRUE

An error.

True

New cards

In R and its packages, you will more often refer to data by location than by name or its characteristics.

True

False

New cards

In the ETL process, the L represents:

loading data into a target system or data warehouse for storage and analysis.

a set of procedures used to extract data from various sources, transform it into a consistent format, and load it into a target system or data warehouse for further analysis and reporting.

transforming data to ensure consistency, accuracy, and compatibility across different sources.

extracting financial data from various source systems.

loading data into a target system or data warehouse for storage and analysis.

New cards

Relational programming can refer to:

Code based on the relational database management system, in which data is organized into tables with rows and columns and relationships between tables are established using keys.

Code that is organized around objects, which encapsulate data and behavior together.

Code in which computations are expressed as the evaluation of mathematical functions. Emphasis is placed on writing pure functions that do not have side effects and are deterministic.

Code based on the constraints of logic, in which the program is asked to identify relations that satisfy the constraints.

Code based on the relational database management system, in which data is organized into tables with rows and columns and relationships between tables are established using keys.

New cards

lc %>% select(contains("grade")) This R code will:

Count the number of observations (rows) in lc for which the attribute "grade" is not missing.

Subset observations (rows) from lc for which the attribute "grade" is not missing.

Subset attributes (columns) by name.

Collapses a dataset into a single row of values.

Subset attributes (columns) by name.

New cards

Aggregation involves:

implementing validation checks to ensure the integrity and consistency of financial data, including verifying data types, ranges, and constraints.

identifying and handling outliers that may skew financial analysis.

identifying and eliminating duplicate records within the data to avoid double-counting or erroneous analysis.

summarizing data by aggregating values at various levels such as time periods and organizations.

New cards

An inner join of datasets X and Y will combine:

all observations from Y and matching observations from X, with missing values inserted as necessary.

all matching observations between X and Y.

all observations from X and matching observations from Y, with missing values inserted as necessary.

all observations from X and all observations from Y, with missing values inserted as necessary.

all matching observations between X and Y.

New cards

Financial data possesses several dimensions or characteristics that define its nature and influence the way it is analyzed and interpreted. The source dimension:

helps interpret and relate financial data to relevant real-world factors and conditions.

refers to the source of the data and its quality, consistency, and reliability.

refers to the level of detail or aggregation in financial data.

refers to the correctness and precision of the data.

Correct!

refers to the source of the data and its quality, consistency, and reliability.

New cards

"flights" is a dataset containing observations of flights from the NYC area in 2013. The attribute "arr_delay" indicates delays to a flight's arrival time, with positive numbers indicating a late flight. The attribute carrier indicates the airline to which the flight belongs. The R/dplyr code: flights %>% group_by(carrier) %>% summarise(mean_arr_delay = mean(arr_delay, na.rm = T)) %>% filter(mean_arr_delay < 0) returns the average delay by airline only for airlines that have late flights on average.

False

True

False

New cards

Commas are a common way to separate attributes in text files.

True

False

True

New cards

"flights" is a dataset containing observations of flights from the NYC area in 2013. The attribute "origin" indicates the airport of departure. The attribute "dest" indicates the airport of arrival. The attributes "month" and "day" indicate the date of the flight. The R/dplyr code: flights %>% filter("origin" == "EWR") returns all of the flights that departed from Newark airport (which has the airport code "EWR").

False

True

False

New cards

Handling errors and exceptions is crucial for handling unexpected situations that may occur during data processing.

False

True

False

New cards

In programming, conditional statements are used to:

Perform mathematical calculations.

Repeat a block of code until a certain condition is met.

Make decisions within code based on certain conditions.

Encapsulate blocks of reusable code that perform specific tasks.

Make decisions within code based on certain conditions.

New cards

What does the “group_by” function do in R?

Subset attributes (columns) by name.

Changes the dataset so that later functions operate on sets of rows that share an attribute value.

Subset observations (rows) based on values of attributes (columns).

Collapses a dataset into a single row of values.

Changes the dataset so that later functions operate on sets of rows that share an attribute value.

New cards

"flights" is a dataset containing observations of flights from the NYC area in 2013. The attribute "arr_delay" indicates delays to a flight's arrival time, with positive numbers indicating a late flight. If the R/dplyr code: flights %>% summarise(mean_arr_delay = mean(arr_delay, na.rm = T)) returns a positive number, then the code flights %>% summarise(mean_arr_delay = mean(arr_delay, na.rm = T)) %>% filter(mean_arr_delay > 0) %>% filter(mean_arr_delay == min(mean_arr_delay)) will return the same result.

True

False

True