ACIS 2504 Exam 2 (Final)

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/151

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

152 Terms

New cards

Motivation

The reason the analysis is being performed

New cards

Opportunities

New, potentially advantageous occasions that are a motivation for data analytics

New cards

Professional Issues and Requirements

Ensuring compliance with regulation, which is a motivation for data analytics

New cards

Problem Solving

Solving problems related to risks and issues around clients, which could be a motivation for data analytics

New cards

Process and Performance Assessment

Evaluating financial statements for potential material misstatements or risks of material misstatement, which could be a movativation for data analysis

New cards

Objective

Should follow the motivation, goal needs to be clear

New cards

Articulating Questions

Good questions

New cards

Role of critical thinking

motivation, objectives, and questions need to be critically evaluated

New cards

Data analysis objective

requires knowing why the task is being performed

New cards

descriptive analytics

objectives involving understanding something that is currently happening, or has happened

New cards

Descriptive Questions

often asked prior to advanced analyses

New cards

Descriptive Questions

identify purpose of analysis
1. break down the purpose into questions

New cards

Analyses Used to Answer Descriptive Questions

Frequency measures, measures of location, measure of dispersion, measure of percentage change

New cards

Frequency measures

understand categories of data

New cards

Measure of location

reveal average observations in a data set

New cards

measure of dispersion

show how much variance there is among the observation in the data set

New cards

measure of percentage change

help compare to prior periods and the percent of the total

New cards

Diagnostic Questions

builds on descriptive questions, explores the data to find a cause of an otucome

New cards

What do diagnostic questions look for?

anomalies, correlations, patterns, trends

New cards

Diagnostic Analyses

Anomaly detection, correlation, pattern detection, trend analysis

New cards

Predictive Questions

what may happen on the future

New cards

Use of predictive analsyses

financial accountants: use to find trends in sales/expenses

cost accountants: predict costs, create forecasts, and evaluate cost drivers

auditors: identify potenital material misstatements

tax accountants: tax planning

New cards

Trendlines

Used by predictive analyses, show underlying relationship of data (functional relationship and linear function)

New cards

Functional relationships

effect of an independent variable on a dependent variable

New cards

linear function

steady increases or decreases over the range of the independent variable

New cards

Linear regression

tool for building mathematical and statistical models, explains the relationship between dependent and one or more independent variables (predictive)

New cards

Building a predictive model

Variable, Dependent Variable, Independent Variables

New cards

Variable

data field used for analysis

New cards

Dependent Variable

outcome measure (warranty expenses)

New cards

independent variable

variables that influence the dependent variable

New cards

Multiple R

correlation coefficient, measures strength of relationship between dependent and independent (-1 to 1)

New cards

R Squared

Coefficient of determination, measure of how well the regression line fits the data (closer to 1, the better the regression line fits to the data)

New cards

Adjusted R-Squared

Explains how well the regression lines fits the data, modifies r-squared given he proportion of the variation in the dependent variable

New cards

standard error

variability of the observed dependent variable values from the values predicted by the model

New cards

regression statisics

statistical measures used to evaluate the model

New cards

Data Analysis Plan

focus on the objective
select a data strategy
select an analysis strategy
consider risks
embed controls

New cards

Best Data strategy Alternative

The one with the highest overall factor ratings

New cards

In Progress Project Plans

Data Strategy/Analysis Strategy Risks, and data strategy/analysis controls

New cards

data strategy risks

data errors, data extraction errors, inaccurate management assumptions and estimates

New cards

analysis strategy risks

errors in age group categorization and calculations, erroneous l application of bad debt estimation percentages to age group totals, erroneous summing an estimated bad debts across age groups, errors in the calculation of the calculate to use for bad debt expense

New cards

data strategy controls

compare data to source documentation, compare to prior period to verify management estimates and assumptions, review customer/policy/procedure changes

New cards

analysis strategy controls

verify all classifications and groupings, verify all calculations, verify correct use of analysis technologies

New cards

When is data appropriate for analysis?

relevant, available, and characteristics match the analysis method requirements

New cards

Internal data

generated within the organization and more easily controlled/verified by organization (sales data, purchase data)

New cards

external data

obtained from sources outside the organization and are somewhat riskier to use, can provide insights that internal data alone cannot prove (weather data, publicly available competitor data)

New cards

Measured raw data

data created by a controlled process capturing the value of the data (price, cost, number on hand, weight, etc), discrete or continous

ex: InventoryCost, NumberOnHand

New cards

non-measured raw data

data created automatically y the computer or company policy for control, discrete format (identification codes, standard descriptions0

ex: InventoryCode, ProductCategory

New cards

Calculated Data

data created when one or more fields in a particular record (row) have any number of math operations, discrete or continuous

ex: =UnitCost x NumberOnHand

New cards

Objective question

what is a reasonable estimate of the uncollectible receivables in the 2025 year-end outstanding accounts recieveables?

New cards

Data strategy alternatives example

all credit sales invoices and collections data that petain to the account period

New cards

Analysis Strategy Examples

Different methods for determine the age of an outstanding invoice

days outstanding from the invoice data forward

New cards

Data Strategy Risk and Controls Example

Missing or incorrectly included invoices in the general or the subsidiary ledger

New cards

Possible Risks

human bias, changes in customer behavior, business process changes

New cards

Analysis Strategy Risk and Controls Examples

Errors in age group categorizatoin

New cards

Identifying Appropriate Data

relevant, available, and the characteristics match the analysis method requirements

New cards

data set

collection of data columns and rows available for analysis

statistical measures and tests often require certain data characteristics

New cards

Fields

individual columns in a data set are called fields

New cards

attributes

the columns if the source of the data was a database

each column describes and represents one unique characteristic

New cards

records

rows in a data set from a database which represent the collection of columns that hold the description of a single occurrence of the data set’s purpose

New cards

Typical elements in a data set

column = data field = attribute
data set if the entire collection of row and columns
row = record

New cards

Why is considering the source of the data important?

quality of the data in the fields impact the quality of the analysis
automatically sequentially assigned field data
- can be confident that the data is consistently correct

New cards

categorical (nominal) data

labeled or named data that can be sorted into groups according to specific characteristics, do not have a quantitative value

ex: regions, product type, account number, account category, location

New cards

Ordinal Data

ordered or ranked categorical data, distance between the categories does not need to be known or equal

ex: survey question asking about customer service on a scale of 1-10

New cards

interval data

ordinal data that have equal and constant differences between observations and an arbitrary zero point

ex: temperature, time, credit scores

New cards

ratio data

interval data with a natural zero point (a natural zero point means that it is not arbitrary)

ex: economic data (dollars)

New cards

Data Risks

non—representative sample selection, outlier data points, dirty data

New cards

controls risk

verify representative of a sample,

perform a histogram or quartile analysis used to identify outliers, then explain the rule used for outlier adjustment or removal
- verify integrity of data set and clean up dirty data issues

New cards

outlier data points

unusual data points compared to rest of data

controlled by visualized the data to check if measure is very different than rest of data

New cards

dirty data

missing, invalid, duplicate, and incorrectly formatted data

test before the analysis performed
compare to source documents or test for reasonableness

New cards

table names

must be clear, intuitive, and clear

scan tables for incorrect ambiguous names and rename

New cards

problem

incorrect or ambiguous table names make it harder to understand and work with a data set

New cards

detect (data profiling)

visually scan tables for incorrect or ambiguous names

New cards

correct (ETL)

rename tables

New cards

Pattern 3

irrelevant and unreliable data that bloat the data model

scan columns for irrelevant and unreliable data
- can be done visually, data dictionary can be a good tool

New cards

pattern 4

incorrect and ambiguous column names, become variables during data exploration and interpretation

names are important because other people may use the analytic data based
visually scan a column’s content

New cards

pattern 5

incorrect data types, integral part of column definition because they determine what can and cannot be done with the data in a column

inspect data type
change the data type

New cards

pattern 6

composite and multi-valued columns, each cell should contain one value describing one characteristics

2 or more values in the same cell make analysis difficult
- scenarios that violate the single-valued rule make analysis more complex, are composite and multi-valued columns
- detect them by visually scanning, solution is to split it

New cards

By delimiter

used to split columns

New cards

pattern 7

wrong value is assigned to one of the entities’ characteristics

helpful to look for outlying values outside numeric data
more than 1.5 outside IQR
- identify root and eliminate, correct value, value could also be corrected in the analytical database

New cards

outlier

falls more than 1.5 times the IR below the first quartile or above the third quartile

New cards

pattern 8

data inconsistency when two or more different representations of the same value when mixed in the same column

distinct values: visually scanning the distinct values of a column
frequencies: values with a low frequency could indicate inconsistent data

Eliminate bad data by identify root cause and eliminating/modifying the values in either source data or analytical data base

New cards

Pattern 9

incomplete values

addresses the incompleteness that might make data unusable and unreliable
investigate null values (ETL tools reveal, on a column by column basis)
- replace
- remove
- design a consistent schema

New cards

pattern 10

invalid values

domain specific rules that determine whether data is acceptable, can be created for most columns
create and apply validation rules
- rely on profiling data
- mandatory column, stats about null values provided by ETL tools can be used for validation
eliminate root cause, change value, change the value in data base

New cards

Design and Implementation of Validation Rules

actual hours for a service must be positive, and can’t exceed 14
- ACTUALHOURSVALID=IF SERVICe.ACTUALTIME> 0 AND SERVICEACTUAL TIME <=14, THEN “YES, ELSE “NO
- minimum employee rate is $150, and the maximum employee rate must be lower than $500
  - RATEVALID = IF EMPLOYEE.RATE >= 150 AND EMPLOYEE.RARE < 500, THEN “YES”, ELSE “NO”

New cards

Pattern 11

non-intuitive and ambiguous table names

part of data model and data set vocab, so must be correct, intuitive, and clear
scan tables for ambiguous names and rename
- whether it accurately reflect content
- intuitive, avoid spaces, underscores, and special coding

New cards

pattern 12

missing primary keys

tables are descriptions of entities, and each instance of an entity should be uniquely identified
- column must have unique value for each instances and no null
  - already in place when data is extracted

New cards

pattern 13

redundant content across columns

data inconsistancy occur when same data are recorded mroe than once, and changed in one place but not the other (email address)
- Possible scenarios
  - overlap as an address that contains state info and separate state field
  - dependency, which exists when one column’s values are dependent on the values of another column in the same table
  - perform column by column comparison, delete redundant and dependent columns

New cards

pattern 14

find invalid values with intra-table rules

creating and apply intra-table rules
- identify invalid data
  - creating validation rules requires in-depth knowledge

New cards

Transform Models

search for data issues across tables

ex: data that describe the same entity spread across multiple tables, models with a structure that is hard to understand, models that do not support efficient processing

New cards

Pattern 15

data spread across tables

analysis is more challenging when data that describe the same entity are spread across multiple tables

Identify similarly structure table, or tables describing different characteristics of the same identity, or combine tables

New cards

Pattern 16

data models do not comply with principles of dimensional modeling

technique of creating data models with fact tables surrounded by dimension tables
- stars schemas
Determine the fact and dimension tables belong to the correct table

New cards

Pattern 17

find invalid values with inter-table rules

determine validity of a column’s values based on values in one or more tables
- referential integrity

create and apply inter-table validation rules that identify invalid data, modify invalid rules

New cards

referential integrity

all values in a foreign key should also exist as values in the corresponding primary key

New cards

Star schema

database used to organize data in a warehouse/data mart, simplifying data analysis and querying

New cards

information modeling

process of generating additional knowledge from data that is relevant for analysis purposes

data is the input (raw facts and figures)

New cards

algorithms

set of instructions that transform the data into information, which is the output of additional knowledge gained from the data

link input of facts and the output of useful info
used to calculate depreciation, cost, ratios, and more
simple and complex

New cards

calculated columns

value is calculated for each cell in the column, integral parts of the table

ex:NetAmount and TransactionSIze

New cards

Measure

aggregate, or total, tht can be used in reports, and thus for analytical purposes

created by algorithms, not integral parts of a table

New cards

Data model

shows the structure of a data set

shows the concepts being described, the tables, and the fields used to describe the concepts

100

New cards

Extended Sales Star Schema

shows extended sales star scheme with field names such as brand, country and gender

fact table describes sales transaction
- fields with question marks indicate the information model’s calculated columns and measures