ACIS 2504 Exam 2 (Final)

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/151

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

152 Terms

1
New cards

Motivation

The reason the analysis is being performed

2
New cards

Opportunities

New, potentially advantageous occasions that are a motivation for data analytics

3
New cards

Professional Issues and Requirements

Ensuring compliance with regulation, which is a motivation for data analytics

4
New cards

Problem Solving

Solving problems related to risks and issues around clients, which could be a motivation for data analytics

5
New cards

Process and Performance Assessment

Evaluating financial statements for potential material misstatements or risks of material misstatement, which could be a movativation for data analysis

6
New cards

Objective

Should follow the motivation, goal needs to be clear

7
New cards

Articulating Questions

Good questions

8
New cards

Role of critical thinking

motivation, objectives, and questions need to be critically evaluated

9
New cards

Data analysis objective

requires knowing why the task is being performed

10
New cards

descriptive analytics

objectives involving understanding something that is currently happening, or has happened

11
New cards

Descriptive Questions

often asked prior to advanced analyses

12
New cards

Descriptive Questions

  1. identify purpose of analysis

    1. break down the purpose into questions

13
New cards

Analyses Used to Answer Descriptive Questions

Frequency measures, measures of location, measure of dispersion, measure of percentage change

14
New cards

Frequency measures

understand categories of data

15
New cards

Measure of location

reveal average observations in a data set

16
New cards

measure of dispersion

show how much variance there is among the observation in the data set

17
New cards

measure of percentage change

help compare to prior periods and the percent of the total

18
New cards

Diagnostic Questions

builds on descriptive questions, explores the data to find a cause of an otucome

19
New cards

What do diagnostic questions look for?

anomalies, correlations, patterns, trends

20
New cards

Diagnostic Analyses

Anomaly detection, correlation, pattern detection, trend analysis

21
New cards

Predictive Questions

what may happen on the future

22
New cards

Use of predictive analsyses

financial accountants: use to find trends in sales/expenses

cost accountants: predict costs, create forecasts, and evaluate cost drivers

auditors: identify potenital material misstatements

tax accountants: tax planning

23
New cards

Trendlines

Used by predictive analyses, show underlying relationship of data (functional relationship and linear function)

24
New cards

Functional relationships

effect of an independent variable on a dependent variable

25
New cards

linear function

steady increases or decreases over the range of the independent variable

26
New cards

Linear regression

tool for building mathematical and statistical models, explains the relationship between dependent and one or more independent variables (predictive)

27
New cards

Building a predictive model

Variable, Dependent Variable, Independent Variables

28
New cards

Variable

data field used for analysis

29
New cards

Dependent Variable

outcome measure (warranty expenses)

30
New cards

independent variable

variables that influence the dependent variable

31
New cards

Multiple R

correlation coefficient, measures strength of relationship between dependent and independent (-1 to 1)

32
New cards

R Squared

Coefficient of determination, measure of how well the regression line fits the data (closer to 1, the better the regression line fits to the data)

33
New cards

Adjusted R-Squared

Explains how well the regression lines fits the data, modifies r-squared given he proportion of the variation in the dependent variable

34
New cards

standard error

variability of the observed dependent variable values from the values predicted by the model

35
New cards

regression statisics

statistical measures used to evaluate the model

36
New cards

Data Analysis Plan

  1. focus on the objective

  2. select a data strategy

  3. select an analysis strategy

  4. consider risks

  5. embed controls

37
New cards

Best Data strategy Alternative

The one with the highest overall factor ratings

38
New cards

In Progress Project Plans

Data Strategy/Analysis Strategy Risks, and data strategy/analysis controls

39
New cards

data strategy risks

data errors, data extraction errors, inaccurate management assumptions and estimates

40
New cards

analysis strategy risks

errors in age group categorization and calculations, erroneous l application of bad debt estimation percentages to age group totals, erroneous summing an estimated bad debts across age groups, errors in the calculation of the calculate to use for bad debt expense

41
New cards

data strategy controls

compare data to source documentation, compare to prior period to verify management estimates and assumptions, review customer/policy/procedure changes

42
New cards

analysis strategy controls

verify all classifications and groupings, verify all calculations, verify correct use of analysis technologies

43
New cards

When is data appropriate for analysis?

relevant, available, and characteristics match the analysis method requirements

44
New cards

Internal data

generated within the organization and more easily controlled/verified by organization (sales data, purchase data)

45
New cards

external data

obtained from sources outside the organization and are somewhat riskier to use, can provide insights that internal data alone cannot prove (weather data, publicly available competitor data)

46
New cards

Measured raw data

data created by a controlled process capturing the value of the data (price, cost, number on hand, weight, etc), discrete or continous

ex: InventoryCost, NumberOnHand

47
New cards

non-measured raw data

data created automatically y the computer or company policy for control, discrete format (identification codes, standard descriptions0

ex: InventoryCode, ProductCategory

48
New cards

Calculated Data

data created when one or more fields in a particular record (row) have any number of math operations, discrete or continuous

ex: =UnitCost x NumberOnHand

49
New cards

Objective question

what is a reasonable estimate of the uncollectible receivables in the 2025 year-end outstanding accounts recieveables?

50
New cards

Data strategy alternatives example

all credit sales invoices and collections data that petain to the account period

51
New cards

Analysis Strategy Examples

Different methods for determine the age of an outstanding invoice

  • days outstanding from the invoice data forward

52
New cards

Data Strategy Risk and Controls Example

Missing or incorrectly included invoices in the general or the subsidiary ledger

53
New cards

Possible Risks

human bias, changes in customer behavior, business process changes

54
New cards

Analysis Strategy Risk and Controls Examples

Errors in age group categorizatoin

55
New cards

Identifying Appropriate Data

relevant, available, and the characteristics match the analysis method requirements

56
New cards

data set

collection of data columns and rows available for analysis

  • statistical measures and tests often require certain data characteristics

57
New cards

Fields

individual columns in a data set are called fields

58
New cards

attributes

the columns if the source of the data was a database

  • each column describes and represents one unique characteristic

59
New cards

records

rows in a data set from a database which represent the collection of columns that hold the description of a single occurrence of the data set’s purpose

60
New cards

Typical elements in a data set

  • column = data field = attribute

  • data set if the entire collection of row and columns

  • row = record

61
New cards

Why is considering the source of the data important?

  • quality of the data in the fields impact the quality of the analysis

  • automatically sequentially assigned field data

    • can be confident that the data is consistently correct

62
New cards

categorical (nominal) data

labeled or named data that can be sorted into groups according to specific characteristics, do not have a quantitative value

  • ex: regions, product type, account number, account category, location

63
New cards

Ordinal Data

ordered or ranked categorical data, distance between the categories does not need to be known or equal

  • ex: survey question asking about customer service on a scale of 1-10

64
New cards

interval data

ordinal data that have equal and constant differences between observations and an arbitrary zero point

  • ex: temperature, time, credit scores

65
New cards

ratio data

interval data with a natural zero point (a natural zero point means that it is not arbitrary)

  • ex: economic data (dollars)

66
New cards

Data Risks

non—representative sample selection, outlier data points, dirty data

67
New cards

controls risk

  • verify representative of a sample,

  • perform a histogram or quartile analysis used to identify outliers, then explain the rule used for outlier adjustment or removal

    • verify integrity of data set and clean up dirty data issues

68
New cards

outlier data points

unusual data points compared to rest of data

  • controlled by visualized the data to check if measure is very different than rest of data

69
New cards

dirty data

missing, invalid, duplicate, and incorrectly formatted data

  • test before the analysis performed

  • compare to source documents or test for reasonableness

70
New cards

table names

must be clear, intuitive, and clear

  • scan tables for incorrect ambiguous names and rename

71
New cards

problem

incorrect or ambiguous table names make it harder to understand and work with a data set

72
New cards

detect (data profiling)

visually scan tables for incorrect or ambiguous names

73
New cards

correct (ETL)

rename tables

74
New cards

Pattern 3

irrelevant and unreliable data that bloat the data model

  • scan columns for irrelevant and unreliable data

    • can be done visually, data dictionary can be a good tool

75
New cards

pattern 4

incorrect and ambiguous column names, become variables during data exploration and interpretation

  • names are important because other people may use the analytic data based

  • visually scan a column’s content

76
New cards

pattern 5

incorrect data types, integral part of column definition because they determine what can and cannot be done with the data in a column

  • inspect data type

  • change the data type

77
New cards

pattern 6

composite and multi-valued columns, each cell should contain one value describing one characteristics

  • 2 or more values in the same cell make analysis difficult

    • scenarios that violate the single-valued rule make analysis more complex, are composite and multi-valued columns

    • detect them by visually scanning, solution is to split it

78
New cards

By delimiter

used to split columns

79
New cards

pattern 7

wrong value is assigned to one of the entities’ characteristics

  • helpful to look for outlying values outside numeric data

  • more than 1.5 outside IQR

    • identify root and eliminate, correct value, value could also be corrected in the analytical database

80
New cards

outlier

falls more than 1.5 times the IR below the first quartile or above the third quartile

81
New cards

pattern 8

data inconsistency when two or more different representations of the same value when mixed in the same column

  • distinct values: visually scanning the distinct values of a column

  • frequencies: values with a low frequency could indicate inconsistent data

Eliminate bad data by identify root cause and eliminating/modifying the values in either source data or analytical data base

82
New cards

Pattern 9

incomplete values

  • addresses the incompleteness that might make data unusable and unreliable

  • investigate null values (ETL tools reveal, on a column by column basis)

    • replace

    • remove

    • design a consistent schema

83
New cards

pattern 10

invalid values

  • domain specific rules that determine whether data is acceptable, can be created for most columns

  • create and apply validation rules

    • rely on profiling data

    • mandatory column, stats about null values provided by ETL tools can be used for validation

    eliminate root cause, change value, change the value in data base

84
New cards

Design and Implementation of Validation Rules

  • actual hours for a service must be positive, and can’t exceed 14

    • ACTUALHOURSVALID=IF SERVICe.ACTUALTIME> 0 AND SERVICEACTUAL TIME <=14, THEN “YES, ELSE “NO

    • minimum employee rate is $150, and the maximum employee rate must be lower than $500

      • RATEVALID = IF EMPLOYEE.RATE >= 150 AND EMPLOYEE.RARE < 500, THEN “YES”, ELSE “NO”

85
New cards

Pattern 11

non-intuitive and ambiguous table names

  • part of data model and data set vocab, so must be correct, intuitive, and clear

  • scan tables for ambiguous names and rename

    • whether it accurately reflect content

    • intuitive, avoid spaces, underscores, and special coding

86
New cards

pattern 12

missing primary keys

  • tables are descriptions of entities, and each instance of an entity should be uniquely identified

    • column must have unique value for each instances and no null

      • already in place when data is extracted

87
New cards

pattern 13

redundant content across columns

  • data inconsistancy occur when same data are recorded mroe than once, and changed in one place but not the other (email address)

    • Possible scenarios

      • overlap as an address that contains state info and separate state field

      • dependency, which exists when one column’s values are dependent on the values of another column in the same table

      • perform column by column comparison, delete redundant and dependent columns

88
New cards

pattern 14

find invalid values with intra-table rules

  • creating and apply intra-table rules

    • identify invalid data

      • creating validation rules requires in-depth knowledge

89
New cards

Transform Models

search for data issues across tables

  • ex: data that describe the same entity spread across multiple tables, models with a structure that is hard to understand, models that do not support efficient processing

90
New cards

Pattern 15

data spread across tables

  • analysis is more challenging when data that describe the same entity are spread across multiple tables

Identify similarly structure table, or tables describing different characteristics of the same identity, or combine tables

91
New cards

Pattern 16

data models do not comply with principles of dimensional modeling

  • technique of creating data models with fact tables surrounded by dimension tables

    • stars schemas

    Determine the fact and dimension tables belong to the correct table

92
New cards

Pattern 17

find invalid values with inter-table rules

  • determine validity of a column’s values based on values in one or more tables

    • referential integrity

create and apply inter-table validation rules that identify invalid data, modify invalid rules

93
New cards

referential integrity

all values in a foreign key should also exist as values in the corresponding primary key

94
New cards

Star schema

database used to organize data in a warehouse/data mart, simplifying data analysis and querying

95
New cards

information modeling

process of generating additional knowledge from data that is relevant for analysis purposes

  • data is the input (raw facts and figures)

96
New cards

algorithms

set of instructions that transform the data into information, which is the output of additional knowledge gained from the data

  • link input of facts and the output of useful info

  • used to calculate depreciation, cost, ratios, and more

  • simple and complex

97
New cards

calculated columns

value is calculated for each cell in the column, integral parts of the table

ex:NetAmount and TransactionSIze

98
New cards

Measure

aggregate, or total, tht can be used in reports, and thus for analytical purposes

  • created by algorithms, not integral parts of a table

99
New cards

Data model

shows the structure of a data set

  • shows the concepts being described, the tables, and the fields used to describe the concepts

100
New cards

Extended Sales Star Schema

shows extended sales star scheme with field names such as brand, country and gender

  • fact table describes sales transaction

    • fields with question marks indicate the information model’s calculated columns and measures