CBIS 4120 Chapter 2

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/54

flashcard set

Earn XP

Description and Tags

CBIS 4120 Chapter 2

CBIS 4120

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

55 Terms

1
New cards

Data provisioning

the process of providing users and systems with access to data

  • Includes maintaining security authorizations to limit access

  • Provisioning from external sources requires obtaining permission from the sources for acquiring and using unless open source (freely available)

2
New cards

During the provisioning process data may be

replicated or copied

3
New cards

Replication

Ensures that the source data remain intact

  • Can be performed in real time or in batches

  • Essential to consider the impact of the replication on performance in the source systems so as to minimize disruption to the orgs. business systems

  • One reason we replicate from source to analysis system is that the analysis could burden the source system and cause it to slow

  • Another reason is to transform the data into a more usable form before we analyze it

4
New cards

Structured data:

Are computer readable, highly organized, and searchable

  • Data sourced from databases, spreadsheets, flat files, and other systems with fields, cells, rows and columns are organized so a computer can understand.

  • Can be formatted as string (text), numbers, or dates that a computer can read

5
New cards

T/F: The fact that acquired data are structured, however, does not mean they are ready to use for analysis

TRUE

  • Structured data frequently contain errors, redundancies, and omissions

6
New cards

Structured data are based on data models that contain _____

metadata (data about the data)

  • provide context, meaning and purpose to data

7
New cards

Unstructured data

  • Do not conform to data models and associated metadata

  • Computers have a harder time reading this data

  • Include text files, pictures, audio recordings, webpages, social media content, and videos

8
New cards

Spreadsheets

a widely used business tool for storing and manipulating data

  • Stored in rows and columns

  • problems: storing data in spreadsheets is that since protection of the data is limited, users can easily introduce errors into formulas if they are unfamiliar with how a spreadsheet works

  • Lack of input control as well as lack of access control

9
New cards

Flat file

contains data in text format with no structured relationship among the data or to other files

  • .csv, ASCII, other delimited files

  • They are more frequently used to transfer data from one location to another

  • Ex: we can download data from one database into a flat file and then use that flat file to upload the data to another database

  • System configuration files are often stored in flat files

  • Amazon SimpleDB

10
New cards

Databases

Organized collections of data that enable users to access, manage, and update the data.

Most popular: Relational database (A collection of tables linked together via relationships)

11
New cards

Data model:

The structure of a database

12
New cards

In a relational database, the relationships between tables are created through the use of unique identifiers called_____

primary keys

  • Uniquely identifies each row in the table

13
New cards

Primary keys are references in other tables, where they are called

foreign keys

14
New cards

Four types of interactions possible in a database

  • Create new records or rows (modify)

  • Read records (has no impact)

  • Update or change (modify)

  • Delete records (modify)

CRUD

15
New cards

Anomolies

Irregularities if the database is not structured properly

  • a serious problem because they threaten data integrity

16
New cards

update anomalies

occur when the same data are stored in multiple places and therefore may or may not update correctly when a data value changes

17
New cards

insert anomalies

result when there is no place within the table to store the new data until another event occurs

  • Ex: the only place to store the customer name and address is in the sales transaction table

18
New cards

delete anomalies

occur when deleting some data results in the unintentional deletion of other data

  • EX: if we were to delete customer who has not purchased from us for a long time then their associated sales records would also be deleted

19
New cards

Normalization

the process of decomposing a database table into more tables until the database is no longer susceptible to modification anomalies

  • 1NF, 2Nf, 3NF, BCNF (Boyce-codd normal form)

20
New cards

Business systems (transactional systems) are typically normalized up to the____ normal form

3rd

21
New cards

Most well known method for dealing with unstructured data

Tagging

22
New cards

tagged data

employ identifiers knows as tags that are attached to the data elements to them make them readable by a computer

  • Enclosed within <> brackets

23
New cards

Hyper Text Markup Language

uses tags to mark how content is structured within a webpage so that a web browser can process the tags and display the intended content

24
New cards

Extensible markup language (XML)

looks very similar to HTML, but it is used to describe data to both humans and computers.

  • a method of tagging or coding data in documents, so that they can be read by both people and computers.

25
New cards

T/F: Unstructured data may be un-understandable to a computer in its native form

TRUE

26
New cards

XML tags are used to

create metadata about data so that the data can be understood by computers for further processing and structuring.

27
New cards

extensible business reporting language (XBRL)

  • developed by accounting professionals to facilitate data sharing for reports and analysis

  • Basically any activity that requires communicating unstructured data to a computer and a structured taxonomy of tags can use XBRL.

  • XBRL then must be stripped of its tags to be used in analysis

  • XRBL converts blocks of text to content of meaning to a computer

28
New cards

Natural language processing (NLP)

  • People speak and the computer translates into commands so that it can understand

  • EX: Python package NLTK

  • NLP can be employed by analysts to convert source data into machine-understandable data

  • Often considered a type of AI

29
New cards

Image recognition 

cans a picture and translates what it ‘sees’  into a textual description of whatever is depicted in the picture

30
New cards

transactional systems

store and process business data required for each of the businesses operations

  • designed to process transactions quickly, reliably, and accurately

  • Most transactional systems are based on an underlying relational database

  • Configured to three-tiered architecture

31
New cards

Transactional systems are also called

online transaction processes (OLTP)

  • enable them to support high-volume business transactions

32
New cards

Transactional systems generally are configured to use a three-tiered architecture that consists of the following components:

  • The user interface or presentation tier (most users will see, use and understand the transactional system only via this interface)

  • The business services, business logic, or application tier (business logic tier, application tier, middle tier, logic tier) 

    • Can be used to enforce data rules

  • The data services or data storage tier

Represent the layers of the application and are logical rather than physical

33
New cards

Business rules

the logic by which business data operated

  • Include workflow, business processes, and user roles

  • Typically resides on a seperate server machine

34
New cards

The Database management system (DBMS) resides at the

data services or data tier

  • Where data are stored and accessed by the business services tier

35
New cards

Characteristics or transactional systems

  • Availability

  • Level of detail

  • Updatable

  • Speed

  • Current

  • Operational

  • Concurrent

  • Support Requirements of business processes

  • Small uniform transactions

  • optimized for storage

  • Data are functionally or process oriented

36
New cards

Enterprise resource planning (ERP) systems

  • Integrated transactional systems that enable all the functional areas of a business to share data

37
New cards

benefits of an ERP:

  • Transactional data need to be entered only once and then can be shared across all pertient areas

  • Changes made to master data are entered only once then used many times

    • This is not the case with non-integrated systems

  • The data processing and storage functionality of all the business processes are consolidated in a single system

38
New cards

Informational systems

are used to provide a place for data to be stored and prepared for analytical purposes

  • users can access to make data-driven decisions

    • optimized for read-only and therefore frequently separate from transactional system

39
New cards

Informational systems are sometimes referred to as

Online analytical processing (OLAP)

  • Contain large quantity of data that can be from multiple sources

  • Both data mining and analytics may be accomplished via OLAP

40
New cards

Characteristics of informational systems

  • level of detail

  • periodic

  • requirements are not always knows

  • managerial requirements

  • Optimized for access

  • Historical data

  • data may be integrated

  • availability

41
New cards

Out-of-date computer systems are referred to as

legacy systems

42
New cards

Web service

an XML-based software system that enables users to access computing resources via a network

43
New cards

Simple object access protocol (SOAP)

  • Web services are application components that communicatre via open protocols

  • Allow different systems to communicate with each other

  • EX: SOAP enables a windows based system to share data with a Unix system

44
New cards

T/F: A key characteristic of web services is that they have no user interface

TRUE

45
New cards

Web Crawlers

  • Also known as info agents or web spiders

  • Search websites one page at a time for information

  • Typically ask permission before pulling data from site

  • Also used for web scraping

46
New cards

Web Scraping

process of searching for information on webpages and then stripping the html tags so the data can be stored in a structured format

  • Used for marketing purposes

  • Web scraping may be accomplished with the use of site-specific application programming interfaces (APIs)

47
New cards

Typically clickstream data are stored as:

  • Semi-unstructured data (contain both text and structured data stored automatically by the system)

48
New cards

Sensor data

gathered from devices such as heating units, vehicles, electrical transformers, satellites, airplanes, health monitors, etc

  • Have applications in many areas of life

  • Going to be ever more important as the IOT becomes more prevalent in our lives

  • Manufacturers use to monitor the health of their products

49
New cards

Sampling

the act of extracting only certain data values from a dataset (subset)

  • This approach is employed in situations where a sample of the dataset tells the same story as the entire dataset

50
New cards

Sampling is appropriate when

  • The analyst are certain that the sample is representative of the entire set

  • the source is too large for the planned analysis

  • The application specifically calls for a data sample (accounting)

51
New cards

Scaling

Standardizes data to a normal distribution

  • Necessary when the output of the analysis needs to fall within a range of values

52
New cards

Systems that examine events and transactions in real time are called

continuous audit modules or continuous audit tools

53
New cards

To collect data from the transactional system, an auditor will sometimes employ an

embedded audit module or EAM

  • identifies transactional data to identify abnormalities

54
New cards

Intelligent control agents

software processes that work autonomously with distributed system to control or run a system both with and without human intervention

55
New cards

Data may be collected automatically through:

Continuous monitoring, feedback mechanisms, control agents