Data Analytics - Chapter 4

0.0(0)
studied byStudied by 0 people
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/31

flashcard set

Earn XP

Description and Tags

Master the Data: Preparing Data for Analysis

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

32 Terms

1
New cards

Database

Structured dataset that can be accessed by many potential authorized users via a computer system/network

2
New cards

The most common/popular database model is ______

Relational

3
New cards

Relational Databases

Breaks the data into separate tables, each containing a unique list of the items stored

4
New cards

Relational Databases are comprised of (2):

1) Fields

2) Record

5
New cards

1) Fields

Variables; the columns that contain descriptive characteristics about the observations in the table

6
New cards

2) Records

The rows, with each observation corresponding to a record, or unique instance, or what is being described in the table

7
New cards

Primary Key

A special name assigned to any field that functions as a unique identifier in a table

  • Ex. Customers Table that contains a unique list of details about each customer; those individual rows have a Primary Key

8
New cards

Foreign Key

Exists to create relationships between two tables so that users of the database can look up details of the observation based on the Primary Key/Foreign Key relationship

  • Ex. Customer Table has a Primary Key for Customer_ID, and in the Transaction Table, there is a Foreign Key for Customer_ID that originates from the Customer Table; it is the same information, but helps to link the two tables together to show how the information is related/important

9
New cards

Are Primary Keys required in every table?

Yes

10
New cards

Are Foreign Keys required in every table?

No

11
New cards

Relational Database Data Dictionary

A multi-table database with additional fields

  • Ex. Attribute Name, Primary/Foreign Key, Field Size

12
New cards

Relational Database Diagram

Communicates the same pieces of information as a data dictionary, but in a visual form

13
New cards

Two Most Popular Methods for Modeling Database Designs

1) Entity-Relationship (ER) Diagram - the one we will be focusing on

2) Unified Modeling Language (UML) Class Diagram

14
New cards

Entity-Relationship (ER) Diagrams

A graphical representation of an information system, illustrating relationships among people, objects, places, and events within that system

15
New cards

As an Accountant, the main usage for Excel should be ___ ___

Data Analysis

16
New cards

Data Integrity

Truth in data; accounting information must be both relevant and a faithful representation

17
New cards

The Three Characteristics of Faithful Representation

1) Free from Error

2) Complete

3) Neutrality

18
New cards

Data should always be…

Up-to-Date to avoid non-faithful information

19
New cards

Four Main Benefits of Storing Data in a Relational Database

1) Preventative internal controls are easier to enforce

  • Mistakes are easier to avoid

2) Security around data entry/table access can aid in creating/enforcing data entry internal controls

  • Can set up table-level security to indicate which employees have read/edit/write permission

3) Reduced redundancy cuts down on errors

  • Confirms there is a unique listing of each observation stored in only one place rather than repeating customer details every purchase (if using only one table)

4) Version control reduces the possibility of having more than one version of the data

  • Maintains data integrity when data is stored in one centralized database that users can connect directly to via Excel, Tableau, etc. rather than multiple desktop databases

20
New cards

The Concept of ETL stands for

Extract, Transform, and Load

21
New cards

The Three Steps of ETL

1) Extracting Data

2) Transforming Data

3) Loading it into the Tool Being Used (ex. Tableau)

22
New cards

The Transformation Step is necessary when…

The data is not formatted in the manner it needs to be for analysis (ex. changing a string of text to geographc)

23
New cards

Structure Query Language (SQL)

A universal database language that can be used to create, update, and delete records and tables in relational databases

  • In this case, the main focus is on combining data from one or more tables/organizing it

24
New cards

The majority of SQL queries created to extract data will be comprised of at least two lines of code:

1) SELECT

2) FROM

25
New cards

Which clause identifies the columns that you would like to extract?

SELECT

26
New cards

What does the command SELECT * denote?

Return every column

27
New cards

Which clause identifies the tables that store the data that you would like to extract?

FROM

28
New cards

To restrict the number of rows returned, add a ____ clause to the query

WHERE

  • Acts as a filter

29
New cards

If you wish to select data from more than one table, extend the FROM clause to contain…

An INNER JOIN

30
New cards

Data can be received from ____ and ____ sources

(a) Internal

(b) External

  • If not publicly available, permission is needed for information

31
New cards

_____ is the best at data visualizations

Tableau

32
New cards

When accessing data, _____ requires that you connect to an external datasource

Tableau