1/31
Master the Data: Preparing Data for Analysis
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Database
Structured dataset that can be accessed by many potential authorized users via a computer system/network
The most common/popular database model is ______
Relational
Relational Databases
Breaks the data into separate tables, each containing a unique list of the items stored
Relational Databases are comprised of (2):
1) Fields
2) Record
1) Fields
Variables; the columns that contain descriptive characteristics about the observations in the table
2) Records
The rows, with each observation corresponding to a record, or unique instance, or what is being described in the table
Primary Key
A special name assigned to any field that functions as a unique identifier in a table
Ex. Customers Table that contains a unique list of details about each customer; those individual rows have a Primary Key
Foreign Key
Exists to create relationships between two tables so that users of the database can look up details of the observation based on the Primary Key/Foreign Key relationship
Ex. Customer Table has a Primary Key for Customer_ID, and in the Transaction Table, there is a Foreign Key for Customer_ID that originates from the Customer Table; it is the same information, but helps to link the two tables together to show how the information is related/important
Are Primary Keys required in every table?
Yes
Are Foreign Keys required in every table?
No
Relational Database Data Dictionary
A multi-table database with additional fields
Ex. Attribute Name, Primary/Foreign Key, Field Size
Relational Database Diagram
Communicates the same pieces of information as a data dictionary, but in a visual form
Two Most Popular Methods for Modeling Database Designs
1) Entity-Relationship (ER) Diagram - the one we will be focusing on
2) Unified Modeling Language (UML) Class Diagram
Entity-Relationship (ER) Diagrams
A graphical representation of an information system, illustrating relationships among people, objects, places, and events within that system
As an Accountant, the main usage for Excel should be ___ ___
Data Analysis
Data Integrity
Truth in data; accounting information must be both relevant and a faithful representation
The Three Characteristics of Faithful Representation
1) Free from Error
2) Complete
3) Neutrality
Data should always be…
Up-to-Date to avoid non-faithful information
Four Main Benefits of Storing Data in a Relational Database
1) Preventative internal controls are easier to enforce
Mistakes are easier to avoid
2) Security around data entry/table access can aid in creating/enforcing data entry internal controls
Can set up table-level security to indicate which employees have read/edit/write permission
3) Reduced redundancy cuts down on errors
Confirms there is a unique listing of each observation stored in only one place rather than repeating customer details every purchase (if using only one table)
4) Version control reduces the possibility of having more than one version of the data
Maintains data integrity when data is stored in one centralized database that users can connect directly to via Excel, Tableau, etc. rather than multiple desktop databases
The Concept of ETL stands for
Extract, Transform, and Load
The Three Steps of ETL
1) Extracting Data
2) Transforming Data
3) Loading it into the Tool Being Used (ex. Tableau)
The Transformation Step is necessary when…
The data is not formatted in the manner it needs to be for analysis (ex. changing a string of text to geographc)
Structure Query Language (SQL)
A universal database language that can be used to create, update, and delete records and tables in relational databases
In this case, the main focus is on combining data from one or more tables/organizing it
The majority of SQL queries created to extract data will be comprised of at least two lines of code:
1) SELECT
2) FROM
Which clause identifies the columns that you would like to extract?
SELECT
What does the command SELECT * denote?
Return every column
Which clause identifies the tables that store the data that you would like to extract?
FROM
To restrict the number of rows returned, add a ____ clause to the query
WHERE
Acts as a filter
If you wish to select data from more than one table, extend the FROM clause to contain…
An INNER JOIN
Data can be received from ____ and ____ sources
(a) Internal
(b) External
If not publicly available, permission is needed for information
_____ is the best at data visualizations
Tableau
When accessing data, _____ requires that you connect to an external datasource
Tableau