1/24
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
data
observations, symbols, or representations that are recorded
information
data placed into a meaningful context
knowledge
the application of information to achieve a goal
databases
a collection of files that are organized as tables
tables
made up of records (rows)
rows
Made up of fields (columns)
fields
made up of characters
flat file database
a single table
relational database
two or more tables with relationships between the tables, tables are related to each other based on a common key field, joined tables look like a flat file database
pros of relational databases
reduces errors and necessary input
steps to using a database
understand the business needs
define entities
define properties/fields
define relationships
data quality definition
the degree to which the data is fit for its intended use (does the data make sense?)
data consistency
what if the data is in different formats?, inconsistent data makes it difficult to combine data into a single file for analysis, How do we decide which standard to use?
common data issues
redundant data across the organization, some data elements stored in different formats, different naming conventions, different unique identifiers
data agency problem
the data creator is usually not the data consumer, the entire organization/project incurs the bad cost of data, and it must be cleaned up at some point
productivity tax
it takes 10 times as much to complete a unit of work when the data is flawed in any way as it does when the data is good, the cost of “non-value-added’ work
dealing with problematic data (outliers) options
remove it
use average of other data points
guessing at the correct values
dealing with problematic data (inconsistent values) possible options:
fix for consistency so analysis can be done (state names, dates, currencies)
connecting diverse data sets: joining
combines tables for analysis, same thing as merging
connecting diverse data sets: appending
emphasizes enriching data by adding value-added information into existing records from external sources
metadata:
data about data, such as title, description, or a data type (not the data itself)
importance of metadata
helped us resolve inconsistencies because it can be an answer to why data is organized in a certain way (fields/descriptions)
data lake
where you put stuff that is not organized yet
type of file pivot tables use
flat file with all related data in the same columns, which allows you to aggregate data by column name
pivot table operations
select measures
select the dimensions (columns)
select granularity
select aggregation operation
additional (select filter conditions and create calculated fields)