1/87
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
what is a database
a related group of files that efficiently and centrally coordinates information
a file is
a related group of records
a record is
a related group of fields
a field is
a specific attribute of interest for the entity
advantages of database
data is integrated
data sharing
minimize data redundancy and inconsistencies
data is independent of the programs that use the data
data is easily accessed for reporting and cross-functional analysis
relational database
represents the conceptual and external schema as if that “data view” were truly stored in one table
conceptual view appears to the user that this information is in one big table, it really is a set of tables that relate to one another
update anomaly
if a database isn’t normalized, updating one piece of information, will have to be updated every time it exists in the database
it might be missed
insert anomaly
when we can’t put in a new record, because something doesn’t exist yet
delete anomaly
if we start to delete some information it might not update everywhere
relational database design rules
every cell of a column in a row must be single valued
primarky key cannot be empty also known as entity integrity
if a foreign key is not null, it must have a value that corresponds to the value of a primary key in another table (referential integrity)
all other attributes in the table must describe characteristics of the object identified by the primary key
data presentation
visualized data is processed faster than written or tabular information
easier to use; users need less guidance to find information with visualized data
supports the dominant learning style of the population because most learners are visual learners
visualization: comparison
used for comparing data across categories or groups; require numeric and categorical data values
visualization: correlation
used to compare how two numeric variables fluctuate with each
visualization: distribution
used to show spread in numeric values
visualization: trend evaluation
used to show changes over an ordered variable, usually time
visualization: part to whole
used to show which items make up parts of a total
high-quality visualization: simplification
refers to making a visualization easy to interpret and understand
distance: how far apart related data is presented
orientation: change the direction of the entire chart
visualization: emphasis
assuring the most important message is easily identifiable
highlighting: using color, contrasts, labels, arrows, fonts, etc. to draw attention to an item
weighting: amount of attention an element attracts
ordering: intentional arranging to produce emphasis
high-quality visualization: ethical presentation
refers to avoiding the intentional or unintentional use of deceptive practices that can alter the user’s understanding of the data being presented
data deception: graphical depiction of information, designed with or without intent to deceive, that may create a message that varies from the actual message
alternative hypothesis
statement of inequality, suggesting that one concept, idea, or group is related to another concept, idea, or group
categorical data
limited number of assigned values to represent different groups, while numeric data are continuous or near continuous
classification analyses
techniques that identify key characteristics of groups or populations and use them to classify new observations into one of those groups
confirmatory data analysis
testing a hypothesis and providing statistical measures of the likelihood that the evidence refutes or supports a hypothesis
data ordering
refers to the intentional arrangement of visualization items to produce emphasis
data overfitting
the model fits training data very well but does not predict well when applied to another datasets
effect size
quantitative measure of the magnitude of the effect, provides insight into the importance of the relationship
exploratory data analysis
an approach that studies data without testing formal models or hypotheses
extrapolation beyond the range
the process of estimating a value that is beyond the data used to create the model
machine learning
an application of AI that allows computer systems to improve and update prediction by using algorithms and statistical models to analyze and draw inferences from patterns in data
outlier
data point that lie far from other values in the data
simplification
making a visualization easy to interpret and understand
test dataset
used to create the model for future prediction
analytics mindset
asked the questions and formed some predictions
also performed ETL
descriptive analytics
what happened
understanding how the data has behaved
performance ratios: profitability, turnover
exploratory analysis includes
finding any mistakes
understand the structure of the data
check assumptions for higher level analytics
determine existing relationships in the data
four categories of data analysis
descriptive - what happened
diagnositc -why did this happen
predictive - what is likely to happen in the future
prescriptive - what should be done
what can go wrong with data analytics?
poor data leading to inappropriate conclusions
overfitting the data
extrapolating beyond the range
failing to appreciate the level of variation
automation
refers to the use of machines to perform tasks previously carried out by humans
bot
RPA software that users create autonomous computer programs
dark data
data that the organization has collected and stored but is not analyzed and is therefore generally ignored
data lake
collection of structured, semi-structured, and unstructred data stored in a single location
data mart
it is often more efficient to process data in smaller data repositories holding structured
data storytelling
process of translation complex data analyses into simpler terms to aid in better decision making
data swamps
data repositories that are not accurately documented and thus their stored data cannot be identifiable and analyzed
data variety
different forms data can take
data velocity
pace at which data is created and stored
data veracity
quality of trustworthiness of data
delimiter
a character, or series of characters, that separates one field from another
ETL process
process of extracting, transforming, and loading data
flat file
text file that consolidates data from multiple tables or sources into a single row
metadata
data that describes other data, such as the number of characters allowed in different fields, the type of characters allowed, and the format of data in a particular field
robotic process automation
allow users to create autonomous computer programs, called bots, to perform specific tasks across different applications
structured data
data that are highly organized and fit into fixed fields
text qualifier
two characters that indicate the start and end of a field and tell the program to ignore any delimiters contained between the qualifiers
unstructred data
have no uniform structure and include items such as images, audio files, documents, tweets, e-mails, videos, and presentations
analytics mindset
the ability to visualize, articulate, conceptualize, or solve both complex and simple problems by making decisions that are sensible given the available information and ability to idenfity trends through analysis of the data/information
ask the right questions
extract, transform and load relevant data
apply appropriate data analytics techniques
interpret and share the results with stakeholders
data analytics is _____
forward looking
what will happen next that can be improved
extracting data - 3 step process
understand the data needs and available data
perform the data extraction
verify the data extraction and document what you’ve done
data dictionary includes
“meta dat” which is data about data
examine the data dictionary before you start to process your data
make sure you understand what is in the data
ETL process
most people say this takes up to 80% of the time when performing data analysis
extracting, transforming, loading
data structuring
process of changing the organization and relationship among data fields; rearranging for analysis
aggregate
joining
pivoting
aggregate
summary with fewer details than original
joining
bringing data from different tables together
pivoting
rotating data from rows to columns
data standardization
standardizing structure and meaning of each data element so it can be analyzed and used
ensure data is consistent syntax throughout
parsing
concatenation
cryptic data
misfielded data
parsing
separating from single field to multiple
concatenation
combining data from multiple fields into one
cryptic data
data items with no apparent meaning without coding scheme
dummy variables
contains only 2 responses, usually 0 or 1
misfielded data
correctly formatted but in wrong field
data cleaning
process of updating data to be consistent, accurate and complete
data de-duplication
process of analyzing data and removing two or more records that contain identical information
data filtering
process of removing records or fields of information from a data source
data imputation
process of replacing a null or missing value with a substituted value
only works with numeric data
data contradiction errors
errors that exist when the same entity is described in two conflicting ways
need to be investigated and resolved appropriately
data threshold violations
data errors that occur when a data value falls outside an allowable level
violated attribute dependencies
errors that occur when a secondary attribute in a row of data does not match the primary attribute
data entry errors
are all types of errors that come from inputting data incorrectly
often occur in human data entry and can also be introduced by the computer system
data validation
process of analyzing data to make certain the data has the properties of high-quality data
it should happen throughout data transformation
visual inspection
the process of examining data using human vision to see if there are problems
basic statistical tests
can be performed to validate the data
audit a sample
one of the best techniques for assuring data quality
advanced testing techniques
possible with a deeper understanding of the content of data
data consistency
every value in a field should be stored in the same way
regular expression (regex)
sequence of characters that specify a search pattern