1/96
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
____ ____ language is used to define the structure of a the database
data definition language
_____ _____ language is used to add, remove, retrieve, and change data in the database
data manipulation language
the DDL command _____, creates new tables and databases
CREATE
the DDL command _____, is used to change the structure of an existing database and tables
ALTER
the DDL command ______ is use to delete entire tables and databases
DROP
the DML command _____ is used to retrieve information from a database
SELECT
the DML command ______ is used to add new information to a database
INSERT
the DML command _____ is used to modify information in a database
UPDATE
the DML command _____ is used to delete records from a database
DELETE
a ______ value in SQL represents missing or unknown data
NULL
aggregating data summarizes ______ datasets
large
_______ determines the number of records
COUNT
_____ adds up the values in a field
SUM
_____ finds the smallest value in a field
MIN
_____ finds the largest value in a field
MAX
______ calc the avg value in a field
AVG/MEAN
GROUP BY allows you to break data out into ______
sub categories
date _______ allows you to extract elements from a date value
deconstruction
date ______ allows you to create date values from text elements
construction
________ values are calculated from stored values
derived
the _______ operator returns all of the unique rows from both sets into one table
UNION
the _______ operator only returns the value that is shared on both tables
INTERSECT
the ______ operator returns all of he unique rows appear in the FIRST result set but not the second
EXCEPT
Only returns matched records from the tables that are being joined
INNER JOIN
Only returns non-matching records from the tables that are being joined
OUTER JOIN
________ queries embed one query inside of another
Nested queries
database _____ help us efficiently find information about a database
indexes
in a ______ query
indexes exist for all query columns
covered
______ tables store data for short term use during a database session
temporary table
what key word in a SQL query statement creates a temporary table
INTO
query _______ plan describes how the data will execute the query
query execution plan
what is he purpose of a query execution plan
to optimize query performance
in a _______ query, the SQL template is precompiled on the database server
parameterized query
a ______ procedure query template is stored on the database server and the application passes arguments
this type of query is maintained by the database admin
stored
a parameterized query is a template that is stored in the ______ code and the application passes the template and arguments to the database
the app developer maintains the parameterized query
application
_____ is data that is missing completely at random
MCAR
_____ is data that is missing at random
MAR
_____ is data that is missing not at random
MNAR
data ______ are data point that are distant from the norm
outliers
_______ group datapoints into bins representing ranges of a numerical value
histograms
______ display summary statistics about a dataset
boxplots
________ _______ removes the entire record from the dataset
remove row
______ ______ replaces the specific outlier values with NA
remove value
______ ______ replaces the outlier values with the correct values obtained from other sources
correct values
_________ replaces missing values with the mean, median, or mode
imputation
________ is a set a maximum and minimum values the dataset and replace outliers with those values
capping
______ replaces the outlier values with values predicted by a model
prediction
when ______ datasets it add the new data as a new row
appending
when ____ datasets it adds the new data as new columns
merging
_______ changes data to a different unit of measurment
conversion
______ adjusts the magnitude of data for comparison
scaling
______ adjusts the data so that it has a mean of 0 and a standard deviation of 1
standardization
________ ______ allow you to search for patterns in text
regular expressions (regex)
the metacharacter ______ in regex matches any single character
.
the metacharacter ______ in regex define a range of characters
[ ]
the metacharacter ______ in regex means one or more of the previous element
+
the metacharacter ______ in regex means zero or more of the previous element
*
the metacharacter ______ in regex represents a word boundary
\b
______ divides a continuous variable into discrete intervals
binning
______ groups data based upon similarity
clustering
data _______ enhances a dataset with additional features
data augmentation
data augmentation
_______ fields are created by calculations/logic on existing data
derived fields
data augmentation
_____ create summary data
aggregations
_________ data break down complex fields for analysis
exploding data
when presenting data to stakeholders make sure you ______ your audience
know
when presenting your data ensure to handle _____ and ____ data appropriately
sensitive and non-sensitive
________ applies mathematical techniques to data to help us analyze, interpret and present that data
statistics
______ statistics describe our data by summarizing it and providing us with visible features
descriptive
_______ statistics draw conclusions from our data by making generalizations and predictions
inferential
_______ statistics make forecasts about the future based upon our data
predictive
______ statistics recommend actions to optimize future results
prescriptive
descriptive statistics are also known as _______ statistics
summary
the ___________ _______ ______ ______ summarize data using a single value that identifies the center of the dataset
measure of central tendency
what are the 3 types of measures of central tendency?
mean, median, mode
What is the mean?
the avg
What is the median?
middle value
what is the mode?
the value that
the measures of ______ describe how spread out the dataset is around the center
dispersion
_____ ______ data does not fit any statistical distribution
non-parametric
_______ the average of the squared differences of each value in the dataset from the mean
variance
how to calculate standard deviation
Square root of variance
what can you enable to assist in troubleshooting
logs
what are 3 techniques to validate that the data source was complete
header row for proper structure
footer row to count total number of added data
checksums to check that the data wasnt altered
what are the 2 most common ways to communicate data to stakeholders
reports and dashboards
_____ are paper or electronic documents that contain data from a given point in time
reports
_____ are dynamic presentation of data that allow users to explore the data interactively
dahsboards
what are the two main way users can access a report
pull or push report
a _____ report makes reports available to users on a self service/ on demand basis
pull report
a _____ report is sent to users on a scheduled basis
push report
a _________ _______ creates a first impression of your work
cover page
the _______ ______ provides an overview of its contents and distills key insights from the report
executive summary
well designed reports should convey information ____ and _____
clearly and efficiently
when designing your reports use a _______ report layout to communicate your message
thoughtful
the report ___ ____ is the date that the report was generated
report run date
the data ____ ____ is the date when the data was last updated
data refresh date
_______ _______ identify the template used to create a report or dashboard
version numbers
you should include _____ for all of the data that you use in your report
sources