1/84
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is SQL?
Structured query language
What is a "relational database"?
What enterprise systems run on top of
The tables can be connected to other tables based on common columns
One-to-one
a link between the information in two tables, where each record in each table only appears once.
One-to-many
one record in a table can be associated with one or more records in another table
Many-to-many
when one or more items in one table can have a relationship to one or more items in another table when one or more items in one table can have a relationship to one or more items in another table
What is data mining?
the process of sorting through large data sets to identify patterns and relationships that can help solve business problems through data analysis
What is unsupervised learning?
consist of descriptive data-mining methods in which there is no outcome variable to predict.
Market segmentation
is a process commonly used to divide customers into different homogenous groups
k-means clustering
iteratively assigns each observation to one of k clusters composed of similar observations.
Hierarchical clustering
starts with single-observation clusters and sequentially merges the most similar ones to create a set of nested clusters.
What is text mining?
the process of extracting information from text data
Document
a contiguous piece of text
Terms
words
Corpus
(the body of text) a collection of text documents to be analyzed
bag of words
treats every document as a collection of individual words or terms
presence/absence document-term matrix
aka binary document term matrix, rows represent documents and column represent words
Tokenization
is the process of dividing text into separate terms
term normalization
the process of identifying tokens
Stopwords
“the”, “and”, “of” are removed from the document and all letters are converted to lowercase
Stemming
the process of converting different forms of the same word and synonyms to its stem or root word
frequency term-document matrix
The rows represent the documents
The columns represent the tokens
The entries in the matrix are the frequency of occurrence of each token in each document
Word cloud
is a visual display that contains the key terms of a document
What is regression?
is a set of data mining methods having the task of predicting a quantitative outcome using a set of input variables
Features
task of predicting a quantitative outcome using a set of input variables
What is supervised learning?
Predictive data mining methods, use known values of the outcome variable to “supervise” the learning process of how to use the input features to predict future values of the outcome variable
Data sampling, data preparation, data partitioning, model construction and assessment
What is a neural network?
a predictive data mining method whose structure is motivated by the biological functioning of the brain
What is classification?
is a set of data mining methods having the task of predicting categorical outcome variable using a set of input variables, or features.
What is a confusion matrix?
a cross-tabulation of the actual class of each observation that displays a model’s correct and incorrect classifications.
Class 1 Error
False negative
Class 0 Error
False positive
What is a classification tree?
is a predictive model consisting of a sequence of rules on the input features.
What is analytics?
Data, Information, Knowledge, Action (DIKA)
Production
assembly/manufacturing
Procurement
ordering from a supplier/vendor
Fulfillment
delivering the product/having and fulfilling customer order
Descriptive Analytics
describing from the past (scorecard)
Predictive Analytics
forecast/weather - what will happen
Prescriptive Analytics
what should you do- plan of action, optimization model
Digitization
paper to computer/ digital format
Digitalization
continuously improving digital form
4 Vs of Big Data
volume, variety, velocity, veracity
Decision-making process
Identify and define the problem
Determine the criteria that will be used to evaluate alternative solutions.
Determine the set of alternative solutions.
Evaluate the alternatives.
Choose an alternative.
Data Flow Diagram
a process model used to depict the flow of data through a system and the work or processing performed by the system.
DFD Symbols
Data flow
1 piece of data
Composite data flow
multi data pieces together in one data flow
ex. receipt
Context Diagram (level 0 diagram)
Decomposition Diagram
High-level Diagram (level 1 diagram)
Assembling
Components - nuts, bolts, wheels-> skateboard
Manufacturing
Raw materials- plastic pellets -> plastic plate
Discrete
distinct items (countable). Usually identifiable (bill of materials)
Process Manufacturing
like a recipe, can not be easily disassembled
Instance Level Information
status of production for a particular order, required inventory status report, stock requirements list (at that moment) (status)
Process Level Information
average time to produce, how many on time/delayed, reasons for delay
Intra-
Inside/within (Human Capital Management, Asset Management, Human resources)
Inter
between, among companies (Supply Chain Management, Supplier relationship management)
Preattentive attributes
features that can be used in a data visualization to reduce the cognitive load required to interpret it
Include color, shape, size, length
Data-ink
the ink used in a table or chart that is necessary to convey the meaning of the data to the audience
Data-ink Ratio
the proportion of ink used for data to the total amount of ink in a table or chart (high ratio is good)
Table Design Principles
Keep the data-ink ratio high
Use lines only to separate labels from data and calculated fields
Labels should be left-aligned
Values should be right-aligned
Center vertical labels
What is a data dashboard?
a data visualization tool that illustrates multiple KPIs and automatically updates as new data becomes available
Objective of data wrangling
to produce a final dataset that is accurate, reliable, and accessible
Activities of data wrangling
Cleaning, managing, transforming
Right data, right place, right time, right form
Structured Data
refers to data arrayed in a predetermined pattern to make them easy to manage and search (flat file is most common pattern)
Unstructured Data
data not arranged in a predetermined pattern
Semi-structured Data
not organized as structured data but contain elements that allow for isolating some raw data elements
How do we assess risk?
Likelihood (probability)
Impact (severity)
Likelihood
probability
Impact
severity
The only document that contains weight is
packing list
In Tableau, tables are connected
by common fields
rows
records, observations
fields
columns
Query aggregation operators
In sql, a group by clause always includes
-Count, avg, min, max, or sum
Logical operators
more than one condition
and, or, not, between, exists, or in
Small and Medium Manufactures
make up 95%
CRM
Customer relationship management
PO
purchase order
SOM
serviceable obtainable market
TDP
total distribution points
SQL
Structured Query Language
DFD
data flow diagram
PPF
production possibilities frontier
DIKA
Data, Information, Knowledge, Action