visual analytics exam 3

0.0(0)
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/84

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

85 Terms

1
New cards

What is SQL?

Structured query language

2
New cards

What is a "relational database"?

What enterprise systems run on top of

  • The tables can be connected to other tables based on common columns

3
New cards

One-to-one

a link between the information in two tables, where each record in each table only appears once.

4
New cards

One-to-many

 one record in a table can be associated with one or more records in another table

5
New cards

Many-to-many

when one or more items in one table can have a relationship to one or more items in another table when one or more items in one table can have a relationship to one or more items in another table

6
New cards

What is data mining?

the process of sorting through large data sets to identify patterns and relationships that can help solve business problems through data analysis

7
New cards

What is unsupervised learning?

consist of descriptive data-mining methods in which there is no outcome variable to predict.

8
New cards

Market segmentation

is a process commonly used to divide customers into different homogenous groups

9
New cards

k-means clustering

iteratively assigns each observation to one of k clusters composed of similar observations.

10
New cards

Hierarchical clustering

starts with single-observation clusters and sequentially merges the most similar ones to create a set of nested clusters.

11
New cards

What is text mining?

the process of extracting information from text data

12
New cards

Document

 a contiguous piece of text

13
New cards

Terms

words

14
New cards

Corpus

(the body of text) a collection of text documents to be analyzed

15
New cards

bag of words

treats every document as a collection of individual words or terms

16
New cards

presence/absence document-term matrix

 aka binary document term matrix, rows represent documents and column represent words

17
New cards

Tokenization

is the process of dividing text into separate terms

18
New cards

term normalization

 the process of identifying tokens

19
New cards

Stopwords

 “the”, “and”, “of” are removed from the document and all letters are converted to lowercase

20
New cards

Stemming

 the process of converting different forms of the same word and synonyms to its stem or root word

21
New cards

frequency term-document matrix

  • The rows represent the documents 

  • The columns represent the tokens

  • The entries in the matrix are the frequency of occurrence of each token in each document

22
New cards

Word cloud

  • is a visual display that contains the key terms of a document

23
New cards

What is regression?

  • is a set of data mining methods having the task of predicting a quantitative outcome using a set of input variables

24
New cards

Features

  •  task of predicting a quantitative outcome using a set of input variables

25
New cards

What is supervised learning?

Predictive data mining methods, use known values of the outcome variable to “supervise” the learning process of how to use the input features to predict future values of the outcome variable

  • Data sampling, data preparation, data partitioning, model construction and assessment

26
New cards

What is a neural network?

  •  a predictive data mining method whose structure is motivated by the biological functioning of the brain

27
New cards

What is classification?

  • is a set of data mining methods having the task of predicting categorical outcome variable using a set of input variables, or features.

28
New cards

What is a confusion matrix?

  • a cross-tabulation of the actual class of each observation that displays a model’s correct and incorrect classifications. 

29
New cards

Class 1 Error

False negative

30
New cards

Class 0 Error

False positive

31
New cards

What is a classification tree?

  •  is a predictive model consisting of a sequence of rules on the input features.

32
New cards

What is analytics?

  • Data, Information, Knowledge, Action (DIKA)

33
New cards

Production

 assembly/manufacturing

34
New cards

Procurement

ordering from a supplier/vendor

35
New cards

Fulfillment

delivering the product/having and fulfilling customer order

36
New cards

Descriptive Analytics

describing from the past (scorecard)

37
New cards

Predictive Analytics

forecast/weather - what will happen

38
New cards

Prescriptive Analytics

  • what should you do- plan of action, optimization model

39
New cards

Digitization

paper to computer/ digital format

40
New cards

Digitalization

continuously improving digital form

41
New cards

4 Vs of Big Data

  • volume, variety, velocity, veracity

42
New cards

Decision-making process

  1. Identify and define the problem

  2. Determine the criteria that will be used to evaluate alternative solutions.

  3. Determine the set of alternative solutions.

  4. Evaluate the alternatives.

  5. Choose an alternative.

43
New cards

Data Flow Diagram

 a process model used to depict the flow of data through a system and the work or processing performed by the system.

44
New cards

DFD Symbols

45
New cards

Data flow

1 piece of data

46
New cards

Composite data flow

multi data pieces together in one data flow

ex. receipt

47
New cards

Context Diagram (level 0 diagram)

48
New cards

Decomposition Diagram

49
New cards

High-level Diagram (level 1 diagram)

50
New cards

Assembling

Components - nuts, bolts, wheels-> skateboard

51
New cards

Manufacturing

Raw materials- plastic pellets -> plastic plate

52
New cards

Discrete

distinct items (countable). Usually identifiable (bill of materials)

53
New cards

Process Manufacturing

like a recipe, can not be easily disassembled

54
New cards

Instance Level Information

status of production for a particular order, required inventory status report, stock requirements list (at that moment) (status)

55
New cards

 Process Level Information

average time to produce, how many on time/delayed, reasons for delay 

56
New cards

Intra-

Inside/within (Human Capital Management, Asset Management, Human resources)

57
New cards

Inter

 between, among companies (Supply Chain Management, Supplier relationship management)

58
New cards

Preattentive attributes

features that can be used in a data visualization to reduce the cognitive load required to interpret it

  • Include color, shape, size, length

59
New cards

Data-ink

 the ink used in a table or chart that is necessary to convey the meaning of the data to the audience

60
New cards

Data-ink Ratio

the proportion of ink used for data to the total amount of ink in a table or chart (high ratio is good)

61
New cards

Table Design Principles

  • Keep the data-ink ratio high

  • Use lines only to separate labels from data and calculated fields

  • Labels should be left-aligned

  • Values should be right-aligned

  • Center vertical labels

62
New cards

What is a data dashboard?

  • a data visualization tool that illustrates multiple KPIs and automatically updates as new data becomes available

63
New cards

Objective of data wrangling

  •  to produce a final dataset that is accurate, reliable, and accessible

64
New cards

Activities of data wrangling

  • Cleaning, managing, transforming

  • Right data, right place, right time, right form

65
New cards

Structured Data

  • refers to data arrayed in a predetermined pattern to make them easy to manage and search (flat file is most common pattern)

66
New cards

Unstructured Data

data not arranged in a predetermined pattern

67
New cards

Semi-structured Data

  • not organized as structured data but contain elements that allow for isolating some raw data elements

68
New cards

How do we assess risk?

Likelihood (probability)

Impact (severity)

69
New cards

Likelihood

probability

70
New cards

Impact

severity

71
New cards

The only document that contains weight is

packing list

72
New cards

In Tableau, tables are connected

by common fields

73
New cards

rows

records, observations

74
New cards

fields

columns

75
New cards

Query aggregation operators

In sql, a group by clause always includes 

-Count, avg, min, max, or sum

76
New cards

Logical operators

more than one condition

  • and, or, not, between, exists, or in 

77
New cards

Small and Medium Manufactures

make up 95%

78
New cards

CRM

Customer relationship management

79
New cards

PO

purchase order

80
New cards

SOM

serviceable obtainable market

81
New cards

TDP

total distribution points

82
New cards

SQL

Structured Query Language

83
New cards

DFD

data flow diagram

84
New cards

PPF

production possibilities frontier

85
New cards

DIKA

Data, Information, Knowledge, Action