BIA 484 Exam 1

studied byStudied by 0 people
0.0(0)
Get a hint
Hint

Business Intelligence (BI)

1 / 323

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

324 Terms

1

Business Intelligence (BI)

An umbrella term that includes the application, infratstructure, tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance

New cards
2

What is Machine Learning

The process of solving practical problems by 1. gathering a dataset and 2. algorithmically building a statistical model based on the dataset

New cards
3

What is Machine Learning pt 2

The use and development of computer systems that are able to learn and adapt without following explicit instructions, by using algorithms and statistical models to analyze and draw infrences from patterns of data

New cards
4

What is Machine Learning pt 3

A branch of Artificla intellignece and computer sicence which focuses on the use of data and algorithms to imitaate the way humans learn, gradually improving its accuraacy

New cards
5

Why use Machine Learning Now?

Big data- unprecedented amount of data
Competitive Landscape: rapidly changing and competitive market
Storage: AAvailability of processing and storage space

New cards
6

Data Mining

process of discovering interesting and meaningful patterns in data
- data driven process not user driven process

New cards
7

How is Predictive Analytics different

- BI is more user driven, PA is a subset
- Statistics - more model (theory) than data, taught not to data mine
- Data mining: more connotation about privacy

New cards
8

Machine Learning Definition

Emphasis on machine reecognizing patterns
- less exphasis on reporting and visualization

New cards
9

Statisticians:

Rigor
- make sure everything follows every assumption

New cards
10

Machine Learning:

Performance
- Get the best outcome variables regardless of the context

New cards
11

Data Analyst:

Storytelling
- Explain something that makes sense

New cards
12

Two Types of Data Mining

Descriptive
Predictive
* Both use Present data

New cards
13

Descriptive Data Mining

Describes the data
- who were the best customers
- what did customers buy together?

New cards
14

Predictive Data Mining

Makes predictions based on past data
- who will be the best customers
- what will customers buy together

New cards
15

Why do we need Machine Learning

Often state the obvious
- what is the best way to predict
- how well does it predict
Data explosion
- large increase of available data

New cards
16

Big Data- The 3 V's

Volume
Variety
Velocity
Outcome: Value- what can you do with the data, how does it add value

New cards
17

Volume

Can you find the information you are looking for

New cards
18

Variety

Is a picture worth a thousand words in 70 languages?
Is your information balanced

New cards
19

Velocity

Information gains momentum and crises and opportunities evolve in real time. How is the outlook for today?

New cards
20

Who does what

End User: Decision Making
Business Analyst: Data Presentation/ visualization
Data Analyst: Data mining (information discovery), Data Exploration (statistical analysis, querying, reporting)
Database Administrator: Data Warehouses/Data Marts (OLAP),
Data sources (paper, files, data, OLTP)

New cards
21

Needed skills of a Data Analyst

- Statistical awareness
- Expertise with analytical tools
- Domain Knowledge
* know what results are useful
* know how to apply results

New cards
22

The Knowledge Discovery in Databases Process (KDD)

Data --> Selection
Target Data --> Processing
Processed Data --> Transformation
Transformed Data --> Data Mining
Patterns --> Interpretation and Evaluation
Knowledge

New cards
23

Potential Application Areas

Marketing
- Database marketing
- Tager marketing
- Customer relationship Management (CRM)

Finance
- Credit Scoring
- Fraud Detection

Managment
- Health inofrmatics

New cards
24

Rexer Analytics Survey Results

survey of 1200 data scientists in 72 countries

from multiple secotrs

three key words: Data, Scientist, Computer

1/3 of respondents have seen difficulties when do it yourself tools or services are used

New cards
25

What do Data Scientists do?

improving understanding for customers
Retraining customers
Improving customer experiences
selling products/services to existing customers
market research/ survey analysis
acquiring customers
improving direct marketing programs
sales forecasting
fraud deetection or prevention
risk management / credit scoring

New cards
26

How do Data Scientists do it?

Regression, Decision Trees, and Cluster Analysis remain the most comonly used algorithms

New cards
27

Biggest Difficulty for Data Scientists

Deployment

New cards
28

Challenges to Analytics

Deployment
- must be actionable, integrated within the company

Management-Communication
- Have to have trust in the model

Data
- Data must be accessible
- must be accurate

Modeling
- Risk of overfitting the model

New cards
29

What is SAS

A suite of business solutions and technologies to help organizations solve problems

Analytical software system used by many businesses worldwide
- regulation and reesouces

The base system for all other SAS products
- JMP
- Enterprise Miner

New cards
30

Why use SAS

- Access and manage data across multiple sources
- perform analyses and deliver information across your organization
- Access, Manage, Analyze, Present

New cards
31

Why not use Excel or SPSS

- Can use huge datasets
- saves a program of your steps

New cards
32

SAS, R, or Python

34% use SAS overall

Dataa Scientists: Those working primarily with unstructured or streaming data
Predictive Analytics Pros: Primarily work with structured data

New cards
33

SAS Programs

A SAS progrma is a sequence of one or more steps
- Data stpes typically create SAS data sets
- Proc steps typically process SAS data sets to generate reports and graphs and to manage data

New cards
34

SAS Program Steps

A step is a sequence of SAS statements.

New cards
35

Step Boundaries

SAS steps begin with one of the following
- A data statement
- A proc statement

SAS detects the end of a statement when it encounters one of the following:
- A run statement (for most steps)
- a Quit statement (for some procedures)
- The beginning of another step (Data statement or Proc statement)

New cards
36

Data Steps

Used for reading, manipulating, and processing data
- any changes to the data file are made here


SAS datasets have the extension .sas7dat
- with a datastep you can read in other formats e.g. Excel

New cards
37

Proc Steps

Use for analyzing the data
- Any interpretation of the data file is done here
- proc contents (shows contents of SAS file, metadata)
- proc sort (sorts sas datasets)
- proc print (prints sas datasets)
- proc import (imports data from excel)
- proc export (exports data to excel)

New cards
38

SAS interface

Three Primary tabs or windows

Editor: can enter, edit, submit, and save a SAS program
Log: Browse notes, warnings, and errors relating to a submitted SAS program
Results: browse output from reporting procedures

New cards
39

SAS Syntax Rules: Statements

Usually begin with an identifying keyword
always end with a semicolon

New cards
40

Recommended Formatting

Begin each statement on a new line
use white space to separate words and steps
include statements within a step
indent continued lines in multiline statements

New cards
41

Add comments to the code

Can be used for:
documenting
taking notes
trial and error

New cards
42

Syntax Errors

A syntax error is an error in the spelling or grammar of a sas statement. Sas finds syntax errors as it compiles each SAS statement before execution begins

examples of syntax errors:
misspelled keywords
unmatched quotation marks
missing semicolons
invalid options

New cards
43

Syntax Errors pt 2

When SAS encouters a syntax problem, it writes a warning or error message to the log

you should always check the log to make sure that the program ran successfully even if output is generated

New cards
44

Data Mining Methodologies

CRISP-DM
SEMMA

viewed as implemmentations of KDD

New cards
45

CRISP-DM

Croos Industry Standard Practice Model for data mining
- developed in 1966 by DaimlerChrysler, SPSS, NCR (POS services)
- De facto industry standard
- 6 steps
- cyclical process

New cards
46

CRISP-DM 6 steps

Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment

New cards
47

Defining Business Objectives

- What core business objectives should be addressed
- How can they be quantified
- What data is available
- What methods can be used for this
- How can the model be assessed
- How can the models be deployed

New cards
48

Defining the target variable

- The variable to be estimated or predicted based on the business objectives
- Cant include the future when building a model
- must have data for all predictors
* may needd 30-60 days in the past

New cards
49

Defining measures of Success

Classification
- Percent correct classification (PCC)
- how many errors are made
- Confusion Matrix
- how errors are made

Subset of the Population
- Lift
- ROC
- Area under the curve

Estimation
- R^2
- Average Error
- Mean Squared Error

Business Measures
- ROI
- Parsimony
- Explainability

New cards
50

Data Understanding

Examine the data
- initial collection of the data
- describe and explore the data

Identify Problems
- verify data quality

New cards
51

Defining data

Data must be 2D
- each row is a unit of analysis, a record
- Each column is a variable
Data is rectangular
- each record has the same number of columns
Normalize

New cards
52

Defining the Unit of Analysis

Including a column for each visit is complicated
- Not every customer has the same number of visits

should the unit of analysis be
- Each visit to a store
- A customer
- A household

All variables must be on the same level

New cards
53

Data Preparation

Fix problems in the data
- Data Cleaning
- Data Transformation

Create Derived Variables
- Data formatting

New cards
54

Modeling

Build predictive or descriptive models
- Regression
- Logistic Regression
- Decision Trees
- Analytical Neural Networks
- Cluster Analysis

New cards
55

Evaluation

Assess Models
- Do they meet the business needs (not just the stats)
- might result in identification of other needs
Report the expected effects of the models

New cards
56

Deployment

Plan for use of models
- Apply to business operations
- Monitor for changes in the operating conditions
- Documentation

New cards
57

Modeling out of Order

Caution
- Can misguide the Analysis
- Can rule out variables that are useful

Building Models First
- Get a sense for the variables
- Get a baseline for how the model might work
- Determine if the model predicts too well
* confounding variable

Early Deployment
- Usually know main predictors quickly
- Determine obstacles in real world

New cards
58

Modeling Process

1. Business Understanding
2. Data Understanding
3. Data Prepatation
4. Modeling
5. Evaluation
6. Deployment

New cards
59

SAS Modeling Method

Developed by SAS
- Logical organization of the functional toolset of SAS Enterprise Miner
- For carrying out corre tasks of data mining
- Does not emphasize business underrstanding

Sample, Explore, Modify, Model, Assess

New cards
60

Sample

Input data
Partitian Data
Create Multiple Data Sets
- Training: used for model fitting
- Validation: To assess the model
- Test: Determine how well model generalizes

New cards
61

Explore

Gain Understanding of the data
Look for patterns that exist
- Factor analysis
- clustering

New cards
62

Modify

Create, select, and transform variables
- Group customers
- Alter dates

look for outliers
reduce the amount of variables

New cards
63

Model

Use tools to predict target variable
- Neural networks
- Decision Trees
- Regression

New cards
64

Assess

- Determine how useful
- Determine how reeliable
- Use the test data

New cards
65

Modern Workflow

Assess and View
Interact
Analyze and Discover
Share
Promote and Govern

New cards
66

What is a SAS Data Set

A SAS data set is a specifically structured data file that SAS creates and that only SAS can read. A SAS data set is a table that contians observations and variables

New cards
67

File Formats

- .sas7bdat
- SAS can also import other file formats
- .xlsx- .csv

New cards
68

SAS Data Set Terminiology

SAS Data Set --> Table
Observation --> Row
Variable --> Column

New cards
69

Browsing the Descriptor Portion of SAS

Use Proc Contents to display the descriptor portion of a SAS data set

Proc contents data = libname.dataset;
run;

New cards
70

Descriptor Portion

The descriptor portion contains the following metadata
- general properties (such as data set name and number of observations)
- Variable properties (such as name, type, and length)

New cards
71

Proc Contents

- Add a proc contents step to display the metadata

New cards
72

Data Portion

The data portion of a SAS data set contains the data values, which are either character or numeric

New cards
73

Browsing the Data Portion

Using Proc print to display the data portion of a SAS data set

proc print data = libname.dataset;
run;

New cards
74

SAS Variable Names

- Can be 1-32 characters long
- Must start with a letter or underscore. Subsequent characters must be letters or numbers or mixed case
- can be uppercase, lowercase, or mixed case
- are not case sensitive

New cards
75

Invalid Variable Names

5monthsdata
data#5
five months data

New cards
76

Missing Data Values

Missing values are valid values in a SAS data set
- a blank represents a missing character value
- a period represents a missing numeric value
A value must exist for every variable in every observation

New cards
77

Wrtie a Program to Display the current Date

date date;
currentdate = today();
run;

proc print data=worrk.date;
run;

New cards
78

Any Date

data _NULL_;
date = input ('DDMMMYY'd, best12.);
put date;
run;

New cards
79

SAS Libraries

SAS datasets are stored in SAS libraries. A SAS library is a collection of SAS files that are refrenced and stored as a unit. Files can be stored in a temporary or permanent library

New cards
80

How SAS Libraries are Defined

When a Sas session starts, SAS crreates on temporary and one permenanet SAS library. These libraries are open and readdy to be used. You rrefer to a sas library by a logical name called a library refrerence name, or libref

New cards
81

Temprory Library

work is a temporary library where you can store and access SAS data sets for the duration of the SAS session. It is the default library

Sas deletes the work library and its contents when the session terminates

New cards
82

Permanent Libraries

SAShelp is a permanent library that contains sample SAS data sets you can access during your SAS session

New cards
83

Acessing SAS Data Sets

All sas data sets have a two level name that consists of the libref and the data set name, separated by a period
libref.datasetname
When a data set is in the temporary work library, you can use a one level name

New cards
84

User Defined Libraries

A user defined library
- is created by the user
- is permanent. Data sets are stored until the user deletes then
- is not automatically available in a SAS session
- is implemeneted within the opeating environments file system

New cards
85

Libname Statement

The SAS libname statement is a global statement
libname libref "sas-library" <options>;
- it is not required to be in a data step or a proc step
- it does not require a run statement
- it executes immediately
- it remains in effect until changed or canelled, or until the session ends

New cards
86

Browsing a Library Programmatically

using proc contents with the _ALL_ keyword to generate a list of all SAS files in a library
PROC CONTENTS DATA=libref._ALL_ NODS;
RUN;
- _ALL_ reequests all of the files in the library
- The nods option surpresses the individual data set descriptor information
- NODS can be used only with the keyword _ALL_

New cards
87

Importing an Excel Data File

- Multiple excel data file extensions
- specify the DBMS you want to use
* XLS
* XLSX
* EXCEL (reads all types of excel files)
- looks at less data when determining import strategy

proc import datafile =" " dbms = out = ;
run;

New cards
88

Print Procedure

By default, proc print displays all observations, all variables, and an observation column on the left side

statements and options camn be added to the print procedure to modify the deefault behavior

New cards
89

Variable Statements

The VAR statement selects variables to include in the report and specifies their order

proc print data=libname.dataset;
run;

New cards
90

Proc print with obs=statement

- select a limited set of variables to display
- obs is not a variable

proc print data = libname.dataset (obs=#);
run:

New cards
91

Sum Statement

The sum statement calculates and dsplays report totals for the requested numeric values

sum variables;

New cards
92

Where Statement

The where statement selects observations that meet the criteria specified in the where expression

proc print data=libname.dataset;
var ;
where where-expression;
run;

New cards
93

Where Statement pt 2

The where expression deefines the condition (or conditions) for selecting observations

operands:
- character constants
- numeric constants
- date constants
- character variables
- numeric variables

operators:
- symbols that represent a comparision, calculation, or logical operation
- +, (), >, <, -, *
- SAS functions
- special where operators

New cards
94

Surpressing the obs column

Use the NOOBS option in the proc print statement to surpress the observation column

proc print data = libname.dataset NOOBS;

New cards
95

Operands

Constants are fixed
- characters are enclosed in quotation marks and are case sensitive
- numeric values do not use quotation marks or special characters
Variables must exist in the input data set

ex:
where sex = 'M';
where salary > 50000;

New cards
96

SAS Date Constant

A SAS date constant is a date written in the following form: 'ddmmm<yy>yy'd

SAS automatically converts a date constant to a SAS date value

New cards
97

Comparison Operators

Comparison operators compare a variable with a value or with another variable

= Equal to (EQ)
^= ¬= ~= Not Equal to (NE)
> Greater Than (GT)
< Less Than (LT)
>= Greater than or equal to (GE)
<= less than or equal to (LE)
(' '), (' ' , ' ' ) equal to one of a list (IN)

New cards
98

Logical Operators

Logical operators combine or modify where expressions

WHERE WHERE-expression-1 AND | OR WHERE-expression-n;

New cards
99

Logical Operator Priority

The operators can be written as symbols or mnemonics, and parentheses can be added to modify the order of evaluation

^ ¬ ~ (NOT) priority: 1
& (AND) priority: 2
| (OR) priority: 3

The NOT operator modifies a condition by finding the complement of the specified criteria

New cards
100

Special Where Operators

Special WHERE operators are operators that can be used only in WHERE expressions.

Contains: Includes a substring, can be used only with characters
Between-And: An inclusive range, can be used with characters and numbers
Is Null: A missing value, can be used with charaacters and numbers
Like: Matches a pattern, can be used with characters only

New cards

Explore top notes

note Note
studied byStudied by 19 people
... ago
5.0(1)
note Note
studied byStudied by 19 people
... ago
5.0(1)
note Note
studied byStudied by 14 people
... ago
5.0(1)
note Note
studied byStudied by 112 people
... ago
5.0(2)
note Note
studied byStudied by 20 people
... ago
5.0(1)
note Note
studied byStudied by 2 people
... ago
5.0(1)
note Note
studied byStudied by 20 people
... ago
5.0(1)
note Note
studied byStudied by 46 people
... ago
5.0(2)

Explore top flashcards

flashcards Flashcard (24)
studied byStudied by 5 people
... ago
5.0(1)
flashcards Flashcard (161)
studied byStudied by 7 people
... ago
5.0(1)
flashcards Flashcard (42)
studied byStudied by 9 people
... ago
5.0(1)
flashcards Flashcard (144)
studied byStudied by 3 people
... ago
5.0(1)
flashcards Flashcard (24)
studied byStudied by 7 people
... ago
5.0(1)
flashcards Flashcard (67)
studied byStudied by 2 people
... ago
5.0(1)
flashcards Flashcard (47)
studied byStudied by 5 people
... ago
5.0(1)
robot