BIA 484 Exam 1

0.0(0)
studied byStudied by 1 person
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/323

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 4:56 PM on 2/7/24
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

324 Terms

1
New cards

Business Intelligence (BI)

An umbrella term that includes the application, infratstructure, tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance

2
New cards

What is Machine Learning

The process of solving practical problems by 1. gathering a dataset and 2. algorithmically building a statistical model based on the dataset

3
New cards

What is Machine Learning pt 2

The use and development of computer systems that are able to learn and adapt without following explicit instructions, by using algorithms and statistical models to analyze and draw infrences from patterns of data

4
New cards

What is Machine Learning pt 3

A branch of Artificla intellignece and computer sicence which focuses on the use of data and algorithms to imitaate the way humans learn, gradually improving its accuraacy

5
New cards

Why use Machine Learning Now?

Big data- unprecedented amount of data
Competitive Landscape: rapidly changing and competitive market
Storage: AAvailability of processing and storage space

6
New cards

Data Mining

process of discovering interesting and meaningful patterns in data
- data driven process not user driven process

7
New cards

How is Predictive Analytics different

- BI is more user driven, PA is a subset
- Statistics - more model (theory) than data, taught not to data mine
- Data mining: more connotation about privacy

8
New cards

Machine Learning Definition

Emphasis on machine reecognizing patterns
- less exphasis on reporting and visualization

9
New cards

Statisticians:

Rigor
- make sure everything follows every assumption

10
New cards

Machine Learning:

Performance
- Get the best outcome variables regardless of the context

11
New cards

Data Analyst:

Storytelling
- Explain something that makes sense

12
New cards

Two Types of Data Mining

Descriptive
Predictive
* Both use Present data

13
New cards

Descriptive Data Mining

Describes the data
- who were the best customers
- what did customers buy together?

14
New cards

Predictive Data Mining

Makes predictions based on past data
- who will be the best customers
- what will customers buy together

15
New cards

Why do we need Machine Learning

Often state the obvious
- what is the best way to predict
- how well does it predict
Data explosion
- large increase of available data

16
New cards

Big Data- The 3 V's

Volume
Variety
Velocity
Outcome: Value- what can you do with the data, how does it add value

17
New cards

Volume

Can you find the information you are looking for

18
New cards

Variety

Is a picture worth a thousand words in 70 languages?
Is your information balanced

19
New cards

Velocity

Information gains momentum and crises and opportunities evolve in real time. How is the outlook for today?

20
New cards

Who does what

End User: Decision Making
Business Analyst: Data Presentation/ visualization
Data Analyst: Data mining (information discovery), Data Exploration (statistical analysis, querying, reporting)
Database Administrator: Data Warehouses/Data Marts (OLAP),
Data sources (paper, files, data, OLTP)

21
New cards

Needed skills of a Data Analyst

- Statistical awareness
- Expertise with analytical tools
- Domain Knowledge
* know what results are useful
* know how to apply results

22
New cards

The Knowledge Discovery in Databases Process (KDD)

Data --> Selection
Target Data --> Processing
Processed Data --> Transformation
Transformed Data --> Data Mining
Patterns --> Interpretation and Evaluation
Knowledge

23
New cards

Potential Application Areas

Marketing
- Database marketing
- Tager marketing
- Customer relationship Management (CRM)

Finance
- Credit Scoring
- Fraud Detection

Managment
- Health inofrmatics

24
New cards

Rexer Analytics Survey Results

survey of 1200 data scientists in 72 countries

from multiple secotrs

three key words: Data, Scientist, Computer

1/3 of respondents have seen difficulties when do it yourself tools or services are used

25
New cards

What do Data Scientists do?

improving understanding for customers
Retraining customers
Improving customer experiences
selling products/services to existing customers
market research/ survey analysis
acquiring customers
improving direct marketing programs
sales forecasting
fraud deetection or prevention
risk management / credit scoring

26
New cards

How do Data Scientists do it?

Regression, Decision Trees, and Cluster Analysis remain the most comonly used algorithms

27
New cards

Biggest Difficulty for Data Scientists

Deployment

28
New cards

Challenges to Analytics

Deployment
- must be actionable, integrated within the company

Management-Communication
- Have to have trust in the model

Data
- Data must be accessible
- must be accurate

Modeling
- Risk of overfitting the model

29
New cards

What is SAS

A suite of business solutions and technologies to help organizations solve problems

Analytical software system used by many businesses worldwide
- regulation and reesouces

The base system for all other SAS products
- JMP
- Enterprise Miner

30
New cards

Why use SAS

- Access and manage data across multiple sources
- perform analyses and deliver information across your organization
- Access, Manage, Analyze, Present

31
New cards

Why not use Excel or SPSS

- Can use huge datasets
- saves a program of your steps

32
New cards

SAS, R, or Python

34% use SAS overall

Dataa Scientists: Those working primarily with unstructured or streaming data
Predictive Analytics Pros: Primarily work with structured data

33
New cards

SAS Programs

A SAS progrma is a sequence of one or more steps
- Data stpes typically create SAS data sets
- Proc steps typically process SAS data sets to generate reports and graphs and to manage data

34
New cards

SAS Program Steps

A step is a sequence of SAS statements.

35
New cards

Step Boundaries

SAS steps begin with one of the following
- A data statement
- A proc statement

SAS detects the end of a statement when it encounters one of the following:
- A run statement (for most steps)
- a Quit statement (for some procedures)
- The beginning of another step (Data statement or Proc statement)

36
New cards

Data Steps

Used for reading, manipulating, and processing data
- any changes to the data file are made here


SAS datasets have the extension .sas7dat
- with a datastep you can read in other formats e.g. Excel

37
New cards

Proc Steps

Use for analyzing the data
- Any interpretation of the data file is done here
- proc contents (shows contents of SAS file, metadata)
- proc sort (sorts sas datasets)
- proc print (prints sas datasets)
- proc import (imports data from excel)
- proc export (exports data to excel)

38
New cards

SAS interface

Three Primary tabs or windows

Editor: can enter, edit, submit, and save a SAS program
Log: Browse notes, warnings, and errors relating to a submitted SAS program
Results: browse output from reporting procedures

39
New cards

SAS Syntax Rules: Statements

Usually begin with an identifying keyword
always end with a semicolon

40
New cards

Recommended Formatting

Begin each statement on a new line
use white space to separate words and steps
include statements within a step
indent continued lines in multiline statements

41
New cards

Add comments to the code

Can be used for:
documenting
taking notes
trial and error

42
New cards

Syntax Errors

A syntax error is an error in the spelling or grammar of a sas statement. Sas finds syntax errors as it compiles each SAS statement before execution begins

examples of syntax errors:
misspelled keywords
unmatched quotation marks
missing semicolons
invalid options

43
New cards

Syntax Errors pt 2

When SAS encouters a syntax problem, it writes a warning or error message to the log

you should always check the log to make sure that the program ran successfully even if output is generated

44
New cards

Data Mining Methodologies

CRISP-DM
SEMMA

viewed as implemmentations of KDD

45
New cards

CRISP-DM

Croos Industry Standard Practice Model for data mining
- developed in 1966 by DaimlerChrysler, SPSS, NCR (POS services)
- De facto industry standard
- 6 steps
- cyclical process

46
New cards

CRISP-DM 6 steps

Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment

47
New cards

Defining Business Objectives

- What core business objectives should be addressed
- How can they be quantified
- What data is available
- What methods can be used for this
- How can the model be assessed
- How can the models be deployed

48
New cards

Defining the target variable

- The variable to be estimated or predicted based on the business objectives
- Cant include the future when building a model
- must have data for all predictors
* may needd 30-60 days in the past

49
New cards

Defining measures of Success

Classification
- Percent correct classification (PCC)
- how many errors are made
- Confusion Matrix
- how errors are made

Subset of the Population
- Lift
- ROC
- Area under the curve

Estimation
- R^2
- Average Error
- Mean Squared Error

Business Measures
- ROI
- Parsimony
- Explainability

50
New cards

Data Understanding

Examine the data
- initial collection of the data
- describe and explore the data

Identify Problems
- verify data quality

51
New cards

Defining data

Data must be 2D
- each row is a unit of analysis, a record
- Each column is a variable
Data is rectangular
- each record has the same number of columns
Normalize

52
New cards

Defining the Unit of Analysis

Including a column for each visit is complicated
- Not every customer has the same number of visits

should the unit of analysis be
- Each visit to a store
- A customer
- A household

All variables must be on the same level

53
New cards

Data Preparation

Fix problems in the data
- Data Cleaning
- Data Transformation

Create Derived Variables
- Data formatting

54
New cards

Modeling

Build predictive or descriptive models
- Regression
- Logistic Regression
- Decision Trees
- Analytical Neural Networks
- Cluster Analysis

55
New cards

Evaluation

Assess Models
- Do they meet the business needs (not just the stats)
- might result in identification of other needs
Report the expected effects of the models

56
New cards

Deployment

Plan for use of models
- Apply to business operations
- Monitor for changes in the operating conditions
- Documentation

57
New cards

Modeling out of Order

Caution
- Can misguide the Analysis
- Can rule out variables that are useful

Building Models First
- Get a sense for the variables
- Get a baseline for how the model might work
- Determine if the model predicts too well
* confounding variable

Early Deployment
- Usually know main predictors quickly
- Determine obstacles in real world

58
New cards

Modeling Process

1. Business Understanding
2. Data Understanding
3. Data Prepatation
4. Modeling
5. Evaluation
6. Deployment

59
New cards

SAS Modeling Method

Developed by SAS
- Logical organization of the functional toolset of SAS Enterprise Miner
- For carrying out corre tasks of data mining
- Does not emphasize business underrstanding

Sample, Explore, Modify, Model, Assess

60
New cards

Sample

Input data
Partitian Data
Create Multiple Data Sets
- Training: used for model fitting
- Validation: To assess the model
- Test: Determine how well model generalizes

61
New cards

Explore

Gain Understanding of the data
Look for patterns that exist
- Factor analysis
- clustering

62
New cards

Modify

Create, select, and transform variables
- Group customers
- Alter dates

look for outliers
reduce the amount of variables

63
New cards

Model

Use tools to predict target variable
- Neural networks
- Decision Trees
- Regression

64
New cards

Assess

- Determine how useful
- Determine how reeliable
- Use the test data

65
New cards

Modern Workflow

Assess and View
Interact
Analyze and Discover
Share
Promote and Govern

66
New cards

What is a SAS Data Set

A SAS data set is a specifically structured data file that SAS creates and that only SAS can read. A SAS data set is a table that contians observations and variables

67
New cards

File Formats

- .sas7bdat
- SAS can also import other file formats
- .xlsx- .csv

68
New cards

SAS Data Set Terminiology

SAS Data Set --> Table
Observation --> Row
Variable --> Column

69
New cards

Browsing the Descriptor Portion of SAS

Use Proc Contents to display the descriptor portion of a SAS data set

Proc contents data = libname.dataset;
run;

70
New cards

Descriptor Portion

The descriptor portion contains the following metadata
- general properties (such as data set name and number of observations)
- Variable properties (such as name, type, and length)

71
New cards

Proc Contents

- Add a proc contents step to display the metadata

72
New cards

Data Portion

The data portion of a SAS data set contains the data values, which are either character or numeric

73
New cards

Browsing the Data Portion

Using Proc print to display the data portion of a SAS data set

proc print data = libname.dataset;
run;

74
New cards

SAS Variable Names

- Can be 1-32 characters long
- Must start with a letter or underscore. Subsequent characters must be letters or numbers or mixed case
- can be uppercase, lowercase, or mixed case
- are not case sensitive

75
New cards

Invalid Variable Names

5monthsdata
data#5
five months data

76
New cards

Missing Data Values

Missing values are valid values in a SAS data set
- a blank represents a missing character value
- a period represents a missing numeric value
A value must exist for every variable in every observation

77
New cards

Wrtie a Program to Display the current Date

date date;
currentdate = today();
run;

proc print data=worrk.date;
run;

78
New cards

Any Date

data _NULL_;
date = input ('DDMMMYY'd, best12.);
put date;
run;

79
New cards

SAS Libraries

SAS datasets are stored in SAS libraries. A SAS library is a collection of SAS files that are refrenced and stored as a unit. Files can be stored in a temporary or permanent library

80
New cards

How SAS Libraries are Defined

When a Sas session starts, SAS crreates on temporary and one permenanet SAS library. These libraries are open and readdy to be used. You rrefer to a sas library by a logical name called a library refrerence name, or libref

81
New cards

Temprory Library

work is a temporary library where you can store and access SAS data sets for the duration of the SAS session. It is the default library

Sas deletes the work library and its contents when the session terminates

82
New cards

Permanent Libraries

SAShelp is a permanent library that contains sample SAS data sets you can access during your SAS session

83
New cards

Acessing SAS Data Sets

All sas data sets have a two level name that consists of the libref and the data set name, separated by a period
libref.datasetname
When a data set is in the temporary work library, you can use a one level name

84
New cards

User Defined Libraries

A user defined library
- is created by the user
- is permanent. Data sets are stored until the user deletes then
- is not automatically available in a SAS session
- is implemeneted within the opeating environments file system

85
New cards

Libname Statement

The SAS libname statement is a global statement
libname libref "sas-library" <options>;
- it is not required to be in a data step or a proc step
- it does not require a run statement
- it executes immediately
- it remains in effect until changed or canelled, or until the session ends

86
New cards

Browsing a Library Programmatically

using proc contents with the _ALL_ keyword to generate a list of all SAS files in a library
PROC CONTENTS DATA=libref._ALL_ NODS;
RUN;
- _ALL_ reequests all of the files in the library
- The nods option surpresses the individual data set descriptor information
- NODS can be used only with the keyword _ALL_

87
New cards

Importing an Excel Data File

- Multiple excel data file extensions
- specify the DBMS you want to use
* XLS
* XLSX
* EXCEL (reads all types of excel files)
- looks at less data when determining import strategy

proc import datafile =" " dbms = out = ;
run;

88
New cards

Print Procedure

By default, proc print displays all observations, all variables, and an observation column on the left side

statements and options camn be added to the print procedure to modify the deefault behavior

89
New cards

Variable Statements

The VAR statement selects variables to include in the report and specifies their order

proc print data=libname.dataset;
run;

90
New cards

Proc print with obs=statement

- select a limited set of variables to display
- obs is not a variable

proc print data = libname.dataset (obs=#);
run:

91
New cards

Sum Statement

The sum statement calculates and dsplays report totals for the requested numeric values

sum variables;

92
New cards

Where Statement

The where statement selects observations that meet the criteria specified in the where expression

proc print data=libname.dataset;
var ;
where where-expression;
run;

93
New cards

Where Statement pt 2

The where expression deefines the condition (or conditions) for selecting observations

operands:
- character constants
- numeric constants
- date constants
- character variables
- numeric variables

operators:
- symbols that represent a comparision, calculation, or logical operation
- +, (), >, <, -, *
- SAS functions
- special where operators

94
New cards

Surpressing the obs column

Use the NOOBS option in the proc print statement to surpress the observation column

proc print data = libname.dataset NOOBS;

95
New cards

Operands

Constants are fixed
- characters are enclosed in quotation marks and are case sensitive
- numeric values do not use quotation marks or special characters
Variables must exist in the input data set

ex:
where sex = 'M';
where salary > 50000;

96
New cards

SAS Date Constant

A SAS date constant is a date written in the following form: 'ddmmm<yy>yy'd

SAS automatically converts a date constant to a SAS date value

97
New cards

Comparison Operators

Comparison operators compare a variable with a value or with another variable

= Equal to (EQ)
^= ¬= ~= Not Equal to (NE)
> Greater Than (GT)
< Less Than (LT)
>= Greater than or equal to (GE)
<= less than or equal to (LE)
(' '), (' ' , ' ' ) equal to one of a list (IN)

98
New cards

Logical Operators

Logical operators combine or modify where expressions

WHERE WHERE-expression-1 AND | OR WHERE-expression-n;

99
New cards

Logical Operator Priority

The operators can be written as symbols or mnemonics, and parentheses can be added to modify the order of evaluation

^ ¬ ~ (NOT) priority: 1
& (AND) priority: 2
| (OR) priority: 3

The NOT operator modifies a condition by finding the complement of the specified criteria

100
New cards

Special Where Operators

Special WHERE operators are operators that can be used only in WHERE expressions.

Contains: Includes a substring, can be used only with characters
Between-And: An inclusive range, can be used with characters and numbers
Is Null: A missing value, can be used with charaacters and numbers
Like: Matches a pattern, can be used with characters only

Explore top flashcards