OMIS 324 Exam 3 Study Guide

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/29

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 10:00 PM on 4/6/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

30 Terms

1
New cards

Numpy

numerical python

-          Core library for numerical computing in python

-          Provides fast operations on large, multi-dimensional arrays

-          Foundation of many data science libraries (pandas, scikit-kearn, etc)

-          Foundation of many data science libraries

2
New cards

Sampling

makes analysis faster and more manageable

3
New cards

Statistical summary

helps us understand data quickly and objectively without looking at every individual observation

4
New cards

Measures of location

mean, median, mode

5
New cards

Measures of dispersion

variance, standard deviation

6
New cards

Measures of shape

skewness

7
New cards

Measures of association

correlation

8
New cards

Mean

Average, (it is sensitive to outliers [extreme values])

-          Ex: df[“Income”].mean()

9
New cards

Median

Middle value when the data is arranged from least to greatest. Not sensitive to outliers

-          Ex: df[“Income”].median()

10
New cards

Mode

most frequent value in dataset. Not sensitive to outliers 

  • df.[“Income”].mode()

11
New cards

Mean, median, mode order

knowt flashcard image

12
New cards

Variance

Measure variability / risk (spread from mean)

-          Small variance -> data points are close to the mean

-          Large variance -> data points are widely spread out

-          Ex: df[“Income”].var()

13
New cards

Standard deviation

 square root of the variance.  Typical deviation from mean (risk measure)

-          Ex: df[“Income”].std()

14
New cards

Variance / standard deviation is commonly used

 to measure risk since variance measures risk by quantifying how much actual outcomes deviate from the expected return

15
New cards

Skewness

knowt flashcard image

measures the asymmetry of a distribution. It tells us whether data is tilted more to the left (negative) or to the right (positive)

-          Ex: df[“Income”].skew()

16
New cards

Correlation

(Two Vars) measures the strength and direction of the relationship between two variables.

df['col1'].corr(df['col2']) 

-          Ex: df[“Income”].corr(df[“YearsofExperience”])

17
New cards

Pd.crosstab()

pandas function used to create a cross tabulation (contingency table). It shows the frequency (count) of combinations between two or more categorical variables

18
New cards

 groupby() -

method in Pandas used to:

  • Split the data into groups based on one or more categorical variables

  • Apply a function to each group

  • Then combine the results

  •  Summarize data by groups (e.g., average by category) df.groupby(‘group_col’)[‘value_col’].mean() 

19
New cards

Reinforcement learning

 learn from the data environment using rewards and errors

20
New cards

Cross Tabulation

(Frequency Table) Generate count matrix for combinations of categorical variables • pd.crosstab(df['row_var'], df['col_var’]) • pd.crosstab([df['row_var1'], df['row_var2']], df['col_var'])

21
New cards

Artificial intelligence

any technique that enables computers to mimic human intelligence. It includes machine learning

22
New cards

Machine learning

 a subset of AI that includes techniques that enable machines to improve tasks with experience. It includes deep learning

23
New cards

Parts of machine learning 

  • Supervised learning - develop predictive models: output values are specified

    • Regression

    • Classification 

  • Unsupervised learning- group and interpret data based only on input data 

    • Clustering

    • Association rule mining

    • Anomaly detection 

  • Reinforcement learning - learn from the data environment using rewards and errors

24
New cards

Deep learning

a subset of machine learning based on neural networks that permit a machine to train itself to perform a task

25
New cards

Supervised learning

 develop predictive models: output values are specified

  • Labeled data (dependent variable y is known)

  Use existing data for prediction

26
New cards

Unsupervised learning

 group and interpret data based only on input data

  • Unlabeled data (there is no dependent variable y)

  • Describe existing data, find patterns

27
New cards

Linear regression -

a tool for building mathematical and statistical models that characterize relationships between a dependent (y) (ratio) variable and one or more independent or explanatory variables (x) (ratio or categorical) all of which are numerical

28
New cards

Simple linear regression

involves a single independent variable 

Y = β0 + β1X1

29
New cards

Multiple linear regression

involves two or more independent variables

Y = B0 + β1X1 + β2X2 + … + βkXk

30
New cards


Market value

a + b * square feet 

  • Numerous possible lines could pass through the data points

  • We want to determine the best regression line