ML Final

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/31

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 5:08 AM on 5/8/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

32 Terms

1
New cards

Neural Networks

These are modeled on the human brain. Just as neurons are networked to one another, artificial neural networks emphasize the relationships between attributes. They have proven especially effective in analyzing visual, audio, and written language data. Similar to Principle Component Analysis, they are affective at identifying which features are most relevant.

2
New cards

Perceptron

This algorithm attempts to classify objects using a single line, plane, or hyperplane. It begins with a set of weights matching the number of dimensions under consideration, plus 1 additional weight. For example, if we’re looking at a two-dimensional field, we will have three weights, ω0, ω1, and ω2. These values will be multiplied by the value of their corresponding dimensions and added together, this is ω Transpose x:

  • ω T x = (ω0 x0) + (ω1 x1) + (ω2 * x2)

3
New cards

Bayesian Nets

The probability of x is the number of times x occurs in the observations:

  • P(x) = Count(x)/N

4
New cards

Joint Probability

The probability that two variables have particular values is the count of records where they have those values divided by the number of records:

  • P(x,y) = Count(x, y) / N

5
New cards

Sum Rule

Once you have a joint probability, you can use that to determine a single probability by ‘summing out’ the other variable:

  • P(+x) = P(+x, +y) + P(+x, -y)

6
New cards

Conditional Probability

Probability x given y is the probability of x and y divided by probability of y:

  • P(x|y) = P(x, y) / P(y)

7
New cards

Bayes Theorem

If the probability of P(y|x) is P(x,y)/P(x), and P(x, y) = P(x|y) * P(y), then:

  • P(y|x) = P(x|y)*P(y) / P(x)

8
New cards

Independence

Two variables are independent if:

  • P(x,y) = P(x)P(y); Or

  • P(x|y) = P(x)

9
New cards

Conditional Independence

  • P(x, y|z) = P(x|y)P(y|z); Or

  • P(x|z,y) = P(x|z)

10
New cards

Unsupervised learning

Used on datasets with no predefined classes. The machine will learn by observation instead of examples. The objectives can be the same as for supervised learning, and it is also useful for stand-alone applications and preprocessing.

11
New cards

Examples for unsupervised learning

  • Customer segmentation

  • Patient cohorts with similar characteristics

  • Topics covered in documents

  • Geological predictions

  • Economics: market research

  • Finding nearest neighbors

  • Compression

12
New cards

Types of unsupervised learning

  • Partitional: an object can belong to only one class.

  • Hierarchical: a class can have a subclass.

  • Overlapping: an object can belong to more than one class

  • Fuzzy Cluster: an object belongs to every class, but a weight is attached, usually between 0 and 1. This is similar to fuzzy sets in math.

13
New cards

Other Properties of Unsupervised Algorithms

  • Prototype: class is defined by a representative for that cluster, like a centroid.

  • Density: class is defined by membership in a tight cluster of similar objects.

  • Shared-Property: membership in a class is based on concepts held in common.

  • Graph: Connection to other objects defines class.

14
New cards

Most Popular Algorithms

  • K-Means: Distance-based partitioning

  • DBSCAN: Density-Based

  • Cobweb: model-based conceptual clustering

  • Expectation Maximization: Statistical modeling

  • Nearest Neighbor: Distance-based partitioning

15
New cards

Linking Clusters

  • Single: Distance between nearest objects

  • Complete: Distance between furthest objects

  • Average: Distance between centroids

16
New cards

Important Measures

  • Cohesion: Average distance of points within cluster (Similar to Sum of Square Error)

  • Separation: Average distance to points of nearest outside cluster

  • Silhouette: Cohesion + Separation normalized

    1. S = (b - a)/max(a, b)

    2. Where b is Separation and a is Cohesion

    3. Silhouette will be between -1 and 1

      1. -1 is a poor clustering

      2. 1 is a good clustering

17
New cards

Association

Put simply, this analysis aims to find items that frequently appear together in data. The technique can be applied to many domains including retail sales, bioinformatics, scholarly authorships, parliamentary voting, and web data mining. The patterns discovered are useful for many purposes including promotions, medical discoveries, and classifications.

18
New cards

Market Basket

To preform association analysis, we need data that records transactions. You can think of this as items purchased in a market basket.

TID

Items Purchased

1

Beer, Nuts, Diaper

2

Beer, Coffee, Diaper

3

Beer, Diaper, Eggs

4

Nuts, Eggs, Milk

5

Nuts, Coffee, Diaper, Eggs

19
New cards

Support

  • We want to discover items that are frequently purchased together in the data. We could generate a list of all possible sets and count how many times those sets occur in the recorded transactions. How many transactions contain a set divided by the total count of transactions is the support for that itemset:

    1. Count(itemset) / Number of transactions = Support; Or

    2. σ(X)/N = S → Where X is the itemset and N is the number of transactions

  • In the data above, the support for the itemset {Beer, Diapers} is 3/5 or .6 because Beer and Diapers are purchased together 3 times and there are 5 transactions.

20
New cards

Association Rules

Association rules are stated Y given X, or X -> Y. X is the antecedent and Y is the consequent. We want to discover conditional relationships between subsets. For example, if someone has purchased Diapers, what is the likelihood they also purchased Beer? What is the probability of Beer given Diapers?

21
New cards

Confidence

  • To find the answer, we can find the count of {Beer, Diaper} and divide it by the count of diapers: 3 / 4 = .75. This is the confidence we have that beer will be purchased if diapers are purchased. 

    1. confidence(X -> Y) = σ(X∪Y) / σ(X); OR

    2. If X, Then Y = count(X and Y) / count(x)

22
New cards

Apriori Analysis

  • The computational cost to calculate the support and confidence increases exponentially as the number of distinct items grows. In the data above, there are 64 possible subsets of the six items and 602 possible association rules. If the data represented a real store, there would likely be thousands of items and millions of transactions. To calculate support and confidence each of those transactions would be scanned for each of the 602 rules. Even for a computer, the task would take far too much time. For this reason, we need an algorithmic approach. We only want to consider possible subsets and rules that are most likely to yield meaningful results. Strategies for narrowing the relevant itemsets are often based on the apriori principle principles:

    1. Apriori Principle: if an itemset if frequent, then all of its subsets must also be frequent.

  • Some measures adhere for this rule based on the anti-monotone property:

    1. Anti-monotone Property: the support for every subset of a set must be greater than the support for the set. For example, support adheres to the anti-monotone property. If X is a subset of Y:

      1. s(X) > s(Y)

23
New cards

Interestingness

Using confidence and support to judge association rules is often insufficient. Some uninteresting rules will have high confidence and support. For example, peanut butter and jelly likely have high confidence and support values, but the association is already well known. Confidence can also be misleading. To demonstrate this, consider these contingency tables. Contingency tables are often used to count an item contingent on whether or not another item is present. For example, the first contingency table shows frequencies for people who drink coffee, tea, both, or neither. The first cell, 15, is the number of people who drink both tea and coffee. The cell with the value 5 shows the number of who drink tea but not coffee (the symbol ~ means ‘not’). The cells to the far right and lowermost rows are the totals for those categories.

24
New cards

Lift

  • Lift is the combination of two measures: support divided by independence. This is also called the Interest Factor.

    1. Lift = s(itemset) / s(Antecedent itemset) X s(Consequence itemset)

  • For example, the lift for Tea -> Coffee is:

    1. s(Tea and Coffee) / s(Tea) X s(Coffee)

    2. .15 / .2 X .8

    3. .9375

  • Lift greater than 1 shows the items are positively related; Less than 1 shows they are negatively related; and lift equal to 1 shows the items are independent. In the Coffee, Tea, and Honey example, Tea -> Coffee has a lift of .9375 and Tea -> Honey has a lift of 4.1667. Therefore, the rule Tea -> Honey is a more interesting rule than Tea -> Coffee.

25
New cards

Null invariance

If additional market baskets are added to a dataset that do not contain items A or B, then a measure of A and B is null invariant if it does not change.

26
New cards

Inversion invariance

In some datasets, the presence of an item is more meaningful than its absence. For example, we might be interested to know if people who buy batteries are also likely to buy pens, but we aren’t as interested in how many people did not buy batteries but did buy pens. The presence of batteries is more meaningful than their absence. But, when finding associations between peoples’ opinions, the presence of a “yes” answer is just as meaningful as the presence of a “no” answer. The binary options are equally weighted. Measures are invariant to inversion when the result does not change when all the binary values are switched (‘yes’ is changed to ‘no’, True is changed to False, 1 is changed to 0, etc).

27
New cards

Scaling invariance

If the results of a measure do not change as the distributions between classes change, then the measure is scaling invariant. For example, if a sample goes from having 100 dogs and 50 cats to 300 dogs and 50 cats, then the scaling has changed. If the measure does not change with the additional dogs, then it is scaling invariant.

28
New cards

Symmetry invariance

If the order of items in an itemset does not affect the measure, then the measure is symmetry invariant. For example, confidence can change if a rule is stated A->B rather than B->A, and is therefore not symmetry invariant.

29
New cards

Frequent Pattern Tree Representation

  • The apriori algorithm has its limitations. Its performance is hindered when the data include many items. When the data are wide, many candidate itemsets must be generated.

  • Another algorithm uses a depth-first approach by restructuring the data into an FP – Tree. Each pattern is illustrated by a series of nodes connected by a solid line. Dotted lines connect nodes with the same value, but different patterns. If the pattern is repeated, it is indicated by a number representing the number of itemsets in the transaction that start the same way.

  • The advantage of this approach is, once the tree is constructed, it doesn’t need to revisit the transaction table. We can determine support based on the patterns and numbers alone.

  • This approach can compress the data if patterns frequently repeat. The amount of compression decreases, however, as the uniqueness of itemsets increases. The more a pattern is repeated, the greater the compression.

30
New cards

Uniqueness

Most association algorithms require that data be formatted in a matrix of zero’s and one’s. When pivoting a categorical field to many flag fields, the number of unique categorical values can affect the outcome and performance of your analysis. If the values are too specific, like ‘Coors Light Beer’, then they likely won’t have enough frequency to pass a minimum support threshold. If the values are too generic, like ‘Drinks’, they might appear too frequently and lead to redundant or uninteresting association rules.

31
New cards

Taxonomies

One approach to decrease the uniqueness of a categorical field is to use a taxonomy. This way, items can be grouped together in a concept hierarchy. For example, using this taxonomy, chicken, rabbit, game, and beef, can all be converted to ‘meats’. Meat will have higher frequency, higher support, and likely result in more association rules.

32
New cards

Binary Fields

Sometimes, only the presence of an item is meaningful. For example, when analyzing grocery store purchases, only the presence of an item is considered. We don’t often want association rules like ‘not batteries -> milk’. Other times, like when considering answers to a survey, a no answer is just as meaningful as a yes answer. If that is the case, you may want to preserve both no and yes answers in your pivot.