Lecture 13: the role of negative information in distributional semantic learning

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/28

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 6:44 PM on 3/13/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

29 Terms

1
New cards

which model is a pre-cursor for large language models?

word2vec

2
New cards

why is word2vec the most plausible model?

because it’s more similar to how people predict words

*prof doesn’t agree with this

3
New cards

define “negative sampling function”

  • for each positive training example, you will supply the network with unrelated words that shouldn’t be associated

  • you will make the network predictive of things related to it (positive) and unpredictive of things that aren’t related (negative)

4
New cards

what’s the issue with word2vec?

  • it doesn’t work well without negative sampling, worse than PMI

  • negative sampling gives the model power to make it better than other models

5
New cards

what’s a criticism towards neural network models?

  • you have some input, but you don’t know the output was constructed

  • from a psychological perspective, it means that it’s hard to understand how the output was produced

<ul><li><p>you have some input, but you don’t know the output was constructed </p></li><li><p>from a psychological perspective, it means that it’s hard to understand how the output was produced </p></li></ul><p></p>
6
New cards

define “positive information”

learning of words that co-occur together

7
New cards

define “negative information”

learning of words that don’t occur together

8
New cards

what’s the difference between positive and negative information?

  • positive: learning of words that co-occur together

  • negative: learning of words that do not occur together

9
New cards

true or false: the use of negative sampling is based in prediction

true

10
New cards

true or false: word2vec only holds negative prediction and not positive

false: it predicts words that should and shouldn’t co-occur

11
New cards
<p>explain the components of the subsampling equation (3)</p>

explain the components of the subsampling equation (3)

  • P(wi): probability of word i being sampled

  • Z(wi): probability of word i occurring

  • t: free parameter

12
New cards

define “window size”

number of words around the target word used for accumulating positive information

13
New cards

define “negative samples”

number of samples taken to accumulate negative information

14
New cards
<p>explain this graph</p>

explain this graph

  • number negative sampling (x-axis) and the balance of positive and negative information (y-axis)

  • at 3 samples, you get a balance of positive and negative information

  • at 4 samples, you get more negative information

15
New cards

what can manipulate in sampling distributions? (4)

  • uniform: all words have equal probability of being sampled

  • unrelated corpus

  • related corpus

  • correct corpus

16
New cards
<p>explain this graph</p>

explain this graph

  • with uniform sampling, there is no overlapping between positive and negative accumulations

  • with an unrelated corpus, there is some frequency words that are coherent

  • with a related corpus, there is some overlap between positive and negative accumulations that are important

  • with the correct corpus, you get the biggest bump: it highlights unique word co-occurrences (co-occurrences that happen above base rate)

17
New cards

what does it mean for co-occurrences to happen above base rate?

co-occurrences happen more often than they should if they were randomly connected to each other

*we use negative sampling to identify them

18
New cards

true or false: similar co-occurrence will appear above base rate

false: only the unique ones will appear above base rate

19
New cards

how does the corpus size affect the base rate?

  • if you increase the corpus size, the negative sample should increase and make the base rate higher

  • meaning that the bigger the corpus, the more impact it should have on negative sampling

*however, word2vec doesn’t work with small corpora → negative sampling should have a bigger impact on small corpora

20
New cards

when researching, what are the differences between psychology and computer science?

  • psychology: coherent theory with clear mechanisms that we understand

  • computer science: only want something that works well, don’t need to understand the mechanisms

21
New cards

negative sampling allows for highlighting of unique co-occurrence [above/below] base rate

above base rate

22
New cards

true or false: the advantage of negative sampling is due to prediction

false: it‘s due to co-occurrences over base rate frequency (that happen more often than if they were randomly connected)

23
New cards

what are the analytic solutions that are parameter free? (2)

  • global negative: addition of equal proportion of positive and negative info

  • distribution of association: transformation of co-occurrence values to z-scores

24
New cards

define “global negative”

addition of equal proportion of positive and negative info

25
New cards

define “distribution of association”

transformation of co-occurrence values to z-scores

26
New cards

true or false: you can combine global negative and distribution of association

true

*global negative: addition of equal proportion of positive and negative info; distribution of association: transformation of co-occurrence values to z-scores

27
New cards

[DOA/global negative] performs better than [DOA/global negative]

DOA better than global negative

28
New cards

how can BEAGLE be made to be as good as word2vec?

with sparse BEAGLE: it will update word co-occurrences so that you can apply DOA and global negative

29
New cards

what’s the difference between between word matrix model and word2vec?

  • a matrix model requires fewer parameters than word2vec (therefore, better for explanations)

  • when compared with the same corpus, a matrix model could outperform word2vec (but on average, both were identical)

Explore top notes

Explore top flashcards

flashcards
Vocab Lesson 12
48
Updated 1141d ago
0.0(0)
flashcards
WWW List 13
25
Updated 30d ago
0.0(0)
flashcards
Quarter 4 Religion : )
140
Updated 659d ago
0.0(0)
flashcards
Unit 5: Westward Migration
25
Updated 344d ago
0.0(0)
flashcards
DMU 3313 Kremkau
140
Updated 966d ago
0.0(0)
flashcards
biol114 - ch.9
54
Updated 373d ago
0.0(0)
flashcards
SPH3U1 - key definitions
191
Updated 1145d ago
0.0(0)
flashcards
APUSH Unit 1 Giddes Test
242
Updated 890d ago
0.0(0)
flashcards
Vocab Lesson 12
48
Updated 1141d ago
0.0(0)
flashcards
WWW List 13
25
Updated 30d ago
0.0(0)
flashcards
Quarter 4 Religion : )
140
Updated 659d ago
0.0(0)
flashcards
Unit 5: Westward Migration
25
Updated 344d ago
0.0(0)
flashcards
DMU 3313 Kremkau
140
Updated 966d ago
0.0(0)
flashcards
biol114 - ch.9
54
Updated 373d ago
0.0(0)
flashcards
SPH3U1 - key definitions
191
Updated 1145d ago
0.0(0)
flashcards
APUSH Unit 1 Giddes Test
242
Updated 890d ago
0.0(0)