Lecture 11: more data trumps over smarter algorithms

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/24

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 1:35 AM on 3/10/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

25 Terms

1
New cards
<p>what do “AIC” and “BIC” represent?</p>

what do “AIC” and “BIC” represent?

likelihood function: measure of fit that can be done with probabilistic data

*BIC is more frequent (it was the thing saw in the last set of card, a BIC > 2 means that something important is happening)

2
New cards

what are the different definitions of parameter? (3)

  • hyperparameter: how a neural network is operating

  • parameters as weights in a model

  • parameter as a way of controlling for a model’s behaviour (in psychology)

3
New cards

what does this article talk about?

the right way of modelling and what aspects you want to emphasize on

4
New cards

how are classical models compared?

with the number of parameters (ex: AIC, BIC)

5
New cards

what issue do co-occurence models face?

the amount of training material needed

6
New cards

why are books not the ideal source of training material?

because most people don’t get their language experience from books only

7
New cards

how many tokens/words do people consume per year?

12 millions

8
New cards

true or false: your linguistic environment can depend on your educational level

true

9
New cards

true or false: two people with high vocabulary might have different type of vocabulary

true: a physics professor will have high vocabulary, but different from an anthropology professor

10
New cards

define “computational complexity”

technique that dictates the maximum or minimum number of steps that an algorithm is going to take given an input size

11
New cards

true or false: all cognitive models fall under the same class

false: they can be separated into different classes

12
New cards

explain how we contrasted point wise mutual information (PMI) from latent semantic analysis (LSA)

  • trained both on small corpus (6 millions words from Wikipedia)

  • tested PMI on large Wikipedia corpus of 400 million words

  • tested on synonym test and word similarity data

  • LSA was better than PMI on smaller training data, but on bigger data PMI was better

  • suggests that overcoming limitation of distributional model is to increase training materials

<ul><li><p>trained both on small corpus (6 millions words from Wikipedia)</p></li><li><p>tested PMI on large Wikipedia corpus of 400 million words </p></li><li><p>tested on synonym test and word similarity data </p></li><li><p>LSA was better than PMI on smaller training data, but on bigger data PMI was better </p></li><li><p>suggests that overcoming limitation of distributional model is to increase training materials </p></li></ul><p></p>
13
New cards

why don’t you need complex learning mechanisms?

  • there is a trade off: complex learning mechanisms OR simpler models but trained with more data

  • should you have simplicity in the training model or with the number of training exemplars?

  • this is a competing model in psychology because we want both but they don’t work well together

14
New cards

define “productive vocabulary”

words you use when communicating

15
New cards

define “receptive vocabulary”

words you can understand

16
New cards

what’s the difference between productive and receptive vocabulary?

  • productive: words you use when communicating with others

  • receptive: words you can understand

17
New cards

we have larger [productive/receptive] vocabulary than [productive/receptive] vocabulary because […]

  • receptive larger than productive

  • because you don’t use all the words you know when communicating

18
New cards

why is BEAGLE more advantageous than Gaussian?

it’s less noisy

19
New cards

how can we understand the language variability between each user?

by understanding individual differences and the way they use language

20
New cards

define “average similarity of words” (ASW)

average similarity between two words across all users

21
New cards

define “total representation” (TR)

train a single model on all user corpora

22
New cards

what’s the difference between “average similarity of words” and “total representation”?

  • ASW: average the similarity between two words across all users

  • TR: train a single model on all user corpora

23
New cards

how could you increase the power in average similarity of words?

by taking into account the variability in word meaning

24
New cards

what do you need to stimulate human learning? (4)

  • learning mechanism

  • realistic training materials

  • representation

  • simplicity

25
New cards

what was the conclusion of the article?

  • humans have a lot of experience with the world

  • models need to scale

  • complexity of training algorithm needs to be understood and balanced