Data Science exam

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/56

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

57 Terms

1
New cards

Fallacies

reasoning that is logically incorrect 

  • Undermines the logical validity of an argument or is recognized as unsound

2
New cards

Fermi Estimations

back-of-the-envelope calculations and rough generalizations to estimate values which would require extensive analysis

3
New cards

Occams razor

The principle of parsimony 

  • When presented with competing hypotheses about the same prediction, one solution with the fewest assumptions 

4
New cards

Metadata

 provides information about one or more data files and is intended to help ensure that the data can be correctly interpreted, by yourself at a later date or by others when sharing or publishing data

5
New cards

Retraction

removes a scientific publication from the record of scholarship. A mechanism for alerting readers to unreliable material and other problems in the publication 

6
New cards

Correlation

 A statistical measure that expresses the extent to which two variables are linearly related (meaning they  change together at a constant rate)

7
New cards

Slope

is a measure of the change in a response variable (y) as a function of the change in the explanatory variable (x)

8
New cards

Standardization

for each data point subtract the mean of the data and divide by the variability of the data 

9
New cards

Spurious correlation

two or more events or variables are associated but not casually related due to either coincidence or the presence of a certain third, unseen factor

10
New cards

“post hoc ergo propter hoc

after this, therefore because of this

11
New cards

Selection bias

individual groups in a study differ systematically from the population of interest leading to a systematic error

12
New cards

cofounding variable

the primary factor of interest is mixed up with some other factor that is associated with the outcome

13
New cards

treatment

  • (1) All the levels of a type of manipulation (for example: the presence of a drug, etc.) 

  • (2) A specific manipulation or treatment level (for example: the drug administered)

14
New cards

First principals

the fundamental building block of science. Depending on the case, they can be formal axioms, theoretical postulates, basic propositions, or general principles. It starts directly at the level of established laws of physics, math, or chemistry 

15
New cards

Data Validation

setting that checks the data being entered and returns a warning or prevents entry if the data do not satisfy a logical expression

16
New cards

Population parameters

a summary of statistics for the population 

17
New cards

Simple estimate

An estimate of the population parameter from a sample 

18
New cards

Centrality (mean/median)

Centrality: A statistic that repentant the middle of the data (means and medians) 

  • Mean: the expected value of a dataset 

  • Median: the middle value of an ordered dataset

19
New cards

Convenience smapling

non-random sampling with data collected from subjects that are the easiest to obtain 

20
New cards

Random sampling

sampling subjects from a population with equal probability

21
New cards

Bias

occurs when there is a systematic difference between your sample mean and the true mean

22
New cards

Weighted mean

a calculation in which each observation is weighted by the number of times it's observed 

23
New cards

Dispersion (variability)

is the extent to which a distribution is stretched or squeezed

24
New cards

Variance

the average of the squared differences from the mean

25
New cards

Sample variance

measures the average square deviation between observations and the sample mean

26
New cards

Standard deviation

the square root of the variance

27
New cards

Accuracy vs. Precision

  • Accuracy: proximity of measurement results to the true value (bias)

  • Precision: the closeness of the measurements to each other (precision)

28
New cards

The sampling distribution of an estimate

 is the probability distribution of all values for an estimate that might be obtained when we sample a population

29
New cards

Sampling error

Error in a statistical analysis occurring from a sample not being perfectly representative of the population 

30
New cards

Standard Error (of an estimate)

is the standard deviation of a sample divided by the square root of the sample size

31
New cards

Cohort effects

occur when changing environmental conditions over time results in different age groups having experienced different environmental effects on the trait values under observation

32
New cards

Random selection of individuals

 each member of the population has an equal chance of being selected 

33
New cards

Independence

samples are not related to or do not affect each other (through ecological interactions of shared evolutionary history) 

34
New cards

The principal of proportional ink

when a shaded region is used to represent a numerical value, the area of that shaded region should be directly proportional to the corresponding value

35
New cards

Duck, Chartjunk and glass slipper

  • Duck: the entire graphic has an “interior [with] a lot of ink that does not tell the reader anything new” - Edward Tufte 

  • Chartjunk: all visual elements in charts and graphs that are not necessary to comprehend the information represented on the graph 

  • Glass slippers: data visualization in which the designer has taken a beautiful data design minted for a very specific situation and tried to shoehorn entirely inappropriate types of data into it 

36
New cards

Information deficit model

if you give people more information, it will correct their views

37
New cards

Average error

the mean percentage difference between a poll estimate and the true population vote

38
New cards

Pseudo-anonymity

occurs when (1) there is a small sample size and (2) you collect enough information that someone could figure out who the person is, even if the data is anonymous

39
New cards

Coverage error

results when some members of the population under study are not included in the sampling design

40
New cards

Measurement error (in a poll)

interviewees interpret questions differently than the researcher intended or dishonestly answer questions

41
New cards

Non-responce bias

nonrespondents in the sample that researchers originally drew differently from respondents in ways that are germane to the objectives of the survey

42
New cards

The serial position effect

given a large number of choices, items at the beginning or the end of the list overly influence a person's perception

43
New cards

The central tendency

the tendency is to score around the midpoint of the scale, and not use the extremes 

  • Invariable answers: score all questions the same

44
New cards

Affirmation bias

the tendency to agree with a statement and then disagree

45
New cards

push poll

a question that leads the reader to the answer the pollster wants the reader to give 

46
New cards

The gatekeeper effect

the person who answers the telephone or opens the letter, but may not represent the population you seek

47
New cards

test statistic

a statistical summary of the data summarizes an estimate (mean, median, proportion) relative to uncertainty in that estimate (standard error)

48
New cards

observed test statistic

 a single summary calculated for your dataset

49
New cards

Effect size

the difference in the treatment means 

50
New cards

Null Hypothesis

the hypothesis that there is no significant difference in an estimate (mean, median, proportion) between specified populations, with any observed difference being due to sampling error `

51
New cards

Null Distribution

the probability distribution of the test statistic when the null hypothesis is true. The probability distribution expected under sampling error for the null hypothesis 

52
New cards

Degrees of freedom

Number of data points that go into the estimate minus the number of parameters used as intermediate steps in the estimation of the parameter itself

53
New cards

Alternative Hypothesis

(1). A statement regarding what possibility (ies) will be considered as extreme or more extreme than that observed. (2) the hypothesis that sample observations are not expected by random chance

54
New cards

The area under the curve

the total probability (area) under a probability density function sums to 1.0

55
New cards

reject vs. reject without prejudice in peer review

  • Reject: the journal will not publish your manuscript 

  • Reject without prejudice:  the journal declines to publish your manuscript at this time but would consider it if you resubmitted in the future

56
New cards

Major revisions in peer review

Your manuscript is not suitable for publication, but may be suitable if the points from the peer reviewers are addressed

57
New cards

Accept with minot revisions

Your manuscript is suitable for publication after points from the peer reviewers are addressed