Statistics Wk 1

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/60

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

61 Terms

1
New cards

What is a random sample?

Sequence of independent random variables, that are identically distributed

2
New cards
<p>What is the property of pdf given X is iid?</p>

What is the property of pdf given X is iid?

knowt flashcard image
3
New cards
<p>What is the realisation or observed value of T given by? </p>

What is the realisation or observed value of T given by?

knowt flashcard image
4
New cards

What does a Statistic depend on? What is it not?

It depends on the random sample and not a function of any unknown parameter of the distribution

5
New cards

What is an estimator?

A statistic used to estimate a parameter

<p>A statistic used to estimate a parameter</p>
6
New cards

What is an estimate?

It is the observed value of the estimator, so the realisation of the estimator

<p>It is the observed value of the estimator, so the realisation of the estimator </p>
7
New cards

What are the categories of data collection?

Randomised Controlled Trial, observational/cohort, case control, survey

8
New cards

What are some real populations sampling methods?

Simple, systematic, stratified, cluster

9
New cards

What are the data types?

Categorical(nominal and ordinal) and numerical (discrete/continuous, interval/ratio)

10
New cards

What is the primary research interest of Randomised Control Trial(RCT)?

Determining causality

11
New cards

What may complicate analysis of Randomised Controlled Trial? Define it

Confounding which involves multiple correlated factors affecting the response of interest, where it is hard to separate out influences

12
New cards

What does simple random sampling involve?

Every item in the population has an equal probability of being selected in the sample

13
New cards

What does stratified random sampling involve?

Partitioning the population into groups based on a variation in a characteristic of interest, then random samples are taken from each group

14
New cards

What does cluster sampling involve?

Population is partitioned into groups that each have similar characteristics to the overall population, where a subset of groups are chosen and then within each group, a random sample is chosen

15
New cards

What does systematic sampling involve?

List out all the items in the population. Then items selected from the list by starting at a random point on the list and then selecting the other items at a regular interval. Interval = total population/desired population size

16
New cards

What are the 2 measurement scales for numerical data?

Interval scale and ratio scale

17
New cards

What are the characteristics of interval vs ratio scale?

In interval scales, it doesn’t have an absolute zero value, as there can be negative values on the scale, hence, ratios are not meaningful, whilst the differences in measurement are meaningful. For a ratio scale, it has a true zero point, can’t take on negative values and ratios are meaningful

<p>In interval scales, it doesn’t have an absolute zero value, as there can be negative values on the scale, hence, ratios are not meaningful, whilst the differences in measurement are meaningful. For a ratio scale, it has a true zero point, can’t take on negative values and ratios are meaningful</p>
18
New cards
<p>What is in the black box?</p>

What is in the black box?

knowt flashcard image
19
New cards

What is the Law of Large Numbers?

knowt flashcard image
20
New cards

What is the Central Limit Theorem?

knowt flashcard image
21
New cards

What are the five principles of good graphics?

Show the data clearly, use good alignment on a common scale for quantities to be compared, use simplicity in design, keep the visual encoding transparent, prefer standard forms of demonstration

<p>Show the data clearly, use good alignment on a common scale for quantities to be compared, use simplicity in design, keep the visual encoding transparent, prefer standard forms of demonstration</p>
22
New cards

What are some ways to show the data clearly?

Identify the source of the data, the purpose of the graphic should influence its construction, the graph should show the data, distractions and distortions should be avoided, labelling in the title, on axes and for data points should be well-chosen and informative

23
New cards

What is the most important guiding principle for the construction of quantitative graphics?

Align quantities to be compared on a common linear scale

24
New cards

What is the data:ink ratio? Write out the formula

Used to measure the density of information in the representation. pixels used directly for data/total non-background pixels

25
New cards

Would you rather low data:ink ratio or high data:ink ratio?

High data:ink ratio

<p>High data:ink ratio</p>
26
New cards

What does it mean to keep the visual encoding transparent?

Ensure that as soon as an individual looks at the graph, the results are obvious to see

27
New cards

What are the standard forms of visually representing data?

Time-series plot, bar chart, scatter plot, dot plot, histogram, boxplot

28
New cards

What are dot plots useful for?

Best used for small sample sizes, can show the detail and distribution of numerical variable and can be extended to compare more than one group

29
New cards

What do you do when you obtain unusual values?

May need careful handling, warrant an investigation into recording procedures. If no mistakes, include them in consideration

30
New cards

Can you just remove outliers from data for analytical convenience? Why or why not?

No since it is an error or maybe could be a serious error to do so

31
New cards

What is a mid outlier?

Lies more than 1.5*IQR below Q1 or above Q3

32
New cards

What is an extreme outlier?

Lies more than 3*IQR below Q1 or above Q3

33
New cards

What does R highlight in default? Mild or extreme outliers?

Mild outliers

34
New cards

Are outliers valuable or can we just ignore them?

They are valuable since they either contain information about what is being investigated or the data gathering and recording process.

35
New cards

A scatterplot shows the relationship between what kind of variables?

2 numerical variables

36
New cards

What is descriptive statistics?

The first step towards understanding our data, and investigating its location, spread and shape

37
New cards

What is used to investigate the location of the data?

Mean, median

38
New cards

What is used to investigate the spread of the data?

Standard deviation, interquartile range, sample range

39
New cards

What is used to investigate the shape of the data?

probability mass function, probability density function, cumulative distribution function

40
New cards

What is the symbol we use to denote the sample mean? What is the formula for the sample/empirical mean of a random variable?

knowt flashcard image
41
New cards

What is the symbol we use to denote the sample variance? What is the formula for the sample/empirical variance of a random variable?

<p></p>
42
New cards

When calculating the sample variance, why do we divided by n-1 and not n?

To get a more unbiased and accurate estimate of the sample variance. If you use n, then the variance tends to be underestimated

43
New cards

What is the formula to calculate the sample standard deviation?

knowt flashcard image
44
New cards

What is the formula for the empirical cdf?

This makes sense because if you think about it, each event of xi<=x is equally likely, and if there are n number of xi’s, then 1/n gives you the probability that an xi<x, so you add the summation in since we are talking about cumulative, so probability that x1 and x2 are less than x is 2/n.

<p>This makes sense because if you think about it, each event of xi&lt;=x is equally likely, and if there are n number of xi’s, then 1/n gives you the probability that an xi&lt;x, so you add the summation in since we are talking about cumulative, so probability that x1 and x2 are less than x is 2/n. </p>
45
New cards

What is the formula for the empirical pmf?

You need the summation so you can scan the whole dataset and see how many xi’s are equal to x. Then you add that number up and divide by n to determine the probability of x occurring.

<p>You need the summation so you can scan the whole dataset and see how many xi’s are equal to x. Then you add that number up and divide by n to determine the probability of x occurring. </p>
46
New cards

If the underlying variable is continuous, you would prefer to obtain an approximation of the pdf. What is the formula for the pdf using the histogram approach?

<p></p>
47
New cards

If the underlying variable is continuous, you would prefer to obtain an approximation of the pdf. What is the formula for the pdf using the smoothed pdf approach?

knowt flashcard image
48
New cards

What is the definition of the pth population quantile?

knowt flashcard image
49
New cards

What is the definition of the pth sample quantile?

knowt flashcard image
50
New cards

Using Order statistics, how can we express the empirical cdf?

knowt flashcard image
51
New cards

What is the symbol used to denote the sample median? What is the formula to obtain the sample median? When would the median be the preferred measure of location?

It would be preferred over mean to measure location for data that is highly skewed

<p>It would be preferred over mean to measure location for data that is highly skewed</p>
52
New cards

What is the formula for the Interquartile Range? When would IQR be a preferred measure of spread?

It would be preferred for data that is highly skewed

<p>It would be preferred for data that is highly skewed</p>
53
New cards

What does the sample IQR estimate?

The population interquartile range

<p>The population interquartile range</p>
54
New cards

What plot do we use to tell what kind of distribution a sample follows?

Quantile-Quantile plots

55
New cards

What does a Quantile-Quantile (QQ) plot involve?

Find a typical sample from the given population and plot it against our sample. If our sample is from that distribution, the 2 will be similar and we will get a straight line y = x.

56
New cards

Should you use empirical pdf’s or cdf’s to tell if a sample is from any given distribution? Why or why not?

No since the empirical shapes vary quite a bit and its hard to compare curved shapes

57
New cards

What does a QQ plot look like?

Plot our sample on the y-axis and the typical population sample on the x-axis

58
New cards

What is the difference between a QQ plot and a Probability plot?

The axes are opposite.

59
New cards

What is a typical population sample?

Where a sample of n sample points chops the line of a distribution into (n+1) intervals with equal probabilities of 1/(n+1)

60
New cards

What assumption do many procedures and models make about the underlying population distribution?

It’s normally distributed

61
New cards

What type of plots are used to test normality of samples?

Normal quantile plots