L2 - Robust estimation of multivariate location and scatter

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/51

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

52 Terms

New cards

How do practitioners often check for outliers?

New cards

What is the problem with this approach?

Multivariate outliers may not be extreme in any variable. Such outliers are called correlation outliers.

New cards

What is the idea for detecting correlation outliers?

New cards

What kind of problem does this lead to?

New cards

What are those methods?

Minimum covariance determinant (MCD) estimator

One-step reweighted estimator

New cards

What is the idea of the MCD estimator?

New cards

What is the geometric meaning of the determinant of the covariance matrix?

Geometrically, the determinant of the covariance matrix is proportional to the (hyper)volume

New cards

How do we denote the sample mean that uses observations from subset H?

New cards

How do we denote the sample covariance matrix that uses observations from subset H?

New cards

What is the objective function for finding HMCD?

New cards

What is the functional for the mean over a set A?

New cards

What is the functional for the covariance matrix over a set A?

New cards

What are the functionals of the MCD estimator of location and scatter?

New cards

New cards

Are the MCD estimator or location and scatter Fisher consistent under a normal distribution N(µ,Σ)?

New cards

What is the Fisher consistent, robust estimator or the covariance matrix then?

New cards

The subset size h can be seen as a conservative initial guess of the number of good data points. What is the problem with this?

New cards

Conservative choice of h trades in efficiency to ensure robustness. How can we regain some efficiency?

We use MCD estimate to detect outliers and exclude only those from estimation of mean and covariance matrix

New cards

What is Mahalanobis distance? Give the formula and explain what it means geometrically.

New cards

What are the weights from outlier detection via the MCD estimator?

New cards

How do we compute the reweighted MCD location estimate, given the weights?

New cards

How do we compute the reweighted MCD covariance matrix estimate, given the weights?

New cards

What are muhat_rwtd and sigmahat_rwtd?

New cards

When people talk about the MCD, they typically refer to …

the reweighted estimators ˆµrwgt and ˆ Σrwgt

New cards

the raw MCD estimators

New cards

What does affine equivariance imply for the data?

Data may be rotated, translated or rescaled without affecting the properties of T(X) and S(X)

New cards

When is an estimator of location T(X) affine equivariant?

New cards

When is an estimator of scatter S(X) affine equivariant?

New cards

Are the reweighted MCD estimators affine equivariant? Give your reasoning.

New cards

Give the finite sample breakdown point of location estimator Tn in the multivariate setting.

New cards

Give the finite sample breakdown point of scatter estimator Sn in the multivariate setting.

New cards

What is the upper bound on the breakdown point of affine equivariant estimates of scatter?

New cards

What is the upper bound on the breakdown point of affine equivariant estimates of location?

New cards

What is the optimal subset size? Maximum possible breakdown point

New cards

What is the distribution of the MCD estimator and its convergence rate?

asymptotically normally distributed with convergence rate √n

New cards

What is the convergence rate of the reweighted estimator?

A reweighting step does not improve the rate of convergence of the initial estimator

New cards

What is the really big problem with the MCD estimator in practice? And what is the solution?

New cards

What is the basic C-step algorithm? What is the problem with it?

If an elemental subset contains an outlier, it will influence all further iterations. In this case fully iterating C-steps until convergence is a waste of computation time.

<p>If an elemental subset contains an outlier, it will influence all further iterations. In this case fully iterating C-steps until convergence is a waste of computation time.</p>

New cards

What is the FAST-MCD Algorithm?

New cards

What is the idea of the Deterministic MCD (DetMCD) algorithm?

Instead of using many random starting values, use only a few good ones

New cards

Give the DetMCD Algorithm.

New cards

What is an alternative to MCD and all methods derived from it?

New cards

What to take away from this lecture?

New cards

If an estimator is non-parametric, does that make it robust?

No, not necessarily. The sample mean is non-parametric yet it is not robust

New cards

For a symmetric distribution, is the sample mean Fisher consistent for the true mean mu?

Yes

New cards

For a symmetric distribution, is the sample median Fisher consistent for the true mean mu?

Yes

New cards

For an asymmetric distribution, is the sample mean Fisher consistent for the true mean mu?

Yes

New cards

For an asymmetric distribution, is the sample median Fisher consistent for the true mean mu?

No (so we would need a correction)

New cards

For an asymmetric distribution, is the sample mean Fisher consistent for the median?

No (so we would need a correction)