Stat ST465/665, Assignment 4

1. (14 points) Read in data matrix “assignment4_data1.txt” to create a data matrix X.
The assignment is to use the matrix scatter plot, a plot of the statistical distances to
the sample mean, and the univariate q-q plots to detect outliers in the data set. Hint:
There are 3 or less outliers in the data.
(a) Compute and display the sample covariance matrix and mean vector S and
e
x.
(b) Show a matrix scatter plot and univariate q-q plots.
(c) Compute the statistical distance
e
D vector between the data points and the sample
means where
e
Di = (
e
x
>
i −
e
µ)
>S
−1
(
e
xi −
e
x) with
e
x
>
i denoting the ith row vector of
X. Show a plot of the values versus index.
(d) Use the graphs and
e
D to identify outliers. Explain your choices. Each outlier should
have at least two indicators.
(e) Remove the outliers to get a new data set, then compute and display sample covariance matrix and mean vector S and
e
x for the cleaned data set. Describe the
effect of removing outliers on the sample covariance and mean.
(f) Show a matrix scatter plot and univariate q-q plots. Is this evidence consistent with
a normal distribution? Explain.

DETAILED ASSIGNMENT

202103160514091615871153_assignment4

Powered by WordPress