Mathematics Question

4. Simulation Study: Screening, Stepwise Selection, and ROC Curves
The following toy problem is motivated by the interesting (but very dicult)
problem of using genome-wide association studies (GWAS) to screen people
for rare diseases with genetic components. 2
(1) Load the data GWAS.csv, and split it into equal-sized testing and training datasets. The first column is the indicator function of a disease.
The remaining columns are indicator functions for a collection of alleles that have been studied. Fit a standard logistic regression model to the training set and apply to the testing set. Summarize the model fit and the performance of the model on the test data.
(2) Repeat the previous step, but this time use Lasso or ridge regression.
Compare the results.
(3) Draw an estimate of the ROC curve for the best method you used.
Describe any interesting features.
(4) You wish to use your model for screening. This means measuring a small number of variables, then forwarding a small fraction of people for further testing. Based on funding considerations, you wish to forward roughly 1 percent of the population for further testing. Describe a decision procedure based on the model you used in the previous step.
Does the plotted ROC curve influence your choice?

DETAILED ASSIGNMENT

20210309043844homework

Powered by WordPress