## PHC321: applied biostatistics

1. Can we predict patients’ baseline HbA1c (in mmol/mol) from their Total Cholesterol (in mmol/L)? Provide a complete statistical investigation using correlation and regression analyses (including the regression equation) with proper written interpretation and visual illustrations.

In order to predict patients’ baseline HbA1c (in mmol/mol) from their Total Cholesterol (in mmol/L) we must fulfill assumptions of linear regression model. (SPSS Statistics | IBM, n.d.)

• Our two variables should be measured at the continuous level
• There needs to be a linear relationship between the two variables
• There should be no significant outliers
• The residual term “e” is Normally distributed, mean = 0, for each value of X
• Spread of residual terms should be equal, no matter the value of X. & e shouldn’t expand or contract as X increases

At first due to some significant outliers the two variables don’t meet Normal distribution and don’t show significant linear relationship by Spearman’s Rank-Order Correlation test

After considering of removing outliers and conducting correlation analysis by Pearson r correlation analysis, we will find a significant linear relationship between the two variables (r = 0.11, p-value = 0.013), by now we can build a linear regression model to predict patients’ baseline HbA1c from their Total Cholesterol.

When we conduct linear regression analysis by SPSS between baseline HbA1c as dependent variable and total Cholesterol as independent variable we will obtain the following result tables

 Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .111a .012 .010 17.08107 a. Predictors: (Constant), Total cholesterol (mmol/L) at baseline

 ANOVAa Model Sum of Squares df Mean Square F Sig. 1 Regression 1804.988 1 1804.988 6.186 .013b Residual 144422.597 495 291.763 Total 146227.585 496 a. Dependent Variable: HbA1c (mmol/mol) at baseline b. Predictors: (Constant), Total cholesterol (mmol/L) at baseline

 Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) 62.269 3.037 20.501 .000 Total cholesterol (mmol/L) at baseline 1.619 .651 .111 2.487 .013 a. Dependent Variable: HbA1c (mmol/mol) at baseline

From the table which provide R and R squared we can explain that 0.012 (1.2%) of the total variation in the dependent variable, baseline HbA1c, can be explained by the independent variable total Cholesterol.

ANOVA table indicates that the regression model predicts the dependent variable significantly well.

And from coefficients tables we can build a regression equation as the following

Baseline HbA1c = 62.269 + 1.619 Total Cholesterol

And the following scatterplot shows a graphical representation of the model

Drawing on your conclusion in the previous question, Can we add other variables to the regression model for confounding effect control? Provide at least two confounding variables with proper justification?

There are many variables that show correlation with baseline HbA1c that can be added to our regression model to make it more precise

• Duration of oral antidiabetic drugs (r = -0.127, p-value = 005)
• Age in years (r = -0.144, p-value = 001)
• Duration of lipids drugs use (r = -0.097, p-value = 042)
• Duration of antihypertensive drugs (r = -0.137, p-value = 005)
• Diastolic BP (mmHg) at baseline (r = 0.094, p-value = 036)
• Alkaline Phosphatase (IU/L) at baseline (r = 0.11, p-value = 015)

The above variables show significant correlation with baseline HbA1c so that we can add them to regression model