1. A study looked at whether pack years of lifetime smoking (smokepy) can predict
the level of C-reactive protein (CRP), an inflammatory marker, after controlling for age,
socioeconomic status (SES) and education. SES and education were numerical
variables; CRP was normally distributed. The results of analyses are shown below.
Significance criterion is set at p<0.05.
R-squared = .43
Adjusted R-squared = .41
Parameter Estimates
Variable DF Estimates Standard
t value p-value
Intercept 1 169.39 7.92 21.39 <.001
Smokepy 1 -0.38 0.05 -7.49 <.001
Age 1 0.03 0.04 0.77 .44
SES 1 1.41 0.67 2.1 .04
Education 1 -1.66 0.87 -1.91 .06
A) What type of statistical procedure is this?
B) Describe the findings, including interpretation of all values in the column for parameter
estimates and whether or not they are significant.
C) After controlling for model complexity (i.e. number of independent variables), what is the
proportion of variability in CRP explained by this model?
2. An investigator conducted a study to find the relationship between the number of
decayed, missing, or filled teeth (DMFT) and sugar consumption. The investigator
produced an estimate for the correlation coefficient and provided the following statement:
“The correlation between DMFT and sugar consumption is 0.7. There is a strong
correlation between DMFT and sugar consumption. Therefore, it is recommended that
patients be advised to reduce sugar consumption to prevent tooth decay.”
State why you are or you are not confident about this investigator’s conclusion. In other
words, explain if something is missing from this investigator’s analysis, or if all you need
is provided.
3. You are conducting a study to analyze gender differences in neurocognitive impairment (NCI)
within a sample of cocaine-dependent methadone-maintained patients. You found 3
demographic characteristics that produced significant effects on NCI. They are gender, race,
and age.
A) What statistical analysis would you use to see simultaneously the contributions of
socio-demographic variables (gender (male/female), race (White, Black, Latino, Asian), and age
(in years) on self-reported NCI, a normal continuous outcome (higher scores indicate higher
B) How many independent variables will there be in your model? Describe (1) what they are, (2)
how you would create them, and (3) interpretation for each coefficient.
4. We ran an inference test to study if gender (0=female; 1= male) is associated with a
diagnosis of Type 2 Diabetes Mellitus (t2dm: 0= absent; 1=present) on a group of patients,
controlling for age. Results table are shown as below:
Analysis of Maximum Likelihood Estimates
Estimate Standard
Test statistic p-value Exp(B)
Intercept -12.77 1.9759 41.8176 <.0001 ———-
Gender 0.41 0.124 10.9799 0.0009 1.5
Age 0.0948 0.0305 9.6883 0.0019 1.09
A) What type of model is this, and why is this type of analysis appropriate in this case?
B) Describe the finding: is gender associated with diagnosis of t2dm, why or why not (provide
the test statistic and p-value)? Interpret the coefficient for gender and age.
5. In 1998, there was a major ice storm in Maine. Researchers wanted to know whether
there was an association between generator location (inside or outside) and CO poisoning
after an ice storm. Results from their case-control study are summarized in the table below
(cases are observations that have experienced the CO poisoning, controls are
observations that have not experienced the CO poisoning):
(A) What type of table is this?
(B) Name at least 2 tests you can perform to investigate the association between
generator location and CO poisoning after an ice storm.
(C) Calculate the odds ratio and risk ratio based on this table. Which one is more
appropriate for this type of study design

Powered by WordPress