## PSTAT 105:statistics

These questions are to be submitted via GradeScope
1. Biologists counted the number of bacterial colonies in 400 cultures. The results were
Number of Colonies 0 1 2 3 4 5 6 or more
Frequency 56 104 80 62 42 27 29
(a) Estimate as best you can the mean number of colonies in each culture. Go ahead and count
everyone in the “6 or more” bin as just 6.
(b) Calculate the expected frequency for each cell if this data is from a Poisson distribution.
(c) Calculate the ÷2 test statistic.
(d) How many degrees of freedom does the distribution of this statistic have?
(e) Calculate a P-value for testing the null hypothesis that this data comes from a Poisson distribution.
For Questions 2 and 3, please analyze the data using R and type up your answers to these questions.
2. A well-known analysis in Malcolm Gladwell’s book Outlier argues that the best hockey players are
more likely to be born earlier in the year presumably because this gives them advantages in the youth
hockey leagues. We are interested in checking whether there is a similar effect in basketball.
(a) The data set Basketball Ref BDays.txt contains information for a large sample of professional
or dplyr::count function to calculate how many players were born in each month. Draw an
appropriate plot.
(b) Perform a ÷2 test to see if the players are equally likely to be born in any month.
(c) In order to focus our attention on modern players, repeat this analysis with only those players
that were born after 1/1/1955. (also use this smaller data set for the following questions.)
(d) To be more careful, we should realize that more people are probably born in January than February
just because there are more days in January. Perform a ÷2 test where the null hypothesis is that
the probability of each month is proportional to the average number of days in that month.
(e) Going even further, it seems that some months generally are favored over others for having babies
(summer births are more likely). We should probably compare our basketball player data to the
following probabilities from the CDC.
Month Jan Feb Mar Apr May Jun
Prob. 0.0815 0.0752 0.0837 0.0816 0.0860 0.0813
Month Jul Aug Sep Oct Nov Dec
Prob. 0.0883 0.0892 0.0866 0.0849 0.0787 0.0830
Perform a ÷2 test to see if the basketball player data has the same distribution.
(f) Interpret your results. Is there significant evidence at an á = 0.05 level that professional basketball
players are born earlier in the year than the normal population?