NIASRA Seminar Series

June 14 @ 10:00 am - 11:00 am


Professor Murray Aitkin, Department of Statistics, University of Melbourne


Statistical modelling education for Data Science


Many Universities and National Statistical Societies are grappling with the need to change the statistics curriculum to reflect the current focus on large-scale data analysis. There is little agreement over how the first course should change. Here is a summary of one approach to the first course (of three):

  1. a) The research questions, the survey designs and the data (lots of them) before anything else, with an early introduction to the deficiencies of observational and voluntary response data;
  1. b) The essential roles of probability and Fisherian likelihood;
  2. c) Visualisation of data with the empirical cdf playing the major role in probability model specification;
  3. d) Inference based on the likelihood: Bayesian analyses (with emphasis on flat or reference priors) and ML should be given together, with ML the quadratic approximation to the full log-likelihood;
  1. e) Regression models up to GLMs and mixtures;
  2. f) Missing and incomplete data analysis by EM and Data Augmentation;
  3. g) Model assessment by credible regions

An example of a small data set of mobile phone lifetimes from an industrial assessment of repair schedules is used to illustrate the course emphases.

The second course covers multi-level designs and multivariate responses.

The third course covers n<<p designs and machine learning and other CS procedures to deal with the dimensionality problem.



Building 39A Room 208
University of Wollongong
Wollongong, NSW Australia