- This event has passed.
NIASRA Seminar Series
June 14 @ 10:00 am - 11:00 am
Professor Murray Aitkin, Department of Statistics, University of Melbourne
Statistical modelling education for Data Science
Many Universities and National Statistical Societies are grappling with the need to change the statistics curriculum to reflect the current focus on large-scale data analysis. There is little agreement over how the first course should change. Here is a summary of one approach to the first course (of three):
- a) The research questions, the survey designs and the data (lots of them) before anything else, with an early introduction to the deficiencies of observational and voluntary response data;
- b) The essential roles of probability and Fisherian likelihood;
- c) Visualisation of data with the empirical cdf playing the major role in probability model specification;
- d) Inference based on the likelihood: Bayesian analyses (with emphasis on flat or reference priors) and ML should be given together, with ML the quadratic approximation to the full log-likelihood;
- e) Regression models up to GLMs and mixtures;
- f) Missing and incomplete data analysis by EM and Data Augmentation;
- g) Model assessment by credible regions
An example of a small data set of mobile phone lifetimes from an industrial assessment of repair schedules is used to illustrate the course emphases.
The second course covers multi-level designs and multivariate responses.
The third course covers n<<p designs and machine learning and other CS procedures to deal with the dimensionality problem.