STA4026S - Analytics

18 credits at NQF level 8

Entry Requirements:

Undergraduate degree that included a substantial degree of training in quantitative subjects and programming, as assessed by the course convener.

Course Outline:

This course will cover computationally-intensive statistical methods for analysing datasets of various sizes. The course will cover three broad sections: (1) Parallel and high-performance computing in R, (2) Supervised Learning and (3) Unsupervised Learning. In the first section, students will learn how to use R to analyse large datasets on multiple computer processors, and UCT's own HPC cluster. The second section will expose students to machine learning techniques that are used to infer a regression or classification rule based on labelled training data, including regression and classification trees, bagging and random forests, boosting, neural networks. The last section will cover statistical methods for classifying observations into groups where the group memberships of the training data are not known in advance, including self-organising maps, association rule mining and cluster analysis.