CSC5007Z - Databases For Data Scientists

12 credits at NQF level 9

Entry Requirements:

Acceptance into the Master's degree, specialising in Data Science.

Course Outline:

This course will introduce students with little or no prior experience to the three cornerstone database technologies for big data, namely relational, NoSQL and Hadoop ecosystems. The course aims to give students an understanding of how data is organised and manipulated at large scale, and practical experience of the design and development of such databases using open source infrastructure. The relational part will cover conceptual, logical and physical database design, including ER modelling and normalisation theory, as well as SQL coding and best practices for performance enhancement. NoSQL databases were developed for big data and semi-structured data applications where relational systems are too inefficient; all four types of NoSQL architecture will be introduced. Distributed data processing is key in manipulating large data sets effectively. The final section of the course will teach the popular Hadoop technologies for distributed data processing, such as MapReduce programming and the execution model of Apache Spark.