COMPSCI 532: Systems for Data Science


In this course, students will learn the fundamentals behind large-scale systems used for data science. We will cover the issues involved in scaling up (to many processors) and out (to many nodes) parallelism in order to perform fast analyses on large datasets. These include locality and data representation, concurrency, distributed databases and systems, performance analysis and understanding. We will explore the details of existing and emerging data science platforms, including map-reduce and data analytics systems like Hadoop and Apache Spark, graph databases, stream processing systems, and systems for machine learning.

Direct Link