Advanced Machine Learning and Artificial Intelligence

Course Code:
MSCA 32017

Since the era of big data started, challenges associated with data analysis have grown significantly in different directions: First, the technological infrastructure had to be developed that can hold and process large amounts of data from different sources and of multiple not always well formalized formats. Second, data analysis methods had to be reviewed, selected and modified to work in distributed computational environments like combinations of in-house clusters of servers and cloud.

But the biggest challenge of all is learning to think differently in order to ask new types of questions that could not be answered by analyses of less complex data streams with less complex technological infrastructure. In recent years significant progress has been achieved in creating technological ecosystems for big data analysis. Innovative technologies such as open source projects MapReduce, Hadoop, Spark, Storm, Kafka, TensorFlow, H2O, etc. allowed us to look at depths of data unseen before. We have now growing number of sources and educational courses introducing these new tools.

It appeared little more difficult to develop new data analysis methods appropriate for the new data ecosystems. There are some new interesting ideas, there is significant amount of empirical studies. But the methods of late 19th and first half of 20th centuries, albeit transformed to run with technology of 21st century, are still dominating our research. Traditional and new concepts and methods of big data analysis are typically, with few exceptions, covered in books and taught in courses requiring a laptop environment, often testing limits of computing power of personal hardware, but not giving enough flavor of the 21st century combination of high-end technology and discussion of methods in depth.

The goal of this course is to fill this gap and teach students to think about real problems by analyzing big data in new data analysis ecosystems. The course is project-based: we will take up several real life projects and discuss different approaches to digging for insights, possible pitfalls and applications. In our work on projects we will use Python and modern cloud computing environments: Spark, TensorFlow.

Prerequisites:

  • MSCA 31009: Machine Learning & Predictive Analytics 
  • MSCA 37011: Deep Learning & Image Recognition (Recommended)