GitHub Repository for the CAS Applied Data Science
Below you will find an overview of the different modules which will be part of this CAS program
Module 1: Data Acquisition and Management
- Learn to understand different data sources and types and how to design data management models and plans
Module 2: Statistical Inference for Data Science
- Gain basic understanding of statistical modules used for analysis and descriptive statistics
Module 3: Data Analysis and Machine Learning
- Overview of machine learning pipelines and their implementation with scikit-learn
- Regression and** Classification**: linear models and logistic regression
- Decision trees & random forest models
- Principal component analysis (PCA) and non-linear embeddings (t-SNE and UMAP)
- Clustering with K-means and Gaussian mixtures
- Artificial Neural networks as general fitters, fully connected nets used to classify the fashion-MNIST dataset
- Scikit-learn and clustering maps, Q&A
Module 4: Ethics and Best Practices
- Create GitHub repository for your CAS material and projects
- Document repository and subfolders with Readme files
Module 5: Peer Consulting and Selected Readings
- Peer knowledge exchange and consultation groups
- Discussion and Collaboration with peers on key concepts and practical applications
Module 6: Deep Learning
- TensorFlow for deep learning applications
Final Project
- Contains accumulated data, notebooks and the final report