- Graduate School, Leavey School of Business
- Department of Information Systems & Analytics
- Class meeting dates:
- Start: January 6, 2020
- End: March 19, 2020
- Class hours:
- Tuesday 5:45 PM - 7:20 PM
- Thursday 5:45 PM 7:20 PM
- Instructor: Mahmoud Parsian
- Class room: Lucas Hall 307
- Office: 216AA, 2nd Floor, Lucas Hall
- Office Hours: by appointment
1.
Hands-On Machine Learning with Scikit-Learn, 2nd Edition2.
PySpark Algorithms Book by Mahmoud Parsian3.
Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman
- Scikit-Learn - Machine Learning in Python
- Apark-SKlearn
- Apache Spark
- Apache Spark Machine Learning
- handson-ml2
- Final Exam Date: TBDL March, 2020
- Final Exam Time: TBDL 5:30-7:30 PST
This course introduces participants to quantitative techniques and algorithms that are based on big data (numerical and textual) or are theoretical models of big systems or optimization that are currently being used widely in business. It introduces topics that are often qualitative but that are now amenable to quantitative treatment. The course will prepare participants for more rigorous analysis of large data sets as well as introduce machine learning models and data analytics for business intelligence.
The main focus of this class is to cover the following concepts:
-
Basic concepts of Machine Learning
- Supervised learning
- Unsupervised learning
- Reinforcement learning
-
Linear Regression
- scikit-learn
- Spark ML
- machine_learning_algorithms_from_scratch_SLR_sample_chapter.pdf
-
Logistic Regression
- scikit-learn
- Spark ML
-
Principal Component Analysis (PCA)
- scikit-learn
- Spark ML
-
Clustering
- K-means
- Latent Dirichlet allocation (LDA)
- scikit-learn
- Spark ML
-
Frequent Pattern Mining
- FP-Growth
- PrefixSpan
PySpark Algorithms
Data Algorithms: Recipes for Scaling up with Hadoop and Spark