"It's tough to make predictions, especially about the future."
– Yogi Berra
In this course, you'll create end-to-end solutions to machine learning problems. This course will cover popular applied techniques in both supervised and unsupervised machine learning, such as regression, classification, and clustering. You'll learn how to properly engineer features, apply algorithms, and evaluate model performance. The focus of the course will be Python's scikit-learn library.
Instructor: Brian Spiering
Contact: Slack DM (more preferred) | [email protected] (less preferred)
Office hours: Thursdays 4:00-5:00 & By Appointment. Zoom link will be in Canvas and on Slack.
Grader: Matthew King
Contact: Slack DM | [email protected]
Office hours: By appointment
Website: github.com/brianspiering/ml_lab
Communication: Slack #msds699_ml_lab_2021
- Working knowledge of probability and statistics.
- Introductory knowledge of linear algebra (e.g., determinants and singular value decomposition).
- Intermediate level of Python (e.g., ability to create to classes).
- No previous knowledge of machine learning required.
By the end of the course, you should be able to:
- Build end-to-end machine learning systems to answer meaningful Data Science questions.
- Write idiomatic code in Python's scikit-learn package to model data.
- Recognize when to and when not to apply machine learning techniques.
- Complete data science take-home challenges that you might encounter during job interviews.
The course will meet on Fridays from 2:20 to 4:20 in Zoom. The Zoom link is in Canvas. The classes will be recorded.
-
01/29
- Welcome
- Machine learning workflow
- Scikit-learn API Overview
- Estimators
- Transformers
- Pipelines
- Linear models
-
02/05
- Preprocessing
- Feature extraction
-
02/12
- Feature selection
- Principal Component Analysis (PCA)
-
02/19
- Model Selection
- Resampling with SMOTE
-
02/26
- Mutliclass Classifiers
- Classification Metrics
-
03/05
- Ensembling
- Feature Importance
- Custom Classes
- Student's choice
-
03/12
- Clustering
There are no required textbooks.
The following books are optional (and highly recommended):
- Machine Learning with Python Cookbook: Practical Solutions from Preprocessing to Deep Learning by Chris Albon
- Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow 2nd Edition by Aurélien Géron
- Introduction to Machine Learning with Python: A Guide for Data Scientists by Andreas C. Müller and Sarah Guido
(Remember as students of University of San Francisco you have access to these through O'Reilly Learning.)
Item | Weight | Due Date & Time |
---|---|---|
Professionalism | 10% | Throughout |
Assignment 1 | 10% | 02/05 at 2:20p |
Assignment 2 | 10% | 02/12 at 2:20p |
Assignment 3 | 10% | 02/19 at 2:20p |
Assignment 4 | 10% | 02/26 at 2:20p |
Assignment 5 | 10% | 03/05 at 2:20p |
Final Project (FP) | 40% | 03/12 at 2:20p* |
Total | 100% |
* There will be small check-in assignments for the Final Project throughout the module to encourage you to not procrastinate.
Each item's contribution is capped at its respective percentage. The total course percentage is capped at 100%.
Currently, there is no extra credit. If there is any extra credit, it is entirely at the discretion of the instructor.
We'll be using Canvas as the learning management system (LMS), aka the gradebook. The instructional team will do their best to have Canvas accurately reflect your current scores in the course. However, Canvas may not be completely accurate all the time. In other words, your actual grade may be significantly different than it appears on Canvas.
Those deadlines are firm. I suggest turning assignments early and often. Late assignments will only be accepted for medical emergencies. Asking for acceptance of any late assignments without a medical emergency or submitting assignments not through Canvas will result in a loss of professionalism points (and your assignment will still not be accepted).
I expect you act professionally at all times: in-person and electronically, during class and outside of class. Since people come up from a variety of backgrounds, I want to be explicit about the elements of professionalism:
- Show up on time and prepared.
- Remain fully present.
- Contribute appropriately and meaningfully.
- Follow staff and faculty instructions appropriately.
- Show respect to all people.
Professionalism points are entirely at the instructor's discretion.
Violations of Academic Integrity are unprofessional, thus you'll automatically lose all Professionalism points for any violations of Academic Integrity.
Tardiness negatively impacts an active learning environment, thus will impact your professionalism grade.
You must show-up to each session prepared. Each person is important to the dynamic of the class, and therefore students are required to participate in class activities. Expect to be "cold called". I call on students at random not to put you on the spot but to keep you engaged in the material at all times.
I expect you to be fully present and engaged in the classroom at all times. I strongly suggest taking notes on paper.
I except you follow the etiquette guidelines throughout the entire course. This is your warning. Every violation will result in a loss of participation points, negatively impacting your total grade.
The MSDS program considers a grade of "A" to represent exceptional work with respect to both the instructor's expectations and peer student achievements. I consider an "A" grade to be above and beyond what most students achieve. A grade of "B" represents the expected outcome, what is called "competence" in a business setting. A "C" grade represents achievements lower than the instructor's expectations for competence in the subject. A grade of "F" represents little or no work in the course.
I will "curve" the final numerical grades at the end of the course. The mapping from percentages to letter grades (e.g., [95, 100] is an A, [90,95) is an A-, etc.) will not be established until the end of the course. Roughly, the top 15% of students will receive grades of A or A-. Roughly, 60% of students will receive grades of B+, B, or B-. Roughly, 20% of students will receive grades of C+, C, or C-. Students can receive failing grades.
If you are a student with a disability or disabling condition, or if you think you may have a disability, please contact USF Student Disability Services (SDS) for information about accommodations.
All students are expected to behave in accordance with the Student Conduct Code and other University policies.
USF upholds the standards of honesty and integrity from all members of the academic community. All students are expected to know and adhere to the University's Honor Code.
You may not copy code from other current or previous students. All suspicious activity will be investigated and, if warranted, passed to the Dean of Sciences for action. Copying answers or code from other students or sources during a quiz, exam, or for an assignment is a violation of the university’s honor code and will be treated as such. Plagiarism consists of copying material from any source and passing off that material as your own original work. Plagiarism is plagiarism: it does not matter if the source being copied is on the Internet, from a book or textbook, or from quizzes or problem sets written up by other students. Giving code or showing code to another student is also considered a violation. You must also abide by the copyright laws of the United States.
The golden rule: You must never represent another person’s work as your own. Credit to Terence Parr.
I generously post all my materials to a public GitHub repo. However, you should not post any solutions to GitHub (or anywhere else on the Internet). Publicly posting any solutions to any problems for this course will result in a failing grade for this course.
If you ever have questions about what constitutes plagiarism, cheating, or academic dishonesty in my course, please feel free to ask me.
CAPS provides confidential, free counseling to student members of our community.
For information and resources regarding sexual misconduct or assault visit the Title IX coordinator or USF's Callisto website.