Welcome to most important course you’ll ever take: Data Science 🙄 Here is my overview of the structure and contents of this unique blend of stats/coding/machine learning: The first few weeks will focus on statistical thinking, and I will lean heavily on the book Think Stats (as well as the DataCamp Statistical Thinking in Python). From week 4 on, we will shift attention to basic concepts from machine learning and rely more on the ISLR book and the sklearn library. If time permits, I am planning to cover the basics of neural networks.
All lecture materials can be found on this github page .
I am planning to teach in a hybrid mode this semester, i.e. some lectures will be online BigBlueButton (BBB) and some in-person.
The BBB live sessions will be recorded and can be found on the corresponding link on moodle. I will also attempt to record zoom sessions if there are any.
There will be a weekly homework assignment which is not graded. I
strongly recommend you to give your very best shot. I will provide
solutions and if you upload your homework, I will try to look over
it.
There will be a final exam at the end of the semester which counts 70% towards your grade. Due to the Corona uncertainties, I cannot specify yet whether it will be taken online or at HWR. The remaining 30% of the total points are earned via the final project, which will be tackled in groups of 4-5 and is typically a kaggle competition or a similar data analysis task. The deliveries are a report and a final presentation. We will start on this project in the middle of the semester.
As this course is taught entirely in python, we will use Jupyter
notebooks frequently. At the same time -having worked with
Rstudio for over a decade now- I feel
that the comfort of a proper IDE with its many powerful features is
vastly superior to the non ASCII notebooks. So I often will share with
you .Rmd
(“Rmarkdown”) files that contain embedded python code. I
strongly encourage you to familiarize yourself with Rstudio as early as
possible.