Machine Learning with R, Part 1
Introduction
This is a public course designed for master’s level students in Umeå University, and it is given at the department of statistics, Umeå University. Students need to have elementary programming knowledge and some understanding of basic statistics, at most understanding regression analysis. In the first part of this course, we will focus on familiarizing students with the basic concepts of machine learning through basic linear models and preparing them for the second part of the course.
Course Design
In this course, after understanding the basic concepts of machine learning, we start with useful tools, the elementary probability knowledge and an introduction to the R language. In lecture 4, 5, and 7, we focus on linear solutions to machine learning problems. Two linear classifiers, Linear Discriminant Analysis and Logistic regression, and linear regression models. During this period, we will also introduce a simple idea of nonlinear extension, which will lead to an important concept and challenge in machine learning, namely the overfitting problem. By this, we consider avoiding overfitting as a model selection problem, and the corresponding methods are discussed.
Textbook
We use ‘An Introduction to statistical learning’ as our textbook. The website of the book: https://www.statlearning.com. On this website, you can not only get an electronic copy of this book, but also find a lot of useful information.
Teaching Methods
As an online course, we naturally choose the flipped classroom teaching method. That is, students first read and study the materials and textbooks we provided, and then conduct laboratory lessons after discussions in a question-and-answer (Q&A) session.
Examination method
We use a combination of take home exams and oral interviews to assess students’ mastery of course knowledge.
Tips for your readings ( CA )
For each lecture, I offer multiple ways to read. You can choose the integrated notes, allowing you to scroll up and down with ease, and use the side menu to jump between sections. However, if you find lengthy notes overwhelming, you might prefer the paginated version. It’s like how I divide a 2000-meter swim into four sessions—breaking it up makes it feel more achievable. Finally, I also provide a PDF version of the notes, making it convenient to print and read at your own pace.
In academia, people often use abbreviation, especially in technical writing. While this can be convenient for the author, it’s not always beginner-friendly. In my notes, I’ll do my best to implement a hover-over feature for annotating abbreviation. When you hover over the cursor on the abbreviation for a second, its full term will appear on the screen. For example: move your cursor on IAT . BTW , if you see NE , it means “Not essential and you may skip”.
List of lectures
Lecture 5. Regression Models
Lecture 6. Model Validation and Selection
Lecture 7. Logistic Regression