Skip to content

Latest commit

 

History

History
338 lines (273 loc) · 14.7 KB

syllabus.md

File metadata and controls

338 lines (273 loc) · 14.7 KB

INFO 490: Advanced Data Science

INFO 490: Advanced Data Science explores advanced concepts in data science by employing a practical approach, including machine learning; probabilistic programming; text, network, and graph analysis; and cloud computing.

Course Goals

Upon completion of this course, students will be expected to understand advanced data science concepts. Students will learn the practical aspects of applying machine and statistical learning in a variety of contexts, as well as different aspects of cloud computing. Specific concepts that will be covered including supervised and unsupervised learning, dimensional reduction, clustering, probabilistic programming, text mining, graph analysis, network analysis, Hadoop, NoSQL data stores, Spark, and streaming data analysis.

Prerequisites

As a pre-requisite for this course, you must have mastered the material in INFO 490: Foundations of Data Science. Generally, this is demonstrated by having taken this previous course. However, instructor approval can also be granted for those who can demonstrate proficiency in the required topics.

Note: At present, we are using the CS department's cluster to run a course JupyterHub server. Each student is running a Dockerized version of the course software stack. This provides many advantages including robustness against crashes, simplicity of deploying software updates, reduced requirements for students (simply a modern web browser, we have used tablets and smart phones), and simplifying assignment submission. You can also run a Docker container locally, as in previous courses, but this approach is not recommended. In addition, if you work locally, since assignments are automatically collected from your cloud-based Docker container, you must ensure that you push local changes to your course cloud Docker container prior to the deadline.

As part of the orientation week activities, we have provided a demo IPython Notebook that you should use to get familiar with working in an IPython Notebook on the course JupyterHub server. Each week there will be an index.ipynb IPython Notebook that will give you access to the different course notebooks for that week, please use these on the course server to move through the course material.

Texts

There are no required textbooks for this course. Instead, we will utilize Internet accessible websites, videos, and documentation as supplemental material to the lesson content. We also will include links, as relevant, to readings from books that are freely available to University of Illinois students, staff and faculty via the University's Safari subscription.

Academic Integrity

Academic honesty is essential to this course and the University. Any instance of academic dishonesty (including but not limited to cheating, plagiarism, falsification of data, and alteration of grades) will be documented in the student's academic file. In addition, at a minimum the particular assessment, quiz, or assignment will be given a zero. Serious or repeated offenses may be punished more severely.

Guidelines for collaborative work: Discussing course material with your classmates is in general a good idea, but each student is expected to do his or her own work. On assignments, you may discuss the problems and concepts behind them, but you are responsible for your own answers. Please do not post code in the forums! Finally, on assessments and quizzes, your answers must of course be your own. For further info, see the Student Code, Part 4. Academic Integrity.

Communication

The instructional staff will use the Announcement Forum on the course Moodle to communicate important course information. Do not unsubscribe from this forum or you risk missing important news!

The preferred method for student communication in this course is to use the Q&A Forum on the course Moodle. The instructional staff monitors this forum and will respond in less than 24 hours (in general we will respond even faster than this, especially during normal business hours). Furthermore, your fellow students may be able to help even faster. We also encourage you to search this forum prior to making a new post since your question may have already been answered. You can search a forum on Moodle by using the Search forums tool that is located on the upper right corner of any Moodle forum.

If you have a question (that is not answered in this syllabus nor on the online course forums) you can email the instructional staff, however, this should be a last resort. If we feel the question is best answered on the Q&A Forum, we reserve the right to post your question and our answer on Moodle. Also, note the information contained in the Point Reductions section of this syllabus.

Finally, we have created a gitter room for this course. This is a completely public, real-time communication channel that you can also use to ask questions.

Course Outline

Note: The following list of topics is tentative. We build the course during the semester for several reasons:

  1. This is a new course (in a new field)!
  2. The enrolled students span multiple colleges and even more departments across the University of Illinois.
  3. Both undergraduate and graduate students are enrolled.

As a result, we feel it is imperative to be able to change the planned pace and material to benefit the majority of enrolled students.

Week Topics Activities
Orientation Week Course Overview & Syllabus Review
Week 1 Introduction to Machine Learning
Week 2 General Linear Models
Week 3 Introduction to Supervised Machine Learning
Week 4 Supervised Machine Learning
Week 5 Introduction to Unsupervised Machine Learning
Week 6 Unsupervised Machine Learning
Week 7 Introduction to Text Mining
Week 8 Introduction to Social Media Analysis
Week 9 Advanced Text Mining
Week 10 Introduction to Network Analysis
Week 11 Probabilistic Programming
Week 12 Introduction to Cloud Computing
Week 13 Introduction to NoSQL
Week 14 Advanced Cloud Computing
Week 15 Introduction to Deep Learning

Weekly Format

Each week will provide learning objectives and an outline of the activities for that week with a list of all deadlines and corresponding point values for assignments.

Videos

Each week there will be at least one instructor created video that will offer a broader context for the new week, explain key concepts, and demonstrate important tasks. To view the instructor videos, you will need to login to the Illinois Mediaspace (links are embedded in the relevant weekly overview). You will be given twenty points for viewing each weekly instructor overview video. In case you are wondering, Illinois Mediaspace tracks the viewing of course videos, so we know not only if you load a video, but how much of the video you actually watched.

Readings

Readings will consist of articles and excerpts from books and Web sites, internet-accessible videos demonstrating a concept, and, in some cases, IPython Notebooks that can be viewed statically on the Github website, or (via the preferred approach) by interacting with them via the course JupyterHub server. You will be required to read and be familiar with the content of these documents. Readings are contextualized as part of the weekly lesson content and are located in the "Readings" section of each lesson.

Lessons

Lessons will expand upon, or clarify key concepts in the reading assignments or supplement or add to the reading. All lessons for a given week must be completed by 6:00 PM Central on Thursday of that week.

Lesson Assessments

Each week will contain three lesson modules (except for the last week, which will contain only one). A lesson module will will include a Moodle quiz designed to be taken after completing the readings and carefully reviewing the lesson material. Lesson quizzes will allow two attempts, to ensure students have mastered the relevant material before advancing to the next lesson module. The lessons assessments must all be completed by 6:00 PM Central on Thursday of that week.

Assignments

Note: This section is still being finalized.

Every week but the first and last will contain an assignment that will involve one or more computational tasks related to the focus for that given week. Your assignment will be automatically collected at the deadline from the course JupyterHub server. These assignments will be automatically graded for your instructor grade, and will also be randomly distributed for peer assessment. You will have up to five assignments to grade as part of peer assessment. You will receive thirty points for simply grading your peer's assignments. Your peer assessment score will be worth a maximum of forty points, and we will drop the highest and lowest score and average the three remaining scores.

To receive full credit from instructor grading, your assignment must be submitted prior to the deadline. There will be a 18-hour grace period, in which an assignment can be submitted, albeit with an automatic 50% reduction in the maximum possible score. After this grace period, no assignments will be accepted. The full credit assignment deadline is 6:00 PM Central on the Monday following the relevant week.

Peer Review

Weekly assignments will be reviewed by your course peers, as well as automatic instructor grading. 70 points (out of the maximum 150 points for each assignment) for each weekly assignment submission will derive from peer review, 80 points (out of the maximum 150 points for each assignment) are assigned from automated instructor review. You will receive 30 points each week for simply viewing and grading your peers' assignments. Note that you can (and should) still grade your peers even if you miss an assignment submission. Peer review of an assignment must be completed by 6:00 PM Central on Saturday of the following week (i.e., you submit your assignment on a Monday and you must peer assess other students assignments by the following Saturday). You will be assigned assignments to grade approximately one hour after the late assignment deadline, thus around 1:00 pm Tuesday afternoon of each week.

Item Grade
Instructor Assessment 80 points
Peer Grading 30 points
Peer Assessments 40 points
Total 150 points

Note that we will only review clearly erroneous peer assessments (this means there needs to be a major problem). Review requests that are deemed insignificant are subject to an instructor determined point reduction.

Weekly Quizzes

In addition to the lesson quizzes, each week will conclude with a weekly quiz. The weekly quiz is designed to test your overall mastery of the content for each given week. Unlike the lesson quizzes, weekly quizzes will be timed and will not allow multiple attempts. The quiz must be completed by 6:00 PM Central on Friday of that week.

Exams

This course is project-based in its use of assignments that build progressively on content mastery, application, and peer review; there are no exams in this course.

Grading

Grading Distribution

Assignment Points Occurrences Total Points
Pre-Class Activity: Introduce Yourself 60 1 60
Orientation Quiz 70 1 70
Lesson Assessments 60 14 (Week 15 is only 20 points) 860
Weekly Quizzes 70 14 (No quiz in Week 15) 980
Weekly Overview Videos 20 16 (including the Orientation Week Video) 320
Assignments (Weeks 2-14) 150 13 1950
Total 4240

Unlike past courses, we do not plan on dropping any weekly scores.

Grading Scale

Final grades will be graded on a curve, if necessary. The letter grade cutoffs will be set at the traditional 90%, 80%, and 70% limits, and plus/minus will be added if you are within two points of the traditional cutoffs (so 100–98 is an A+ and 90–92 is an A-).

Percentage Letter Grade
98-100 A+
92-98 A
90-92 A-
88-90 B+
82-88 B
80-82 B-
78-80 C+
72-78 C
70-72 C-
68-70 D+
62-68 D
60-62 D-
Below 60 F

Point Reductions

This is a large, online course with only one instructor and one teaching assistant. Thus we require that students read the syllabus and search online forums before either emailing us directly or posting a new question in the Moodle forums. Failure to abide by this request may, at the sole discretion of the instructor, result in a loss of five points per obvious infraction. Please note that we are not trying to stifle questions (such as why is the FAA server down?). We simply need to minimize the number of such emails/questions we receive.

Extra Credit

There is a course Wiki hosted on the course github repository. If you have a problem and obtain a solution (either through your own efforts or in partnership with an instructor), consider writing your problem and solution up as a FAQ post in the github wiki. You get extra credit for doing this and also help your classmates!

To get credit for your wiki entry you must contact the course assistant, Samantha Thrush. She will review your post and indicate how many points you will receive, and if she would be willing to review an edited post for additional information. You can submit multiple Wiki entries.

Sample Weekly Schedule

The following table summarizes the typical weekly schedule, where the assignments are collected the Monday following the Friday when quizzes are due.

Task Days into Week Date/Time
Week Opens 0 Monday, 12:00 am
Lessons Completed 3 Thursday, 6:00 pm
Lesson Assessments 3 Thursday, 6:00 pm
Weekly Quiz 4 Friday, 6:00 pm
Assignment Collected 7 The following Monday, 6:00 pm
Late Assignments Collected 8 The following Tuesday, 12:00 pm
Assignments distributed for Peer Assessment 8 The following Tuesday, 1:00 pm
Peer Assessment Deadline 12 The following Saturday, 6:00 pm