CS 320 Data Science Programming II Syllabus
CS 320 Data Science Programming II Syllabus
Welcome to Data Science Programming II! In this course, we will learn object-oriented programming to create tree and graph data
structures to represent hierarchical data and implement algorithms for efficiently searching these structures.
We'll often create our own datasets, using techniques like logging, benchmarking, web scraping, and A/B testing.
In the last third of the semester we'll explore some basic machine learning techniques, including regression, classification, clustering,
and decomposition.
Course Instructor
Dr. Gurmail Singh (Teaching Faculty - Department of Computer Sciences) [Link]@[Link]
Lecture recordings will be provided, but is subject to change based on in-person attendance. In-person attendance is expected.
Attendance will be recorded via TopHat (or other tool). Also, on paper attendance may be taken on random days. If attendance is
healthy and it feels like people are keeping up, I'll usually be posting recordings. If the attendance drops, I will stop posting
recordings (warning will be issued one lecture prior to this change).
Instructional Modality
LEC001 and LEC002: in-person
Communication
We message the class regularly via @[Link] email and/or Canvas announcements. We recommend updating your Canvas settings
so that the "Announcement" option is "Notify immediately" so that you don't miss something important. Also, you are supposed to
check your @[Link] email regularly.
See the help page for details about how to contact us.
We have various forms for us to leave (optionally anonymous) feedback, report grading issue and exam conflict, and thank
TAs/mentors.
Grading
Grading breakdown
Letter Grades
At the end of the semester, we will assign final grades based on these thresholds:
93% - 100%: A
88% - 92.99%: AB
80% - 87.99%: B
75% - 79.99%: BC
70% - 74.99%: C
60% - 69.99% D
Lab attendance
We'll post a weekly lab activities document. You can work through it individually, or with your assigned study group. TAs and peer
mentors will walk around to answer questions and check your progress in finishing the lab activities. If you have extra time at the lab
after completing the lab document, you can work on projects with your assigned study group.
To obtain the point for a lab, you need to submit screenshots of the work (code and/or running results) you have done so far to
Canvas within five minutes after the lab ends. You don't have to finish every lab activity, but sufficient (as determined by the Lab TA)
working progress is needed.
Projects
Submission: Everybody will individually upload either a .py or a .ipynb or a zip (as specified) file for each project with the submission
tool.
Collaboration: Even though everybody will make their individual submission, every project will have (1) a group part to be optionally
done with your assigned study group and (2) an individual part. For the group part, any form of help from anybody in your group is
allowed; I recommend you find times for everybody on the group to work at the same time so you can help each other through
coding difficulties in this part. You're also welcome to do the "group" part individually, or with a subset of your assigned study group.
For the individual part, you may only receive help from course staff (instructors/TAs/peer mentors); you may not discuss this part
with anybody else (in the class or otherwise) or get help from them.
Late Policy:
Code Review: TAs will give you comments on specific parts of your assignment. This feedback process is called a "code review",
and is a common requirement in industry before a programmer is allowed to add their code changes to the main codebase. TAs will
also include reasons for deductions in the comments. Read your code reviews carefully; even if you receive 100% on your work, we'll
often give you tips to save effort in the future.
Project Grading: Grades will be largely based on automatic tests that we run. We'll share the tests with you before the due date, so
you should rarely be too surprised by your grade. Though it shouldn't be common, we may deduct points for serious hardcoding, not
following directions, or other issues. Some bugs (called non-deterministic bugs) don't show up every time code is run -- if you have
such an issues, we may give you a different grade based on the tester than what you were expecting based on when you ran it.
Finally, our tests aren't very good at evaluating whether plots and other visualizations look how they should (a human usually needs
to evaluate that).
Auto-grader: The autograder will be run periodically during 2 days days prior to a project deadline (from Tuesday night if the
deadline is on Thursday and so on). Because of this, we expect you to try submitting your project early and make sure nothing
crashes. However, this should not be a substitute for running [Link] locally. You should only try submitting once you pass the tests
locally.
Clearing the auto-grader is a mandatory part of the project submission process. Regular project deadlines will be applicable
for autograder failures as well. That is, your project submission must clear auto-grader within the hard deadline for a project. If
not, we are unable to grade your project submission.
If your project fails auto-grader, it will be your responsibility to utilize office hours and make an appropriate resubmission. The
resubmission will also be counted towards late day usage.
Allowed Packages: anything that comes pre-installed with Python and any packages used during the lectures and listed in the
projects are allowed. Using unapproved packages may result in a score of zero when submitted for grading because the autograder
won't be able to run your code without those packages.
Quizzes
There will be a short Canvas quiz due at the end of most Tuesdays. Make sure you know the rules regarding what is allowed and what
is not. Eacn quiz will be allowed to be taken twice with unlimited time (in given number of days) but the quiz score will be the
average score of both the attempts.
Allowed
however much time you need during the specified days
discussing answers with members of your assigned study group who are taking the quiz at the same time
referencing texts, notes, or provided course materials
searching online for general information
running code
NOT allowed
taking it more than twice
discussing answers with anybody outside of your group
discussing with members of your group who have already completed the quiz when you haven't completed it yourself yet
posting anything online about the quizzes
using such material potentially posted by other 320 students who broke the preceding rule
Readings
We'll sometimes assign readings from the following sources (all free):
Cheating
Yeah, of course you shouldn't cheat, but what is cheating? The most common form of academic misconduct in these classes
involves copying/sharing code for programming projects. Here's an overview of what you can and cannot do:
Acceptable
any collaboration with your assigned study group members on the group part of a project
using ChatGPT to ask simple questions. For example: how do I use "self" inside a class constructor?
copying code examples from online examples that is NOT specific to your project (if project solutions are leaked online, you
may not use that). If you copy code, you must cite it in your code with a comment (think of it like citing a quote in a essay --
without the citation, you're plagarizing). Here're some code citing template:
# copied/adapted from ... (website name) ... (link to the post) ...
e.g., # copied/adapted from Stackoverflow: [Link]
pythonLinks to an external site. (For detail, see the Citing Code section below.)
# copied/adapted from ... (Large Language Models name) ... (prompt used) ...
e.g., # copied/adapted from GPT4: "write a Python function to find the median of a list."
NOT Acceptable
Citing Code: you can copy small snippets of code from stackoverflow (and other online references) if you cite them. For example,
suppose I need to write some code that gets the median number from a list of numbers. I might search for "how to get the median of
a list in python" and find a solution at [Link]
I could (legitimately) post code from that page in my code, as long as it has a comment as follows:
if (lstLen % 2):
return sortedLst[index]
else:
return (sortedLst[index] + sortedLst[index + 1])/2.0
In contrast, copying from a nearly complete project (that accomplishes what you're trying to do for your project) is not OK. When in
doubt, ask us! The best way to stay out of trouble is to be completely transparent about what you're doing.
Similarity Detection: We will use automated tools to look for similarities across submissions. We take cheating detection seriously
to make the course fair to students who put in the honest effort.
Copyright © 2024 Department of Computer Sciences, UW-Madison
Automated similarity detection tools are crucial in maintaining academic integrity by ensuring that students' work is original and complies with the course's academic standards. Given the ease with which code can be shared and reused in programming courses, these tools help detect unauthorized copying or collaboration, thereby deterring dishonest practices. They promote a fair learning environment where individual effort is rewarded, protect the value of the course's credentials, and ultimately uphold the educational institution's reputation .
Lecture resources and attendance policies enhance learning by promoting engagement and accountability. The provision of lecture recordings is contingent on consistent attendance, which incentivizes students to participate in person, enhancing interaction and real-time feedback. Furthermore, attendance contributes to 4% of the grade, directly encouraging students to attend. These policies cultivate a structured learning environment and ensure that students are responsibly engaged in the course content beyond just accessing online materials .
Offering multiple-choice exams with Honorlock is likely to enhance exam integrity by monitoring students' actions during the test to prevent cheating, thus maintaining a fair assessment environment. However, this method may induce stress for some students due to the perceived scrutiny and technical requirements, potentially affecting their performance. To mitigate these concerns, clear communication about the technology and support mechanisms should be provided, ensuring students can confidently demonstrate their knowledge and abilities .
Students relying solely on group discussions for quiz preparations may face challenges such as reliance on others, potentially leading to gaps in individual understanding if group members are not equally prepared. The dynamics of the group can affect focus, with discussions veering off-topic and time being wasted if not properly managed. Additionally, discrepancies in comprehension levels can result in frustration or misinformation. Thus, while collaborative learning is beneficial, balanced personal study is crucial for comprehensive understanding .
The policy of not rounding off scores can incentivize students to maintain consistent effort throughout the course, knowing that every decimal of their score counts. This high-stakes environment underscores the importance of diligence and precision in both learning and assessment, reducing complacency. While this could drive some students to strive for the next grade threshold, it may induce stress for borderline students, potentially affecting their performance. Thus, it encourages resilience but may also require additional support from instructors .
The grading policy for quizzes in the Data Science Programming II course allocates 10% of the overall grade to 11 online quizzes, with the lowest score being dropped. This system encourages students to perform consistently well in quizzes throughout the semester since a single poor performance can potentially be compensated by better scores on other quizzes. Additionally, by allowing the quizzes to be taken twice and the average score being calculated, students have an incentive to thoroughly understand the material to enhance their improved second attempt, fostering a deeper learning process .
The late policy allows students to use 12 late days throughout the semester, where they can extend a project's deadline by up to 3 days without penalty. While this provides flexibility to manage unforeseen circumstances and may reduce stress, it requires students to plan their time carefully to avoid running out of late days. The policy also prevents abuse by capping late day usage, ensuring fairness. However, the restriction of not using any late days for the final project heightens pressure to complete it on time. This dual approach teaches time management while still accommodating genuine delays .
Incorporating a group component in project work fosters collaborative learning by allowing students to engage in knowledge-sharing and problem-solving as a team. This environment mimics real-world scenarios where teamwork is crucial, helping students develop communication, coordination, and collective problem-solving skills. The project structure, which allows for group work on certain parts, encourages students to benefit from diverse perspectives and expertise, facilitating a more comprehensive understanding of programming concepts and tasks .
Machine learning topics such as regression, classification, clustering, and decomposition provide foundational knowledge critical for data science. Regression and classification techniques equip students with tools to predict and classify data, facilitating analysis patterns and decision-making. Clustering teaches segmentation of datasets into groups, a key skill in exploratory data analysis. Decomposition enables simplifying complex data sets, providing deeper insights and data representation. These topics collectively lay a foundation for advanced data applications and problem-solving in real-world data science scenarios .
The auto-grader system impels students to write code that not only performs correctly but also adheres to specified project requirements, promoting attention to detail and precision. Students are encouraged to iteratively test their code locally before submission, fostering a deeper understanding of debugging processes. This system provides immediate feedback, which is beneficial in helping students identify and learn from their mistakes quickly, thereby enhancing their practical coding skills and preparing them for real-world programming challenges .