0% found this document useful (0 votes)

5 views5 pages

Practical Data Science - Description

The course 'Practical Data Science' focuses on applying Data Science concepts using Python, covering topics like data collection, processing, machine learning, and natural language processing. Students will gain hands-on experience through assignments and projects, with a strong emphasis on participation and attendance. No prior Python knowledge is required, but a basic understanding of programming is necessary, and all materials will be provided in class.

Uploaded by

dineshjanardhan5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views5 pages

Practical Data Science - Description

Uploaded by

dineshjanardhan5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Practical Data Science

Panos Louridas, Associate Professor, AUEB, louridas@[Link]

Overview
The course covers a large area of Data Science, focusing on practical applications with the Python
programming language. Python is one of the most popular choices for handling big data. The clean
syntax of the language, the availability of a large number of mathematical and scientific libraries, and a
large programming community make it an ideal choice for tackling a large variety of real-world
problems. As a result, Python is an essential component of a Data Scientist’s tool chest.

The course treats tools, best practices, and practical applications of theoretical results in Data Science
that enable us to process efficiently and leverage various forms of data.

Key Outcomes
By completing the course the students will be able to use Python in order to:

• Collect, process, and store, using appropriate tools and mechanisms, a variety of different data
from the Internet.
• Carry out mathematical and scientific programming tasks using appropriate libraries.
• Avail themselves of tools to interact and process big data volumes.
• Employ Machine Learning methods and models.
• Carry out Natural Language Processing tasks.
• Solve a problem starting from a general statement, developing an algorithmic solution, then a
fully-fledged implementation.
• Visualise their data and their results of their analyses.

Requirements and Prerequisites

This is a hands-on course. Students will spend a significant amount of time on writing programs and
working with libraries and tools. We will use the Python programming language; although no previous
knowledge of Python is assumed, it is assumed that students do have programming experience that they
can use to reach proficiency in Python fast.

The course does not assume any prior experience in Python. However, basic knowledge of programming
and computer science concepts is required.

Required Course Materials

There is no required textbook. All course materials will be provided in class and available for
downloading.

Students will need to bring their laptops in class in order to try out interactively the material being
presented.
Books
There are many books on the subject, and a lot of free resources on the Internet; the following selection
provides a good foundation for those students who wish to delve deeper on the topics discussed in
class:

• Michael Bowles, Machine Learning in Python: Essential Techniques for Predictive Analysis,
Wiley, 2015.
• Joel Grus, Data Science from Scratch: First Principles with Python, O’Reilly, 2013.
• Wes McKinney, Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython,
O’Reilly, 2012
• Ryan Mitchel, Web Scraping with Python: Collecting Data from the Modern Web, O’Reilly, 2015.
• Brett Slatkin , Effective Python: 59 Specific Ways to Write Better Python (Effective Software
Development Series), Addison-Wesley, 2015.
• Cyrille Rossant, Learning Python for Interactive Computing and Data Visualization, 2nd ed., Packt
Publishing, 2015.

Software/Computing requirements
Students will be able to run the examples and work with the material presented in the course using
online tools and services. Instructions will be given at the relevant units during the course.

Students will also be able to run and work with most of the course material on their own computers. To
to so, they should download and install the Anaconda Python distribution, available for all platforms
at: [Link]

We will be using Python 3.5 in the course, so make sure you pick that version on the download page.
Instructions on how to use Anaconda will be provided in the course, but students should have
downloaded and installed it in time for the first unit.

Grading
This is a practical course; students will be graded on their attendance and participation and on their
competency to work with data in realistic problems in order to show practical results. This competency
will be determined by course assignments.

Participation
In-class contribution is a significant part of your grade and an important part of our shared learning
experience. Your active participation helps me to evaluate your overall performance. You can excel in
this area if you come to class on time and contribute to the course by:

• Providing strong evidence of having thought through the material.

• Advancing the discussion by contributing insightful comments and questions.
• Listening attentively in class.
• Demonstrating interest in your peers' comments, questions, and presentations.
• Giving constructive feedback to your peers when appropriate.
Please arrive to class on time and stay to the end of the class period. Chronically arriving late or
leaving class early is unprofessional and disruptive to the entire class. Repeated tardiness will have
an impact on your grade.
Turn off all electronic devices prior to the start of class. Cell phones tablets and other electronic
devices are a distraction to everyone.

Assignments
There will be three course assignments at two unit intervals.

1. The first assignment will be announced after the first two units and will count 20% towards the
final grade.
2. The second assignment will be announced after the fourth unit and will count 20% towards the
final grade.
3. The third assignment will be announced after the sixth unit and will count 20% towards the final
grade.
4. The course project will be announced at the sixth unit; it will be a substantial undertaking in
which the student will have to analyse a problem, find data to solve it, write the necessary
programs, arrive at a conclusion, and fully document the solution and the results. The proposed
course project will count 40% towards the final grade.

Late assignments will either not be accepted or will incur a grade penalty unless due to documented
serious illness or family emergency. Exceptions to this policy for reasons of civic obligations will only be
made available when the assignment cannot reasonably be completed prior to the due date, you make
suitable arrangements, and give notice for late submission in advance.

Attendance Requirements
Class attendance is essential to success in this course and is part of your grade. An excused absence can
only be granted in cases of serious illness or grave family emergencies and must be documented. Job
interviews and incompatible travel plans are considered unexcused absences. Where possible, please
notify the instructor in advance of an excused absence.

Students are responsible for keeping up with the course material, including lectures, from the first day
of this class, forward. It is the student's obligation to bring oneself up to date on any missed
coursework.

Code of Ethics
Students may not work together on graded assignments unless the instructor gives express permission.

Exercise integrity in all aspects of one's academic work including, but not limited to, the preparation and
completion of all other course requirements by not engaging in any method or means that provides an
unfair advantage. In any case of doubt, students must be able to prove that they are the sole authors of
their work by demonstrating their knowledge to the instructor.
Clearly acknowledge the work and efforts of others when submitting written work as one’s own. Ideas,
data, direct quotations (which should be designated with quotation marks), paraphrasing, creative
expression, or any other incorporation of the work of others should be fully referenced. No plagiarism of
any sort will be tolerated. This includes any material found on the internet. Reuse of material found in
question and answer forums, code repositories, other lecture sites, etc., is unacceptable. You may use
online material to deepen your understanding of a concept, not for finding answers.

Please report observed violations of this policy. Any violations will incur a fail grade at the course and
reporting to the senate for further disciplinary action.

Course Syllabus
The course comprises ten units of three hours each.

Unit 1: Introduction to Python

As the course assumes no prior knowledge of Python, we start with a rapid pace presentation of the
main features of the language. In particular, we will overview syntax, data types, operators, control
structures, functions, classes and objects, file handling. We will get a first view of the Python tools for
Data Science (iPython, Numpy, Scipy, Matplotlib, Pandas, Scikit-learn) by walking through a typical
example with real-world data.

Unit 2: Data Crawling

To carry out Data Science we need to crawl for data from various sources from the Internet. Popular
web services like Facebook, LinkedIn, and Twitter, offer a wealth of data via specific Application
Programming Interfaces (APIs). We will see how we can interact with these services and collect data, the
most common data formats that we encounter in the Internet, and the ways we can read and parse
them.

Unit 3: Data Storage and Retrieval; SQL and noSQL

We will juxtapose the two basic data storage technologies: relational (SQL) and non-relational (NoSQL)
databases. Relational databases have been widely in use for several decades now, while non-relational
databases have gained in popularity in the last few years. We will work with the MySQL relational
database and the MongoDB NoSQL counterpart. We will examine how we can interact with them in
Python. As each technology has its own strengths for specific kinds of data we will analyse their
comparative advantages and the most appropriate application areas.

Unit 4: Numpy and Scipy

The Numpy and Scipy libraries provide the fundamental building blocks for performing mathematical
and scientific computations in Python. They offer optimised data types and efficient algorithm
implementations that we can either use directly or harness them to implement our own specialised
solutions. We will overview their underlying principles and structures and we will investigate how we
can draw on them to solve Data Science problems.
Unit 5: Pandas I
Often we need to examine data in different ways: filter them dynamically using various criteria, combine
and merge data based on common elements, group data using specific values, etc. The Pandas library
provides a rich set of such capabilities that enable us to manipulate, and work interactively with data at
a high level of abstraction.

Unit 6: Pandas II
The Pandas library offers many facilities for working with time-varying data (time series analysis), with
special applications to analysis of financial models. We will use it to become acquainted with the basic
terms of time series processing and we will see how we can handle market data effectively.

Unit 7: Scikit-learn
Machine Learning (ML) is one of the most important branches of Data Science. It includes areas such as
classification, regression, clustering, and dimensionality reduction, which are essential in processing
data and solving many different problems. The Scikit-learn library offers a rich spectrum of Machine
Learning functionalities and we will leverage it to approach some typical applications.

Unit 8: Natural Language Processing

The Natural Language Toolkit (NLTK) is a platform for building programs, in Python, to work with human
language data. We will explore what natural language is about, applications of natural language
processing, and what NLTK has to offer us.

Unit 9: Classification
Although there are many high quality tools, libraries, and frameworks in Python for Data Science there
are always situations where we must develop a solution ourselves. To do that we must know how to
proceed from a general problem description to its algorithmic solution, and then to an actual
implementation. In this unit we will examine how we can solve the classification problem by using a
classic algorithm for that task (ID3). We will start from the underlying principles of the algorithm and
gradually develop a fully implemented solution.

Unit 10: Visualisation

To understand our data we need to be able to create graphical representations; visualisation is a rich
and rewarding field. An image can be a thousand words, if done properly; or it can be a clutter of visual
junk, or, worse, misleading, if we are not aware of some basic rules. After discussing the principles of
visualisation we will show how the Matplotlib framework can help us create effective visualisations,
both in screen and in print.

Applied Data Analytics With Python
No ratings yet
Applied Data Analytics With Python
14 pages
Python Programming Course Overview
No ratings yet
Python Programming Course Overview
21 pages
CS 100 Python Programming Syllabus
100% (1)
CS 100 Python Programming Syllabus
3 pages
Advanced Programming for Data Science Intro
No ratings yet
Advanced Programming for Data Science Intro
32 pages
Syllabus 2026
No ratings yet
Syllabus 2026
6 pages
IE 555: Programming for Analytics Course
No ratings yet
IE 555: Programming for Analytics Course
5 pages
CSE 6040: Data Analysis Syllabus
No ratings yet
CSE 6040: Data Analysis Syllabus
7 pages
Business Analytics Programming Course
No ratings yet
Business Analytics Programming Course
9 pages
Business Intelligence with Python Syllabus
No ratings yet
Business Intelligence with Python Syllabus
4 pages
Python Programming II Course Overview
No ratings yet
Python Programming II Course Overview
10 pages
CSE 101: Intro to Computer Science
No ratings yet
CSE 101: Intro to Computer Science
9 pages
Course Introduction
No ratings yet
Course Introduction
22 pages
Applied Machine Learning with Python
No ratings yet
Applied Machine Learning with Python
12 pages
ISOM 2020: Coding for Business Course
No ratings yet
ISOM 2020: Coding for Business Course
4 pages
UCR Data Science with Python Syllabus
No ratings yet
UCR Data Science with Python Syllabus
5 pages
CDS1001
No ratings yet
CDS1001
8 pages
Introduction to Python Programming
No ratings yet
Introduction to Python Programming
12 pages
ISOM 2020: Coding for Business Course
No ratings yet
ISOM 2020: Coding for Business Course
4 pages
Python for Data Science Course Overview
No ratings yet
Python for Data Science Course Overview
2 pages
CSc 110: Intro to Python Programming
No ratings yet
CSc 110: Intro to Python Programming
6 pages
ITMD 413/513 Open-Source Programming Syllabus
No ratings yet
ITMD 413/513 Open-Source Programming Syllabus
7 pages
Advanced R Programming Course Overview
No ratings yet
Advanced R Programming Course Overview
5 pages
Python Programming for Economists
No ratings yet
Python Programming for Economists
6 pages
BUDT704: Data Processing and Analysis in Python
No ratings yet
BUDT704: Data Processing and Analysis in Python
9 pages
STA 301: Intro to Data Science Overview
No ratings yet
STA 301: Intro to Data Science Overview
12 pages
Python Programming Course Overview
No ratings yet
Python Programming Course Overview
7 pages
Python for Data Science Course Overview
No ratings yet
Python for Data Science Course Overview
3 pages
COMSC 150K Python Programming Syllabus
No ratings yet
COMSC 150K Python Programming Syllabus
7 pages
Python for Data Science Course Overview
No ratings yet
Python for Data Science Course Overview
2 pages
Summer 2023 Machine Learning Syllabus
No ratings yet
Summer 2023 Machine Learning Syllabus
6 pages
ITM 14: IT Application Programming Syllabus
No ratings yet
ITM 14: IT Application Programming Syllabus
3 pages
Machine Learning in Finance Syllabus
No ratings yet
Machine Learning in Finance Syllabus
3 pages
Graduate Course: Machine Learning in R
No ratings yet
Graduate Course: Machine Learning in R
5 pages
ISYE 6740: Machine Learning Syllabus
No ratings yet
ISYE 6740: Machine Learning Syllabus
7 pages
Data Mining & Machine Learning Course
No ratings yet
Data Mining & Machine Learning Course
3 pages
Intro to Python for Statistics 2604
No ratings yet
Intro to Python for Statistics 2604
3 pages
Object-Oriented Development Course Guide
No ratings yet
Object-Oriented Development Course Guide
6 pages
Industrial Engineering with Python Course
No ratings yet
Industrial Engineering with Python Course
31 pages
ISYE 6740 Summer 2022 Syllabus
No ratings yet
ISYE 6740 Summer 2022 Syllabus
6 pages
MIS 433-001 Syllabus - Spring 2026
No ratings yet
MIS 433-001 Syllabus - Spring 2026
6 pages
Numerical Methods Course Syllabus
No ratings yet
Numerical Methods Course Syllabus
7 pages
MSCDA622 Programming in Big Data Analytics
No ratings yet
MSCDA622 Programming in Big Data Analytics
4 pages
CPSC-51100 Statistical Programming Syllabus
No ratings yet
CPSC-51100 Statistical Programming Syllabus
10 pages
Python Programming for Data Science I
No ratings yet
Python Programming for Data Science I
6 pages
Data Programming for Analytics Course
No ratings yet
Data Programming for Analytics Course
7 pages
Data Science Course Overview - NU Lahore
No ratings yet
Data Science Course Overview - NU Lahore
5 pages
DAA Updated Anexures1
No ratings yet
DAA Updated Anexures1
8 pages
CMU Python Programming Syllabus
No ratings yet
CMU Python Programming Syllabus
5 pages
Geog 520 Mos
No ratings yet
Geog 520 Mos
6 pages
Introduction to Data Science Course
No ratings yet
Introduction to Data Science Course
7 pages
DNSC 4211: Programming for Analytics Syllabus
No ratings yet
DNSC 4211: Programming for Analytics Syllabus
5 pages
COMP 382: Algorithms Course Overview
No ratings yet
COMP 382: Algorithms Course Overview
3 pages
Essentials of Python Programming
No ratings yet
Essentials of Python Programming
15 pages
COMM 337 Course Outline 2018.01.02
No ratings yet
COMM 337 Course Outline 2018.01.02
4 pages
Course Overview
100% (1)
Course Overview
29 pages
ECOR 1041: Python Programming Course
No ratings yet
ECOR 1041: Python Programming Course
11 pages
Advanced Python for Business Applications
No ratings yet
Advanced Python for Business Applications
8 pages
Designing Effective Language Tests
No ratings yet
Designing Effective Language Tests
18 pages
Weekend Activities: Affirmative & Negative
No ratings yet
Weekend Activities: Affirmative & Negative
2 pages
Reviving Latin American Philosophy
No ratings yet
Reviving Latin American Philosophy
22 pages
ASIC Verification Engineer CV
No ratings yet
ASIC Verification Engineer CV
2 pages
Audio Signal Processing Simulation
No ratings yet
Audio Signal Processing Simulation
13 pages
Narrative Essay on Bullying Experience
No ratings yet
Narrative Essay on Bullying Experience
5 pages
TIBCO BW Interview Questions Overview
No ratings yet
TIBCO BW Interview Questions Overview
8 pages
Overview of 1 & 2 Samuel Insights
No ratings yet
Overview of 1 & 2 Samuel Insights
16 pages
Upgrade ESXi 6.5 to 6.7 Guide
No ratings yet
Upgrade ESXi 6.5 to 6.7 Guide
4 pages
M.Tech CSE R21 Course Structure 2008
No ratings yet
M.Tech CSE R21 Course Structure 2008
47 pages
WinSC: ADCP Command and Deployment Tool
No ratings yet
WinSC: ADCP Command and Deployment Tool
42 pages
Understanding Search Engines and URLs
No ratings yet
Understanding Search Engines and URLs
39 pages
Linear Algebra Homework Assignment 2023
No ratings yet
Linear Algebra Homework Assignment 2023
3 pages
Schnuerch Erdfelder 2020
No ratings yet
Schnuerch Erdfelder 2020
21 pages
Telugu Translation and Interpretation Course
0% (1)
Telugu Translation and Interpretation Course
2 pages
Web-ADI Excel Setup and Error Solutions
100% (1)
Web-ADI Excel Setup and Error Solutions
18 pages
Analyzing "Imagination" and "You Raise Me Up"
No ratings yet
Analyzing "Imagination" and "You Raise Me Up"
4 pages
Rethinking Early Vietnamese History
No ratings yet
Rethinking Early Vietnamese History
38 pages
TCS 2003-2004 Question Paper Overview
No ratings yet
TCS 2003-2004 Question Paper Overview
18 pages
Analysis of Machado's "I Listen to the Songs"
No ratings yet
Analysis of Machado's "I Listen to the Songs"
1 page
Numpy Array Creation Techniques
No ratings yet
Numpy Array Creation Techniques
6 pages
Effective List-Making Techniques
No ratings yet
Effective List-Making Techniques
1 page
Enhancing Business Documents with IT
No ratings yet
Enhancing Business Documents with IT
29 pages
Roman Engraved Gems in Lisbon Museum
No ratings yet
Roman Engraved Gems in Lisbon Museum
74 pages
Financial Advisor Bot for Personal Finance
No ratings yet
Financial Advisor Bot for Personal Finance
30 pages
Aesthetics of the Machine-God
No ratings yet
Aesthetics of the Machine-God
33 pages
P.5 Mathematics Lesson Notes: Sets & Numeracy
No ratings yet
P.5 Mathematics Lesson Notes: Sets & Numeracy
109 pages
TEACHING AND EVALUATION SCHEME FOR 5th Semester (CSE) (Wef 2020-21)
No ratings yet
TEACHING AND EVALUATION SCHEME FOR 5th Semester (CSE) (Wef 2020-21)
25 pages
FortiGate Web Filtering Configuration Guide
No ratings yet
FortiGate Web Filtering Configuration Guide
95 pages
Jesus Heals the Paralyzed Man
No ratings yet
Jesus Heals the Paralyzed Man
15 pages

Practical Data Science - Description

Uploaded by

Practical Data Science - Description

Uploaded by

Practical Data Science

Panos Louridas, Associate Professor, AUEB, louridas@[Link]

Requirements and Prerequisites

Required Course Materials

• Providing strong evidence of having thought through the material.

Unit 1: Introduction to Python

Unit 2: Data Crawling

Unit 3: Data Storage and Retrieval; SQL and noSQL

Unit 4: Numpy and Scipy

Unit 8: Natural Language Processing

Unit 10: Visualisation

You might also like