0% found this document useful (0 votes)
10 views2 pages

Introduction to Data Science Course

CSD

Uploaded by

Harshil Gupta
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views2 pages

Introduction to Data Science Course

CSD

Uploaded by

Harshil Gupta
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Annexure 8

INTRODUCTION TO DATA SCIENCE

Course Code: CSD 101 Credit Units: 03


Total Hours: 45
Course Objective:
To provide a basic understanding of data science field and its implementation in Various Industries.

Course Contents:
Module I: Introduction : (5 Hours)

Introduction to Data Science, Definition and description of Data Science, history and development of Data Science,
terminologies related with Data Science, basic framework and architecture, difference between Data Science and
business analytics, importance of Data Science in today’s business world, primary components of Data Science,
users of Data Science and its hierarchy.

Module II: Data Science Project Management(8 Hours)


Data Science project framework, Stages in a Data Science Project ,execution flow of a Data Science project, various
components of Data Science projects, stakeholders of Data Science project, , challenges and scope of Data Science
project management, process evaluation model, comparison of Data Science project methods, improvement in
success of Data Science project models.

Module III: Mathematics behind Data Science: (12 Hours)


Role of mathematics in Data Science, importance of probability and statistics in Data Science, important types of
statistical measures in Data Science : Descriptive, Predictive and prescriptive statistics, introduction to statistical
inference and its usage in Data Science, application of statistical techniques in Data Science, Basics of probability,
permutation and combination, introduction to linear Regression model, mean, mode, median, Outliers, Leverage
points, Business Logics, Feature Engineering, bad data identification and correction.

Module IV: Computers in Data Science(9 Hours)


Role of computer science in Data Science, various components of computer science being used for Data Science,
role of relation data base systems in Data Science: SQL, NoSQL, role of data warehousing in Data Science, terms
related with data warehousing techniques, importance of operating concepts and memory management, various
freely available software tools used in Data Science : R, Python, important proprietary software tools, different
business intelligence tools and its crucial role in Data Science project presentation.

Module V: Applications of Data Science: (8 Hours)


Applications of Data Science in various fields. industry use cases of Data Science implementation General use
cases of data science in Finance-defaulter detection, E-Commerce-Recommendation Systems, Banking Industry-
Loan credibility System, Real Estate, and GIS Systems- optimal route founding (Olla, Uber)

Course Outcome:
Student are well acquainted with knowledge about Data Science and can do EDA Projects

Examination Scheme:

Components A CT S/V/Q/HA ESE


Weightage (%) 5 15 10 70

A: Attendance, CT: Class Test,:, S/V/Q/HA: Seminar/Viva/Quiz/ Home Assignment, EE: End Semester
Examination

Text & References:


Texts:

• Think Python by Allen B Downey


• Cathy O’Neil and Rachel Schutt. Doing Data Science, Straight Talk From The Frontline. O’Reilly. 2014.
• Avrim Blum, John Hopcroft and Ravindran Kannan. Foundations of Data Science.
Annexure 8

References:
• Jure Leskovek, Anand Rajaraman and Jeffrey Ullman. Mining of Massive Datasets. v2.1, Cambridge
University Press. 2014. (free online)
• Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. ISBN 0262018020. 2013.
• Foster Provost and Tom Fawcett. Data Science for Business: What You Need to Know about Data Mining
and Data-analytic Thinking. ISBN 1449361323. 2013.
• Trevor Hastie, Robert Tibshirani and Jerome Friedman. Elements of Statistical Learning, Second Edition.
ISBN 0387952845. 2009. (free online)

Common questions

Powered by AI

Data science and business analytics, while related, serve different purposes and use different methodologies. Data science is broader, encompassing data collection, cleaning, and preparation, as well as advanced analytics and predictive modeling using machine learning and artificial intelligence. Business analytics, on the other hand, focuses on applying statistical analysis to business operations to improve decision-making processes. While data science involves predictive and prescriptive analytics to simulate and forecast future outcomes, business analytics primarily deals with descriptive analytics to understand current and past performance trends and improve business performance through insights .

Computer science supports data science by providing the computational frameworks and tools necessary for data processing, storage, and analysis. Key technologies include relational databases (SQL) and non-relational databases (NoSQL), which manage and query large datasets efficiently. Data warehousing technologies support the storage and retrieval needs. Programming languages like R and Python are pivotal in writing scripts and developing models. Additionally, business intelligence tools facilitate data visualization and reporting, enabling data scientists to communicate insights effectively. These technologies create a robust infrastructure that supports complex data science tasks, from data management to advanced analytics .

The primary components of data science include data collection, data preparation, data analysis, data visualization, and data-driven decision making. Data collection involves gathering data from various sources which can then be cleaned and prepared for analysis. Data analysis involves examining the data to uncover patterns and insights, often using statistical models and algorithms. Data visualization helps present findings in an accessible way to support decision-making. These components collectively contribute to the importance of data science in the business world by enabling companies to leverage data for strategic insights and competitive advantage .

Data warehousing supports data science processes by providing a structured repository where large volumes of disparate data can be stored, retrieved, and managed efficiently. Key techniques in data warehousing include Extract, Transform, Load (ETL) processes, which prepare data for analysis by extracting it from various sources, transforming it into a suitable format, and loading it into the data warehouse. This centralized data storage facilitates sophisticated analyses and ensures data consistency, supporting the data-driven insights crucial for data science projects. This enhances the ability to perform complex queries and data evaluations rapidly .

Mathematics underpins data science projects through various stages such as data analysis, modeling, and evaluation. Probability and statistics are critical for making inferences from data and identifying patterns, which are essential for developing predictive models. Statistical measures like descriptive, predictive, and prescriptive statistics provide frameworks for understanding data characteristics and behaviors. Linear regression, an important mathematical model, helps in predicting continuous outcomes and interpreting relationships between variables. Permutation and combination are used in feature selection and optimization problems. Mathematical concepts ensure robust data analysis and inform decision-making through accurate model evaluations .

Statistical inference contributes to data science by providing methodologies to draw conclusions about a population based on sample data. It includes hypothesis testing, estimation, and prediction, which are crucial for developing and validating models. Typical applications in data science involve making predictions, estimating trends, and quantifying uncertainty around model predictions. In practice, it enables data scientists to make data-driven decisions with confidence, validate assumptions, and enhance models by ensuring their applicability across different scenarios. Statistical inference thus underpins many predictive analytics tasks and model evaluations, guiding decision-making processes .

Challenges in managing data science projects include dealing with large and complex datasets, integrating data from diverse sources, and aligning project goals with business objectives. Additionally, there is a need to effectively communicate technical results to non-technical stakeholders and manage the iterative nature of data science work. To improve success rates, projects can implement strategic planning phases, agile methodologies to allow flexibility and iterations, and robust project evaluation frameworks to ensure alignment with objectives. Improving team collaboration and stakeholder engagement, along with using process evaluation models, helps in identifying potential pitfalls early and increases project success rates .

Industry-specific applications of data science include finance, where it is used for defaulter detection by analyzing credit histories and transaction patterns to predict default risks. In e-commerce, data science is employed to build recommendation systems that analyze customer behavior to personalize product suggestions, thereby improving sales and customer retention. In banking, data science aids in evaluating loan credibility through credit scoring models. Real estate and GIS industries use data science to find optimal routes, enhancing logistics and reducing travel times. Each application leverages data to optimize operations, increase efficiency, and drive better decision-making processes .

Feature engineering plays a critical role in data science as it involves transforming raw data into informative features that better represent the underlying problem to predictive models, thereby improving their performance. It includes creating new variables from existing data, encoding categorical features, normalizing numerical features, and selecting the most relevant attributes. By improving model input quality, feature engineering enhances a model's ability to detect patterns and make accurate predictions, ultimately aiding in extracting meaningful insights from data .

Different types of statistical measures contribute significantly to data science by supporting various analytical tasks. Descriptive statistics summarize data characteristics, enabling understanding of basic patterns and distributions. Predictive statistics focus on making forecasts and identifying patterns to anticipate future events based on historical data. Prescriptive statistics complement these by providing recommendations based on predictive insights, often using optimization and simulation techniques. Together, these measures allow data scientists to gather a holistic view of data, predict outcomes, and propose data-driven strategies, enhancing the overall decision-making process .

You might also like