Data/Analytics
Career Paths
Eng. Ahmed Amr
ahmed.amr@rubikal.com
Road Map
● Defining Data Science.
● Data Science Marketplace.
● Required Skills for Data Science.
● Data Science Career Paths.
● Day in the life of Data Scientist.
Data Science, hype/reality?
“Data Scientist: The Sexiest Job of the 21st Century” – Thomas H.
Davenport and D.J. Patil
“Analytics is defined as the scientific process of transforming
data into insight for making better decisions.” – The Institute for
Operations Research and the Management Sciences (INFORMS)
“With more and more companies using big data, the demand for
data analytic specialists,—sometimes called data scientists, who
know how to manage the tsunami of information, spot patterns
within it and draw conclusions and insights—is nearing a frenzy.”
– Chris Morris, CNBC
Data Scientist
● “A person who is better at statistics than any software engineer
and better at software engineering than any statistician.”
- Josh Wills- Director of Data Science at Cloudera
● “Data scientists are inquisitive: exploring, asking questions, doing
“what if” analysis, questioning existing assumptions and
processes. Armed with data and analytical results, a top-tier data
scientist will then communicate informed conclusions and
recommendations across an organization’s leadership structure. ”
- Anjul Bhambhri, IBM
Defining
Data Science
History
Role of Computer Science
Empowering Statistics
Solving a wide practical problems by providing
number of crunching and massive storage.
Inventions
Accelerating the pace of the marriage between
Statistics and Data Science.
1960s
Database Management Systems (DBMS)
1970s
Relational DBMS
Knowledge Discovery and Data Mining
Late 1980s
Terms like Knowledge Discovery and Data Mining
started to be used widely.
Big Data
Early 1990s
Explosion of business data.
1997
Official start of the term big data.
Data Science
Late 1990s
The phrase data science first appeared to inspire
professionals to harness the power of data by
effectively analyzing them and producing useful
intelligence.
Statistician is replaced by data scientist.
Analytics
Mid 2000s
The word analytics was adopted by data scientists
to emphasize the fact that an increasing number
of companies started to heavily rely on the
statistical and quantitative analysis of data
as well as predictive modeling to make informed
decisions so that they can compete better with
other businesses.
Defining
Data Science
Enabling Technologies
1-Data Infrastructure Technologies
● Support how data is :
1) Shared.
2) Processed.
3) Consumed.
● Distributed Computing and Cloud Computing.
○ Virtualization and distributed file sharing.
Distributed Computing
● An approach to break down a task into smaller
pieces that are easier to process.
● Each element in the task is assigned to a
processor which could be geographically
dispersed.
● A software is necessary to manage all aspects of
distributed computing.
○ i.e. Hadoop
Cloud Computing
● Platform to support distributed computing.
● A bunch of computers housed in data centers.
● Can be used as an easy hardware for distributed
computing.
2-Data Management Technologies
● Data Management is handled by DBMS.
● Data Science requires highly scalable, reliable,
efficient ways to store, manage and process data.
● Structure and unstructured data.
3-Visualization Technologies
● Acquired insights need to be conveyed to
leadership of an organization.
● Effective communication with non-experts.
● Responsible for increasing the impact of the data
science project results.
Data Science
Marketplace
Fraud Detection
● Criminals are committing fraud against banking
sector.
● In the past:
○ Significant human intervention.
○ Desired outcome to improve accuracy.
● Today:
○ Machine Learning and Big data analytics
Social Media Analytics
● Huge Amount of data, Millions of posting.
● Metadata is valuable.
○ Data about data, such as location information
and timestamps.
● IBM personality insights product.
○ Uncover a deeper understanding of customers
personality to companies.
Data Science
Skills
Data Mining and Analytics Skill
1. Classification:
● Constructs a model with knows labels.
● Data represented into discrete sets.
● Can categorize trustworthy and not
trustworthy users for an online banking
system.
Data Mining and Analytics Skill
2. Prediction:
● Builds a model that predict a continuous or
ordered values.
● These models can predict for example, mean
time to failures for computers.
Data Mining and Analytics Skill
3. Clustering:
● Is a process of grouping similar data objects
into a class.
● Helps reveal features that distinguish one class
of data objects from the other, leading to new
discoveries on a dataset.
● As an example, clustering can reveal people
with similar purchasing behaviours.
●
Machine Learning Skill
● Machine learning is based on self-learning or
self-improving algorithms.
● In machine learning, a computer starts with a
model, and continues to enhance it through
trial and error.
● It can then provide meaningful insight in the
form of classification, prediction, and
clustering.
●
Machine Learning Skill
● A data scientist needs to be familiar with
models that commonly used in Data such as:
○ Logistic regression.
○ Support vector machines.
○ Bayesian methods.
●
Statistics Skill
● Lays a foundation for data science.
● The more you know about it, the better.
● At minimum, you need to know:
○ Probability.
○ Correlations.
○ Variables, distributions, and regression.
○ Null hypothesis significance tests.
○ Confidence intervals, ANOVA, t-tests, and chi-square
○ Tools like:
■ R, Excel.
●
Visualization Skill
● Important skill to overcome the challenge of
effectively communicating the results of data
analytics to an audience.
● Tableau offers one of the most popular and
comprehensive visualization tools for data
scientists. It supports a variety of visualization
elements such as different types of charts,
graphs, maps.
●
Programming Skill
● Ability to code in at least one of the
programming languages such as Python, Java,
or Scala.
● Many languages have powerful libraries to
clean and process your data (pandas)
● Along with powerful libraries to build machine
learning models (i.e. sci-kit learn)
●
Big-Data Analytics Skills
●
Data Team Skills Variety
Data Science
Roles and Career
Paths
●
●
●
●
●
●
●
●
A day in the life of data scientist
Questions?
Thanks!