MODULE II
DATA EXPLORATION AND FEATURE ENGINEERING
DATA ANALTYICS
• Data analytics refers to the process and practice of
analyzing data to answer questions, extract insights,
and identify trends to enhance productivity and
business gain.
• Analytics is used to extract meaningful insights from
data that can drive decision-making and strategy
formulation.
• There are four types of analytics you can leverage
depending on the data you have and the type of
knowledge you’d like to gain.
STEPS IN DATA ANALTYICS
STEPS IN DATA ANALTYICS
1. Identify Business Questions
The first step to turning data into insights is to define a clear set of goals and questions.
Below you can find a list of examples
• What does the company need?
• What type of problem are we trying to solve?
• How can data help solve a problem or business question?
• What type of data is required?
• What programming languages and technologies will we use?
• What methodology or technique will we use in the data analysis
process?
• How will we measure results?
• How will the data tasks be shared among the team?
STEPS IN DATA ANALTYICS
2. Collect and Store Data
A huge amount of data is generated every second.
The three main sources of data are
• Company data. It is created by companies in their day-to-day activity. It can be
web events, customer data, financial transactions, or survey data. This data is
normally stored in relational databases.
• Machine data. With recent advances in sensitization and IoT technologies, an
increasing number of electronic devices are generating data. They range from
cameras and smartwatches to smart houses and satellites.
• Open data. Given the potential of data to create value for economies,
governments and companies are releasing data that can be freely used. This can
be done via an open data portal and APIs (Application Programming Interface).
We can then classify data into two types:
• Quantitative data
• Qualitative data.
STEPS IN DATA ANALTYICS
• Quantitative data. It’s information that can be
counted or measured with numerical values. It’s
normally structured in spreadsheets or
SQL databases.
• Qualitative data. The main bulk of data that is
generated today is qualitative. Some common
examples are text, audio, video, images, or
social media data. Qualitative data is often
unstructured, making it difficult to store and
process in standard spreadsheets or relational
databases.
STEPS IN DATA ANALTYICS
3. Clean and Prepare Data
• Raw data rarely comes in ready for analysis. Assessing data quality is essential to finding and
correcting errors in your data. This process involves fixing errors like:
• Removing duplicate rows, columns, or cells.
• Removing rows and columns that won’t be needed during analysis. This is especially important if
you’re dealing with large datasets that consume a lot of memory.
• Dealing with white spaces in datasets, also known as null values
• Managing anomalous and extreme values, also known as outliers
• Standardizing data structure and types so that all data is expressed in the same way.
• Spotting errors and anomalies in data is in itself a data analysis, commonly known as
exploratory data analysis.
STEPS IN DATA ANALTYICS
4. Analyze Data
• Depending on the goals of the analysis and the type of data, different techniques are
available.
• Over the years, new techniques and methodologies have appeared to deal with all kinds of data.
• They range from simple linear regressions to advanced techniques from cutting-edge fields, such
as machine learning, natural language processing (NLP), and computer vision.
5. Visualize and Communicate Results
• The last step of the data science workflow is visualizing and communicating the results of your
data analysis.
• To turn your insights into decision-making, you must ensure your audience and key stakeholders
understand your work.
Different Tyes of Data Analytics
DESCRIPTIVE ANALTYICS
Descriptive Analytics: This is the most basic form of data analysis, GAIN INSIGHTS and it focuses on
summarizing data and describing what it means. DATA VISUVALIZATION LIKE PIE CHARTS BAR CHARTS
AND LINE GRAPH
• Descriptive analytics answers the questions of who, what, when, and where.
• For example, a company might use descriptive analytics to track its sales figures over
time or to see which products are the most popular.
• Descriptive statistics are broken down into measures of central tendency and
measures of variability (spread).
• Measures of central tendency include the mean, median, and mode, while measures
of variability include standard deviation, variance, minimum and maximum variables.
DIAGNOSTIC ANALTYICS
• Diagnostic Analytics : This type of analytics goes beyond simply describing
data and instead tries to identify the root causes of patterns or trends.
• Diagnostic analytics answers the question of why.
• For example, a company might use diagnostic analytics to figure out why sales
of a particular product are declining.
• deep-dive or detailed data examination to understand why
• something happened. It
• It is characterized by techniques such as drill-down, data discovery,
• data mining, and correlations
PREDECTIVE ANALTYICS
• Predictive analytics answers the question of what will happen.
• For example, a company might use predictive analytics to forecast demand for its
products or to identify customers who are at risk of churning.
• The term predictive analytics refers to the use of statistics and modeling techniques to
make predictions about future outcomes and performance.
• Predictive analytics looks at current and historical data patterns to determine if those
patterns are likely to emerge again. uses historical data to make accurate forecasts about
data
• patterns that may occur in the future.
• This allows businesses and investors to adjust where they use their resources to take
advantage of possible future events.
• Predictive analysis can also be used to improve operational efficiencies and reduce risk.
PRESCRIPTIVE ANALTYICS
• Prescriptive analytics is an advanced form of data analytics that goes beyond
describing past performance or predicting future outcomes.
• It focuses on providing recommendations on what actions to take to achieve
desired objectives or outcomes.
• predictive analytics forecasts what might happen, prescriptive analytics suggests
actions to influence those outcomes
• The goal is to optimize decision-making for future opportunities or to mitigate
potential risks
• It’s a powerful tool for strategic planning and operational efficiency. It uses graph
analysis, simulation, complex event processing, neural networks, and
• recommendation engines from machine learning
Difference Between four types of advanced analytics