Understanding Data Analytics Process
Understanding Data Analytics Process
Data is everywhere, and people use data every day, whether they realize it or not. Data is also
crucial in a professional sense. Organizations that use data to drive business strategies often
find that they are more confident, proactive, and financially savvy. As a result, data analytics is
important across many industries. A sneaker manufacturer might look at sales data to determine
which designs to continue and which to retire, or a health care administrator may look at
inventory data to determine the medical supplies they should order.
In simple words it helps people and businesses learn from data like what worked in the past,
what is happening now and what might happen in the future.
Here’s a breakdown of the major steps involved in the data analytics process:
Step Description
1. Data Collection Gathering raw data from sources like websites, sensors, surveys, or
databases
2. Data Cleansing Removing errors, duplicates, and inconsistencies to ensure data
quality
3. Data Analysis Applying statistical and computational methods to find patterns
and insights
4. Data Understanding what the results mean in context and drawing
Interpretation conclusions
5. Data Presenting findings using charts, graphs, and dashboards for clarity
Visualization
Data analytics is used in many fields like banking, farming, shopping, government and
more. It helps in many ways:
Analytics reveals trends and chances for growth that may not be obvious.
Example: A farmer might discover which crops grow best in certain seasons using
data
✅ 6. Supports Innovation
✅ 7. Reduces Risks
Companies using analytics can act faster and smarter than competitors.
They can adapt to changes and stay ahead in the market.
Example: A business that tracks customer trends can launch new products before
others.
The Data analytics lifecycle was designed to address Big Data problems and data science
projects. The process is repeated to show the real projects. To address the specific demands
for conducting analysis on Big Data, the step-by-step methodology is required to plan the
various tasks associated with the acquisition, processing, analysis, and recycling of data.
Phase 1: Discovery -
o The team studies data to discover the connections between variables. Later, it selects
the most significant variables as well as the most effective models.
o In this phase, the data science teams create data sets that can be used for training for
testing, production, and training goals.
o The team builds and implements models based on the work completed in the
modelling planning phase.
o Some of the tools used commonly for this stage are MATLAB and STASTICA.
o The team creates datasets for training, testing as well as production use.
o The team is also evaluating whether its current tools are sufficient to run the models
or if they require an even more robust environment to run models.
o Tools that are free or open-source or free tools Rand PL/R, Octave, WEKA.
o Commercial tools - MATLAB, STASTICA.
o Following the execution of the model, team members will need to evaluate the
outcomes of the model to establish criteria for the success or failure of the model.
o The team is considering how best to present findings and outcomes to the various
members of the team and other stakeholders while taking into consideration
cautionary tales and assumptions.
o The team should determine the most important findings, quantify their value to the
business and create a narrative to present findings and summarize them to all
stakeholders.
Phase 6: Operationalize -
o The team distributes the benefits of the project to a wider audience. It sets up a pilot
project that will deploy the work in a controlled manner prior to expanding the project
to the entire enterprise of users.
o This technique allows the team to gain insight into the performance and constraints
related to the model within a production setting at a small scale and then make
necessary adjustments before full deployment.
o The team produces the last reports, presentations, and codes.
o Open source or free tools such as WEKA, SQL, MADlib, and Octave.
3. Predictive Data Analytics: Predictive data analytics is used to guess what might
happen in the future. It looks at current and past data to find patterns and make
forecasts. Businesses use it to predict things like customer behavior, future sales or
possible risks.
Predictive analytics focuses on what is likely to happen in the future. It uses past and
current data, along with statistical models and machine learning, to forecast outcomes such
as customer behavior, market demand or risks.
Applications:
Linear Regression: Predicting numerical outcomes like sales or revenue growth.
Time Series Forecasting: Estimating future trends such as demand or stock prices.
Data Mining: Uncovering patterns that indicate future behavior.
Predictive Modeling: Creating models to predict customer churn, fraud or credit risk.
Decision Analysis & Optimization: Evaluating scenarios to determine the best
strategy.
Transaction Profiling: Detecting suspicious or unusual financial transactions.
Example: A bank applies predictive analytics to estimate the likelihood of customers
defaulting on loans, helping it decide whether to approve or reject applications.
Pros:
Anticipates future risks and opportunities.
Improves planning and resource allocation.
Cons:
Accuracy depends heavily on data quality.
Models may be complex and require advanced expertise.
4. Prescriptive Data Analytics: Prescriptive data analytics helps to choose the best action
or solution. It looks at different options and suggests what should be done next.
Companies use it for things like loan approval, pricing decisions and managing
machines or schedules.
Prescriptive analytics focuses on what action should be taken. It doesn’t just predict
outcomes but also recommends the best steps to achieve goals or reduce risks. By
combining big data, business rules, optimization and AI, it suggests the most effective
decisions.
Applications:
Decision Support: Helping leaders choose the most effective action.
Healthcare Strategic Planning: Optimizing resources using operational, demographic
and economic data.
Risk Mitigation: Suggesting strategies to minimize exposure to risks.
Opportunity Optimization: Identifying actions to maximize benefits from upcoming
market trends.
What-if Analysis: Simulating different decision outcomes and their consequences.
Example: A logistics company uses prescriptive analytics to recommend the most efficient
delivery routes, reducing fuel costs and improving on-time delivery.
Pros:
Provides actionable recommendations along with predictions.
Helps optimize decisions for maximum benefits.
Cons:
Requires advanced technology and expertise.
Implementation can be costly and resource-intensive.
Data Analytics finds applications across various industries and domains. Here are some
examples-
Business and marketing industry- In the business and marketing industry, Data
Analytics is used for customer segmentation and targeting, market trend analysis,
pricing optimisation, sales forecasting and social media analytics.
Healthcare industry- Data Analytics in Healthcare industry is used for patient risk
assessment, disease outbreak prediction, clinical decision support systems, drug
effectiveness analysis and healthcare resource allocation.
Finance industry- In the finance industry, Data Analytics is used for fraud detection
and prevention, risk modelling and management, investment portfolio optimisation,
credit scoring and financial market analysis.
Excel plays a crucial role in data analytics by enabling users to organize, manipulate, and
analyze large datasets efficiently. It offers powerful features such as pivot tables, conditional
formatting, and advanced functions (like XLOOKUP, IFERROR, MATCH) that help identify
trends, clean data, and perform complex calculations. Excel also supports data visualization
through charts and graphs, making it easier to communicate insights. Additionally, tools like
Power Query and Data Analysis Toolpak enhance its capability to handle data transformation
and statistical analysis. Its accessibility and user-friendly interface make it ideal for both
beginners and experienced analysts to extract actionable insights and support decision-
making.
OR
✅ 5. Data Visualization
PivotTables are Excel’s most powerful feature for summarizing large datasets.
You can:
Group data by categories
Calculate totals, averages, percentages
Drill down into details interactively
Ideal for dashboards and reporting.
Excel supports:
What-If Analysis: Test different scenarios using Goal Seek or Data Tables.
Forecast Sheets: Predict future trends using historical data.
Solver Add-in: Optimize decisions based on constraints.
Excel supports several data types that determine how data is stored, displayed, and processed:
1. Text (String)
Represents any combination of letters, numbers, and symbols treated as text.
Examples: Names, addresses, product codes.
Even if numbers are entered as text (e.g., phone numbers), Excel treats them as
strings.
2. Number-
Numeric values used for calculations.
Includes integers, decimals, percentages, and scientific notation.
5. Error Values
Indicate problems in formulas or data.
Examples: #DIV/0! (division by zero), #N/A (value not available), #REF! (invalid
cell reference).
Excel can import and connect to data from various sources, enabling dynamic and
comprehensive data analysis:
4. Databases-
Excel can connect to databases like SQL Server, Access, Oracle, and MySQL using
ODBC or other connectors.
Enables importing large datasets and refreshing data dynamically.
5. Web Data-
Data can be imported from web pages or APIs using Power Query or built-in web
query tools.
A powerful tool within Excel to connect, combine, and transform data from multiple sources
including files, databases, online services, and more.
Excel supports connections to online services like SharePoint, Azure, and cloud storage
platforms.
Summary Table-
Here are common data types you’ll encounter when working in Excel:
Number 250, 3.14, -45 Used for calculations like totals or averages
Text (String) "Apple", "HR Labels, names, categories
Dept"
Date & Time 19-Sep-2025, 18:04 For tracking events, deadlines, timestamps
Boolean (Logical) TRUE, FALSE Used in formulas and conditions
Error #DIV/0!, #VALUE! Appears when a formula fails or is invalid
Currency ₹1,500.00, $99.99 Number formatted with currency symbol
Percentage 75%, 0.25 Useful for ratios, growth rates
Excel can pull data from many external sources. Here are some examples:
SharePoint List Company Tasks List Import task or project data from SharePoint
Power BI
Sales Dashboard Dataset Use Power BI dataflows directly in Excel
Dataset
Excel Interface-
An Excel spreadsheet, called a workbook, contains one or more worksheets, each a grid
of 1,048,576 rows and 16,384 columns for data management. Workbooks organize related
data across multiple worksheets in a single file.
1. Understanding Excel Workbooks and Worksheets-
Workbook: A single Excel file containing one or more worksheets.
Worksheet: A grid with over 17 billion cells (1,048,576 rows × 16,384 columns) for
entering and analyzing data.
Starting Point: Open a blank workbook or select a template via File > New.
2. Key Features of Excel Spreadsheets
2.1. Rows and columns
Rows (horizontal, numbered) and columns (vertical, lettered) intersect to form cells.
Example: Row 3 and Column B create cell B3.
Capacity: 1,048,576 rows and 16,384 columns per worksheet.
For example, we can use the formula to find the average of the integers in column C from
row 2 to row 7:
= AVERAGE(D2:D7)
The range of values on which we want to average is defined by D2:D6. The formula is
located near the name field on the formula tab.
Detailed Explanation-
Worksheet: After opening an Excel workbook, we get a window of Excel to perform any
required operation that is the worksheet.
Cell: The cell is the shortest part of Excel. Usually, a cell is denoted by the combination of
row and column headings. Cell A1 means that the cell is located in the first column and first
row. Cell numbers are unique.
Active Cell: When we click on any cell, it becomes the active cell. The address of the active
cell is shown in the Name Box at the upper left corner of the sheet.
Row: Row is the horizontal collection of cells and is denoted by a number. On the left side of
the sheet, you can see the row bar that indicates all rows. Excel has 1,048,576 rows in total.
Column: The column is the vertical collection of cells and is denoted by alphabetic
characters. You will have a bar on the upper side of the worksheet consisting of alphabetic
characters starting from A, that is the column bar. Each character of this bar indicates
individual columns. Excel has 16,384 columns in total.
Title Bar: The Title bar is the horizontal bar that contains the name of the Excel file and is
located at the top of the workbook.
Quick Access Toolbar: The Quick Access Toolbar or QAT is a customized toolbar, located
at the left-upper side of the workbook. We gather all the frequently used commands here so
that there is no need to search for them.
Control Buttons: Control buttons are located at the upper-right side of the workbook and are
used for control purposes like minimizing, maximizing, and closing.
Ribbon: The Ribbon is the key interface in Excel that organizes and contains various
commands. It is divided into tabs, each housing groups of related commands. It was first
introduced in Excel 2007 and is available in all the latest versions including Excel 365.
Formula Bar: Formula bar is located below the ribbon. We can insert, modify, and delete
any value or formula in Excel from this bar. We can also see the formula of any cell in this
bar.
Name Box: The Name Box is on the left side of the Formula Bar. We can see the address cell
or name of a range from this box. We can also go to the desired cell or select the range by
inserting the cell reference or name in this box.
Scroll Bar: The scroll bar is used to navigate the Excel worksheet in 4 directions. There are
two scroll bars: the horizontal scroll bar for left and right, and the vertical scroll bar for up
and down directions.
Sheet Tab: The sheet tab contains the names of all available sheets on the workbook. We can
also create new sheets from there. It is also called the leaf bar. It is located at the bottom left
corner of a workbook above the Status Bar.
Status Bar: The status bar is a horizontal bar located at the bottom of the workbook. It
indicates the current status of the selected cell and other mathematical calculations like sum,
average, count, etc.
Zoom Slider: It refers to the zoom adjustment of Excel workbooks that ranges from 10% to
400%. It is located at the bottom-right corner of the Excel workbook.
View Buttons: This button refers to different ways to present the workbook in Excel. There
are three modes: Normal, Page Layout, and Page Break Preview.
Excel's structure is made of two pieces, the Ribbon and the Sheet.
Have a look at the picture below. The Ribbon is marked with a red rectangle and the Sheet is
marked with a yellow rectangle:
First, let's start with explaining the Ribbon.
The Ribbon provides shortcuts to Excel commands. A command is an action that allows you
to make something happen. This can for example be to: insert a table, change the font size, or
to change the color of a cell.
The Ribbon may look crowded and hard to understand at first. Don't be scared, It will
become easier to navigate and use as you learn more. Most of the time we tend to use the
same functionalities over again.
The Ribbon is made up by the App launcher, Tabs, Groups and Commands. In this
section we will explain the different parts of the Ribbon.
App launcher
The App launcher icon has nine dots and is called the Office 365 navigation bar. It allows
you to access the different parts of the Office 365 suite, such as Word, PowerPoint and
Outlook. App launcher can be used to switch seamlessly between the Office 365 applications.
Tabs
The tab is a menu with sub divisions sorted into groups. The tabs allow users to quickly
navigate between options of menus which display different groups of functionality.
Groups
The groups are sets of related commands. The groups are separated by the thin vertical line
break.
Commands
Now, let's have a look at the Sheet. Soon you will be able to understand the relationship
between the Ribbon and the Sheet, and you can make things happen.
The Sheet is a set of rows and columns. It forms the same pattern as we have in math
exercise books, the rectangle boxes formed by the pattern are called cells.
Multiple Sheets
You start with one Sheet by default when you create a new workbook. You can have many
sheets in a workbook. New sheets can be added and removed. Sheets can be named to making
it easier to work with data sets.
First, click the plus icon, shown in the picture below, create two new sheets:
Tip: You can use the hotkey Shift + F11 to create new sheets. Try it!
Second, right click with your mouse on the relevant sheet and click rename:
Second, right click with your mouse on the relevant sheet and click rename:
Third, enter useful names for the three sheets:
In this example we used the names Data Visualization, Data Structure and Raw Data. This
is a typical structure when you are working with data.
Two significant terms, Formula and Function, are unique ingredients that play with the
dataset and give you an accurate result in less time.
Excel formulas are mathematical expressions to compute the integer values usually defined in
the worksheet.
Functions are inbuilt formulas in Microsoft Excel that evaluate complex mathematical and
statistical problems.
We have a dataset of vegetable costs from a week in a household. Using this, we explore
essential Excel functions to analyze the data effectively.
1. Sum Function
This function adds all values within a selected range, helping us calculate the total cost of
vegetables.
Syntax:
=SUM(number1, [number2], ...)
Where,
number1, [number2]: are the numbers, cell references (e.g., C3:C8), or ranges to add
together. Use this to calculate the total of values, like summing vegetable costs in
C3:C8
2. Max Function
We use this to identify the highest value in a range, such as finding the most expensive
vegetable.
Syntax:
=MAX(number1, [number2], ...)
Where,
number1, [number2]: are the numbers or range (e.g., C5:C12) to evaluate. Apply this to
find the highest value, such as the most expensive item in a list.
3.
3. Min Function
This helps us find the lowest value in a range, useful for spotting the cheapest item.
Syntax:
=MIN(number1, [number2], ...)
Where,
number1, [number2]: are the numbers or range (e.g., C5:C10) to evaluate. Use this to
determine the lowest value, like the cheapest vegetable.
4. Average Function
This counts the number of cells with values in a range, aiding in data tracking.
Syntax:
=COUNT(number1, [number2], ...)
Where,
number1, [number2]: are the numbers or range (e.g., C5:C10) to count.
6. Len Function
We use this to determine the number of characters in a text string, useful for data
validation.
Syntax:
=LEN(text)
Where,
text is a cell reference (e.g., A5) containing the text string to measure. Apply this to
count characters, like the length of "brinjal".
7. Sumif Function
This adds values in a range that meet a specific condition, enhancing selective calculations.
Syntax:
=SUMIF(range, criteria, [sum_range])
Where,
range is the cells to check (e.g., C5:C10), criteria is the condition (e.g., ">20"), and
[sum_range] (optional) is the cells to sum if different from range. Use this to add values
meeting a condition, like costs over 20.
8. AverageIf Function
Excel is a powerful tool that allows users to manage and manipulate large amounts of data
quickly and efficiently. One of the key features of Excel is the ability to sort and filter data,
which can be especially useful for working with datasets that contain hundreds or even
thousands of rows of information. In this article, we'll explore how to use Excel's sorting and
filtering capabilities to organize and analyze data.
Sorting can be done in ascending or descending order based on a single column or multiple
columns. Filtering allows users to view specific data based on criteria, such as dates, text, or
numerical values. Excel also provides advanced filtering options, such as filtering by color or
by using complex criteria.
Sorting Data
Sorting Data by One Column
Sorting data by one column is a straightforward process that allows users to rearrange data
based on the values in a specific column. To sort data by one column, follow these steps:
Select the range of cells that you want to sort.
Click the "Data" tab on the Excel ribbon.
Click the "Sort A-Z" button to sort data in ascending order or the "Sort Z-A" button to
sort data in descending order. Excel will automatically sort the data based on the
values in the selected column, either in ascending or descending order.
Excel will now sort the data based on the values in the selected columns, in the order that
they were added. For example, if you sorted by column A first and then by column B, Excel
would sort the data based on the values in column A first, and then sort the data within each
group of values in column A based on the values in column B.