0% found this document useful (0 votes)

5 views19 pages

OLAP in Descriptive Analytics

Uploaded by

adifa mahmud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views19 pages

OLAP in Descriptive Analytics

Uploaded by

adifa mahmud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Nature of Data

Data is a collection of facts – usually obtained as the result of experiences, observations,

or experiments. Data are the main ingredient for any BI, data science, and business
analytics initiative. The raw material for what popular decision technologies produce—
information, insight, and knowledge. Data is the lowest level of abstraction (from which
information and knowledge are derived). Data quality and data integrity is critical to
analytics.

Metrics for Analytics Ready Data

▪ Data source reliability: The originality and appropriateness of the storage medium
where the data are obtained.
▪ Data content accuracy: Data are correct and are a good match for the analytics
problem.
▪ Data accessibility: The data are easily and readily obtainable.
▪ Data security and data privacy: Data security means that the data are secured to
allow only those people who have the authority and the need to access them and to
prevent anyone else from reaching them.
▪ Data richness: All required data elements are included in the data set. It also means
that the information content is complete (or near complete) to build a predictive
and/or prescriptive analytics model.
▪ Data consistency: The data are accurately collected and combined/merged.
▪ Data currency/data timeliness: The data should be up-to-date (or as recent/new as
they need to be) for a given analytics model.
▪ Data granularity: This requires that the variables and data values be defined at the
lowest (or as low as required) level of detail for the intended use of the data.
▪ Data validity: This is the term used to describe a match/mismatch between the
actual and expected data values of a given variable.
▪ Data relevancy: The variables in the data set are all relevant to the study being
conducted.

A Simple Taxonomy of Data

Data (datum in singular form) refers to a collection of facts usually obtained as the result of
experiments, observations, transactions, or experiences. Data can consist of numbers,
letters, words, images, voice recordings, and so on, as measurements of a set of variables
(characteristics of the subject or event that we are interested in studying). Data are often
viewed as the lowest level of abstraction from which information and then knowledge is
derived.
➢ At the highest level of abstraction, one can classify data as structured and
unstructured (or semi structured). Unstructured data/semi structured data are
composed of any combination of textual, imagery, voice, and Web content.
Structured data are what data mining algorithms use and can be classified as
categorical or numeric.
Unstructured/textual data – Targeted for humans to process/digest.
Structured data – Targeted for computers to process.
➢ The categorical data can be subdivided into nominal or ordinal data,
whereas numeric data can be subdivided into intervals or ratios.
➢ Nominal data contain measurements of simple codes assigned to objects as labels,
which are not measurements. For example, the variable marital status can be
generally categorized as (1) single, (2) married, and (3) divorced.
➢ Ordinal data contain codes assigned to objects or events as labels that also
represent the rank order among them. For example, the variable credit score can be
generally categorized as (1) low, (2) medium, or (3) high.
➢ Interval data are variables that can be measured on interval scales. A common
example of interval scale measurement is temperature on the Celsius scale.
➢ Ratio data include measurement variables commonly found in the physical sciences
and engineering. Mass, length, time, plane angle, energy, and electric charge are
examples of physical measures that are ratio scales.
Data Preprocessing Steps
A Summary of Data Preprocessing Tasks and Potential Methods

Statistical Modelling for Business Analytics

Descriptive analytics has two main branches: statistics and online analytics processing
(OLAP). OLAP is the term used for analyzing, characterizing, and summarizing structured
data stored in organizational databases (often stored in a data warehouse or in a data
mart) using cubes (i.e., multidimensional data structures that are created to extract a
subset of data values to answer a specific business question). The OLAP branch of
descriptive analytics has also been called business intelligence.
Statistics – A collection of mathematical techniques to characterize and interpret data.
Descriptive Statistics – Describing the data (as it is).
Inferential statistics – Drawing inferences about the population based on a sample data.
Relationship between Statistics and Descriptive Analytics

Descriptive Statistics for Descriptive Analytics

Descriptive statistics describes the basic characteristics of the data at hand, often one
variable at a time. Using formulas and numerical aggregations, descriptive statistics
summarizes the data in such a way that often meaningful and easily understandable
patterns emerge from the study. In business analytics, descriptive statistics plays a critical
role—it allows us to understand and explain/present our data in a meaningful manner
using aggregated numbers, data tables, or charts/graphs. In essence, descriptive statistics
helps us convert our numbers and symbols into meaningful representations for anyone to
understand and use. Such an understanding helps not only business users in their
decision-making processes but also analytics professionals and data scientists to
characterize and validate the data for other more sophisticated analytics tasks.
The methods in descriptive statistics can be classified as either measures for central
tendency or measures of dispersion.
Measures of Centrality Tendency (Also Called Measures
of Location or Centrality)
Measures of centrality are the mathematical methods by which we estimate or describe
central positioning of a given variable of interest. A measure of central tendency is a single
numerical value that aims to describe a set of data by simply identifying or estimating the
central position within the data.
Arithmetic Mean or Simple Average
The arithmetic mean (or simply mean or average) is the sum of all the values/observations
divided by the number of observations in the data set. It is by far the most popular and
most commonly used measure of central tendency. It is used with continuous or discrete
numeric data.

Median (The Number in the Middle)

The median is the measure of center value in a given data set. It is the number in the
middle of a given set of data that has been arranged/sorted in order of magnitude (either
ascending or descending). If the number of observations is an odd number, identifying the
median is very easy—just sort the observations based on their values and pick the value
right in the middle. If the number of observations is an even number, identify the two
middle values, and then take the simple average of these two values. The median is
meaningful and calculable for ratio, interval, and ordinal data types. Once determined, one-
half of the data points in the data is above and the other half is below the median. In
contrary to the mean, the median is not affected by outliers or skewed data.
Mode
The mode is the observation that occurs most frequently (the most frequent value in our
data set). The mode is most useful for data sets that contain a relatively small number of
unique values. Although it is a useful measure (especially for nominal data), mode is not a
very good representation of centrality, and therefore, it should not be used as the only
measure of central tendency for a given data set.
Which central tendency measure is the best?
— Use the mean when the data are not prone to outliers and there is no significant level of
skewness.
— Use the median when the data have outliers and/or it is ordinal in nature.
— Use the mode when the data are nominal.

Perhaps the best practice is to use all three together so that the central tendency of the
data set can be captured and represented from three perspectives.
It is a better practice to present descriptive statistics as a package—a combination of
centrality and dispersion measures—as opposed to a single measure such as mean.

Measures of Dispersion (Also Called Measures of Spread or

Decentrality)
Measures of dispersion are the mathematical methods used to estimate or describe the
degree of variation in a given variable of interest. They represent the numerical spread
(compactness or lack thereof) of a given data set.
Range
The range is perhaps the simplest measure of dispersion. It is the difference between the
largest and the smallest values in a given data set (i.e., variables). So, we calculate range
by simply identifying the smallest value in the data set (minimum), identifying the largest
value in the data set (maximum), and calculating the difference between them (range =
maximum- minimum).
Variance
A more comprehensive and sophisticated measure of dispersion is the variance. It is a
method used to calculate the deviation of all data points in a given data set from the
mean. To prevent the offsetting of negative and positive differences, the variance takes
into account the square of the distances from the mean. The formula for a data sample can
be written as
where n is the number of samples, x is the mean of the sample, and xi is the ith value in the
data set. The larger values of variance indicate more dispersion, whereas smaller values
indicate compression in the overall data set.
Standard Deviation
The standard deviation is also a measure of the spread of values within a set of data. The
standard deviation is calculated by simply taking the square root of the variations. The
following formula shows the calculation of standard deviation from a given sample of data
points.

Mean Absolute Deviation (MAD)

It is a simpler way to calculate the overall deviation from the mean. Specifically, the mean
absolute deviation is calculated by measuring the absolute values of the differences
between each data point and the mean and then summing them. This process provides a
measure of spread without being specific about the data point being lower or higher than
the mean. The following formula shows the calculation of the mean absolute deviation:

Quartiles and Interquartile Range

A quartile is a quarter of the number of data points given in a data set. Quartiles are
determined by first sorting the data and then splitting the sorted data into four disjoint
smaller data sets.
A common way of expressing quartiles is as an interquartile range, which describes the
difference between the third quartile (Q3) and the first quartile (Q1), telling us about the
range of the middle half of the scores in the distribution.
Box-and-Whiskers Plot
The box-and-whiskers plot (or simply a box plot) is a graphical illustration of several
descriptive statistics about a given data set. They can be either horizontal or vertical, but
vertical is the most common representation, especially in modern-day analytics software
products. The box plot shows the centrality (median and sometimes also mean) as well as
the dispersion (the density of the data within the middle half—drawn as a box between the
first and third quartiles), the minimum and maximum ranges (shown as extended lines from
the box, looking like whiskers, that are calculated as 1.5 times the upper or lower end of
the quartile box), and the outliers that are larger than the limits of the whiskers.
Understanding the Specifics about Box-and-Whiskers Plots

Shape of a Distribution
Distribution is the frequency of data points counted and plotted over a small number of
class labels or numerical ranges. A very well-known distribution is called normal
distribution, which is perfectly symmetric on both sides of the mean and has numerous
well-founded mathematical properties that make it a very useful tool for research and
practice.
There are two commonly used measures to calculate the shape characteristics of a
distribution: skewness and kurtosis.
A histogram (frequency plot) is often used to visually illustrate both skewness and kurtosis.
Skewness is a measure of asymmetry (sway) in a distribution of the data that portrays a
unimodal structure—only one peak exists in the distribution of the data. Because normal
distribution is a perfectly symmetric unimodal distribution, it does not have skewness; that
is, its skewness measure (i.e., the value of the coefficient of skewness) is equal to zero. The
skewness measure/value can be either positive or negative. If the distribution sways left
(i.e., the tail is on the right side and the mean is smaller than median), then it produces a
positive skewness measure; if the distribution sways right (i.e., the tail is on the left side
and the mean is larger than median), then it produces a negative skewness measure.

Kurtosis is another measure to use in characterizing the shape of a unimodal distribution.

As opposed to the sway in shape, kurtosis focuses more on characterizing the
peak/tall/skinny nature of the distribution. Specifically, kurtosis measures the degree to
which a distribution is more or less peaked than a normal distribution. Whereas a positive
kurtosis indicates a relatively peaked/tall distribution, a negative kurtosis indicates a
relatively flat/short distribution. As a reference point, a normal distribution has a kurtosis
of 3. The formula for kurtosis can be written as

Regression Modeling for Inferential Statistics

Regression, especially linear regression, is perhaps the most widely known and used
analytics technique in statistics. It is a part of inferential statistics. It is used to characterize
relationship between explanatory (input) and response (output) variable. It can be used
for—
Hypothesis testing (explanation)—investigating potential relationships between different
variables.
Forecasting (prediction)—estimating values of a response variable based on one or more
explanatory variables.
CORRELATION VERSUS REGRESSION
Correlation gives an estimate on the degree of association between the variables.
Regression attempts to describe the dependence of a response variable on one (or more)
explanatory variables.
Correlation is interested in the low-level relationships between two variables
Regression is concerned with the relationships between all explanatory variables and the
response variable.

SIMPLE VERSUS MULTIPLE REGRESSION

If the regression equation is built between one response variable and one explanatory
variable, then it is called simple regression.
Multiple regression is the extension of simple regression when the explanatory variables
are more than one.
In both cases, the relationship between the response variable and the explanatory
variable(s) is linear and additive in nature.

How do we develop linear regression models?

The simplest thing that one can do is to draw a
Scatter Plot where the y-axis represents the values of the response variable and the x-axis
represents the values of the explanatory variable. Simple regression analysis aims to find a
mathematical representation of this relationship.
Ordinary Least Squares (OLS) method aims to minimize the sum of squared residuals
(squared vertical distances between the observation and the regression point) and leads to
a mathematical expression for the estimated value of the regression line (which are known
as β parameters). For simple linear regression, the aforementioned relationship between
the response variable (y) and the explanatory variable(s) (x) can be shown as a simple
equation as follows:
y = β0 + β1x
In this equation, β0 is called the intercept, and β1 is called the slope.

How Do We Know If the Model Is Good Enough?

For the numerical assessment, three statistical measures are often used in evaluating the
fit of a regression model: R2(R- squared), the overall F-test, and the Root Mean Square
Error (RMSE). All three of these measures are based on the sums of the square errors (how
far the data are from the mean and how far the data are from the model’s predicted
values). Of the three, R2 has the most useful and understandable meaning because of its
intuitive scale.

A Process Flow for Developing Regression Models

Regression Modelling Assumptions

1. Linearity: This assumption states that the relationship between the response
variable and the explanatory variables are linear.
2. Independence (of errors): This assumption states that the errors of the response
variable are uncorrelated with each other.
3. Normality (of errors): This assumption states that the errors of the response variable
are normally distributed.
4. Constant variance (of errors): This assumption, also called homoscedasticity, states
that the response variables have the same variance in their error regardless of the
values of the explanatory variables.
5. Multicollinearity: This assumption states that the explanatory variables are not
correlated (i.e., do not replicate the same but provide a different perspective of the
information needed for the model).

Logistic Regression Modelling

A very popular, statistically sound, probability-based classification algorithm that
employs supervised learning. It is a complement to linear regression and linear
discriminant analysis methods. Logistic regression differs from linear regression with
one major point: its output (response variable) is a class as opposed to a numerical
variable.
If there is only one predictor variable and one predicted variable, the method is called
simple logistic regression (similar to calling linear regression models with only one
independent variable simple linear regression).
The Logistic Function

The logistic function, is the core of logistic regression, which can take values only
between 0 and 1. The following equation is a simple mathematical representation of
this function:
Time Series Forecasting
A time series is a sequence of data points of the variable of interest, measured and
represented at successive points in time spaced at uniform time intervals. Examples of time
series include monthly rain volumes in a geographic area, the daily closing value of the
stock market indexes.
Time-series forecasting is the use of mathematical modeling to predict future values of the
variable of interest based on previously observed values.
The time-series plots/charts look and feel very similar to simple linear regression.
In time series there are two variables: the response variable and the time variable
presented in a scatter plot. Beyond this appearance similarity, there is hardly any other
commonality between the two.

Business Reporting
Information is essentially the contextualization of data. Decision makers need information
to make accurate and timely decisions. The information is usually provided to decision
makers in the form of a written report (digital or on paper), or orally.
A report is any communication artifact prepared with the specific intention of conveying
information in a digestible form to whoever needs it whenever and wherever.
It might be organized in a narrative, graphic, and/or tabular form, prepared periodically
(recurring) or on an as-needed (ad hoc) basis, referring to specific time periods, events,
occurrences, or subjects.

Business Report: A written document that contains information regarding business matters.
➢ Purpose: To improve managerial decisions
➢ Source: Data from inside and outside the organization (via the use of ETL—
Extract, Transform, Load)
➢ Format: text, tables, graphs/charts
➢ Distribution: in-print, email, portal/intranet
Data acquisition → Information generation → Decision making → Process management
❖ Different functions of Business Report:
✓ To ensure proper departmental functioning
✓ To provide information
✓ To provide the results of an analysis
✓ To persuade others to act
✓ To create an organizational memory (as part of a knowledge management
✓ system).
Types of Business Reports
Metric Management Reports: Help manage business performance through metrics (SLAs—
Service Level Agreement for externals; KPIs—Key Performance Indicator for internals).
Can be used as part of Six Sigma and/or TQM.
Dashboard-Type Reports: Graphical presentation of several performance indicators in a
single page using dials/gauges.
Balanced Scorecard–Type Reports: Include financial, customer, business process, and
learning & growth indicators.

Data Visualization
The use of visual representations to explore, make sense of, and communicate data.
Since, what is portrayed in visualizations is the information (aggregation, summarization,
and contextualization of data), not the data, the two terms data visualization and
information visualization are used interchangeably and synonymously.
Related to information graphics, scientific visualization, and statistical graphics. Includes
charts, graphs, illustrations.

A Brief History of Data Visualization

Data visualization can date back to the second century AD. Most developments have
occurred in the last two and a half centuries. William Playfair is widely credited as the
inventor of the modern chart, having created the first line and pie charts.
Until recently it was not recognized as a discipline.
Today’s most popular visual forms date back a few centuries.

Types of Charts and Graphs

LINE CHART
The line chart is perhaps the most frequently used graphical visuals for time-series data.
Line charts (or line graphs) show the relationship between two variables.
BAR CHART
The bar chart is among the most basic visuals used for data representation. They are
effective when you have nominal data or numerical data that split nicely into different
categories so you can quickly see comparative results and trends within your data.
PIE CHART
The pie chart is visually appealing, as the name implies, pie-looking charts. Because they
are so visually attractive, they are often incorrectly used. Pie charts should be used only to
illustrate relative proportions of a specific measure.
SCATTER PLOT
The scatter plot is often used to explore the relationship between two or three variables (in
2D or 3D visuals). Because scatter plots are visual exploration tools, translating more than
three variables into more than three dimensions is not easily achievable.
BUBBLE CHART
The bubble chart is often an enhanced version of scatter plots. Bubble charts, though, are
not a new visualization type; instead, they should be viewed as a technique to enrich data
illustrated in scatter plots (or even geographic maps).

Specialized Charts and Graphs

HISTOGRAM
A histogram looks just like a bar chart. Histograms are used to show the frequency
distribution of one variable or several variables.
GANTT CHART
A Gantt chart is a special case of horizontal bar charts used to portray project timelines,
project tasks/activity durations, and overlap among the tasks/activities.
PERT CHART (Program Evaluation and Review Technique)
The PERT chart (also called a network diagram) is developed primarily to simplify the
planning and scheduling of large and complex projects. A PERT chart shows precedence
relationships among project activities/tasks. It is composed of nodes (represented as
circles or rectangles) and edges (represented with directed arrows).
GEOGRAPHIC MAP
When the data set includes any kind of location data (e.g., physical addresses, postal
codes, state names or abbreviations, country names, latitude/longitude, or some type of
custom geographic encoding), it is better and more informative to see the data on a map.
Maps usually are used in conjunction with other charts and graphs rather than by
themselves.
BULLET
A bullet graph is often used to show progress toward a goal. This graph is essentially a
variation of a bar chart.
HEAT MAP
The heat map is a great visual to illustrate the comparison of continuous values across two
categories using color. The goal is to help the user quickly see where the intersection of
the categories is strongest and weakest in terms of numerical values of the measure being
analyzed.
HIGHLIGHT TABLE
The highlight table is intended to take heat maps one step further. In addition to showing
how data intersect by using color, highlight tables add a number on top to provide
additional detail.
TREE MAP
A tree map displays hierarchical (tree-structured) data as a set of nested rectangles. Each
branch of the tree is given a rectangle, which is then tiled with smaller rectangles
representing subbranches. A leaf node’s rectangle has an area proportional to a specified
dimension on the data.

Which Chart or Graph Should You Use?

Visual Analytics
A recently coined term that is often used loosely to mean nothing more than information
visualization. What is meant by visual analytics is the combination of visualization and
predictive analytics. Whereas information visualization is aimed at answering “What
happened?” and “What is happening?” and is closely associated with BI (routine reports,
scorecards, and dashboards), visual analytics is aimed at answering “Why is it happening?”
and “What is more likely to happen?” and is usually associated with business analytics
(forecasting, segmentation, correlation analysis). There is a strong move toward visual
analytics.

INFORMATION DASHBOARDS/PERFORMANCE DASHBOARDS

Dashboards provide visual displays of important information that is consolidated and
arranged on a single screen so that the information can be digested at a single glance and
easily drilled in and further explored.
Information dashboards are common components of most, if not all, BI or business
analytics platforms, business performance management systems, and performance
measurement software suites.

Dashboard Design
The fundamental challenge of dashboard design is to display all the required information
on a single screen, clearly and without distraction, in a manner that can be assimilated
quickly. The most distinctive feature of a dashboard is its three layers of information:
1. Monitoring: Graphical, abstracted data to monitor key performance metrics.
2. Analysis: Summarized dimensional data to analyze the root cause of problems.
3. Management: Detailed operational data that identify what actions to take to resolve
a problem.

What to look for in a dashboard

– Use of visual components to highlight data and exceptions that require action
– Transparent to the user, meaning that they require minimal training and are extremely
easy to use
– Combine data from a variety of systems into a single, summarized, unified view of the
business
– Enable drill-down or drill-through to underlying data sources or reports
– Present a dynamic, real-world view with timely data– Require little coding to implement,
deploy, and maintain

Best Practices in Dashboard Design

o Benchmark KPIs with Industry Standards
o Wrap the Metrics with Contextual Metadata
o Validate the Design by a Usability Specialist
o Prioritize and Rank Alerts and Exceptions
o Enrich Dashboard with Business-User Comments
o Present Information in Three Different Levels
o Pick the Right Visual Constructs
o Provide for Guided Analytics

Understanding Descriptive Analytics
No ratings yet
Understanding Descriptive Analytics
6 pages
Business Analytics Essentials Guide
No ratings yet
Business Analytics Essentials Guide
6 pages
Quantitative Data Analysis Techniques
No ratings yet
Quantitative Data Analysis Techniques
25 pages
Data Analysis Basics and Techniques
No ratings yet
Data Analysis Basics and Techniques
12 pages
Measurement Scales and Data Analysis
No ratings yet
Measurement Scales and Data Analysis
3 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
6 pages
Understanding Predictive Analytics Basics
No ratings yet
Understanding Predictive Analytics Basics
37 pages
Data Management in Modern Mathematics
No ratings yet
Data Management in Modern Mathematics
13 pages
Descriptive Statistics:: Module 2: Data Exploration
No ratings yet
Descriptive Statistics:: Module 2: Data Exploration
261 pages
Document 1
No ratings yet
Document 1
11 pages
Understanding Descriptive Analytics
No ratings yet
Understanding Descriptive Analytics
18 pages
Data Analytics Techniques Overview
No ratings yet
Data Analytics Techniques Overview
35 pages
Statistics Lesson 1
No ratings yet
Statistics Lesson 1
7 pages
Understanding Mesokurtosis in Statistics
No ratings yet
Understanding Mesokurtosis in Statistics
43 pages
Foundations of Data Science Concepts
No ratings yet
Foundations of Data Science Concepts
14 pages
Descriptive Statistics Explained
No ratings yet
Descriptive Statistics Explained
29 pages
Module 1 Introduction
No ratings yet
Module 1 Introduction
9 pages
Understanding Data Attributes and Measures
No ratings yet
Understanding Data Attributes and Measures
7 pages
IDS Unit 2
No ratings yet
IDS Unit 2
7 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
29 pages
Exploring Summary Statistics in Data Analysis
No ratings yet
Exploring Summary Statistics in Data Analysis
2 pages
Descriptive Statistics in Data Science
No ratings yet
Descriptive Statistics in Data Science
9 pages
Info 561 Notes
No ratings yet
Info 561 Notes
8 pages
Statistics Fundamentals for Data Science
No ratings yet
Statistics Fundamentals for Data Science
2 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
14 pages
Understanding Data Processing and Analysis
No ratings yet
Understanding Data Processing and Analysis
132 pages
Basic Statistics
100% (10)
Basic Statistics
73 pages
Choosing Measurement Scales in Data Analysis
No ratings yet
Choosing Measurement Scales in Data Analysis
12 pages
Understanding Applied Statistics Basics
No ratings yet
Understanding Applied Statistics Basics
58 pages
Statistical Concepts for Data Science
No ratings yet
Statistical Concepts for Data Science
26 pages
Descriptive Statistics with Python Guide
No ratings yet
Descriptive Statistics with Python Guide
8 pages
Understanding Central Tendency Measures
No ratings yet
Understanding Central Tendency Measures
10 pages
Understanding Statistics and Data Types
No ratings yet
Understanding Statistics and Data Types
6 pages
Understanding Data and Statistics Basics
No ratings yet
Understanding Data and Statistics Basics
8 pages
Understanding Descriptive Analytics
No ratings yet
Understanding Descriptive Analytics
15 pages
Data Mining and Statistical Analysis
No ratings yet
Data Mining and Statistical Analysis
12 pages
Understanding Data Types in Statistics
No ratings yet
Understanding Data Types in Statistics
2 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
7 pages
Research Methodology Manual Unit 4
No ratings yet
Research Methodology Manual Unit 4
21 pages
Data Understanding and Preparation Guide
No ratings yet
Data Understanding and Preparation Guide
50 pages
Lean Six Sigma Measurement Techniques
No ratings yet
Lean Six Sigma Measurement Techniques
20 pages
Origin and Meaning of "Statista"
No ratings yet
Origin and Meaning of "Statista"
25 pages
Understanding Statistics and Data Analysis
No ratings yet
Understanding Statistics and Data Analysis
25 pages
Understanding Exploratory Data Analysis
No ratings yet
Understanding Exploratory Data Analysis
33 pages
Class Notes Descrpitives and Inferential Analysis
No ratings yet
Class Notes Descrpitives and Inferential Analysis
8 pages
Data Sets and Attribute Types Explained
No ratings yet
Data Sets and Attribute Types Explained
41 pages
Myrsu Sign-In and Statistics Overview
No ratings yet
Myrsu Sign-In and Statistics Overview
16 pages
Understanding Statistics and Data Analysis
No ratings yet
Understanding Statistics and Data Analysis
7 pages
Introduction to Statistics Concepts
No ratings yet
Introduction to Statistics Concepts
50 pages
Data Analysis: Types & Visualizations
No ratings yet
Data Analysis: Types & Visualizations
11 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
176 pages
Understanding Statistics Fundamentals
No ratings yet
Understanding Statistics Fundamentals
6 pages
Data Management in Statistics Overview
100% (1)
Data Management in Statistics Overview
51 pages
M1 & M2 Supplementaries
No ratings yet
M1 & M2 Supplementaries
52 pages
Essential Statistics for Data Science
No ratings yet
Essential Statistics for Data Science
30 pages
Understanding Data Management and Statistics
No ratings yet
Understanding Data Management and Statistics
2 pages
Understanding Data Objects and Attributes
No ratings yet
Understanding Data Objects and Attributes
42 pages
Applied Statistics for Business Students
No ratings yet
Applied Statistics for Business Students
14 pages
Descriptive Analytics Overview
No ratings yet
Descriptive Analytics Overview
25 pages
HRM Practices at BRAC Bank Bangladesh
No ratings yet
HRM Practices at BRAC Bank Bangladesh
73 pages
HSC ICT Course Overview and Topics
No ratings yet
HSC ICT Course Overview and Topics
3 pages
Communication Mediums and Models MCQs
No ratings yet
Communication Mediums and Models MCQs
64 pages
Tips for Reducing Belly Fat and BP
No ratings yet
Tips for Reducing Belly Fat and BP
1 page
Ansys Pressure Vessel Stress Analysis
No ratings yet
Ansys Pressure Vessel Stress Analysis
4 pages
Data Scientist Skills & Experience Overview
No ratings yet
Data Scientist Skills & Experience Overview
1 page
Exploring The Role of Visualization in Climate Change Communication An Audience Perspective
No ratings yet
Exploring The Role of Visualization in Climate Change Communication An Audience Perspective
130 pages
GILP Decision Scientist Internship 2025
No ratings yet
GILP Decision Scientist Internship 2025
3 pages
Data Analyst 3-Month Learning Plan
No ratings yet
Data Analyst 3-Month Learning Plan
3 pages
Salary Prediction in Data Science
No ratings yet
Salary Prediction in Data Science
6 pages
Introduction to Data Visualization Techniques
No ratings yet
Introduction to Data Visualization Techniques
46 pages
AI Predictive System Project Report
No ratings yet
AI Predictive System Project Report
2 pages
Road Traffic Accident Data Analysis
No ratings yet
Road Traffic Accident Data Analysis
5 pages
DMiner: Dashboard Design Automation
No ratings yet
DMiner: Dashboard Design Automation
14 pages
Data Analyst Internship Resume
No ratings yet
Data Analyst Internship Resume
1 page
Data Analyst Resume - Neeraj Kumar
No ratings yet
Data Analyst Resume - Neeraj Kumar
2 pages
OmniRate View 2 Operator Manual
No ratings yet
OmniRate View 2 Operator Manual
22 pages
NoSQL Databases Course Overview
No ratings yet
NoSQL Databases Course Overview
7 pages
Data Scientist Profile: Guillermo Mares
No ratings yet
Data Scientist Profile: Guillermo Mares
1 page
System Platform Getting Started
No ratings yet
System Platform Getting Started
58 pages
Essential Python Data Viz Libraries
No ratings yet
Essential Python Data Viz Libraries
16 pages
AI in Business Decision Making Guide
No ratings yet
AI in Business Decision Making Guide
10 pages
Business Analytics Lecture Notes MBA
No ratings yet
Business Analytics Lecture Notes MBA
56 pages
Introduction to Data Visualization with Matplotlib
No ratings yet
Introduction to Data Visualization with Matplotlib
11 pages
Amazon Data Architecture & Governance
No ratings yet
Amazon Data Architecture & Governance
6 pages
Emergingresearchareasbyraphaelebiefung PDF
No ratings yet
Emergingresearchareasbyraphaelebiefung PDF
17 pages
Palantir Gotham: Dynamic Ontology Insights
100% (1)
Palantir Gotham: Dynamic Ontology Insights
8 pages
Experienced Power BI Developer Resume
No ratings yet
Experienced Power BI Developer Resume
7 pages
B. Tech First Year Curriculum 2023-24
No ratings yet
B. Tech First Year Curriculum 2023-24
14 pages
Data Analytics Trends for 2021
No ratings yet
Data Analytics Trends for 2021
12 pages
Benefits of an Analytic Sandbox
No ratings yet
Benefits of an Analytic Sandbox
10 pages
Movie Dataset Analysis Insights
No ratings yet
Movie Dataset Analysis Insights
15 pages
Image Captioning System for Fashion
No ratings yet
Image Captioning System for Fashion
18 pages
Global Unicorn Metrics Internship Report
No ratings yet
Global Unicorn Metrics Internship Report
27 pages

OLAP in Descriptive Analytics

Uploaded by

OLAP in Descriptive Analytics

Uploaded by

Nature of Data

Data is a collection of facts – usually obtained as the result of experiences, observations,

Metrics for Analytics Ready Data

A Simple Taxonomy of Data

Statistical Modelling for Business Analytics

Descriptive Statistics for Descriptive Analytics

Median (The Number in the Middle)

Measures of Dispersion (Also Called Measures of Spread or

Mean Absolute Deviation (MAD)

Quartiles and Interquartile Range

Kurtosis is another measure to use in characterizing the shape of a unimodal distribution.

Regression Modeling for Inferential Statistics

SIMPLE VERSUS MULTIPLE REGRESSION

How do we develop linear regression models?

How Do We Know If the Model Is Good Enough?

A Process Flow for Developing Regression Models

Regression Modelling Assumptions

Logistic Regression Modelling

A Brief History of Data Visualization

Types of Charts and Graphs

Specialized Charts and Graphs

Which Chart or Graph Should You Use?

INFORMATION DASHBOARDS/PERFORMANCE DASHBOARDS

What to look for in a dashboard

Best Practices in Dashboard Design

You might also like