0% found this document useful (0 votes)

3 views8 pages

Dads302 - Exploratory Data Analysis

The document discusses the role of Data Science across various domains, emphasizing its importance in healthcare, finance, retail, and marketing. It also covers measures of dispersion in statistics, data visualization techniques, feature selection methods, and the concept of factor analysis. Each section provides insights into how data science and statistical methods can enhance decision-making and improve outcomes in different fields.

Uploaded by

dixitshetty49

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views8 pages

Dads302 - Exploratory Data Analysis

Uploaded by

dixitshetty49

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

NAME: TANISHQ SONKER

ROLL NO.: 2414503852

PROGRAM: MASTER OF BUSINESS ADMINISTRATION (MBA)
SEMESTER: III
COURSE CODE & NAME: DADS302- EXPLORATORY DATA ANALYSIS

Assignment Set – 1
Questions

1. What is Data Science? Discuss the role of Data Science in various Domains.

Data Science is an interdisciplinary field that combines statistical methods, mathematical algorithms, and
computational technology to extract meaningful insights and knowledge from structured and unstructured data.
At its core, it is the practice of using data to answer questions, predict future trends, and drive strategic decision-
making.

The field is often visualized as the intersection of three primary pillars: Mathematics/Statistics, Computer
Science/Programming, and Domain Expertise. Without domain expertise, data science is merely a technical
exercise; with it, data science becomes a powerful engine for innovation.

The Core Components of Data Science: -

a. Data Collection & Cleaning: Gathering raw data from various sources (databases, APIs, sensors) and
refining it by removing noise and inconsistencies.
b. Exploratory Data Analysis (EDA): Using visual and statistical tools to understand patterns, detect outliers,
and test hypotheses.
c. Modelling & Machine Learning: Applying algorithms to the data to create predictive models or identify
complex relationships.
d. Communication: Translating technical findings into actionable insights for non-technical stakeholders
through data visualization and storytelling.

The Role of Data Science Across Various Domains: -

Data science is not confined to a single industry; its applications are universal. By leveraging big data,
organizations can shift from reactive decision-making to proactive strategy.

i. Healthcare:
Data science has revolutionized patient care and medical research.
• Predictive Diagnostics: Algorithms analyze medical imaging (X-rays, MRIs) to detect anomalies like
tumors with higher accuracy than human practitioners.
• Drug Discovery: Instead of years of trial and error, data science simulates chemical interactions to speed
up the development of new medications.
• Personalized Medicine: Genomic data analysis allows doctors to tailor treatments to a patient's specific
genetic profile.

ii. Finance and Banking:

In the financial sector, where risk and speed are paramount, data science is essential.
• Fraud Detection: Real-time monitoring of transactions uses machine learning to identify suspicious
patterns and prevent unauthorized activity instantly.
• Algorithmic Trading: High-frequency trading models analyze market data in milliseconds to execute
trades at the most profitable moments.
• Credit Scoring: Beyond traditional history, banks now use alternative data (utility payments, social media
behavior) to assess the creditworthiness of individuals.

iii. E-commerce and Retail:

Retailers use data to optimize the entire customer journey.
• Recommendation Engines: Platforms like Amazon or Netflix use collaborative filtering to suggest
products based on a user's browsing history and the behaviour of similar customers.
• Inventory Management: Predictive analytics forecast demand, ensuring that popular products are stocked
while reducing the waste of overstocking.
• Dynamic Pricing: Algorithms adjust prices in real-time based on competitor pricing, demand surges, and
inventory levels.

iv. Marketing and Social Media:

Data science allows for hyper-targeted communication.
• Sentiment Analysis: Brands analyse social media comments and reviews to gauge public opinion about
their products and adjust their messaging accordingly.
• Churn Prediction: Companies identify customers who are likely to stop using a service (like a
subscription) and offer targeted incentives to retain them.

Summary of Impact: -
Domain Primary Application Key Benefit
Healthcare Disease Prediction Improved Patient Outcomes
Finance Fraud Prevention Risk Mitigation
Retail Recommender Systems Increased Sales/Engagement
Manufacturing Predictive Maintenance Reduced Downtime
Logistics Route Optimization Cost & Time Efficiency

In conclusion, Data Science acts as a bridge between raw information and intelligent action. By transforming vast
quantities of data into strategic assets, it enables organizations across all domains to solve complex problems and
stay competitive in an increasingly data-driven world.

2. Explain various measures of dispersion in detail using specific examples.

In descriptive statistics, **measures of dispersion** (or variability) describe how spread out or scattered the data
points are within a dataset. While measures of central tendency like the mean tell us where the "center" of the
data is, dispersion tells us how much the individual values deviate from that center.

To illustrate these concepts, imagine two branches of **DOCA Wellness** in Bangalore. Both have an average
daily sales figure of ₹5,000, but their consistency differs:

Branch A: Sales are always between ₹4,800 and ₹5,200.

Branch B: Sales fluctuate between ₹1,000 and ₹9,000.

Dispersion helps us quantify this difference in risk and reliability.

i. Range: -
The simplest measure of dispersion, the Range, is the difference between the maximum and minimum values in
a dataset.

Formula: Range = Max - Min

Example: In Branch B, if the highest sales day is ₹9,000 and the lowest is ₹1,000, the range is ₹8,000.
Limitation: It is highly sensitive to outliers. A single exceptionally good or bad day can drastically change the
range without reflecting the typical spread.

ii. Interquartile Range (IQR):

The IQR focuses on the middle 50% of the data, making it far more robust against outliers than the range. It
measures the distance between the 25th percentile (Q1 ) and the 75th percentile (Q3 ).

Formula: IQR = Q3 – Q1
Example: If you rank the sales of Branch B for a month and find that is ₹3,000 and is ₹7,000, the IQR is ₹4,000.
This tells you where the "bulk" of your business happens.
iii. Variance:
Variance measures the average squared deviation of each data point from the mean. Squaring the differences
ensures that positive and negative deviations don't cancel each other out.
2
∑(xi−μ )
Formula:σ2 = N
Example: If Branch A has sales very close to the mean, (xi−μ ) will be small, resulting in a low variance. Branch
B will have a high variance because many days are far from the average.
Note: Because the units are squared (e.g., "rupees squared"), variance is difficult to interpret intuitively in
business terms.

iv. Standard Deviation:

The Standard Deviation is the square root of the variance. It is the most widely used measure of dispersion because
it is expressed in the same units as the original data.

Formula:σ = √variance
Example: If the standard deviation for Branch A is ₹150, you can say that most days, sales will fall within ₹150
of the ₹5,000 average. If Branch B’s standard deviation is ₹2,500, the business is much more volatile.

v. Coefficient of Variation (CV):

The CV is a relative measure of dispersion, expressed as a percentage. It is used to compare the spread of two
datasets that have different units or widely different means.

Standard Deviation
Formula:CV = ( ) × 100%
Mean
Example: If you want to compare the variability of sales in Bangalore (INR) versus a partner clinic in London
(GBP), the CV allows for a direct comparison of which operation is "more stable" relative to its size.

Summary Table of Measures:

Measure Best Used For... Sensitivity to Outliers
Range Quick, rough estimate of spread. Very High
IQR Data with extreme values/skewed data. Low
Variance Mathematical modelling and volatility. High
Standard Deviation Standard reporting and risk assessment. High
CV Comparing variability across different scales. High

3. Discuss various techniques used for Data Visualization.

Data visualization is the graphical representation of information and data. By using visual elements like charts,
graphs, and maps, data visualization techniques provide an accessible way to see and understand trends, outliers,
and patterns in data. For an MBA student, mastering these techniques is essential for translating complex analytics
into persuasive business narratives.

i. Comparison Techniques:
Comparison charts are used to compare the magnitude of values across different categories or over a period of
time.
• Bar Charts: The most common tool for comparing discrete categories. Vertical bars are typically used for
time series, while horizontal bars are better for long category labels.
• Grouped/Stacked Bar Charts: These allow for a secondary level of comparison. For example, comparing
sales across different branches of DOCA Wellness while simultaneously showing the breakdown of
services (Grooming vs. Daycare) within each branch.

ii. Composition Techniques:

These techniques show how individual parts make up a whole.
• Pie and Donut Charts: Used to show proportions among a small number of categories.
• Waffle Charts: As discussed previously, these use a 10x10 grid to represent percentages, offering better
precision than pie charts for comparing similar values.
• Treemaps: These use nested rectangles to show hierarchies and part-to-whole relationships. They are
excellent for visualizing complex budgets or large product portfolios where area represents value.

iii. Distribution Techniques:

Distribution visualizations help data scientists understand the spread, frequency, and skewness of a dataset.
• Histograms: Used to show the frequency distribution of a continuous variable by grouping data into
"bins." This is vital for understanding things like customer age demographics.
• Box Plots (Whisker Plots): These are essential for identifying outliers and understanding the quartiles and
median of a dataset. In your St. Joseph’s University assignments, a box plot could effectively show the
variation in exam scores across different sections.

iv. Relationship Techniques:

These techniques explore correlations and dependencies between two or more variables.
• Scatter Plots: The primary tool for identifying relationships. For instance, plotting "Marketing Spend"
against "Revenue" to see if there is a positive correlation.
• Bubble Charts: An extension of the scatter plot where a third dimension is added through the size of the
bubbles.
• Heatmaps: These use colour intensity to represent the magnitude of a variable across two dimensions.
They are frequently used in business to show peak store hours or website "click" activity.

v. Trend and Connection Techniques:

These are used to visualize data that changes over time or across locations.
• Line Charts: The gold standard for time-series analysis. They help in spotting seasonal trends or long-
term growth in business KPIs.
• Choropleth Maps: Thematic maps where areas are shaded in proportion to a statistical variable (e.g., a
map of Bangalore shaded by the density of pet owners for your business plan).

Technique Selection Guide: -

Goal Recommended Technique
Comparing Categories Bar Chart, Clustered Bar
Showing Trends over Time Line Chart, Area Chart
Analysing Distributions Histogram, Box Plot
Finding Correlations Scatter Plot, Heatmap
Part-to-Whole Waffle Chart, Treemap

In Exploratory Data Analysis (EDA), visualization is often the first step to forming a hypothesis. By choosing
the right technique, you ensure that the insights you discover are not only accurate but also easily communicable
to stakeholders.

Assignment Set – 2
Questions

4. What is feature selection? Discuss any two feature selection techniques used to get optimal feature
combinations.

Feature Selection is the process of selecting a subset of the most relevant features (variables, columns, or
predictors) from the original dataset to be used in building a predictive model.

In a real-world dataset, you might have hundreds of columns (e.g., a customer database might have age, income,
shoe size, favourite colour, and zodiac sign). However, not all these features contribute to predicting the target
variable (e.g., "Will they buy a car?"). Some features are irrelevant (zodiac sign), and some are redundant (date
of birth and age provide the same information).
• Improves Accuracy: Removing misleading data improves the model's performance.
• Reduces Overfitting: Fewer redundant data points mean less opportunity for the model to make decisions
based on noise.
• Reduces Training Time: Fewer columns mean faster computation.

Techniques for Feature Selection: -

There are three main categories of feature selection: Filter, Wrapper, and Embedded methods.2 Below are detailed
discussions on two of the most widely used techniques:

Technique A: Filter Methods (specifically "Correlation Matrix")

Concept: Filter methods select features based on their statistical scores, independent of any machine learning
algorithm. They act as a "screening" step before the model is even built. The most common filter technique is
analysing the Correlation Matrix.

How it Works:
a. Correlation Check: The algorithm calculates the correlation coefficient (usually Pearson’s r) between
every pair of input variables.
b. Multicollinearity Detection: It looks for features that are highly correlated with each other (e.g., a
correlation > 0.85).
c. Elimination: If two features are highly correlated (e.g., "Monthly Salary" and "Annual Salary"), they
essentially provide the same information to the model. The technique keeps one and drops the other to
remove redundancy.

Pros & Cons:

• Pros: Very fast and computationally cheap. Good for high-dimensional datasets.
• Cons: It looks at features in isolation and ignores the interaction between features.

Technique B: Wrapper Methods (specifically "Recursive Feature Elimination - RFE")

Concept: Wrapper methods are more computationally intensive. They "wrap" a specific machine learning model
around the feature selection process. They treat feature selection as a search problem, evaluating different
combinations of features to see which specific combination produces the best model performance.
How RFE Works:
a. Build Initial Model: The algorithm builds a model (e.g., a Linear Regression or Random Forest) using all
available features.
b. Rank Features: Based on the model's output, it calculates an "importance score" for each feature.
c. Prune: It identifies the least important feature and removes it from the dataset.8
d. Repeat (Recurse): It rebuilds the model with the remaining features and repeats the process.
e. Final Selection: This cycle continues until the desired number of features is reached or the model
performance peaks.

Pros & Cons:

• Pros: Highly accurate because it considers how features interact with the specific model being used.
• Cons: Very slow and computationally expensive, as the model must be retrained dozens or hundreds of
times.

Summary Comparison: -
Feature Filter Method (Correlation) Wrapper Method (RFE)
Speed Fast Slow
Model Dependency Independent (General) Dependent on a specific model
Best For Removing irrelevant/redundant data Finding the absolute best combination

5. Discuss in detail the concept of Factor Analysis

The Concept of Factor Analysis:

Factor Analysis is a statistical method used to describe variability among observed, correlated variables in terms
of a potentially lower number of unobserved variables called factors.
In simpler terms, it is a data reduction technique. It takes a massive dataset with many variables and reduces them
into a smaller, manageable number of "underlying themes" or "dimensions" without losing much critical
information.

The Core Logic: Imagine you conduct a survey asking people 100 different questions about their shopping habits.
It is difficult to analyse 100 separate answers. However, you might notice that people who answer "Yes" to
Question 5 also answer "Yes" to Questions 10, 15, and 20. Factor analysis identifies these patterns and groups
these related variables into a single Factor (e.g., "Price Sensitivity").

Key Terminology in Factor Analysis:

To understand how Factor Analysis works, one must understand its specific vocabulary:

A. Latent Variables (Factors):

These are the underlying, unobservable constructs. We cannot measure them directly, but we measure them
through observable variables.
Example: "Intelligence" is a latent factor. We cannot measure it directly, so we measure "Math Score," "Verbal
Score," and "Logic Score" (Observed Variables) to infer it.

B. Factor Loading:
This is the correlation coefficient between an observed variable and the factor. It ranges from -1 to 1.
A high loading (e.g., 0.8) means the variable is strongly linked to that factor.
A low loading (e.g., 0.1) means the variable is irrelevant to that factor.

C. Eigenvalues:
An Eigenvalue represents the amount of variance in the total data that is explained by a single factor.
The Rule of Thumb: usually, only factors with an Eigenvalue greater than 1.0 are retained for analysis (Kaiser
Criterion).

D. Factor Rotation:
After the computer extracts factors, the raw data is often hard to interpret (variables might load on multiple
factors). "Rotation" (e.g., Varimax Rotation) is a mathematical step that clarifies the picture, maximizing the
distinction between factors so that each variable loads clearly on only one factor.

Types of Factor Analysis: -

There are two primary approaches:

i. Exploratory Factor Analysis (EFA):

Used when you do not have a pre-defined idea of the structure or how many dimensions are in a set of variables.
You are "exploring" the data to see what patterns emerge.
Use Case: A marketer launching a new product does a survey to find out what features customers value.

ii. Confirmatory Factor Analysis (CFA):

Used to test a specific hypothesis. You already have a theory (e.g., "I believe there are exactly 3 personality
traits"), and you use CFA to see if the data supports this theory.

Use Case: Validating a psychological test to ensure it accurately measures "Anxiety" and "Depression" as separate
constructs.

iv. Practical Example: Customer Satisfaction:

Imagine a restaurant wants to analyse customer satisfaction. They ask customers to rate 6 items on a scale of 1-
10:

• Taste of food
• Speed of service
• Cleanliness of plates
• Friendliness of staff
• Temperature of food
• Décor and ambiance
Without Factor Analysis:The manager has to analyse 6 different scores.
With Factor Analysis: The algorithm calculates the correlations and groups them:
Items 1, 3, and 5 are highly correlated Factor A
Items 2, 4, and 6 are highly correlated Factor B

Interpretation:
The manager names Factor A "Product Quality."
The manager names Factor B "Service Experience."

Now, instead of tracking 6 messy variables, the manager simply tracks 2 strategic factors.

v. Importance of Factor Analysis

• Simplification: It makes complex datasets understandable.

• Multicollinearity Removal: In regression analysis, highly correlated variables can break the model. Factor
analysis solves this by combining them into a single score.
• Survey Design: It helps researchers remove redundant questions from surveys (if two questions measure
the exact same factor, one can be deleted).

6. Differentiate between Principal Component Analysis and Linear Discriminant Analysis

Both PCA and LDA are linear transformation techniques used for dimensionality reduction (reducing the number
of variables in a dataset), but they operate with fundamentally different goals.
i. The Fundamental Goal (Unsupervised vs. Supervised)
• PCA (Principal Component Analysis):
o Type: It is an Unsupervised Learning algorithm.
o Goal: It ignores class labels (it doesn't care if a data point belongs to "Group A" or "Group B"). Its
only goal is to find the "directions" (Principal Components) that capture the maximum variance
(spread) in the data. It tries to preserve the overall shape of the dataset while compressing it.
• LDA (Linear Discriminant Analysis):
o Type: It is a Supervised Learning algorithm.
o Goal: It explicitly uses class labels. Its goal is to find the axes that maximize the separation
between different classes while minimizing the spread within each class. It tries to make the classes
as distinct as possible.
ii. The Mechanics (Variance vs. Separation):
• PCA: Focuses on the dataset's internal structure. It projects data onto new axes where the global data
variance is highest.
o Analogy: Imagine taking a photo of a flying saucer. PCA tries to find the angle that shows the
saucer's widest shape (biggest size).
• LDA: Focuses on the boundaries between classes. It computes the "Between-Class Variance" and "Within-
Class Variance" and tries to maximize the ratio of the two.
o Analogy: Imagine taking a photo of a flying saucer and an airplane. LDA tries to find the angle
where the two objects look most distinct from each other, even if that angle makes them look
small.
iii. Component Construction:
• PCA: Computes Principal Components (PC1, PC2...) which are orthogonal (perpendicular) to each other.
The number of components you can find is equal to the number of original features.
• LDA: Computes Linear Discriminants (LD1, LD2...). The number of discriminants is limited to $C - 1$,
where $C$ is the number of classes. For example, if you are classifying "Cats vs. Dogs" (2 classes), LDA
can produce only 1 axis of separation, regardless of how many features (weight, height, colour) you have.

Feature Principal Component Analysis (PCA) Linear Discriminant Analysis (LDA)

Learning Unsupervised (Ignores labels) Supervised (Uses labels)
Type
Primary Maximizing Variance Maximizing Class Separation
Focus
Use Case Data compression, Visualization, Noise Classification problems
reduction
Output Limit Equal to the number of features Equal to (Number of Classes - 1)
Best When You have high-dimensional data with no You have labeled data and need to build a
labels. classifier.

Assignment DADS302 MBA 3
No ratings yet
Assignment DADS302 MBA 3
16 pages
Overview of Data Science Concepts
No ratings yet
Overview of Data Science Concepts
7 pages
EDA Week5
No ratings yet
EDA Week5
28 pages
Pre-Study Material For Data Analytics
No ratings yet
Pre-Study Material For Data Analytics
9 pages
Measures of Dispersion in Data Analysis
No ratings yet
Measures of Dispersion in Data Analysis
10 pages
Business Analytics SEM
No ratings yet
Business Analytics SEM
45 pages
Data Science Syllabus Overview
No ratings yet
Data Science Syllabus Overview
5 pages
Understanding Data Science Basics
No ratings yet
Understanding Data Science Basics
17 pages
Introduction to Data Science Course
No ratings yet
Introduction to Data Science Course
100 pages
Data Science Applications Across Industries
No ratings yet
Data Science Applications Across Industries
8 pages
Statistics for Business Analysis Guide
No ratings yet
Statistics for Business Analysis Guide
4 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
143 pages
Dimensionality Reduction & Data Wrangling Guide
No ratings yet
Dimensionality Reduction & Data Wrangling Guide
18 pages
Data Analysis Techniques and Methods
No ratings yet
Data Analysis Techniques and Methods
82 pages
Introduction to Data Analytics Concepts
No ratings yet
Introduction to Data Analytics Concepts
14 pages
Business Analytics: Key Concepts and Models
No ratings yet
Business Analytics: Key Concepts and Models
13 pages
Data Science and Analytics Overview
No ratings yet
Data Science and Analytics Overview
32 pages
Introduction to Data Science Concepts
100% (2)
Introduction to Data Science Concepts
64 pages
Big Data Characteristics and Skills
No ratings yet
Big Data Characteristics and Skills
6 pages
Importance and Applications of Data Science
No ratings yet
Importance and Applications of Data Science
9 pages
DS Notes S
No ratings yet
DS Notes S
34 pages
Introduction to Business Statistics Basics
No ratings yet
Introduction to Business Statistics Basics
12 pages
Info 561 Notes
No ratings yet
Info 561 Notes
8 pages
Data Analytics and Descriptive Stats Guide
No ratings yet
Data Analytics and Descriptive Stats Guide
3 pages
MBA Business Statistics Overview
No ratings yet
MBA Business Statistics Overview
2 pages
Business Analytics and Decision-Making Insights
No ratings yet
Business Analytics and Decision-Making Insights
1 page
Unitrr
No ratings yet
Unitrr
24 pages
Week 1: Intro to Business Statistics
No ratings yet
Week 1: Intro to Business Statistics
15 pages
Dddsaa
No ratings yet
Dddsaa
8 pages
Data Dispersion Analysis Techniques
No ratings yet
Data Dispersion Analysis Techniques
32 pages
DAN Sample Q Paper
No ratings yet
DAN Sample Q Paper
11 pages
BA Unit5 Answers
No ratings yet
BA Unit5 Answers
46 pages
Business Analytics Overview and Notes
No ratings yet
Business Analytics Overview and Notes
6 pages
Business Analytics and Descriptive Statistics
No ratings yet
Business Analytics and Descriptive Statistics
11 pages
SMDS Metireal
No ratings yet
SMDS Metireal
65 pages
Data Science Process and Big Data Insights
No ratings yet
Data Science Process and Big Data Insights
9 pages
Applications of Data Science Explained
No ratings yet
Applications of Data Science Explained
5 pages
Data Analysis Techniques Overview
No ratings yet
Data Analysis Techniques Overview
9 pages
Data Preparation and Visualization Guide
No ratings yet
Data Preparation and Visualization Guide
6 pages
Slide Ptich Mar
No ratings yet
Slide Ptich Mar
620 pages
Introduction to Data Science Basics
No ratings yet
Introduction to Data Science Basics
26 pages
Essential Tools for Data Science
No ratings yet
Essential Tools for Data Science
4 pages
Unit4notespdf 2026 01 09 09 10 51
No ratings yet
Unit4notespdf 2026 01 09 09 10 51
12 pages
Predictive Modeling for Used Devices
No ratings yet
Predictive Modeling for Used Devices
25 pages
Chapter 2 Business Analytics & Data Visualization
No ratings yet
Chapter 2 Business Analytics & Data Visualization
12 pages
Data Science Basics for Business Success
No ratings yet
Data Science Basics for Business Success
4 pages
Data Science Techniques in R
No ratings yet
Data Science Techniques in R
41 pages
Understanding Agent Risk Profiles
No ratings yet
Understanding Agent Risk Profiles
321 pages
Understanding Predictive Business Analytics
No ratings yet
Understanding Predictive Business Analytics
31 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
18 pages
Data Analysis: Process and Techniques
No ratings yet
Data Analysis: Process and Techniques
7 pages
Central Tendency and Data Analysis Techniques
No ratings yet
Central Tendency and Data Analysis Techniques
9 pages
Importance of Data Analysis Explained
No ratings yet
Importance of Data Analysis Explained
22 pages
Best Practices for Data Analysis
No ratings yet
Best Practices for Data Analysis
8 pages
Data Analytics Life Cycle Explained
No ratings yet
Data Analytics Life Cycle Explained
23 pages
Understanding the DCOVA Framework
100% (1)
Understanding the DCOVA Framework
7 pages
Statistics for Business Analytics Guide
No ratings yet
Statistics for Business Analytics Guide
23 pages
Understanding Statistical Concepts
No ratings yet
Understanding Statistical Concepts
6 pages
Overview of R and Data Analysis Concepts
No ratings yet
Overview of R and Data Analysis Concepts
5 pages
Argumentative Essay Test Guide
No ratings yet
Argumentative Essay Test Guide
10 pages
Cogon Grass as Bioplastic Alternative
No ratings yet
Cogon Grass as Bioplastic Alternative
30 pages
QUT Thesis Design Template Guide
100% (1)
QUT Thesis Design Template Guide
29 pages
Veritas University 400 Level Students Note On Research Methodology
No ratings yet
Veritas University 400 Level Students Note On Research Methodology
52 pages
Rosenblatt's Perceptron: Neural Networks and Learning Machines
No ratings yet
Rosenblatt's Perceptron: Neural Networks and Learning Machines
12 pages
Ijfs 11 00115
No ratings yet
Ijfs 11 00115
25 pages
Top 20 Commerce Project Ideas 2025
No ratings yet
Top 20 Commerce Project Ideas 2025
50 pages
Business Model Innovation
No ratings yet
Business Model Innovation
40 pages
Module 1 MRMI107
No ratings yet
Module 1 MRMI107
13 pages
Differential Equations: A Dynamical Systems Approach To Theory and Practice
100% (1)
Differential Equations: A Dynamical Systems Approach To Theory and Practice
553 pages
Control Charts for Defective Bulb Analysis
No ratings yet
Control Charts for Defective Bulb Analysis
17 pages
Impact of TPM Pillars on OEE
No ratings yet
Impact of TPM Pillars on OEE
10 pages
Statistical Machine Learning Lab Manual
No ratings yet
Statistical Machine Learning Lab Manual
45 pages
HRM 604 Performance Appraisal Guidelines
No ratings yet
HRM 604 Performance Appraisal Guidelines
4 pages
RFP for Quality Assessment Protocol
No ratings yet
RFP for Quality Assessment Protocol
11 pages
Chapter 2 External Environment
No ratings yet
Chapter 2 External Environment
34 pages
The Power of Visual Analytics Explained
No ratings yet
The Power of Visual Analytics Explained
10 pages
AI Challenges in Project Management
No ratings yet
AI Challenges in Project Management
12 pages
Brand Name Assingn
No ratings yet
Brand Name Assingn
18 pages
HR Planning Assignment Guidelines
No ratings yet
HR Planning Assignment Guidelines
2 pages
Assessment Results: Machine Learning Stats
No ratings yet
Assessment Results: Machine Learning Stats
1 page
Personal Reflection Assessment Brief
No ratings yet
Personal Reflection Assessment Brief
9 pages
RCM II: Enhancing Asset Reliability
No ratings yet
RCM II: Enhancing Asset Reliability
2 pages
Settling into LSE MPA Life in London
No ratings yet
Settling into LSE MPA Life in London
54 pages
Comprehensive Assessment Notes for Educators
No ratings yet
Comprehensive Assessment Notes for Educators
6 pages
Fire Safety Design for Underground Rail
No ratings yet
Fire Safety Design for Underground Rail
20 pages
Evaluating AI Detector Accuracy
No ratings yet
Evaluating AI Detector Accuracy
8 pages
Agricultural Statistics Overview for MCAER
No ratings yet
Agricultural Statistics Overview for MCAER
12 pages
Kelly L. Rulison: Public Health Expert Profile
No ratings yet
Kelly L. Rulison: Public Health Expert Profile
13 pages

Dads302 - Exploratory Data Analysis

Uploaded by

Dads302 - Exploratory Data Analysis

Uploaded by

NAME: TANISHQ SONKER

ROLL NO.: 2414503852

The Core Components of Data Science: -

The Role of Data Science Across Various Domains: -

ii. Finance and Banking:

iii. E-commerce and Retail:

iv. Marketing and Social Media:

2. Explain various measures of dispersion in detail using specific examples.

Branch A: Sales are always between ₹4,800 and ₹5,200.

Dispersion helps us quantify this difference in risk and reliability.

Formula: Range = Max - Min

ii. Interquartile Range (IQR):

iv. Standard Deviation:

v. Coefficient of Variation (CV):

Summary Table of Measures:

3. Discuss various techniques used for Data Visualization.

ii. Composition Techniques:

iii. Distribution Techniques:

iv. Relationship Techniques:

v. Trend and Connection Techniques:

Technique Selection Guide: -

Techniques for Feature Selection: -

Technique A: Filter Methods (specifically "Correlation Matrix")

Pros & Cons:

Technique B: Wrapper Methods (specifically "Recursive Feature Elimination - RFE")

Pros & Cons:

5. Discuss in detail the concept of Factor Analysis

The Concept of Factor Analysis:

Key Terminology in Factor Analysis:

A. Latent Variables (Factors):

Types of Factor Analysis: -

There are two primary approaches:

i. Exploratory Factor Analysis (EFA):

ii. Confirmatory Factor Analysis (CFA):

iv. Practical Example: Customer Satisfaction:

v. Importance of Factor Analysis

• Simplification: It makes complex datasets understandable.

6. Differentiate between Principal Component Analysis and Linear Discriminant Analysis

Feature Principal Component Analysis (PCA) Linear Discriminant Analysis (LDA)

You might also like