0% found this document useful (0 votes)
20 views11 pages

Basic Statistical Descriptions Guide

data visualization notes

Uploaded by

hassanmansuri570
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views11 pages

Basic Statistical Descriptions Guide

data visualization notes

Uploaded by

hassanmansuri570
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Basic Statistical Descriptions of Data - Complete

Study Notes
This comprehensive guide covers all essential concepts from Unit II on Basic Statistical
Descriptions of Data, providing easy-to-understand explanations with proper diagrams and
examples to help you score full marks in your examination. [1]

Overview of Basic Statistical Descriptions


Basic statistical descriptions are fundamental tools for data preprocessing and analysis. They
help identify properties of data and highlight which values should be treated as noise or
outliers. The chapter covers six main areas: [1]
Measures of central tendency (mean, median, mode, midrange)
Measures of variation (range, variance, standard deviation, IQR)
Measures of position (quartiles, percentiles, deciles)
Five-number summary
Box plots for visualization
Correlation analysis for relationships between variables
Distribution shapes and positions of central tendency measures

Measures of Central Tendency


Measures of central tendency represent the center point or typical value of a dataset, helping
describe the overall pattern by identifying a single representative value. [1]

1. Mean (Arithmetic Average)


The mean is the sum of all values divided by the number of values. [1]
Formula: $ \bar{x} = \frac{\sum x}{n} $
Example: For data , Mean = (10 + 20 + 30) / 3 = 20 [1]
Cartoon chart showing student's progressive improvement in exam scores from 50% to over
90%.
Pros: Uses all data points; good for symmetrical distributions
Cons: Sensitive to outliers [1]

Weighted Arithmetic Mean


The weighted arithmetic mean assigns different levels of importance (weights) to each data
point. [1]
Formula: $ \bar{x}_w = \frac{\sum(w \times x)}{\sum w} $
Example: Final Grade Calculation [1]
Homework: 80 × 0.2 = 16
Midterm: 70 × 0.3 = 21
Final: 90 × 0.5 = 45
Final Grade = (16 + 21 + 45) / 1 = 82

GPA Calculation
GPA is essentially a weighted average where each course grade is multiplied by its credit hours.
[1]

Formula: GPA = (Total Grade Points Earned) ÷ (Total Credit Hours Attempted) [1]

2. Median
The median is the middle value in a set of ordered data values, separating the higher half from
the lower half. [1]
Calculation:
If n is odd: Median = middle value
If n is even: Median = average of two middle values [1]
Example:
For , Median = 20
For , Median = (20 + 30)/2 = 25 [1]
Pros: Not affected by outliers; good for skewed data
Cons: Doesn't use all data points [1]

Median for Grouped Data


For grouped data, use the interpolation formula: [1]
$ Median = L_1 + \left(\frac{\frac{N}{2} - (\sum freq)1}{freq{median}}\right) \times width $
Where:
L₁ = lower boundary of median interval
N = total number of values
(∑freq)₁ = cumulative frequency before median interval
freq_median = frequency of median interval
3. Mode
The mode is the value that occurs most frequently in the dataset. [1]
Types:
Unimodal: One mode
Bimodal: Two modes
Multimodal: Multiple modes
No mode: Each value occurs only once [1]
Example: For , Mode = 20 [1]
Empirical Relationship: For moderately skewed unimodal data:
mean - mode ≈ 3 × (mean - median) [1]

4. Midrange
The midrange is the average of the largest and smallest values. [1]
Formula: Midrange = (Max + Min) / 2

When to Use Each Measure


Scenario Best Measure

Symmetrical data Mean

Skewed data or outliers Median

Categorical data Mode

^1

Measures of Variation (Dispersion)


Measures of variation help understand how spread out or clustered data values are around the
center. [1]
Step-by-step process for calculating measures of variation in grouped data

1. Range
The simplest measure of data dispersion. [1]
Formula: Range = Maximum value - Minimum value
Example: For test scores
Range = 95 - 62 = 33 [1]
Coefficient of Range: $ \frac{Max - Min}{Max + Min} $

2. Quartiles and Interquartile Range (IQR)


Quartiles divide data into four equal parts: [1]

Quartile Percentile Meaning

Q1 25th Lower quartile

Q2 50th Median

Q3 75th Upper quartile

IQR = Q3 - Q1 (measures middle 50% spread) [1]


Outlier Detection:
Lower bound: Q1 - 1.5 × IQR
Upper bound: Q3 + 1.5 × IQR [1]

3. Variance and Standard Deviation


Variance measures the average squared deviation from the mean. [1]
Formulas:
Population: $ \sigma^2 = \frac{\sum(x - \mu)^2}{N} $
Sample: $ s^2 = \frac{\sum(x - \bar{x})^2}{n-1} $
Standard Deviation: $ \sigma = \sqrt{variance} $ [1]

4. Coefficient of Variation (CV)


Relative measure of dispersion. [1]
Formula: $ CV = \frac{\sigma}{\mu} \times 100% $
Uses:
Comparing variability across different datasets
Unitless measure for comparison [1]

Measures of Position
Measures of position help understand the relative standing of a data point within a dataset. [1]

Percentiles
Percentiles divide data into 100 equal parts. [1]
Formula: Position = P(n+1)/100

Deciles and Quintiles


Deciles: Divide data into 10 equal parts
Quintiles: Divide data into 5 equal parts [1]

Z-Score
Standardizes values by expressing how many standard deviations a value is from the mean. [1]
Formula: $ Z = \frac{x - \mu}{\sigma} $
Five-Number Summary and Box Plots
The five-number summary consists of: [1]
1. Minimum
2. Q1 (First Quartile)
3. Median (Q2)
4. Q3 (Third Quartile)
5. Maximum

Five-number summary components and box plot construction

Box Plot Components


Diagram of a box and whisker plot showing lower quartile, median, upper quartile, interquartile
range, whiskers, minimum, maximum, and outliers with formulas.
Box plots incorporate the five-number summary:
Box: Extends from Q1 to Q3 (IQR)
Median line: Divides the box
Whiskers: Extend to minimum and maximum
Outliers: Points beyond 1.5 × IQR [1]

Graphic Displays

Histogram
Histograms show the frequency distribution of data. [1]
Key Features:
Height indicates frequency
Bars represent intervals (bins)
Used for numeric data [1]

Quantile Plots
Quantile plots display sorted data values against their corresponding quantiles, helping assess
distribution shape. [1]
Scatter Plots
Scatter plots determine relationships between two numeric attributes. [1]
Correlation Types:
Positive: Values increase together
Negative: One increases as other decreases
No correlation: No clear pattern [1]

Correlation Analysis
Correlation quantifies the strength and direction of relationship between variables. [1]
Pearson's Correlation Coefficient (r):
$ r = \frac{\sum(x-\bar{x})(y-\bar{y})}{\sqrt{\sum(x-\bar{x})^2 \sum(y-\bar{y})^2}} $
Interpretation:
r = +1: Perfect positive correlation
r = -1: Perfect negative correlation
r = 0: No correlation [1]
r Value Range Strength Direction

0.7 to 0.9 Strong Positive

0.3 to 0.6 Moderate Positive

-0.3 to -0.6 Moderate Negative

-0.7 to -0.9 Strong Negative

^1

Calculation Examples

Example 1: Grouped Data (Exclusive Series)


Given frequency distribution:

Class Interval Frequency

0-20 6

20-40 20

40-60 37

60-80 10

80-100 7

Mean Calculation:
1. Find midpoints: 10, 30, 50, 70, 90
2. Calculate f×x: 60, 600, 1850, 700, 630
3. Mean = Σ(f×x)/Σf = 3840/80 = 48 [1]
Median Calculation:
1. N/2 = 40, so median class is 40-60
2. Median = 40 + [(40-26)/37] × 20 = 47.57 [1]
Mode Calculation:
1. Modal class: 40-60 (highest frequency = 37)
2. Mode = 40 + [(37-20)/(2×37-20-10)] × 20 = 47.73 [1]

Key Formulas Reference

Practice Problems

Summary
Basic statistical descriptions provide valuable insight into the overall behavior of data. They
help identify noise and outliers and are essential for data cleaning. Key takeaways: [1]
1. Central tendency measures locate the center of data distribution
2. Variation measures show how spread out data is
3. Position measures indicate relative standing of values
4. Graphic displays provide visual insights into data patterns
5. Correlation analysis reveals relationships between variables
Understanding these concepts thoroughly with proper formulas, examples, and visualizations will
help you excel in your examination and practical data analysis tasks.

1. [Link]

You might also like