0% found this document useful (0 votes)
5 views7 pages

Chapter 7

Chapter 7 covers the fundamentals of statistics, including its definition, importance in data science, and various methods of data collection. It explains key concepts such as population vs. sample, measures of central tendency (mean, median, mode), and measures of variation (variance, standard deviation). Additionally, it discusses correlation, percentiles, quartiles, and normal distribution, emphasizing their significance in analyzing and interpreting data.

Uploaded by

balamurugan
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views7 pages

Chapter 7

Chapter 7 covers the fundamentals of statistics, including its definition, importance in data science, and various methods of data collection. It explains key concepts such as population vs. sample, measures of central tendency (mean, median, mode), and measures of variation (variance, standard deviation). Additionally, it discusses correlation, percentiles, quartiles, and normal distribution, emphasizing their significance in analyzing and interpreting data.

Uploaded by

balamurugan
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Chapter 7: Basic Statistics

1. Statistics – Meaning (Detailed Explanation)


Statistics is a branch of mathematics that deals with data, but it is not only about numbers. It
is about understanding what numbers are telling us.

Statistics involves four main steps:

1. Collecting data – gathering information


2. Organizing data – arranging data in tables or charts
3. Analyzing data – finding averages, spread, relationships
4. Interpreting data – drawing conclusions and decisions

In simple words:
Statistics helps us convert raw data into useful information.

Real-life example:
A teacher collects marks of students (data), calculates average and pass percentage (analysis),
and decides whether students understood the subject (interpretation).

2. Importance of Statistics in Data Science (Detailed


Explanation)
In data science, we work with huge amounts of data. Statistics helps us manage and
understand this data.

Statistics is important because it:

 Reduces large data into simple numbers (mean, percentage)


 Helps compare groups (Class A vs Class B)
 Identifies patterns and trends
 Helps in prediction and decision-making

Example:
Netflix uses statistics to recommend movies based on user behavior.

Without statistics, data science cannot exist.

3. Types of Data Collection (Detailed Explanation)


Data collection is the first step in statistics. Data can be collected in two major ways
depending on how much control we have.
Observational Data

In observational data:

 We only observe what is happening


 We do not interfere or control

Examples:

 Conducting surveys
 Census data
 Observing customer purchases

👉 Used when experiments are not possible.

(b) Experimental Data

In experimental data:

 We conduct experiments
 We control variables

Examples:

 Giving different medicines to two groups


 Testing two different teaching methods

Gives more accurate cause-and-effect results.

(b) Experimental Data

 Data is collected by conducting experiments


 Researcher controls conditions

Examples:

 Testing new medicine


 Comparing two teaching methods

4. Population and Sample (Detailed Explanation)


In statistics, studying the entire population is often difficult.

Population

 Complete group under study


 Very large in size
Example:
All voters in a country

Sample

 Small part selected from population


 Used to represent population

Example:
1000 voters selected for survey

A good sample gives accurate results about the population.

5. Sampling Methods (Detailed Explanation)


Sampling is the method of selecting individuals from a population.

Random Sampling

 Every individual has equal chance


 No bias

Example:
Lottery method

Unequal Probability Sampling

 Some individuals have higher chance


 Used when groups are unequal

Example:
Selecting more people from cities than villages

Proper sampling gives reliable results.

6. Measures of Central Tendency (Detailed Explanation)


Measures of central tendency help us find a single value that represents the whole data.

They help answer:


👉 What is the typical value?

Three measures are used:

 Mean
 Median
 Mode
7. Mean (Average) – Detailed Explanation
Mean is the most commonly used average.

Formula:
Mean = Sum of all values / Number of values

Example:
Marks = 60, 70, 80
Mean = (60+70+80)/3 = 70

Advantage:
Easy to calculate

Disadvantage:
Affected by extreme values

Example:
Salaries = 10k, 15k, 20k, 1,00,000 → Mean becomes misleading

8. Median – Detailed Explanation


Median is the middle value when data is arranged in order.

Steps:

1. Arrange data in ascending order


2. Find middle value

Example:
Marks = 50, 60, 90
Median = 60

Advantage:
Not affected by extreme values

Best used for income, salary data

9. Mode – Detailed Explanation


Mode is the value that occurs most frequently.

Example:
Marks = 60, 70, 70, 80
Mode = 70

Useful when data is categorical


Example:
Most preferred mobile brand

10. Measures of Variation (Detailed Explanation)


Measures of variation tell us how data values differ from each other.

They help answer:


Are values close together or spread out?

Main measures:

 Range
 Variance
 Standard Deviation

11. Variance and Standard Deviation (Detailed


Explanation)
Variance

Variance measures the average squared distance from the mean.

Higher variance means data is more spread out.

Standard Deviation (SD)

Standard deviation is the square root of variance.

Why SD is important:

 Same unit as data


 Easy to interpret

Example:
Low SD → consistent marks
High SD → inconsistent marks

12. Correlation (Detailed Explanation)


Correlation measures the strength and direction of relationship between two variables.

Types:

 Positive correlation: both increase together


 Negative correlation: one increases, other decreases
 No correlation: no relationship

Example:
Temperature ↑ → Ice cream sales ↑ (positive)

Correlation does NOT mean causation.

13. Percentiles (Detailed Explanation)


Percentiles show the relative position of a value in a dataset.

 Data is divided into 100 equal parts


 Used to compare performance

Example:
If a student is in the 90th percentile, it means the student scored better than 90% of students.

Used in competitive exams, rankings, and performance analysis.

14. Quartiles (Detailed Explanation)


Quartiles divide data into four equal parts.

 Q1 (25%) – Lower quartile


 Q2 (50%) – Median
 Q3 (75%) – Upper quartile

Use:
Helps understand data distribution and detect outliers.

15. Normal Distribution & Empirical Rule (Detailed


Explanation)
Normal Distribution

 Bell-shaped curve
 Most values are near the mean

Empirical Rule (68–95–99.7 Rule)

 68% of data lies within 1 SD


 95% of data lies within 2 SD
 99.7% of data lies within 3 SD

Used to understand how data is spread around the mean.

You might also like