0% found this document useful (0 votes)
8 views13 pages

Introduction to Statistics and Data Analysis

Uploaded by

salahudinwarfaa
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views13 pages

Introduction to Statistics and Data Analysis

Uploaded by

salahudinwarfaa
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter 1: Introduction to Statistics

1. What is Statistics?
Statistics is the science of collecting, analyzing, presenting, and interpreting data, and
making decisions based on that data.

2️. Types of Statistics


1. Descriptive Statistics
➢ Organizes and describes data using tables, graphs, and summaries.
➢ Example: Showing the average marks of students in a class.
2. Inferential Statistics
➢ Uses information from a sample to make predictions or decisions about a
population.
➢ Example: Estimating the average income of a city from a small group of
people.

3️. Population vs Sample


Term Meaning Example

Population All elements being studied. All people in Somalia.

Sample A small part of the population chosen 200 people chosen from
for study. Somalia.

Census Study of every member of the National population


population. count.

Sample Survey Study of part of the population. Survey of 1,000 families.

Representative A sample that accurately reflects the Equal number of males


Sample population’s characteristics. and females.

Random Sample Each member has an equal chance to be Drawing names from a
selected. box.

Simple Random Every possible sample has an equal Using random numbers
Sample chance to be chosen. to pick students.

Prepared by: Abdibasid Mohamed 1


4️. Sampling Methods
1. With Replacement:
➢ Each time you select an item, you put it back before choosing again.
2. Without Replacement:
➢ Once selected, the item is not put back into the population.

5️. Basic Terms


Term Definition

Element/Member The specific item or person studied.

Variable The characteristic being measured (e.g., age, income).

Constant A fixed value that does not change.

Observation The value of a variable for one element.

Data Set A collection of observations.

Parameter A summary measure of the population (e.g., population


mean).

Statistic A summary measure of the sample (e.g., sample mean).

6️. Types of Variables


Type Meaning Example

1. Quantitative Variable Can be measured numerically. Age, income, weight

➢ Discrete Variable Has countable values (whole Number of students


numbers).

➢ Continuous Variable Can take any value in a range. Height, time

2️. Qualitative (Categorical) Describes categories, not Gender, color, type


Variable numbers. of car

Prepared by: Abdibasid Mohamed 2


7️. Sources of Data
1. Internal Sources: Company records or internal reports.
2. External Sources: Government publications, research reports.
3. Surveys and Experiments: Data collected directly from people or tests.

8️. Summation Notation (Σ)


Symbol / Formula Meaning Example

Σx Sum of all values of x x₁ + x₂ + x₃ + … + xn

(Σx)² Square of the total sum (x₁ + x₂ + … + xn)²

Σx² Sum of squares of each x x₁² + x₂² + … + xn²

Σxy Sum of products of x and y x₁y₁ + x₂y₂ + … + xnyn

9. Example 1 — Σ Notation
Given values: 75, 90, 125, 61

Find:
(a) Σx (b) (Σx)² (c) Σx²

Step Formula Solution

(a) Σx = x₁ + x₂ + x₃ + x₄ 75 + 90 + 125 + 61 = 351

(b) (Σx)² (351)² = 123,201

(c) Σx² = x₁² + x₂² + x₃² + x₄² 75² + 90² + 125² + 61² = 33,071

Prepared by: Abdibasid Mohamed 3


Chapter 2️: Organizing and Graphing Data

1. Introduction
• Large amounts of data are collected by many organizations and government agencies.
• These data are often too large to understand in their raw form.
• Descriptive statistics helps to organize, summarize, and display data using:
➢ Tables
➢ Graphs
➢ Summary measures

2️. Raw Data


• Definition:
Data recorded in the order they are collected before processing are called raw data.
• Example:
Ages or student statuses of 50 students collected directly from a survey.

3️. Organizing and Graphing Data


There are two main types:
1. Qualitative (Categorical) Data
2. Quantitative (Numerical) Data

A. QUALITATIVE DATA

4️. Frequency Distribution

• Definition:
A frequency distribution lists all categories and shows how many items fall into
each category.

Prepared by: Abdibasid Mohamed 4


Example:
A sample of 30 persons who often consume donuts were asked what variety of donuts was
their favorite. The responses from these 30 persons were as follows:

glazed filled other plain glazed other

frosted filled filled glazed other frosted

glazed plain other glazed glazed filled

frosted plain other other frosted filled

filled other frosted glazed glazed filled

Prepared by: Abdibasid Mohamed 5


5️. Relative and Percentage Distribution
Formula Meaning

Relative Frequency = (Frequency ÷ Total number Shows the proportion of data in


of observations) each category

Percentage = Relative Frequency × 100% Converts relative frequency to a


percentage

6️. Graphical Presentation of Qualitative Data


1. Bar Graph
➢ Shows frequencies of categories using bars.
➢ Bars are separate from each other.
2. Pie Chart
➢ A circle divided into slices.
➢ Each slice represents a category’s percentage or relative frequency.
➢ Angle for each category:

Angle = Relative Frequency×360∘

B. QUANTITATIVE DATA

7️. Frequency Distribution for Quantitative Data


• Definition:
Lists all classes (intervals) and frequencies of values that fall into each class.
• Grouped Data:
Data shown in classes or intervals (e.g., 10–19, 20–29, etc.)

Prepared by: Abdibasid Mohamed 6


8️. Key Terms and Formulas
Term Definition / Formula

Class Boundary Midpoint between upper limit of one class and lower limit of next
class

Class Width Class Width = Upper Boundary – Lower Boundary

Class Midpoint Class Midpoint = (Lower Limit + Upper Limit) ÷ 2


(Mark)

9. Steps to Construct a Frequency Distribution Table


1. Decide the number of classes
➢ Usually between 5️ and 2️0.
2. Determine class width
➢ Round to a convenient number.
3. Choose the lower limit of the first class
➢ Should be equal to or less than the smallest data value.

Example 2️-3️ (iPods Sold)


• Given: 30 days of sales (values from 5 to 29)
• Steps:
1. Minimum = 5, Maximum = 29
2. Number of classes = 5
3. Class width ≈ (29–5)/5 = 4.8 ≈ 5
4. Classes: 5–9, 10–14, 15–19, 20–24, 25–29

11. Relative Frequency and Percentage Distribution


Formula Explanation

Relative Frequency = f / n f = frequency of the class, n = total


number of data

Percentage = Relative Frequency × 100% Convert to percent

Prepared by: Abdibasid Mohamed 7


12️. Graphing Quantitative Data
1. Histogram
➢ Bars are touching (no gaps).
➢ X-axis → classes, Y-axis → frequency or percentage.
2. Frequency Polygon
➢ Connects midpoints of histogram bars with straight lines.
3. Frequency Curve
➢ A smooth version of a polygon (for large data sets).

13️. Shape of Histograms


Type Description

Symmetric Both sides look similar.

Skewed Right Tail is longer on the right side.

Skewed Left Tail is longer on the left side.

Uniform All classes have similar frequencies.

C. CUMULATIVE FREQUENCY DISTRIBUTION

14️. Cumulative Frequency


• Definition:
Total number of values below the upper boundary of each class.

15️. Related Concepts


Term Formula / Description

Cumulative Frequency (CF) Sum of frequencies up to each class

Cumulative Relative Frequency (CRF) CF ÷ Total number of observations

Cumulative Percentage CRF × 100%

• Graph:
The graph for cumulative frequency is called an Ogive.

Prepared by: Abdibasid Mohamed 8


D. STEM-AND-LEAF DISPLAY

16️. Definition
• A method to display data while keeping individual values visible.
• Each value is divided into:
➢ Stem: The first digit(s)
➢ Leaf: The last digit

17️. Example
Data: 52, 55, 56, 63, 68, 71, 74, 82
• Stem: 5, 6, 7, 8
• Leaves: 2 5 6 | 3 8 | 1 4 | 2
Advantages:
• Keeps original data visible.
• Easy to find shape and spread of data.

18️. Grouped Stem-and-Leaf Display


• When there are many stems, they can be grouped together for simplicity.

In Summary:
1. Qualitative Data: Use tables, bar graphs, pie charts.
2. Quantitative Data: Use frequency tables, histograms, polygons.
3. Cumulative Data: Use cumulative tables and ogives.
4. Detailed Data: Use stem-and-leaf displays to keep values visible.

Prepared by: Abdibasid Mohamed 9


Chapter Two – Data Collection and Sampling

1. Objectives
At the end of this chapter, you should be able to:
1. Define data
2. Explain methods of collecting data
3. Discuss sampling and sampling plans
4. Identify sampling and non-sampling errors

2️. Introduction
• Statistical inference means drawing conclusions about a population based on a
sample.
• A parameter is a measurement about a population.
• A statistic is a measurement about a sample.
• Populations are often too large to measure completely, so we use samples to estimate
population characteristics.

3️. Data
• Definition: Data is information collected in the form of numbers, words,
measurements, or observations.
Types of Data:
1. Primary (Active) Data: Collected directly by the researcher.
2. Secondary (Passive) Data: Already collected by others and available from other
sources.

4️. Sources of Data


• Primary sources: Surveys, observations, experiments, questionnaires, interviews.
• Secondary sources: Government publications, websites, books, journals, newspapers,
internal company records.

Prepared by: Abdibasid Mohamed 10


5️. Advantages & Disadvantages of Secondary Data
Advantages
• Saves time and cost
• Helps define research problems
• Provides comparison data
Disadvantages
• May not perfectly fit the problem
• May be inaccurate or outdated

6️. Differences Between Primary and Secondary Data


Comparison Primary Data Secondary Data

Meaning Collected first-hand by researcher Collected earlier by others

Time Real-time data Past data

Cost Expensive Economical

Collection Time Long Short

Specificity Specific to researcher’s needs May not be specific

Accuracy More accurate Less accurate

Form Raw form Refined form

7️. Methods of Collecting Data


1. Direct Observation – Watching and recording events directly.
➢ Types: Participant, Systematic, Mechanical observation.
2. Experiments – Changing variables to observe effects.
➢ Produces reliable and controlled data.
3. Surveys – Asking people questions to collect information.
➢ Types: Personal interview, Telephone interview, Self-administered
questionnaire.

Prepared by: Abdibasid Mohamed 11


8️. Questionnaire Design
Good questionnaires should:
• Be short and simple.
• Use clear language.
• Begin with easy demographic questions.
• Contain:
➢ Closed-ended questions: Yes/No, True/False, etc.
➢ Multiple-choice questions: For quick answers.
➢ Open-ended questions: For detailed opinions.

9. Sampling and Sampling Plans


• Sampling: Selecting a small group (sample) from a large population to draw
conclusions.
• Why Sampling?
1. Saves time
2. Reduces cost
3. Needs fewer workers

10. Types of Sampling


A. Probability Sampling
Each member of the population has an equal chance of being selected.
1. Simple Random Sampling – Everyone has an equal chance.
2. Systematic Sampling – Selecting every kth member from a list.
3. Stratified Sampling – Dividing population into subgroups (strata) and sampling from
each.
4. Cluster Sampling – Dividing population into clusters (e.g., villages) and selecting
some clusters randomly.

Prepared by: Abdibasid Mohamed 12


B. Non-Probability Sampling
Not everyone has an equal chance of being selected.
1. Convenience Sampling – Using easily available people.
2. Quota Sampling – Selecting specific numbers from categories.
3. Judgment (Purposive) Sampling – Choosing based on researcher’s judgment.
4. Snowball Sampling – Existing participants refer others (useful for rare populations).

11. Sampling and Non-Sampling Errors


1. Sampling Error:
➢ Difference between sample results and true population values.
➢ Happens by chance.
2. Non-Sampling Error:
➢ Caused by human mistakes during data collection, recording, or analysis.
➢ Examples: wrong recording, misunderstanding questions, bias.

Prepared by: Abdibasid Mohamed 13

You might also like