0% found this document useful (0 votes)
33 views9 pages

Introduction to Descriptive Statistics

This document introduces the basic concepts of descriptive statistics. It explains that it analyzes data series to draw conclusions about the behavior of variables. Variables can be qualitative or quantitative and can be unidimensional, bidimensional, or multidimensional. Quantitative variables are divided into discrete or continuous. It also defines the concepts of individual, population, and sample. Finally, it introduces measures of central tendency such as mean, median, and mode.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views9 pages

Introduction to Descriptive Statistics

This document introduces the basic concepts of descriptive statistics. It explains that it analyzes data series to draw conclusions about the behavior of variables. Variables can be qualitative or quantitative and can be unidimensional, bidimensional, or multidimensional. Quantitative variables are divided into discrete or continuous. It also defines the concepts of individual, population, and sample. Finally, it introduces measures of central tendency such as mean, median, and mode.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Introduction to Statistics

Descriptive
Descriptive statistics is a science that analyzes series of data.
(for example, age of a population, height of the students of a
school, temperature in the summer months, etc) and try to extract
conclusions about the behavior of these variables.
Variables can be of two types:
Qualitative variables or attributes: cannot be measured
numerically (for example: nationality, skin color, sex).
Quantitative variables: they have numerical value (age, price of a)
product, annual income.
The variables can also be classified into:
Unidimensional variables: they only collect information about one
characteristic (for example: age of the students in a class).
Variables bidimensional: they collect information about two
characteristics of the population (for example: age and height of the
students of a class.
Multidimensional variables: collect information about three or more
characteristics (for example: age, height, and weight of the students of a)
clase).
On their part, quantitative variables can be classified into
discrete and continuous:
Discrete: they can only take integer values (1, 2, 8, -4, etc.).
example: number of siblings (can be 1, 2, 3...., etc., but, for
example, it can never be 3.45).
Continuous: they can take any real value within an interval.
For example, the speed of a vehicle can be 80.3 km/h, 94.57
km/h...etc.
When studying the behavior of a variable, it is necessary to distinguish
the following concepts:
Individual: any element that carries information about it
phenomenon that is studied. Thus, if we study the height of children of
a class, each student is an individual; if we study the price of the
housing, each housing is an individual.
Population: a set of all individuals (people, objects,
animals, etc.) that carry information about the phenomenon that
study. For example, if we study the price of housing in a
city, the population will be the total number of housing units in that city.

Sample: a subset that we select from the population. Thus, if we


study the housing price of a city, it will normally be not to collect
information about all the housing in the city (it would be a very
complex), but rather a subgroup (sample) is usually selected that
understand that it is sufficiently representative.

The frequency distribution is the structured representation of


table format, of all the information that has been collected about the
variable being studied.

Absolute Frequencies

Valor Simple Accumulated Simple Accumulated

x x x x x

X1 n1 n1 f1 = n1 / n f1

X2 n2 n1 + n2 f2 = n2 / n f1 + f2

... ... ... ... ...

n1 + n2 + .. + fn-1 = nn-1 /
Xn-1 nn-1 f1 + f2 + .. + fn-1
nn-1 n

Xn nn Sn fn = nn / n Sf

Being X the different values that the variable can take.

Being the number of times each value is repeated.


Being the percentage that the repetition of each value represents
about the total

Let's see an example:


We measure the height of the children in a class and obtain the following
results (cm):

Alum Estatu Alum Estatu Alum Estatu


no ra no ra no ra

x x x x x x

Alumnus Alumnus Alumnus


1.25 1.23 1.21
or 1 at 11 at 21

Alum Alumni Alumnus


1.28 1.26 1.29
or 2 12 or 22

Alumn Alumni Alumni


1.27 1.30 1.26
or 3 or 13 at 23

Alumnus Alumnus Alumnus


1.21 1.21 1.22
o4 at 14 or 24

Alumnus Alumnus Alumnus


1.22 1.28 1,28
o5 or 15 or 25

Alumnus Alumnus Alumnus


1.29 1.30 1.27
or 6 or 16 to 26

Alumnus Alumnus Alumnus


1.30 1.22 1.26
o7 or 17 the 27

Alumnus Alumnus Alumnus


1.24 1.25 1.23
8 at 18 or 28

Alumnus Alumnus Alumnus


1.27 1.20 1.22
the 9 or 19 29

Alumnus Alumnus Alumnus


1.29 1.28 1.21
or 10 or 20 or 30

If we presented this structured information, we would obtain the


following frequency table:
Frequencies
Variable Relative frequencies
absolutes

Simple Accumulated Simple Accumulated

x x x x x

1.20 1 1 3.3% 3.3%

1.21 4 5 13.3% 16.6%

1.22 4 9 13.3% 30.0%

1.23 2 11 6.6% 36.6%

1.24 1 12 3.3% 40.0%

1.25 2 14 6.6% 46.6%

1.26 3 17 10.0% 56.6%

1.27 3 20 10.0% 66.6%

1.28 4 24 13.3% 80.0%

1.29 3 27 10.0% 90.0%

1.30 3 30 10.0% 100.0%

If the values that the variable takes are very diverse and each one of them
it occurs very few times, so it is advisable to group them by intervals,
otherwise we would obtain a very frequency table
extensive that would contribute very little value for synthesis purposes (as it is
you will see in the next lesson.

Frequency distributions
grouped
Let's suppose we measure the height of the inhabitants of a household.
and we obtain the following results (cm):

Inhabitant Inhabitant Height Inhabitant Height


a e a e
x x x x x x
Inhabitant Inhabitant Inhabitant
1.15 1.53 1.21
1 11 21
Inhabitant Inhabitant Inhabitant
1.48 1.16 1.59
2 12 22
Inhabitant Inhabitant Inhabitant
1.57 1.60 1.86
3 13 23
Inhabitant Inhabitant Inhabitant
1.71 1.81 1.52
4 14 24
Inhabitant Inhabitant Inhabitant
1.92 1.98 1.48
5 15 25
Inhabitant Inhabitant Inhabitant
1.39 1.20 1.37
6 16 26
Inhabitant Inhabitant Inhabitant
1.40 1.42 1.16
7 17 27
Inhabitant Inhabitant Inhabitant
1.64 1.45 1.73
8 18 28
Inhabitant Inhabitant Inhabitant
1.77 1.20 1.62
9 19 29
Inhabitant Inhabitant Inhabitant
1.49 1.98 1.01
10 20 30

If we presented this information in a frequency table


we would obtain a table of 30 lines (one for each value), each of
they with an absolute frequency of 1 and with a relative frequency of
3.3%. This table would provide us with little information.

Instead, we prefer to group the data by intervals, with which


the information becomes more summarized (therefore, something is lost,)
information), but it is more manageable and informative:

Height Absolute frequencies Relative frequencies


Cm Simple Accumulated Simple Accumulated
x x x x x

1.01 - 1.10 1 1 3.3% 3.3%

1.11 - 1.20 3 4 10.0% 13.3%


1.21 3 7 10.0% 23.3%
1.31 - 1.40 2 9 6.6% 30.0%
1.41 - 1.50 6 15 20.0% 50.0%
1.51 - 1.60 4 19 13.3% 63.3%
1.61 - 1.70 3 22 10.0% 73.3%
1.71 - 1.80 3 25 10.0% 83.3%
1.81 - 1.90 2 27 6.6% 90.0%
1.91 - 2.00 3 30 10.0% 100.0%

The number of sections in which the information is grouped is a


decision that the analyst must make: the rule is that the more
If less information is used, some may be lost, but it could be less.
the table should be representative and informative.

Measures of central position - the


mean, median and mode
Measures of position provide us with information about the data series.
that we are analyzing. These measures allow us to understand various
characteristics of this data series.
The measures of position are of two types:
a) Measures of central tendency: provide information about the average values of
the data series.
b) Non-central position measures: they inform how
distribute the rest of the values of the series.
a) Measures of central tendency
The main measures of central tendency are as follows:
1.- Mean: it is the weighted average of the data series. You can
calculate various types of averages, the most commonly used being:
a) Arithmetic mean: it is calculated by multiplying each value by the number
of times it is repeated. The sum of all these products is divided by
the total data of the sample:
(X1 * n1) + (X2 * n2) + (X3 * n3) + .....+ (Xn-1 * nn-1) + (Xn * nn)
Xm ---------------------------------------------------------------------------------------
n

b) Geometric mean: each value is raised to the number of times that...


has repeated. All these results are multiplied and the final product is added to it
calculate the root "n" (where "n" is the total number of data in the sample).

Depending on the type of data being analyzed, it will be more appropriate to use the
arithmetic mean or geometric mean.
The geometric mean is commonly used in data series such as types of
annual interest, inflation, etc., where the value of each year has a
multiplicative effect compared to previous years. In any case, the
The arithmetic mean is the most commonly used measure of central tendency.

The most positive aspect of the average is that all are used in its calculation.
values of the series, so no information is lost.
However, it presents the problem that its value (both in the case of
the arithmetic mean (as geometric) can be greatly influenced by
extreme values, which deviate excessively from the rest of the series. These
anomalous values could greatly influence the value of the
media, losing this representativeness.
2.- Median: it is the value of the data series that is located exactly
in the center of the sample (50% of values are lower and another 50%
they are superior.
They do not present the problem of being influenced by extreme values.
but instead does not use all the information from the series in its calculation
data (does not weight each value by the number of times it has been
repeated).
3.- Mode: it is the value that occurs most frequently in the sample.
Example: we are going to use the frequency distribution table with the
data on the height of the students that we saw in lesson 2.

Variable Absolute frequencies Relative frequencies


Valor Simple Accumulated Simple Accumulated
x x x x x

1.20 1 1 3.3% 3.3%

1.21 4 5 13.3% 16.6%


1.22 4 9 13.3% 30.0%
1.23 2 11 6.6% 36.6%
1.24 1 12 3.3% 40.0%
1.25 2 14 6.6% 46.6%
1.26 3 17 10.0% 56.6%
1.27 3 20 10.0% 66.6%
1.28 4 24 13.3% 80.0%
1.29 3 27 10.0% 90.0%
1.30 3 30 10.0% 100.0%

We are going to calculate the values of the different central positions:

1.-Arithmetic mean:

(1.20*1) + (1.21*4) + (1.22 * 4) + (1.23 * 2) + ......... + (1.29 * 3) + (1.30 * 3)


Xm = --------------------------------------------------------------------------------------------------
30
Then:

Xm 1,253

Therefore, the average height of this group of students is 1.253


cm.
2.-Geometric mean:

X= ((1,20^1) * (1,21^4) * (1,22^4) * .....* (1,29^3)* (1,30^3)) ^ (1/30)

Then:

Xm 1,253

In this example, the arithmetic mean and the geometric mean coincide,
but it doesn't always have to be this way.

3.- Median:
The median of this sample is 1.26 cm, as 50% is below it.
of the values and above the other 50%. This can be seen when analyzing the
column of accumulated relative frequencies.
In this example, since the value 1.26 is repeated 3 times, the mean
it would be located exactly between the first and second value of this
group, since the division between these two values is found within the
50% lower and 50% upper.

4.- Fashion:

There are 3 values that repeat 4 times: 1.21, 1.22, and 1.28.
Therefore, this account would have 3 modes.

You might also like