Introduction to Descriptive Statistics
Introduction to Descriptive Statistics
Descriptive
Descriptive statistics is a science that analyzes series of data.
(for example, age of a population, height of the students of a
school, temperature in the summer months, etc) and try to extract
conclusions about the behavior of these variables.
Variables can be of two types:
Qualitative variables or attributes: cannot be measured
numerically (for example: nationality, skin color, sex).
Quantitative variables: they have numerical value (age, price of a)
product, annual income.
The variables can also be classified into:
Unidimensional variables: they only collect information about one
characteristic (for example: age of the students in a class).
Variables bidimensional: they collect information about two
characteristics of the population (for example: age and height of the
students of a class.
Multidimensional variables: collect information about three or more
characteristics (for example: age, height, and weight of the students of a)
clase).
On their part, quantitative variables can be classified into
discrete and continuous:
Discrete: they can only take integer values (1, 2, 8, -4, etc.).
example: number of siblings (can be 1, 2, 3...., etc., but, for
example, it can never be 3.45).
Continuous: they can take any real value within an interval.
For example, the speed of a vehicle can be 80.3 km/h, 94.57
km/h...etc.
When studying the behavior of a variable, it is necessary to distinguish
the following concepts:
Individual: any element that carries information about it
phenomenon that is studied. Thus, if we study the height of children of
a class, each student is an individual; if we study the price of the
housing, each housing is an individual.
Population: a set of all individuals (people, objects,
animals, etc.) that carry information about the phenomenon that
study. For example, if we study the price of housing in a
city, the population will be the total number of housing units in that city.
Absolute Frequencies
x x x x x
X1 n1 n1 f1 = n1 / n f1
X2 n2 n1 + n2 f2 = n2 / n f1 + f2
n1 + n2 + .. + fn-1 = nn-1 /
Xn-1 nn-1 f1 + f2 + .. + fn-1
nn-1 n
Xn nn Sn fn = nn / n Sf
x x x x x x
x x x x x
If the values that the variable takes are very diverse and each one of them
it occurs very few times, so it is advisable to group them by intervals,
otherwise we would obtain a very frequency table
extensive that would contribute very little value for synthesis purposes (as it is
you will see in the next lesson.
Frequency distributions
grouped
Let's suppose we measure the height of the inhabitants of a household.
and we obtain the following results (cm):
Depending on the type of data being analyzed, it will be more appropriate to use the
arithmetic mean or geometric mean.
The geometric mean is commonly used in data series such as types of
annual interest, inflation, etc., where the value of each year has a
multiplicative effect compared to previous years. In any case, the
The arithmetic mean is the most commonly used measure of central tendency.
The most positive aspect of the average is that all are used in its calculation.
values of the series, so no information is lost.
However, it presents the problem that its value (both in the case of
the arithmetic mean (as geometric) can be greatly influenced by
extreme values, which deviate excessively from the rest of the series. These
anomalous values could greatly influence the value of the
media, losing this representativeness.
2.- Median: it is the value of the data series that is located exactly
in the center of the sample (50% of values are lower and another 50%
they are superior.
They do not present the problem of being influenced by extreme values.
but instead does not use all the information from the series in its calculation
data (does not weight each value by the number of times it has been
repeated).
3.- Mode: it is the value that occurs most frequently in the sample.
Example: we are going to use the frequency distribution table with the
data on the height of the students that we saw in lesson 2.
1.-Arithmetic mean:
Xm 1,253
Then:
Xm 1,253
In this example, the arithmetic mean and the geometric mean coincide,
but it doesn't always have to be this way.
3.- Median:
The median of this sample is 1.26 cm, as 50% is below it.
of the values and above the other 50%. This can be seen when analyzing the
column of accumulated relative frequencies.
In this example, since the value 1.26 is repeated 3 times, the mean
it would be located exactly between the first and second value of this
group, since the division between these two values is found within the
50% lower and 50% upper.
4.- Fashion:
There are 3 values that repeat 4 times: 1.21, 1.22, and 1.28.
Therefore, this account would have 3 modes.