ANALISIS AND INTERPRETATION
OF DATA
Descriptive Tools and Inferential Tools
Common Descriptive Techniques(Discriptive Statistics)
A. Percentage
• A percentage is a way of expressing a number as a fraction of 100 (per cent
meaning "per hundred" in Greek).
• It is often indicated by using the percent sign, "%". Percentages are used to
express how large or small one quantity is, relative to another quantity.
• The first quantity usually represents a part of, or a change in, the second
quantity, which should be greater than zero. It serves two purposes in data
presentation.
• They simplify the data by reducing all numbers in a range 0 to 100. It is more
useful when the research problem calls for a comparison of several distributions
of data.
• By using percentages we can see the relative relationships and shifts in the data.
Percentages are used every one who deals with numbers.
Frequency Table
B. Frequency table
• A frequency table is a table that lists items and shows the number of times
the items occur.
• Eg:
C. Contingency table
• Contingency table is a format for presenting the relationships among variables
as percentage distributions.
• It is essentially a display format used to analyze and record the relationship
between two or more categorical variables.
• It is the categorical equivalent of the scatter plot used to analyze the
relationship between two continuous variables.
• The term contingency table was first used by the statistician Karl Pearson in
1904.
• Contingency tables will normally have as many rows.
• Example of contingency table.
D. Measures of Central Tendency.
• The central tendency of a distribution is an estimate of the "center" of a
distribution of values.
• The various measures of central tendency are;
(1) Arithmetic mean
(2) Median
(3) Mode
(4) Geometric mean
(5) Harmonic mean
Arithmetic mean
Mean is known as arithmetic mean. It is the most common measure of central
tendency. Its value is obtained by adding all items and by dividing this by the
total number of items.
Median
Median is the middle value in the distribution when it is arranged in descending
or ascending order. It is a positional average. It may be defined as the value of
that item which divides the series in to equal parts, one half contains values
greater than it and the other half contains value less than it. Therefore the series
is to be arranged in ascending or descending order.
Mode
It is the most common item of a series. Mode is the positional average and it is
not affected by the values of extreme items. Mode is defined as the value of the
variable which occurs most frequently in distribution. The chief feature of the
mode is that it is the size of that item which has the maximum frequency and is
also affected by the frequencies of the neighbouring items. It is particularly
useful in study of popular sizes.
Geometric Mean
• Geometric mean is an average, which indicates the central tendency of a set of
numbers by using the product of their values. The geometric mean is defined as
the nth root of the product of n numbers.
Harmonic mean
• Harmonic mean is a type of average that is calculated by dividing the number
of values in a data set by the sum of the reciprocals of each value
Measures of Dispersion
• The measures of central tendency indicate the central tendency of a
frequency distribution in the form of an average.
• These averages tell something about the general level of magnitude of the
distribution, but it does not show anything further about the distribution.
• Especially it fails to give an idea about the dispersion of variables. In order
to measure dispersion statistical devise named measures of dispersion is to
be calculated.
• According to Spiegel, the degree to which numerical data tend to spread
about an average value is called the variation or dispersion of data.
• The variation measure is used to test the representative character of an
average.
• The following are the important measures of dispersion.
1. Range
2. Mean deviation
1. Standard deviation
Mean deviation
• The mean deviation is a measure of dispersion based on all items of in a
distribution.
• It is the arithmetic mean of the deviation of series computed from any measure
of central tendency. It is the average of the absolute difference of values of
items from an average.
• Mean deviation is calculated by any measure of central tendency. It is an
absolute figure..
• The relative mean deviation is known as coefficient of mean deviation. The
coefficient of mean deviation can be obtained by dividing mean by the average
used for calculating mean deviation.
•
Standard Deviation
• Karl Pearson introduced the concept of standard deviation in 1893.
• It is the most important measure of dispersion and it is widely used in many
statistical formulae.
• It is commonly denoted by the symbol (sigma). It is defined as the positive
square root of the arithmetic mean of the squares of the deviation of the given
observation from their arithmetic mean. The following formula is to be applied
to find standard deviation.
•
Range
• It is the difference between third quartile (Q3) and first quartile (Q1)
• It is defined as the difference between the highest value and the lowest values in
a series.
• It gives an idea about the variability very quickly. The limitation is that the
range is affected very greatly by fluctuation of sampling. It is not suitable for
mathematical treatment. It is not considered as an appropriate measure in
research studies.
Statistical Tests/Tools for Inferential Analysis (Inferential Statistics)
TESTING OF HYPOTHESIS
The purpose of research is to solve a problem. So, the researcher should propose a
set of suggested solutions or answers to solve the problem. Such tentative solutions
formulated as propositions are known as hypothesis. A proposition is a statement
about observable phenomena that may be judged as true or false. When proposition
is made more specific it is known as hypothesis.
STEPS IN TESTING HYPOTHESIS
Testing of hypothesis is a process of examining whether the hypothesis formulated
by the researcher is valid or not. The main objective of hypothesis testing is
whether to accept or reject the hypothesis.
The various steps in testing of hypothesis involves the following :
1. Set Up a Hypothesis:
• The first step in testing of hypothesis is to set up a hypothesis about
population parameter.
• Normally, the researcher has to fix two types of hypothesis.
• They are null hypothesis and alternative hypothesis.
Null Hypothesis: - Null hypothesis is the original hypothesis. It states that
there is significant difference between the sample and population regarding
a particular matter under consideration. Null hypothesis is denoted by Ho.
Eg:States that there is no difference between two variables.
Alternative Hypothesis: - Any hypothesis other than null hypothesis is
called alternative hypothesis. When a null hypothesis is rejected, we accept
the other hypothesis, known as alternative hypothesis. Alternative
hypothesis is denoted by H1.
Eg:States that there is difference between two variables.
2. Set up a suitable level of significance:
After setting up the hypothesis, the researcher has to set up a suitable level
of significance. The level of significance is the probability with which we
may reject a null hypothesis when it is true.
For example, if level of significance is 5%, it means that in the long run, the
researcher is rejecting true null hypothesis 5 times out of every 100 times.
Level of significance is denoted by a (alpha).
Generally, the level of significance is fixed at 1% or 5%.
3. Decide a test criterion:
The third step in testing of hypothesis is to select an appropriate test
[Link] used tests are z-test, t-test, F-test, etc.
4. Calculation of test statistic:
The next step is to calculate the value of test statistic using appropriate formula.
The decision to accept or to reject a null hypothesis is made on the basis of a
statistic computed from the sample.
Such a statistic is called the test statistic.
There are different types of test statistics.
All these test statistics can be classified into two groups.
They are (a)Parametric Tests (b)Non-Parametric Tests.
Parametric test
• Parametric tests are those that make assumptions about the parameters of the
population distribution from which the sample is drawn.
Main Assumptions
• Population distribution is normal
• Levels of measurement is interval or ratio
• Data is taken as random samples.
• Parametric tests are used when the population data is normally distributed.
• Parametric tests are done based on mean and standard deviation
Non-parametric test
•Non-parametric test does not make any assumptions about the parameters of the
population distribution.
•Non-parametric tests are distribution free and can be used for normal and non-
normal data.
•When data are not distributed normally or when they are on an ordinal level of
measurement, nonparametric tests is suitable for analysis
5. Making Decision:
Finally, we may draw conclusions and take decisions.
The decision may be either to accept or reject the null hypothesis.
If the calculated value is more than the table value, we reject the null hypothesis
and accept the alternative hypothesis.
If the calculated value is less than the table value, we accept the null hypothesis.
TYPES OF ERRORS IN TESTING OF HYPOTHESIS:
In any test of hypothesis, the decision is to accept or reject a null hypothesis.
The four possibilities of the decision are:
1. Accepting a null hypothesis when it is true.
2. Rejecting a null hypothesis when it is false.
3. Rejecting a null hypothesis when it is true.
4. Accepting a null hypothesis when it is false.
Out of the above 4 possibilities, 1 and 2 are correct, while 3 and 4 are errors.
The error included in the above 3rd possibility is called type I error and
that in the 4th possibility is called type II error
Graphical Representation of data
Graphical presentation involves use of graphics, charts and other pictorial devices.
These forms are very helpful to condense large quantity of statistical data to a form
that can be quickly understood at a glimpse. Properly drawn charts and graphs
helps to understand the data easily by general public. Graphic presentation should
be cone with care and diligence. The forms used should be simple, clear, accurate
and also suitable for data.
Types Of Graphs
1. Line Graphs
A line chart graphically displays data that changes continuously over time. Each
line graph consists of points that connect data to show a trend (continuous
change). Line graphs have an x-axis and a y-axis. In the most cases, time is
distributed on the horizontal axis.
Uses of line graphs:
▪ When you want to show trends. For example, how house prices have
increased over time.
▪ When you want to make predictions based on a data history over time.
▪ When comparing two or more different variables, situations, and
information over a given period of time.
2. Bar Charts
Bar charts represent categorical data with rectangular bars (to understand what is
categorical data see categorical data examples). Bar graphs are among the most
popular types of graphs and charts in economics, statistics, marketing, and
visualization in digital customer experience. They are commonly used to compare
several categories of data.
Bar Charts Uses:
▪ When you want to display data that are grouped into nominal or ordinal
categories (see nominal vs ordinal data).
▪ To compare data among different categories.
▪ Bar charts can also show large data changes over time.
▪ Bar charts are ideal for visualizing the distribution of data when we have
more than three categories.
3. Pie Charts
When it comes to statistical types of graphs and charts, the pie chart (or the circle
chart) has a crucial place and meaning. It displays data and statistics in an easy-to-
understand ‘pie-slice’ format and illustrates numerical proportion.
Each pie slice is relative to the size of a particular category in a given group as a
whole. To say it in another way, the pie chart brakes down a group into smaller
pieces. It shows part-whole relationships.
▪ When you want to create and represent the composition of something.
▪ It is very useful for displaying nominal or ordinal categories of data.
▪ To show percentage or proportional data.
▪ When comparing areas of growth within a business such as profit.
▪ Pie charts work best for displaying data for 3 to 7 categories.
4. Histogram
A histogram shows continuous data in ordered rectangular columns (to understand
what is continuous data see our post discrete vs continuous data). Usually, there
are no gaps between the columns.
The histogram displays a frequency distribution (shape) of a data set. At first
glance, histograms look alike to bar graphs. However, there is a key difference
between them. Bar Chart represents categorical data and histogram represent
continuous data.
Histogram Uses:
▪ When the data is continuous.
▪ When you want to represent the shape of the data’s distribution.
▪ When you want to see whether the outputs of two or more processes are
different.
▪ To summarize large data sets graphically.
▪ To communicate the data distribution quickly to others.
5. Scatter plot
• The scatter plot is an X-Y diagram that shows a relationship between two
variables. It is used to plot data points on a vertical and a horizontal axis. The
purpose is to show how much one variable affects another.
• Usually, when there is a relationship between 2 variables, the first one is called
independent. The second variable is called dependent because its values depend
on the first variable.
• The below Scatter plot presents data for 7 online stores, their monthly e-
commerce sales, and online advertising costs for the last year.
6. Pictographs
• The pictograph or a pictogram is one of the more visually appealing types of
graphs and charts that display numerical information with the use of icons or
picture symbols to represent data sets.
• They are very easy to read statistical way of data visualization. A pictogram
shows the frequency of data as images or symbols. Each image/symbol may
represent one or more units of a given dataset.
• The following pictographic represents the number of computers sold by a
business company for the period from January to March.
INTERPRETATION OF DATA
• Analysis and interpretation are the integral parts of research. The important.
objective of analysis of data is to provide answers to the questions activated
in research.
• Interpretation refers drawing inferences from the collected facts after
analytical study.
• According to C. William Emory interpretation has two major aspects namely
establishing continuity in research through linking the results of a given
study with those of another and the establishment of some relationship with
the collected data.
• Analysis is not complete without interpretation. So, both are inter dependent.
In the words of Jahada and Cook scientific interpretation seeks for
relationship between the data of study and between the study findings and
other scientific knowledge.
• It helps the researcher to understand the conceptual principles that works
underneath his findings. He can connect his findings with those of other
studies having relationship with other established concepts.
Techniques of Interpretation
• Interpretation is not an easy job. Technical skills are required to interpret
data. A number of methods are used to interpret data. The following are
some of the methods.
• 1 Relationship: It is one of the most fundamental bases for interpretation of
analyzed data. There may be three types of relationship such as symmetrical
relationship, reciprocal relationship and asymmetrical relationship. The data
can be interpreted on the basis of these relationships.
• 2. Proportion: It is another method used of interpretation of data. It is
generally ascertained to determine the nature and form of absolute changes
in the subject of study.
• 3. Percentages: It is used to make a comparison between two or more series
of data. Percentages are also used to describe the relationship between
variables.
• [Link]: The average or other measures of comparison are used for
interpretations of analyzed data. There are three forms of averages such
mean, median and mode. For analyzing and interpreting of long statistical
table we will use averages or other measures to interpret the results.