Statistical Terms and Definitions Guide
Statistical Terms and Definitions Guide
Random
Founded on the intervention of chance or independently of others.
events.
Profile analysis. Method for analyzing data from the semantic differential, in
which an arithmetic mean or median is calculated for each set of
polar opposites and for each evaluated object.
Regression analysis:
Statistical procedure to analyze the association relationships between
a metric dependent variable and one or more independent variables.
Attributes:
Qualitative variables that only have categories.
Autocorrelation:
It is the correlation that exists between a lagged variable one or more.
periods and the same variable.
Beta:
Probability of committing a type II error.
Census:
Complete count of the elements of a population or study objects.
Certainty:
Decision-making environment in which there is only one state of nature.
Coefficient of determination:
Measure of the proportion of variation in Y, the dependent variable, which is
explained by the regression line, that is, by the relationship of y with the
independent variable.
Coefficient of variation:
Relative measure of dispersion, comparable by means of distributions.
different, which expresses the standard deviation as a percentage of the
media.
Rank correlation:
Method for conducting correlation analysis when the data is not
available in numerical form, but when the information is sufficient
to classify the data.
Serial correlation:
It exists when successive observations over time are related.
between each other.
Correlation:
It is a measure of the relationship between two or more variables. Correlation can
take values between -1 and +1. The value of -1 represents a correlation
perfect negative while a value of +1 represents a correlation
perfect positive. A value of 0 represents a lack of correlation.
Covariance:
Systematic relationship between two variables, in which the change in one implies
a corresponding change in the other.
Quartile:
Percentile whose value indicates its proportion is a multiple of 25. First
the first quartile is the 25th percentile, the second quartile is the median, the third quartile is the
75th percentile.
Questionnaire:
Structured technique for collecting data, consisting of a series of
questions, written or oral, that an interviewee must answer.
Kurtosis:
The degree of sharpness of a point distribution.
Continuous data:
Data that can move from one class to the next without interruption and that
they can be expressed as whole or fractional numbers.
Discrete data:
Data that does not move from one class to the next without interruption;
this is, where the categories represent different values or accounts that
They can be represented by integers.
External data:
Data obtained from a source different from the organization for the
which is being carried out the investigation.
Primary data:
Data generated by the researcher to be applied specifically to the
research problem.
Secondary data:
Data collected for a purpose different from the problem being addressed.
driving.
Raw data:
Information before being organized by statistical methods.
Data. Collection of any number of related observations about
one or more variables.
Decile:
Percentile whose value indicating its proportion is a multiple of ten.
Percentile 10 is the first decile, percentile is the second decile, etc.
Deflation of prices:
It is the process by which terms of a series are expressed in colones.
constants.
Data cleansing:
Extensive and thorough reviews for consistency and management of the
unanswered questions.
Standard deviation:
Positive square root of the variance; measure of dispersion with the same
units that the original data, rather than in squared units
What is the variance in?
Dispersion:
The spread or variability of a data set.
Asymmetric distribution:
It occurs when the distribution of a data set results in a
average, a median, and a mode with different values; also
considered as a 'skewed' distribution.
Bimodal distribution:
Distribution of data points in which two values are presented with more
frequencies than the other elements of the dataset.
Binomial distribution:
Distribution that describes the results of an experiment known as
Bernoulli process.
Frequency distribution:
Mathematical distribution whose objective is to obtain a count of the number of
responses associated with the different values of a variable and express
these counts in terms of percentages.
Chi-square distribution:
Asymmetric distribution whose shape depends solely on the number of
degrees of freedom. As the number of degrees of
freedom, the distribution of chi-squared becomes more symmetrical.
Sample distribution:
The distribution of the values of the sample statistic (calculated
for each possible sample), that can be taken from the target population of
agreement with a specific sampling plan.
Poisson distribution:
Discrete distribution in which the probability of occurrence of an event
in a very small interval is also a small number, the probability
that two or more of these events occur within the same interval
is effectively equal to zero, and the probability of the occurrence of the event
within the given period is independent of when it occurs
period.
Probability distribution:
List of the results of an experiment with the probabilities that are
they would expect to see associated with each result.
Distribution F:
Family of distributions differentiated by two parameters (degrees of
freedom of the numerator, degrees of freedom of the denominator), used
mainly to test hypotheses regarding variances.
Hypergeometric distribution:
The correct distribution to calculate consumer risk; is often
it approximates it using the binomial distribution.
Chi-square distribution:
Family of probability distributions, differentiated by their degrees of
freedom, which is used to test a certain number of different hypotheses
about variances, proportions, and goodness of fit of distributions.
Normal distribution:
Distribution of a continuous random variable that has a single peak curve
and bell-shaped. The average falls in the center of the distribution and the
the curve is symmetrical with respect to a vertical line that passes through the mean.
The two ends extend indefinitely, never touching the axis.
horizontal. Base for classical statistical inference that has the shape of
bell-shaped and symmetrical appearance. All of its measures of central tendency are
identical.
Student's t distribution:
Family of probability distribution that are distinguished by their degrees of
individual freedoms; it is similar, in form, to the normal distribution; and it
used when the population standard deviation is unknown and the
the sample size is relatively small (< 30).
Uniform distribution:
It is a frequency distribution of the set of non-negative integers.
The frequency assigned to any of the non-negative integers is 1, and the
the measure of the frequency of any set A of non-negative integers is its
measure of counting.
Frequency distributions:
Organized deployment of data showing the number of observations
of the dataset that corresponds to each of the classes in a set
of mutually exclusive and collectively exhaustive classes.
Division of total variation:
In unidirectional ANOVA, separation of observed variation in the
dependent variable in the variation due to the independent variables
plus the variation due to the error.
Domains:
They denote subclasses that have been specifically planned in the design of
the sample.
Market surveys:
A market survey can serve as a means to assess the
impact that the product is having in the market. It can be done
questions that provide feedback on the design, the process or the quality of
materials used.
Survey:
Systematic collection of information about a defined population
to study its characteristics, through the game of forms
applied to a sample of population units. The survey
thus constitutes the basis of the statistical information system, allowing
provide complete and reliable data.
Random error:
Error that arises from differences or random changes in the interviewees or
measuring situations.
Measurement error:
The variation in the information that the researcher and the information that
generate the measurement process used.
Sampling error:
Error or variation between sample statistics due to chance; that is,
differences between each sample and the population, and between several samples that
they are due solely to the elements we choose for the sample.
Standard error:
The standard deviation of the sampling distribution of a statistic.
Sampling error:
Difference between the observed statistic of the probabilistic sample and the
population parameter.
Sample space:
Set of all possible outcomes of an experiment.
Hope:
The expectation (expected value or mean) of a discrete random variable is
the sum of the products of their values by their associated probabilities.
Statistics:
Branch of mathematics that deals with collecting, organizing, and analyzing data
numerical and helps to solve problems such as experimental design
and decision-making.
Chi-square statistic:
Data used to test the statistical significance of the association
observed in a cross-tabulation. It helps us determine if there is a
systematic association between the two variables.
Test statistic:
Measure of how close the sample is to the null hypothesis. Often,
it follows a well-known distribution, such as the normal, Student's t, or chi-square
square.
Descriptive statistics:
It is the science that analyzes, organizes, compiles, and interprets information
qualitative in graphs or tables and is responsible for establishing the parameters
what defines a population.
Inferential statistics:
It is the type of statistics that interprets information in such a way that
can lead us to draw valid conclusions from the study of a
sample.
Statistic F:
Relationship of the variances of two samples.
t statistic:
Statistics that assumes the variation has a symmetric distribution in
bell-shaped, where the mean (or is presumed to be known) and
The population variance is estimated from the sample.
Statistics:
Science that deals with the development and application of efficient methods of
collection, elaboration, presentation, analysis, and interpretation of data
Numerical. Measurements that describe the characteristics of a sample.
Statistician:
Brief description of a measure in the selected sample.
Event:
One or more of the possible outcomes of doing something, or one of the outcomes.
possible to conduct an experiment.
Independent events:
Two events are independent if the knowledge that one
it will happen or has already happened does not affect the probability of the other; more
Interval Scale:
Measurement scale that allows for calculating differences (in addition to assigning
names and order) among the data.
Nominal Scale.
Measurement scale that only allows assigning names to the data.
Ordinal Scale:
Measurement scale that allows assigning order (in addition to names) to the
data.
Experiment:
Process of manipulating or observing data from one or more variables
independent and measure their effect on one or more dependent variables,
while controlling the extrinsic variables.
Absolute frequency:
It is the number of times a certain event occurs, in the proportion of
times that the event occurs in relation to the number of times it could
have occurred.
Relative frequency:
Percentage of total elements that appear in a given
category.
Cumulative frequency:
In a frequency table, when the variable is quantitative and, therefore,
the different values of the table are displayed in ascending order, they
calls accumulated frequency of a value of the variable to the sum of its
frequency with the frequencies of the previous values.
Degrees of freedom:
Number of values in a sample that we can specify freely,
after we already know something about that sample.
Line graph:
Graphic presentation of magnitude in the dataset shown by the
pending of a line (or lines) that has been situated concerning a
horizontal or vertical scale.
Pie chart:
Circle that divides into sections in such a way that the size of each one of
these correspond to a proportion of the total.
Bar chart:
Graphic presentation of magnitude in the dataset, represented by
the length of different bars drawn with reference to a scale
horizontal or vertical.
Histogram graph:
Graphical representation of a dataset formed by rectangles,
from a frequency table whose variable is numerical, so that each
the sample data occupies the same area as the others.
Heteroscedasticity:
It occurs when the errors or residuals do not have a constant variance.
through a full range of values.
Alternative hypothesis:
Assertion that some difference or effect is expected. The acceptance of the
The alternative hypothesis will lead to changes in opinions or actions.
Null hypothesis:
Statement in which no difference or effect is expected. If the
the null hypothesis is not rejected, no changes will be made.
Simple hypothesis:
It is the one that completely specifies the distribution of the population.
principal
Hypothesis:
Unproven statement or proposition about a factor or phenomenon of
interest to the researcher. A statistical hypothesis to a statement
regarding a population and is usually a statement regarding one or more
population parameters.
Indicator:
It is a number or an index (a value on a scale of
measure) derived from the observation of a set of phenomena. Variable
that allows evaluating certain changes over time.
Uncertainty:
Lack of complete knowledge about the possible outcomes of the
actions, with ignorance of the probabilities of the possible
results.
Statistical inference:
Process of generalizing the sample results to the results of the
population.
Classification information:
Socioeconomic and demographic characteristics that are used to
classify the interviewees.
Identification information:
Type of information obtained in a questionnaire that includes the
name, address, and phone number.
Research report:
Presentation of research results directed to an audience
specific to achieve a certain purpose.
Confidence interval:
Range of values that has been assigned a probability of including the
real value of the population parameter.
Sample interval:
Size of the distance between the selected elements in a sampling
systematic; the reciprocal of the sample fraction.
Confidence limits:
Lower and upper limits of a confidence interval.
Regression line:
A line fitted to a group of points to estimate the relationship between two
variables.
Media:
The average; a value obtained by adding all elements in a
set and divide them by the number of elements.
Median:
Measure of central tendency that is given as the value above which the
half of the values and below which falls the other half.
Measures of dispersion:
Statistics that express criteria to describe the relative location of
the data.
Measures of location:
Statistics that describe general characteristics of the location of the
data within a set of possible values.
Distance measure:
Measure of dispersion in terms of the difference between two values of
dataset.
Measures of tendency:
Statistic that describes a location within a set of data. The
Measures of the trend describe the center of the distribution.
Measures of variability:
Statistic that indicates the dispersion of the distribution.
Fashion:
Measure of central tendency that is given as the value that occurs most frequently.
frequency in the distribution of a sample.
Sample:
It is a representative part that reflects the similarities and differences of the
population and that are important for research, it could be said that it is
the selected subset of the population; that is why a
subgroup that is sufficiently representative, but it has to have
data that may serve for generalized conclusions.
Random sampling:
Random sampling techniques ensure that each element in the
the population of interest has a (non-zero) probability of being included in the
sample.
Multicollinearity:
Statistical problem that arises in multiple regression analysis, in
the reliability of the regression coefficients is diminished
due to a high level of correlation between the independent variables.
Confidence level:
Probability that statisticians associate with an interval estimate of
a population parameter. This indicates how certain they are that the
Interval estimation will include the population parameter.
Significance level:
Value that indicates the percentage of sample values that are outside ce
certain limits, assuming that the null hypothesis is correct, that is, it is
it deals with the probability of rejecting the null hypothesis when it is true.
Observation:
The systematic recording of behavioral patterns of individuals,
objects and events in order to obtain information about the phenomenon of
interest. The act of verifying, describing, measuring something, particularly a
phenomenon, by means of instruments.
Ovoid:
Graph of a cumulative frequency distribution.
Parameter:
Variable element based on which the characteristics are specified
essentials of a phenomenon. It is a unit of unknown measure and
quantitative (such as total income, average income, total production, the
number of unemployed) used by researchers to study
an entire population or other areas of interest. Values that describe the
characteristics of a population.
Pending:
Constant for any given line whose value represents how much the
change of unit of the independent variable changes the variable
dependent.
Survey population:
Represents the study population minus non-response and coverage
deficient.
Finite population:
Population that has a set or limited size.
Infinite population:
Population in which it is theoretically impossible to observe all the elements.
Target population:
Set of elements or objects that contains the information you are looking for.
researcher and about which inferences must be made.
Population:
Set of all elements that share a common group of
characteristics, and shape the universe for the purpose of the problem of
Sample population:
Subset of the Target Population whose elements are susceptible to
to be selected for their study. Usually referred to as a population.
Frequency polygon:
Line that connects the midpoints of each class of a data set,
drawn at the height corresponding to the frequency of the data.
Weighting:
Statistical adjustment to the data in which each case or interviewee in the
a relative value is assigned to the database in order to reflect its importance
relative to other cases or interviewees.
Percentage:
Quotient of a current value over a base value whose result is
multiplied by one hundred.
Percentage:
Quotient of an actual value divided by a base value whose result is
multiplied by one hundred.
Probability:
The possibility that something will happen.
Moving average:
It is obtained by finding the average of a specific set of values and
using it later to forecast the next period.
Average:
Measure of central tendency that is obtained by summing the data and
dividing them by the number of them.
Weighted Average:
Average of data assigned different relative importance.
Quintile:
Percentile whose value indicating its proportion is a multiple of twenty. First
quintile is the 20th percentile, second is the 40th percentile, etc.
Interquartile range:
Range of a distribution that indicates the 50% intermediate of the
observations.
Rank
Difference between the lowest and highest values of a distribution.
Curvilinear regression. Association between two variables described by a
curved line.
Discriminant analysis:
Regression procedure in which the predictor variables enter or
they leave the regression equation one at a time.
Multiple regression:
Statistical technique that simultaneously develops a relationship
mathematics between two or more independent variables and one variable
dependent with interval scale.
Regression:
General process that consists of predicting one variable based on another
through statistical means, using previous data.
Inverse relationship:
Relationship between two variables in which, when the variable increases,
independent, the dependent variable decreases.
Linear relationship:
A particular type of association between two variables that can be described
mathematically through a straight line.
Residual:
Difference between the observed value of the dependent variable and the value
projected by the regression equation.
Tracking signal:
Understand the calculation of some measurement of error over time and the
establishment of limits, so that when the cumulative error exceeds
if that limit is reached, the forecaster should be alerted.
Time series:
It consists of data collected, registered, or observed in increments.
successive time intervals.
Stationary series:
It is one whose stationary value does not change over time.
Time series:
Information accumulated at regular intervals, and statistical methods
used to determine patterns in such data.
Bias:
It is the human error, intentional or unintentional, that occurs when executing the
sampling and that is generally systematic. This error is minimized to
through training, education, and motivation programs
inspectors and collectors of statistical information.
Frequency Tables:
Table showing the number of times an item appears in a dataset
each of the classes of interest specified in the data journey.
Tabulation:
It is the procedure by which the dataset is organized according to
the categories of a certain characteristic.
Sample size:
Number of units to be included in a study.
Fertility rate:
Birth numbers occurred in a certain population during a period,
among the female population of childbearing age.
Bayes' theorem:
Formula for calculating conditional probability under conditions of
statistical dependence.
Chebyshev's Theorem:
It does not matter what shape the distribution has, at least 75% of the values of
the population will fall within two standard deviations from the
media, and at least 89% will fall within three standard deviations.
Variable:
Property or trait of a fact or object (not constant) by which it can be
characterized or classified. Representation of a characteristic, of a
attribute, which possesses some reality.
Critical value:
Value of the standard statistic (z or t) beyond which we reject the
null hypothesis; the boundary between the regions of acceptance and rejection.
Sample value:
It is an estimate calculated from the (n) elements in the sample.
It is a random variable that depends on the sample design and of the
particular combination of the selected elements.
Value of the population:
It is a numerical expression that synthesizes the values of one or several
characteristics of the N elements of a complete population; it is a
summary measure of a quality of the distribution of the variable or variables
in the defined population.
Expected value:
It is the average value of a random variable in many trials.
observations.
Value z:
Number of standard errors by which a point is away from the
media.
Qualitative variables:
They express different qualities, characteristics, or modalities.
(each modality presented is referred to as a category or attribute). The
measurement consists of measuring these attributes. The variables
qualitative are divided:
Ordinal qualitative variable: How much they take different values
ordered, following an established scale. The ordinal variables
they can be dichotomous (They can only take two possible values,
"YES" or "NO" or "MAN" or "WOMAN") or it can also be
polytomous (when they can take 3 or more values, example: mild,
moderate, serious).
Nominal qualitative variable: When the values it takes cannot
to be subject to a criterion of order (Like colors or place of
residence).
Quantitative variables:
They are expressed through numerical amounts that result from
to measure or to count, they can be:
Discrete variable: It presents interruptions or separations, in the
scale of values that this variable can take, which indicate the
absence of values among the different specific values that the
variable can take. It can only take integer values.
Continuous variable: This variable can take any value, within
a range of specific values. It can take any real value
within an interval.
Random variable:
It is a real function in a probability space: it corresponds to each
elementary event with a real number, the value of the random variable in that
elemental event.
Dependent variable:
The variable we are trying to predict in the regression analysis.
Dependent variables:
Variables that measure the effect of the independent variables on the
test units.
Independent variables:
Variables (s) known in regression analysis.
Variance:
Mean square deviation of all values from the mean.