0% found this document useful (0 votes)
8 views29 pages

Stochastic Frontier Analysis with SFAMB

Stochastic frontier analysis 2

Uploaded by

Aanshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views29 pages

Stochastic Frontier Analysis with SFAMB

Stochastic frontier analysis 2

Uploaded by

Aanshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

JSS Journal of Statistical Software

November 2017, Volume 81, Issue 6. doi: 10.18637/jss.v081.i06

Stochastic Frontier Analysis Using SFAMB for Ox

Jonathan Holtkamp Bernhard Brümmer


University of Goettingen University of Goettingen

Abstract
SFAMB is a flexible econometric tool designed for the estimation of stochastic frontier
models. Ox is a matrix language used in different modules, with a console version freely
available to academic users. This article provides a brief introduction to the field of
stochastic frontier analysis, with examples of code (input and output) as well as a technical
documentation of member functions. SFAMB provides frontier models for both cross-
sectional data and panel data (focusing on fixed effects models). Member functions can
be extended depending on the needs of the user.

Keywords: stochastic frontier analysis, panel data, Ox.

1. Introduction
SFAMB (stochastic frontier analysis using ‘Modelbase’) is a package for estimating stochastic
frontier production (as well as cost, distance, and profit) functions. It provides specifications
for both cross-sectional data and panel data. SFAMB is written in Ox (Doornik 2009) and is
operated by writing small programs (scripts).
The console versions of Ox are free for research and educational purposes. Ox Console uses
OxEdit to run programs, while the commercial version of the programming language, Ox
Professional, uses the graphical user environment OxMetrics instead.
The structure of the paper is as follows. In the next section, we briefly introduce the econo-
metric foundations and related literature. The first part focuses on the theory of models for
cross-sectional and panel data. A subsection presents the specifications that are available
in SFAMB and mentions some related other software. Section 3 explains the usage of the
package, which includes data structure, model formulation, and model output. Furthermore,
it provides a detailed list of class member functions. For illustration, we present practical
examples using real world data (Section 4) which are distributed with the package. We men-
tion some possible extensions of SFAMB in Section 5. Finally, a technical appendix provides
a brief overview of some underlying workings (Appendix A).
2 Stochastic Frontier Analysis Using SFAMB for Ox

2. Econometric methods of stochastic frontier analysis

2.1. Cross-sectional data

Basic approach – POOLED model


This section provides a brief introduction to stochastic frontier (SF) techniques. A more de-
tailed introduction can be found in Coelli, Rao, O’Donnell, and Battese (2005) or Bogetoft
and Otto (2010). More advanced material is covered in Kumbhakar and Lovell (2000). The
basic problem in efficiency analysis lies in the estimation of an unobservable frontier (produc-
tion, distance or cost) function from observable input and output data, together with price
data when necessary. Standard estimation techniques like OLS are inappropriate in this set-
ting as they aim to describe average relationships, which are not the focus of an efficiency
model.
The basic approach was simultaneously developed by Aigner, Lovell, and Schmidt (1977)
(ALS), and Meeusen and Van den Broeck (1977). The following example of a production
frontier highlights its most important characteristics. The basic production frontier model is
given by:
yi = α + β > xi + vi − ui . (1)
On the left hand side, yi is the output (or some transformation of the output) of observation
i (i = 1, 2, . . . , N ). On the right hand side, xi is a K × 1 vector of inputs that produces
output yi , and the vector β represents technology parameters to be estimated. The most
commonly used transformation of the variables is the natural logarithm. The crucial part of
this formulation is the composed error term given by i = vi −ui , where vi represents statistical
noise and ui represents inefficiency. Both error components are assumed to be independent of
each other. Estimation is possible by means of maximum likelihood estimation (MLE) where
distributional assumptions on the error components are required. The noise component is
iid
a conventional two-sided error, distributed as vi ∼ N (0, σv2 ). The inefficiency component is
a non-negative disturbance that can be modeled using several distributions.1 However, the
truncated normal and half-normal distributions are most frequently used and implemented in
SFAMB (see Table 2). In case of the normal-truncated normal SF model, the random variable
iid
ui is distributed as ui ∼ N + (µ, σu2 ). If µ is set to zero, the model becomes the normal-half
normal SF model.
Extensions of the basic SF approach allow us to model the location and scale of the ineffi-
ciency distribution in a more flexible way. The corresponding covariates are often labeled as
z-variables. Alvarez, Amsler, Orea, and Schmidt (2006) offer a comprehensive discussion of
this topic and the so-called “scaling property”.
Another useful overview is given by Lai and Huang (2010) who summarize and categorize
several well-known models.2 The so-called KGMHLBC3 model parameterizes µ and originally
assumes the following inefficiency distribution, ui ∼ N + (µ0 + θ > z i , σu2 ). If µ is set to zero
1
This text cannot provide a full overview of all relevant models. Kumbhakar and Lovell (2000) and more
recently Greene (2008) provide very detailed surveys on applied SF models.
2
The following abbreviations used by Lai and Huang (2010) were already used by Alvarez et al. (2006).
The abbreviation KGMHLBC was introduced by Wang and Schmidt (2002).
3
Kumbhakar, Gosh, and McGuckin (1991); Huang and Liu (1994); Battese and Coelli (1995).
Journal of Statistical Software 3

and the scale is modeled using an exponential form, it becomes the RSCFG4 model, where
ui ∼ N + (0, exp(2(δ0 + δ > z i ))). The combination of both models leads to the following form,
ui ∼ N + (µi = µ0 + θ > z i , σu,i
2 = exp(2(δ + δ > z ))), that according to Lai and Huang (2010)
0 i
could be labeled as a generalized linear mean (GLM) model.5
Jondrow, Lovell, Materov, and Schmidt (1982) present a point estimator of inefficiency, given
by E(ui |i ). Battese and Coelli (1988) show that if the dependent variable is in logarithms,
a more appropriate estimator is the point estimator of technical efficiency, given by TE i =
E(exp(−ui )|i ).

2.2. Panel data

Unobserved heterogeneity – LSDV model


Panel data provide additional information because each individual is observed over a certain
time period, where periods are indexed with t (t = 1, 2, . . . , T ). The respective production
function model, estimated by OLS, can be written as:

yit = αi + β > xit + vit . (2)

iid
This formulation includes the time dimension, a conventional two-sided error vit ∼ N (0, σv2 ),
and an individual intercept αi . The model’s virtue originates from the identification of these N
time-invariant parameters. These “fixed” effects absorb unmeasured time-invariant individual
attributes, and hence, the model accounts for unobserved heterogeneity. One commonly used
name of this approach is “least squares with dummy variables” (LSDV).
Instead of estimating the dummy variables, the conventional strategy is to apply within-
transformation to the dependent and independent variable(s), e.g., for an independent vari-
able:

x̃it = xit − x̄i .

The observation on xit is transformed by subtracting the respective individual mean x̄i . The
resulting variable x̃it is a deviation from the mean. This procedure eliminates the individual
effects because α̃i = αi − αi = 0. The transformed variables are used for model estimation.
After estimation, the individual effects are calculated as:
>
α̂i = ȳi − β̂ x̄i .

Schmidt and Sickles (1984) use the model in a frontier context. They interpret the individual
with the highest intercept as 100% technically efficient and determine the inefficiency of the
remaining individuals as ui = max(α̂) − α̂i . Accordingly, efficiency scores are time-invariant
and are given by TE i = E(exp(−ui )).
4
Reifschneider and Stevenson (1991); Caudill, Ford, and Gropper (1995).
5
Actually, they label models that include the KGMHLBC form as generalized exponential mean (GEM)
models. The reason is that they refer to the exponential form of the model that has been proposed by Alvarez
et al. (2006). Following the categorization of Lai and Huang (2010), the model implemented in SFAMB is a
GLM model. Furthermore, note that in SFAMB the respective scale parameter in the POOLED model is (the
2
natural logarithm of) σu,i , and not σu,i . While σu2 is often used, the original formulation of CFG involved σu .
4 Stochastic Frontier Analysis Using SFAMB for Ox

If (in)efficiency should be modeled as time-invariant or not, depends on the objective and


empirical application (see Greene 2008 for a comprehensive discussion). Nevertheless, in a
longer panel, the assumption of time-varying inefficiency will usually be attractive. This fact
has motivated extensions of the above model as well as the development of other approaches.
One famous example is the model of Battese and Coelli (1992) (BC92) that has been applied
in many empirical studies. This model specifies uit = exp(−η(t − T )) × ui , and nests the case
of persistent inefficiency (if η = 0). SFAMB does not support this model, but other packages
do (see Table 3).

Unobserved heterogeneity in SFA: Dummy variables – TFE model


The approach of Schmidt and Sickles (1984) is a reinterpretation of the well-known panel
data model. However, there is no differentiation between heterogeneity and inefficiency. A
complete panel SF model takes both components into account:

yit = αi + β > xit + vit − uit . (3)

This model is proposed by Greene (2005, p. 277) who labels it as the “true fixed effects
[TFE] stochastic frontier model”. Estimation of this model requires the inclusion of all N
dummy variables, i.e., the number of intercepts to be estimated corresponds to the number
of individuals. With fixed T , the estimate of the error variance is inconsistent (incidental
parameters problem). Furthermore, it is likely to be biased as pointed out by Chen, Schmidt,
and Wang (2014). This is a relevant issue since this estimate is required for the assessment
of inefficiency.

Elimination of dummies – WT model


Wang and Ho (2010) propose an extension to overcome the incidental parameters problem.
Their model is based on deviations from means (within-transformation; WT):6

ỹit = β > x̃it + ṽit − ũit . (4)

This represents either a normal-truncated normal or a normal-half normal SF model where


iid
the noise component is distributed as vit ∼ N (0, σv2 ). Let the vector of transformed vit be
>
denoted by ṽ i = (ṽi1 , . . . , ṽiT ) . This vector has a multivariate normal distribution, i.e.,
ṽ i ∼ MN (0, Π), where Π is a T × T covariance matrix.7
The specification of time-varying inefficiency (uit ) is more involved. Here, the (“basic”) ineffi-
iid
ciency component is assumed to be producer-specific, but time-invariant, i.e., u∗i ∼ N + (µ, σu2 ),
where µ is equal to zero in the case of a half-normal distribution. Inefficiency varies over time
by means of a scaling function:

uit = u∗i × hit = u∗i × f (δ > z it ) = u∗i × exp(δ > z it ),

where z it is a vector of time-varying, producer-specific covariates. The transformed ineffi-


ciency component results from the transformation of the scaling function:

ũit = u∗i × h̃it .


6
They also demonstrate how the model can be estimated by first-differencing.
7
The panel may be unbalanced.
Journal of Statistical Software 5

Wang and Ho (2010) present the conditional expectation of uit in their Equation 30; efficiency
estimates are given by TE it = E(exp(−uit |˜
it )). The individual effects are calculated as:
>
α̂i = ȳi − β̂ x̄i + ūi .

Consistent estimation with time-varying inefficiency – CFE model


Consistent estimation of the fixed effects SF model given in Equation 3 is demonstrated by
Chen et al. (2014). Their approach is also based on deviations from means (Equation 4), but
the CFE (consistent fixed effects) model allows inefficiency to vary over individuals and time,
without an auxiliary function.
The approach is characterized by two features, within-transformation and the T −1 deviations,
as well as by the use of a more general distributional theory. Firstly, within-transformation
removes the incidental parameters. Secondly, the model’s likelihood function is derived from
the first T − 1 deviations, i.e., from the vector ˜∗i = (˜
i1 , . . . , ˜i,T −1 )> . This strategy achieves
8
an implicit correction of the error variance. The approach is based on the closed skew normal
(CSN) distribution.9
The composed error,  = v − u, has a skewed distribution (to the left) due to the non-
negativeness of u. Accordingly, the standard (half-normal) SF model has a skew normal
distribution, with skewness parameter λ and density:

2   
f () = φ Φ −λ .
σ σ σ
While the skew normal distribution is a generalization of the normal distribution, it can be
generalized itself by using the CSN distribution. The composed error has a CSN distribution,
which is expressed by:

λ
it ∼ CSN 1,1 (0, σ 2 , − , 0, 1).
σ
The density of a CSN p,q -distribution includes a p-dimensional pdf and a q-dimensional cdf of a
normal distribution. The five associated parameters describe location, scale, and skewness, as
well as the mean vector and covariance matrix in the cdf. With panel data, the T -dimensional
vector i = (i1 , . . . , iT )> is distributed as:

λ
i ∼ CSN T,T (0T , σ 2 IT , − IT , 0T , IT ),
σ
where I is the identity matrix. Chen et al. (2014, p. 67) make use of the fact that the CSN
distribution is “closed under linear combination”, and partition the vector i into its mean ¯i
and its first T −1 deviations ˜∗i . The model’s likelihood function is derived from ˜∗i . Therefore,
it is free of incidental parameters, and the parameters to be estimated are β, λ, and σ 2 – as
8
With regards to the degrees of freedom, the correction accounts for the N individuals: df = N T −N −K =
N (T − 1) − K.
9
Chen et al. (2014) explain how the SF model is related to the CSN distribution and present the required
properties of CSN distributed random variables. Another plain introduction to the CSN distribution in the
SF context is provided by Brorsen and Kim (2013).
6 Stochastic Frontier Analysis Using SFAMB for Ox

SetMethod Example
SFA – cross section POOLED [Link]
Least squares with dummies LSDV [Link]
SFA – with dummies TFE [Link]
SFA – within-transformation WT [Link]
SFA – consistent fixed effects CFE [Link]

Table 1: Available estimators, with name of estimation method and sample files.

in the basic SF model.10 ¯i and ˜∗i are not independent, unless λ = 0. If λ = 0 the model
becomes the fixed effects model with normal error (LSDV model).
In order to assess the inefficiency, the composed error term must be recovered:
>
it = yit − ŷit = yit − β̂ xit − α̂i .

There are two ways to calculate α̂i . The one used here is labeled as the mean-adjusted
estimate by Chen et al. (2014):
r
M > 2
α̂i = ȳi − β̂ x̄i + σ̂u . (5)
π

2.3. Software
SFAMB provides frontier models of ALS (with extensions), Schmidt and Sickles (1984),
Greene (2005), Wang and Ho (2010) as well as Chen et al. (2014). The available estima-
tors are listed in Table 1.
There are several other software packages that incorporate (some of) these estimators. Ta-
bles 2 and 3 provide an overview of the different specifications that are available in SFAMB
and other software.11
LIMDEP (Econometric Software Inc. 2014) and Stata (StataCorp LP 2015) are comprehensive
commercial packages that implement frontier techniques in their standard distributions. In
case of Stata, there are additional third-party add-ons such as those of Wang (2012) or Belotti,
Daidone, Ilardi, and Atella (2013).
Hughes (2008) has written two free packages, sfa hetmod and sfa mod, that can be used with
gretl (Cottrell and Lucchetti 2014) and include variations of the standard model.
The recent package spfrontier (Pavlyuk 2016) is concerned with (the specific family of) spatial
SF models. It is implemented in R (R Core Team 2017) and allows for various specifications.
The first program to implement frontier techniques was Frontier (Coelli 1996). Later, the
original code was transferred to R in the package frontier by Coelli and Henningsen (2017).
This package provides the standard model (ALS), the extension of KGMHLBC as well as the
model of BC92. Its functionality is augmented by some additional options (e.g., for calculating
marginal effects).
10
The conventional parameterization is: λ = σu /σv and σ 2 = σu2 + σv2 .
11
The coverage of other software may be larger as indicated here. This overview makes no claim to be
complete.
Journal of Statistical Software 7

Cross-sectional data
POOLED
ui ∼ N + (0, σu2 ) N + (µ0 + θ > z i , σu2 ) N + (0, exp(2 δ > z i )) N + (µ0 + θ > z i , exp(2 δ > z i ))
[ALS] [KGMHLBC] [RSCFG] [GLM]

SFAMB∗ X X X X

frontier X X
LIMDEP X X X X
sfa hetmod X
sfa mod X
Stata X X X X
∗ Here, the respective scale parameter is σ , and not σ 2 .
u,i u,i

Table 2: Available model specifications for cross-sectional data.

Panel data
BC92 LSDV TFE WT CFE
ui ∼ N + (µ, σu2 ) uit ∼ N + (0, σu2 ) u∗i ∼ N + (0, σu2 ) uit ∼ N + (0, σu2 )
or u∗i ∼ N + (µ, σu2 )

SFAMB X X X X

frontier X
LIMDEP X X X
Stata X X X X

Table 3: Available model specifications for panel data.

Similarly, SFAMB offers specific member functions that can be extended by the user. To
date, it is the only package including the CFE model.

3. Using SFAMB

3.1. Data organization


Ox supports different data file formats (.xls, .dta, . . . ) that can be directly read into an
SFAMB object. Details can be found in printed documentation (Doornik and Ooms 2007;
Doornik 2009), or online at [Link]
The data have to be organized in columns, i.e., one column contains one variable. The first
row indicates the name. Ox interprets missing values as .NaN (Not a Number). In case of
panel data, the data have to be stacked by individual (i = 1, 2, . . . , N ), and within individuals
by time period (t = 1, 2, . . . , T ). The data set must include two variables that indicate the
individual and the time period (the panel may be unbalanced). An example is given in Table 4.

3.2. Model formulation


The sequence of model formulation is sketched out in Figure 1. A new object is created in each
8 Stochastic Frontier Analysis Using SFAMB for Ox

id time y x1
31 1 298384 24145
31 2 333522 27725
31 3 378768 38115
37 1 62473 3401
37 2 212442 12529
37 3 295142 16734
101 1 150037 10752
101 2 158909 10418
101 3 172744 10671

Table 4: Example data.

Figure 1: Model formulation.

input file (script with ending .ox). This object is an instance of the ‘SFAMB’ class and can use
the functionality of this class. The function Load loads the data file and creates the database.
The type of model is chosen with SetMethod, where the respective arguments are presented
in Table 1. In case of panel data, the panel structure must be specified using Ident. If the
original data are in levels, you can use PrepData for transformation (the classes ‘Database’
and ‘Modelbase’ provide additional functions). Variables are grouped using Select, where
variable names serve as arguments.12
To formulate the frontier function:

ˆ Use Select(Y_VAR, {".",.,.}) to select the dependent variable.


ˆ Use Select(X_VAR, {".",.,.}) to select the independent variable(s).

To include variables that affect the one-sided inefficiency distribution (for the WT model choose
only one of these groups):
12
In case of the panel models, a common constant is not identified. However, you can leave "Constant" in
the selection because it is ignored automatically.
Journal of Statistical Software 9

ˆ Use Select(U_VAR, {".",.,.}) to select variables that shift the individual location
parameter of the distribution, µi .

ˆ Use Select(Z_VAR, {".",.,.}) to select variables that affect the scale parameter of
2 .
the distribution, σu,i or σu,i

SetTranslog can be used to choose the functional form of the frontier function. In case
of the translog specification, we recommend to normalize the variables by the respective
sample means. Estimation of the model is executed via Estimate. For more details, see the
documentation of member functions in Section 3.4.

3.3. Model output


Besides standard results, some model-specific numbers are printed after estimation.

Cross-sectional data
Specific numbers for model POOLED:

gamma was defined Pby Battese and Corra (1977) and is given by γ = σu2 /σ 2 = σu2 /(σv2 + σu2 ),
where σu2 = n1 i exp(2 δ > z i ).

VAR(u)/VAR(total) describes the variance decomposition of the composed error term. The
share of the variance of u in the total variance of the composed error is given by
VAR[u]/VAR[] = [(π − 2)/π]σu2 /[(π − 2)/π]σu2 + σv2 , see Greene (2008, p. 118), where
σu2 = n1 i exp(2 δ > z i ).
P

Test of one-sided err provides a likelihood ratio test statistic for the presence of ineffi-
ciency, i.e., for the null hypothesis H0 : γ = 0. The critical value cannot be taken from
a conventional χ2 -table, see Kodde and Palm (1986).

Panel data
Specific numbers for model LSDV:

SSR
sigma_e describes σv , the square root of the corrected error variance, σv2 = N (T −1)−K . This
estimate is also used to compute the standard errors.

AIC1 (all obs) is given by AIC1 = P−2Pln L + 2 (K + 1); it uses the likelihood function
ṽ 2
ln L = − 2 ln(2π) − 2 ln(σ ) − i2σ2t it with the uncorrected σ 2 = SSR
NT NT 2
NT .

AIC2 uses a different formula for the criterion, AIC2 = ln( SSR K+N
N T ) + (2 N T ); that does not
require the likelihood function and considers the number of individuals in the penalty
term.

Specific number for models TFE, WT and CFE:

lambda given by λ = σu /σv .


10 Stochastic Frontier Analysis Using SFAMB for Ox

3.4. Class member functions


These functions (user interface) together with the data members build up the ‘SFAMB’ class.
Some internal functions are not listed here. The interested user may consult the package’s
header file and source code file. Note that the class derives from the Ox ‘Modelbase’ class,
and hence, all underlying functions may be used.13

AiHat AiHat();
Return value
Returns the calculated individual effects α̂i , N × 1 vector.
Description
– Only panel data – These values can be obtained after estimation, see Section 2 for the
respective formulas.

Different functions to extract data:


Return value
Different vectors or matrices.
Description
These functions can be used with convenient (‘Database’) functions such as Save, Renew
or savemat.

IDandPer(); is an N T ×2 matrix with the ID of the individual (e.g., 1, 1, 1, 2, . . . , N, N )


in the first column and the individual panel length Ti in the second column. – Only
panel data –
GetLLFi(); returns the individual log-likelihood values. It is an N T ×1 vector for both
POOLED model and LSDV model, but an N × 1 vector for the other models.
GetResiduals(); returns the (composed) residual of the respective observation, N T ×1
vector.
GetTldata(); returns the corresponding vectors of Y , X, square, and cross terms of
X. – Use with SetTranslog() –
GetMeans(); returns the means of Y - and X-variables, N × (K + 1) matrix. – Only
panel data –
GetWithins(); returns the within-transformed Y - and X-variables, N T × (K + 1)
matrix. – Only panel data –

DropGroupIf DropGroupIf(const mifr);


No return values
Description
– Only panel data – Allows the exclusion of a whole individual from the sample if the
condition in one (single) period is met. Call after function Ident.

mifr is the condition that specifies the observation to be dropped, see the documenta-
tion of selectifr.

Elast Elast(const sXname);


Return value
13
The ‘Modelbase’ class derives from the ‘Database’ class. Accordingly, the member functions of ‘Database’
are available in SFAMB.
Journal of Statistical Software 11

Returns the calculated output elasticities and the respective t-values for the specified
(single) input variable (for all sample observations).
Description
– Use with SetTranslog() – Only if a translog functional form is used, N T × 2 matrix.
The elasticity is calculated for each observation of the sample: the output elasticity of
input k is given by ∂ ln yi /∂ ln xki . The t-values are calculated using the delta method
(Greene 2012), extracting the required values from the data and covariance matrix of
the parameter estimates.
sXname is the name of the corresponding input variable (string).
GetResults GetResults(const ampar, const ameff, const avfct, const amv);
No return values
Description
– Only POOLED model – This function can be used to store the results of the estimation
procedure for further use. All four arguments should be addresses of variables.
mpar consists of a Npar × 3 matrix, where Npar is the number of parameters in the
model. The first column contains the coefficient estimates, the second column the
standard errors, and the third column the appropriate probabilities.
eff consists of a Nobs × 3 matrix, where Nobs is the number of total observations. The
first column holds the point estimate for technical efficiency, the second and third
columns contain the upper and lower bound of the (1 − α) confidence interval.
fct holds some likelihood function values (OLS and ML), as well as some information
on the correct variance decomposition of the composed error term.
v variance-covariance-matrix.
Ident Ident(const vID, const vPer);
No return values
Description
– Only panel data – Identifies the structure of the panel.
vID is an N T × 1 vector holding the identifier (integer) of the individual.
vPer is an N T × 1 vector holding the identifier (integer) of the time period.
Ineff Ineff();
Return value
Returns point estimates of technical inefficiency, N T × 1 vector.
Description
These predictions are given by the conditional expectation of u, see Section 2 for details.
PrepData PrepData(const mSel, iNorm);
Return value
Returns logarithms of the specified variables, either normalized or not.
Description
This function expects your data in levels and can do two things: It takes logarithms of
your specified variables (if iNorm = 0) or it normalizes your data (by the sample mean
if iNorm = 1), before taking logarithms. The transformed variable should receive a new
name.
12 Stochastic Frontier Analysis Using SFAMB for Ox

mSel is a N T × k matrix holding the respective Y - and X-variables.


iNorm is an integer: 0=no normalization; 1=normalization;

SetConfidenceLevel SetConfidenceLevel(const alpha);


No return values
Description
– Only POOLED model – This function expects a double constant, indicating the error
probability for the construction of confidence bounds (default 0.05).

SetPrintDetails SetPrintDetails(const bool);


No return values
Description
– Not for LSDV model – Prints starting values, warnings and elapsed time if bool 6= 0.

SetRobustStdErr SetRobustStdErr(const bool);


No return values
Description
– Only POOLED model – By default, robust standard errors are used for the cross-sectional
model. Use FALSE to switch off this setting.

SetStart SetStart(const vStart);


No return values
Description
This function expects a column vector of appropriate size (K + 2), containing starting
values for the maximum likelihood iteration. If the function is not called at all, OLS
values are used in conjunction with a grid search for the SFA-specific parameter(s)
σ 2 = σv2 + σu2 . If only (K) technology parameters are given, the grid search is also
applied.

SetTranslog SetTranslog(const iTl);


No return values
Description
This function expects an integer to control the construction of additional regressors
from the selected X-variables.

ˆ A value of zero indicates no further terms to be added, e.g., for a log-linear model,
this corresponds to the Cobb-Douglas form.
ˆ A value of one indicates that all square and cross terms of all independent variables
should be constructed, e.g., for a log-linear model, this corresponds to the full
translog form.
ˆ An integer value of k > 1 indicates that the square and cross terms should be
constructed for only the first k independent variables (useful when the regressor
matrix contains dummy variables).

TE TE();
Return value
Returns point estimates of technical efficiency, N T × 1 vector.
Description
Journal of Statistical Software 13

These predictions are given by the conditional expectation of exp(−u), see Section 2 for
details.

TEint TEint(const dAlpha);


Return value
Returns point estimates of technical efficiency as well as lower and upper bounds.
Description
– Only POOLED model – This function expects a double constant, indicating the error
probability for the construction of confidence bounds (default 0.05); for details see
Horrace and Schmidt (1996), for an application Brümmer (2001). It returns an N T × 3
matrix structured as (point estimate – lower bound – upper bound).

TestGraphicAnalysis TestGraphicAnalysis();
No return values
Description
Only useful in conjunction with the free Ox package GnuDraw (Bos 2014), which is an Ox
interface to gnuplot (gnuplot Team 2015). This function draws two (or three) graphs:
A histogram of the efficiency point estimates and a respective boxplot. It displays
an additional graph in case of the POOLED model, depicting the interval estimates of
technical efficiency at the specified significance.

4. Examples

4.1. Example: [Link]


The first example is a generalized linear mean (GLM) model, where ui ∼ N + (µ, σu,i =
exp(δ0 + δ > z i )). The original data are in levels and are transformed using member function
PrepData to accommodate the translog functional form. The data are a subset of FAO/USDA
data prepared by Fuglie (2012), including Sub-Saharan African countries and South Africa.
General usage and details of the Ox language are explained in Doornik and Ooms (2007). The
sample file [Link] is presented below. At the beginning of each program some header
files are linked in:

#include <oxstd.h>
#include <packages/gnudraw/gnudraw.h>
#import <packages/sfamb/sfamb>

The first file, the so-called standard header, ensures that all standard library functions can
be used. The second line includes the header file of GnuDraw (Bos 2014), an Ox interface to
gnuplot (gnuplot Team 2015). If it is not installed or you do not want to use this package,
delete this line. However, graphics output will then be disabled in the free Ox Console version
(in the commercial OxMetrics version, graphics would still be available). Alternatively, you
can comment it out via //:

// #include <packages/gnudraw/gnudraw.h>
14 Stochastic Frontier Analysis Using SFAMB for Ox

The third line imports the (compiled) source code of the package (you may also use #include
<packages/sfamb/[Link]>). Each Ox program is executed by the main() function that
contains the main loop of Ox.

main(){
...
}

The next steps outlined follow the structure of Figure 1. A new object of class ‘Sfa’ has to
be declared.

decl fob = new Sfa();

The data are loaded with a call to the member function Load. The argument of SetMethod
chooses the respective estimator (see Table 1). Here, the model for cross-sectional data is
specified. The function SetConstant creates a constant (intercept).

[Link]("[Link]");
[Link](POOLED);
[Link]();

Data are either used directly or prepared within the code. Here, the output variable, five
input variables, and a time variable are transformed where logarithms of the mean-normalized
inputs (output) are used.14 New names are assigned to the prepared variables. These names
are used for further instructions. The function Info is useful here because it prints summary
statistics, thereby allowing the transformed data to be checked. The program always stops
at an exit function (that is why it is commented out here).

decl inorm = 1;
[Link]([Link]([Link]("output"), inorm), "lny");
[Link]([Link]([Link]("labour"), inorm), "lnlab");
[Link]([Link]([Link]("land"), inorm), "lnland");
[Link]([Link]([Link]("machinery"), inorm), "lnmac");
[Link]([Link]([Link]("fertilizer"), inorm), "lnfert");
[Link]([Link]("time") - meanc([Link]("time")), "trend");
// [Link](); exit(1);

Selection of variables is carried out by Select where Y_VAR is the selection of the dependent
variable and X_VAR is the selection of the regressors. The function uses the new variable
names defined above (if your data file already includes transformed variables, you would use
the names from within the file). The intercept ("Constant") is available because SetConstant
is called above. Within the Select function there are arrays with three elements (variable
name, start lag, end lag). Here, the lags are set to zero. Note that there must not be a comma
before the closing curly brace of Select.
 
14 xjit
Normalization of inputs (and output): ln x̄j
; normalization of time trend: t − t̄ .
PrepData is a member function of this package (see Section 3.4). Both of the other functions are member
functions of the ‘Database’ class (see Doornik and Ooms 2007).
Journal of Statistical Software 15

[Link](Y_VAR, {"lny", 0, 0});


[Link](X_VAR, {
"Constant", 0, 0,
"lnlab", 0, 0,
"lnland", 0, 0,
"lnmac", 0, 0,
"lnfert", 0, 0,
"trend", 0, 0
});

The above selections define the production frontier. Additional covariates associated with
the underlying inefficiency distribution can be introduced (POOLED and WT model). Covariates
used to model the location parameter of the distribution are selected in the group U_VAR.
Here, only "Constant" is selected, meaning that µ 6= 0, but additional variables could be
included.

[Link](U_VAR, {
"Constant", 0, 0
});

Likewise, covariates used to model the scale of the distribution are selected in the group
2 ).
Z_VAR, i.e., these variables parameterize σu,i (in case of the WT model, it is σu,i

[Link](Z_VAR, {
"Constant", 0, 0,
"lnlab", 0, 0,
"lnland", 0, 0,
"lnmac", 0, 0,
"lnfert", 0, 0
});

The next three lines allow for different adjustments. SetSelSample is required and can be
used to choose a subset of the data (here: full sample). SetPrintSfa ensures that estimation
output is printed. MaxControl is an optional function that allows for documentation and
adjustments of the maximization procedure.

[Link](-1, 1, -1, 1);


[Link](TRUE);
MaxControl(1000, 10, TRUE);

The functional form of the production frontier is chosen by SetTranslog where the options
are either Cobb-Douglas or translog. Here, a translog form is specified. Estimation of the
model is invoked via Estimate.

[Link](1);
[Link]();

A number of results can be obtained after estimation. In the SF context, the efficiency scores
(TE i ) are of particular interest. Here, the point estimates are extracted, together with the
16 Stochastic Frontier Analysis Using SFAMB for Ox

lower and upper bounds of a 95% confidence band. The respective function is TEint. The
function Ineff extracts the point estimates of inefficiency, E(ui |i ). These results are labeled
and appended to the object using Renew. The original database together with the transformed
variables and results is saved to file via Save.

[Link]([Link](0.05), {"TE", "lower", "upper"});


[Link]([Link](), {"jlms"});
[Link]("[Link]");

There is a graphical functionality involving the package GnuDraw that allows for a visual
assessment of the efficiency scores. The function TestGraphicAnalysis displays the graphics
presented in Figure 2. The confidence band can be changed with SetConfidenceLevel, where
an error probability of 0.05 is the default.

[Link](0.05);
[Link]();

The output of this program appears as follows (omitting information on the maximization
procedure). Some general information:

SFAMB package version 1.1, object created on 19-02-2014


Constructing Squares and Cross-Products...done.
-Pooled model-

---- SFAMB ----


The estimation sample is: 1 - 2400
The dependent variable is: lny
The dataset is: [Link]

The transformed variables facilitate the interpretation of the estimated coefficients of the
translog functional form. Thus, the first order coefficients listed below can be interpreted as
output elasticities at the sample mean. These estimates are positive and meet the requirement
of monotonicity – except for the machinery input whose (insignificant) estimate violates the
regularity condition. The parameter associated with trend indicates the estimated average
rate of technical change per year.

Coefficient [Link] robust-SE t-value t-prob


Constant 0.418510 0.01734 0.01604 26.1 0.000
lnlab 0.128543 0.01338 0.01105 11.6 0.000
lnland 0.747665 0.01552 0.01301 57.5 0.000
lnmac -0.0103601 0.009487 0.008851 -1.17 0.242
lnfert 0.0753082 0.006573 0.006243 12.1 0.000
trend 0.0104214 0.0007006 0.0006763 15.4 0.000

Furthermore, the output shows the coefficients of the squared and cross terms that can be
used to calculate the individual output elasticities.

Coefficient [Link] robust-SE t-value t-prob


.5*lnlab^2 -0.0555284 0.02432 0.02387 -2.33 0.020
Journal of Statistical Software 17

.5*lnland^2 -0.170593 0.02547 0.02843 -6.00 0.000


.5*lnmac^2 -0.0152333 0.005151 0.004632 -3.29 0.001
.5*lnfert^2 0.0611977 0.003107 0.003063 20.0 0.000
.5*trend^2 0.000420193 6.481e-005 6.132e-005 6.85 0.000
lnlab*lnland 0.189011 0.02492 0.02557 7.39 0.000
lnlab*lnmac -0.125612 0.008138 0.007344 -17.1 0.000
lnlab*lnfert -0.0294981 0.006109 0.005248 -5.62 0.000
lnlab*trend -0.000443318 0.0007217 0.0006231 -0.711 0.477
lnland*lnmac 0.137893 0.008829 0.008381 16.5 0.000
lnland*lnfert -0.0633867 0.006748 0.006383 -9.93 0.000
lnland*trend -0.000495235 0.0007838 0.0007483 -0.662 0.508
lnmac*lnfert -0.0135746 0.002997 0.002857 -4.75 0.000
lnmac*trend 0.000810357 0.0002892 0.0002743 2.95 0.003
lnfert*trend 0.000898471 0.0002366 0.0002062 4.36 0.000

After the technology parameters, the estimates of σv and σu are listed in the form of their
natural logarithms. The next line refers to the noise component.

Coefficient [Link] robust-SE t-value t-prob


ln{\sigma_v} -2.64681 0.1459 0.1361 -19.4 0.000

Since ln (σu ) is parameterized using covariates, there are several


P4 related estimates. The order
of coefficients corresponds to the specification ln (σu ) = δ0 + l=1 δl × zl where l = 1 (labor),
2 (land), 3 (machinery), 4 (fertilizer); and the z ’s are in logarithms. Higher use of zl is
associated with a lower level of inefficiency (or higher technical efficiency) if the estimated
parameter has a negative sign.

Coefficient [Link] robust-SE t-value t-prob


Constant -1.04439 0.04104 0.04790 -21.8 0.000
lnlab 0.232702 0.04300 0.05044 4.61 0.000
lnland -0.146200 0.04176 0.05050 -2.90 0.004
lnmac -0.00976576 0.01491 0.01671 -0.584 0.559
lnfert -0.0149142 0.01372 0.01647 -0.905 0.365

Here, the inefficiency distribution is supposed to have a non-zero mean, ui ∼ N + (µ = µ0 , σu,i


2 ),

i.e., the location parameter is a constant (µ0 ) common to all individuals. Additional covariates
could be introduced. The omission of U_VAR in the model specification leads to µ = 0, and
hence, results in the normal half-normal model. Note that this estimate (here, the third
Constant) is always the last Constant term in the list (if a truncated-normal is specified).

Coefficient [Link] robust-SE t-value t-prob


Constant 0.454144 0.02926 0.03249 14.0 0.000

Some additional information is provided, for details see Section 3.3.

log-likelihood -458.928611
no. of observations 2400 no. of parameters 28
AIC.T 973.857222 AIC 0.405773842
18 Stochastic Frontier Analysis Using SFAMB for Ox

2.5 Technical efficiency


Technical efficiency
1.2
2 1.1
1
1.5 0.9
0.8
0.7
1 0.6
0.5
0.5 0.4
0.3
0 0.2
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3 TE
0.2 Lower bound
Upper bound
0.1
0 500 1000 1500 2000 2500

Figure 2: TE scores of the POOLED model.

mean(lny) -1.14273 var(lny) 2.98932


\gamma: 0.9618 VAR(u)/VAR(total) 0.9016
Test of one-sided err 172.93 mixed Chi^2 !!
with p-value 6.246e-035

Finally, the graph created by the function TestGraphicAnalysis is depicted in Figure 2.

4.2. Example: [Link]


In this example, the CFE model of Chen et al. (2014) is specified using, again, the data set
[Link] and a translog functional form. You can immediately switch to the LSDV or
TFE model, respectively, by changing the argument of SetMethod. A large part of this example
corresponds to the code of the previous subsection. However, as panel data are involved here
some things are different.

#include <oxstd.h>
#include <packages/gnudraw/gnudraw.h>
#import <packages/sfamb/sfamb>

main(){
decl fob = new Sfa();
[Link]("[Link]");
[Link](CFE);
[Link]();
Journal of Statistical Software 19

CFE is the estimator selected. Here, the function SetConstant does not create a constant
because it is not required. However, this line can be kept for convenience. The function
Ident identifies the panel structure of the data. The required information includes the variable
names of the individuals ("ID") and the period ("time").

[Link]([Link]("ID"), [Link]("time"));

Data transformation and model specification correspond to the previous example. Note that
neither U_VAR nor Z_VAR are available here.

decl inorm = 1;
[Link]([Link]([Link]("output"), inorm), "lny");
[Link]([Link]([Link]("labour"), inorm), "lnlab");
[Link]([Link]([Link]("land"), inorm), "lnland");
[Link]([Link]([Link]("machinery"), inorm), "lnmac");
[Link]([Link]([Link]("fertilizer"), inorm), "lnfert");
[Link]([Link]("time") - meanc([Link]("time")), "trend");

[Link](Y_VAR, {"lny", 0, 0});

[Link](X_VAR, {
"Constant", 0, 0,
"lnlab", 0, 0,
"lnland", 0, 0,
"lnmac", 0, 0,
"lnfert", 0, 0,
"trend", 0, 0});

[Link](-1, 1, -1, 1);


[Link](TRUE);
MaxControl(1000, 10, TRUE);
[Link](1);
[Link]();

For this model, there is no calculation of the confidence bounds involved. The efficiency scores
can be extracted as point estimates using function TE.

[Link]([Link](), {"TE"});
[Link]([Link](), {"jlms"});
[Link]();

delete fob;
}

The output of this program appears as follows. Additional information on the panel structure
is printed.

SFAMB package version 1.1, object created on 19-02-2014


#groups: #periods(max): avg.T-i:
20 Stochastic Frontier Analysis Using SFAMB for Ox

48.000 50.000 50.000


Constructing Squares and Cross-Products...done.
-CFE model-

---- SFAMB ----


The estimation sample is: 1 - 2400
The dependent variable is: lny
The dataset is: [Link]

A common intercept is not identified, and hence, there is no Constant.

Coefficient [Link] t-value t-prob


lnlab 0.00883652 0.03048 0.290 0.772
lnland 0.677192 0.02304 29.4 0.000
lnmac 0.106177 0.009083 11.7 0.000
lnfert 0.0837343 0.007086 11.8 0.000
trend 0.00920993 0.0006800 13.5 0.000
.5*lnlab^2 0.138565 0.02083 6.65 0.000
.5*lnland^2 0.177254 0.02047 8.66 0.000
.5*lnmac^2 0.0121082 0.003350 3.61 0.000
.5*lnfert^2 0.0245012 0.002852 8.59 0.000
.5*trend^2 0.000407978 3.744e-005 10.9 0.000
lnlab*lnland -0.138300 0.02024 -6.83 0.000
lnlab*lnmac -0.0247345 0.007611 -3.25 0.001
lnlab*lnfert 0.00218990 0.005678 0.386 0.700
lnlab*trend -0.000134440 0.0005109 -0.263 0.792
lnland*lnmac 0.0243333 0.008190 2.97 0.003
lnland*lnfert -0.0319551 0.006178 -5.17 0.000
lnland*trend 0.000212194 0.0004843 0.438 0.661
lnmac*lnfert 0.00379000 0.001959 1.93 0.053
lnmac*trend 0.000346355 0.0001844 1.88 0.060
lnfert*trend -0.000171510 0.0001308 -1.31 0.190

This model is restricted to the normal half-normal case. Here, the estimates of (the natural
logarithms of) σv2 and σu2 are given.

ln{\sigma_v^2} -4.94563 0.1464 -33.8 0.000


ln{\sigma_u^2} -3.44008 0.1125 -30.6 0.000

log-likelihood 1476.81739
no. of observations 2400 no. of parameters 22
AIC.T -2909.63478 AIC -1.21234782
mean(lny) 7.55183e-018 var(lny) 0.127523
lambda 2.123

The function TestGraphicAnalysis is used to create the graph depicted in Figure 3.


Journal of Statistical Software 21

7
Technical efficiency
6

0
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Technical efficiency
1.1

0.9

0.8

0.7

0.6

0.5

Figure 3: TE scores of the CFE model.

4.3. Example: Member functions SetTranslog and Elast


The member function SetTranslog allows for convenient specification of a translog functional
form. In the following excerpt, we refer to the current instance of the class as fob. Suppose
your selection of regressors looks like this:

[Link](X_VAR, {
"Constant", 0, 0,
"lnx1", 0, 0,
"lnx2", 0, 0,
"lnx3", 0, 0,
"trend", 0, 0});

The default specification is Cobb-Douglas, i.e., SetTranslog(0), changing the argument to


1 invokes construction of the respective square and cross terms of X_VAR. In general notation:
K K K
X 1 XX
ln yi = β0 + βj ln xji + βjl ln xji ln xli .
2
j=1 j=1 l=1

If your selection includes dummies, the variables should be ordered like this:

[Link](X_VAR, {
"Constant", 0, 0,
"lnx1", 0, 0,
"lnx2", 0, 0,
"lnx3", 0, 0,
"trend", 0, 0,
22 Stochastic Frontier Analysis Using SFAMB for Ox

"dummy1", 0, 0,
"dummy2", 0, 0});

Specification of a translog form is then possible by means of SetTranslog(4) because only


the first four regressors are used ("Constant" is ignored automatically).
After estimation, the member function Elast can be used to calculate the output elasticity
(ji ) of each input for each observation:
K
X
ji = βj + βjl ln xli .
l=1

The following example illustrates one possible way the function may be used. Here, results
are plotted as histograms (see Figure 4). Note that indexing starts at 0 in Ox (Elast returns
an N T × 2 matrix but only the first column is considered here). The first three arguments
of DrawDensity are the most important here: area (panel) index, variable, label. See the
documentations of Ox or GnuDraw for a full description.

decl vEps1 = [Link]("lnx1");


decl vEps2 = [Link]("lnx2");
decl vEps3 = [Link]("lnx3");
decl vEpst = [Link]("trend");

DrawDensity(0, vEps1[][0]', {"eps1"}, 1, 1, 0, 0, 0, 0, 1, 0, 1);


DrawDensity(1, vEps2[][0]', {"eps2"}, 1, 1, 0, 0, 0, 0, 1, 0, 1);
DrawDensity(2, vEps3[][0]', {"eps3"}, 1, 1, 0, 0, 0, 0, 1, 0, 1);
DrawDensity(3, vEpst[][0]', {"epst"}, 1, 1, 0, 0, 0, 0, 1, 0, 1);
ShowDrawWindow();

5. Future developments
The basic version of SFAMB dates back to the mid 1990s where the capability was restricted
to cross-sectional data. As the package now allows for panel data and the literature on SF
methods is considerably broader and still growing, there is scope for potential extensions.
Some related possibilities are mentioned here.
In the model framework of Chen et al. (2014) there are two ways to calculate the individual
effects. As an alternative to Equation 5, the individual “between estimator of αi ” can be
used. It could be implemented as an optional function, involving a second maximization. Its
availability would allow us to compare results and check the consequences for TE scores.
While the current focus of panel methods is on fixed effects estimation, a more comprehen-
sive supplement might involve random effects models. The most recent SF approach using
the CSN distribution is presented by Colombi, Kumbhakar, Martini, and Vittadini (2014).
Its specification is similar to Equation 3, but the time-invariant part is further decomposed
into two residuals (persistent inefficiency and time-invariant unobserved heterogeneity). Fil-
ippini and Greene (2016) introduce computational simplifications and label the model as the
“Generalized True Random Effects SF model ”.
Journal of Statistical Software 23

6 6
eps1 eps2
5 5

4 4

3 3

2 2

1 1

0 0
-0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2

25 50
eps3 epst
45
20 40
35
15 30
25
10 20
15
5 10
5
0 0
-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 -0.02 -0.01 0 0.01 0.02 0.03 0.04 0.05

Figure 4: Histograms of calculated elasticities (by observation).

Acknowledgments
All code is pure Ox code. However, code of the WT model is partially adapted from Stata code
by Hung-Jen Wang. We thank Hung-Jen Wang for providing data to check our CFE code.
Additional thanks are due for the referees for offering helpful comments that improved the
quality of the manuscript.

References

Aigner D, Lovell CAK, Schmidt P (1977). “Formulation and Estimation of Stochastic Frontier
Production Function Models.” Journal of Applied Econometrics, 6(1), 21–37. doi:10.1016/
0304-4076(77)90052-5.
Alvarez A, Amsler C, Orea L, Schmidt P (2006). “Interpreting and Testing the Scaling
Property in Models Where Inefficiency Depends on Firm Characteristics.” Journal of Pro-
ductivity Analysis, 25(3), 201–212. doi:10.1007/s11123-006-7639-3.
Battese GE, Coelli TJ (1988). “Prediction of Firm-Level Technical Efficiencies with a Gen-
eralized Frontier Production Function and Panel Data.” Journal of Econometrics, 38(3),
387–399. doi:10.1016/0304-4076(88)90053-x.
Battese GE, Coelli TJ (1992). “Frontier Production Functions, Technical Efficiency and Panel
Data: With Application to Paddy Farmers in India.” Journal of Productivity Analysis, 3(1–
2), 153–169. doi:10.1007/bf00158774.
24 Stochastic Frontier Analysis Using SFAMB for Ox

Battese GE, Coelli TJ (1995). “A Model for Technical Inefficiency Effects in a Stochastic
Frontier Production Function for Panel Data.” Empirical Economics, 20(2), 325–332. doi:
10.1007/bf01205442.

Battese GE, Corra GS (1977). “Estimation of a Production Function Model: With Application
to the Pastoral Zone of Eastern Australia.” Australian Journal of Agricultural Economics,
21(3), 169–179. doi:10.1111/j.1467-8489.1977.tb00204.x.

Belotti F, Daidone S, Ilardi G, Atella V (2013). “Stochastic Frontier Analysis Using Stata.”
Stata Journal, 13(4), 719–758. URL [Link]
article=st0315.

Bogetoft P, Otto L (2010). Benchmarking with DEA, SFA, and R, volume 157 of International
Series in Operations Research & Management Science. Springer-Verlag.

Bos CS (2014). GnuDraw – An Ox Package for Creating gnuplot Graphics. URL http:
//[Link]/[Link]/software/[Link].

Brorsen BW, Kim T (2013). “Data Aggregation in Stochastic Frontier Models: The Closed
Skew Normal Distribution.” Journal of Productivity Analysis, 39(1), 27–34. doi:10.1007/
s11123-012-0274-2.

Brümmer B (2001). “Estimating Confidence Intervals for Technical Efficiency: The Case of
Private Farms in Slovenia.” European Review of Agricultural Economics, 28(3), 285–306.
doi:10.1093/erae/28.3.285.

Caudill SB, Ford JM, Gropper DM (1995). “Frontier Estimation and Firm-Specific Inefficiency
Measures in the Presence of Heteroscedasticity.” Journal of Business & Economic Statistics,
13(1), 105–111. doi:10.2307/1392525.

Chen YY, Schmidt P, Wang HJ (2014). “Consistent Estimation of the Fixed Effects Stochastic
Frontier Model.” Journal of Econometrics, 181(2), 65–76. doi:10.1016/[Link].2013.
05.009.

Coelli TJ (1996). A Guide to FRONTIER 4.1: A Computer Program for Stochastic Frontier
Production and Cost Function Estimation. CEPA Working Papers, University of New
England, URL [Link]

Coelli TJ, Henningsen A (2017). frontier: Stochastic Frontier Analysis. R package version
1.1-2, URL [Link]

Coelli TJ, Rao PDS, O’Donnell CJ, Battese GE (2005). An Introduction to Efficiency and
Productivity Analysis. Springer-Verlag. doi:10.1007/978-1-4615-5493-6.

Colombi R, Kumbhakar SC, Martini G, Vittadini G (2014). “Closed-Skew Normality in


Stochastic Frontiers with Individual Effects and Long/Short-Run Efficiency.” Journal of
Productivity Analysis, 42(2), 123–136. doi:10.1007/s11123-014-0386-y.

Cottrell A, Lucchetti R (2014). gretl User’s Guide – Gnu Regression, Econometrics and
Time-Series Library. URL [Link]
Journal of Statistical Software 25

Doornik JA (2009). An Object-Oriented Matrix Language Ox 6. Timberlake Consultants


Press, London.

Doornik JA, Ooms M (2007). Introduction to Ox: An Object-Oriented Matrix Language. Tim-
berlake Consultants Press, London. Available at [Link]
pdf.

Econometric Software Inc (2014). LIMDEP, Version 10.0. ESI, New York. URL http:
//[Link]/.

Filippini M, Greene WH (2016). “Persistent and Transient Productive Inefficiency: A Max-


imum Simulated Likelihood Approach.” Journal of Productivity Analysis, 45(2), 187–196.
doi:10.1007/s11123-015-0446-y.

Fuglie KO (2012). “Productivity Growth and Technology Capital in the Global Agricultural
Economy.” In KO Fuglie, SL Wang, VE Ball (eds.), Productivity Growth in Agriculture:
An International Perspective. CABI.

gnuplot Team (2015). gnuplot 5.0 – An Interactive Plotting Program. URL http:
//[Link]/projects/gnuplot.

Greene WH (2005). “Reconsidering Heterogeneity in Panel Data Estimators of the Stochastic


Frontier Model.” Journal of Econometrics, 126(2), 269–303. doi:10.1016/[Link].
2004.05.003.

Greene WH (2008). “The Econometric Approach to Efficiency Analysis.” In HO Fried,


CAK Lovell, SS Schmidt (eds.), The Measurement of Productive Efficiency and Produc-
tivity Growth. Oxford University Press.

Greene WH (2012). Econometric Analysis. 7th edition. Pearson International Edition.

Horrace WC, Schmidt P (1996). “Confidence Statements for Efficiency Estimates from
Stochastic Frontier Models.” Journal of Productivity Analysis, 7(2–3), 257–282. doi:
10.1007/bf00157044.

Huang CJ, Liu JT (1994). “Estimation of a Non-Neutral Stochastic Frontier Production


Function.” Journal of Productivity Analysis, 5(2), 171–180. doi:10.1007/bf01073853.

Hughes G (2008). sfa hetmod and sfa mod. User-Contributed Function Pack-
ages for gretl. URL [Link]
available_user-contributed_function_packages.

Jondrow J, Lovell CAK, Materov IS, Schmidt P (1982). “On the Estimation of Technical In-
efficiency in the Stochastic Frontier Production Function Model.” Journal of Econometrics,
19(2–3), 233–238. doi:10.1016/0304-4076(82)90004-5.

Kodde DA, Palm FC (1986). “Wald Criteria for Jointly Testing Equality and Inequality
Restrictions.” Econometrica, 54(5), 1243–1248. doi:10.2307/1912331.

Kotz S, Balakrishnan N, Johnson NL (2000). Continuous Multivariate Distributions: Models


and Applications, volume 1. John Wiley & Sons. doi:10.1002/0471722065.
26 Stochastic Frontier Analysis Using SFAMB for Ox

Kumbhakar SC, Gosh S, McGuckin JT (1991). “A Generalized Production Frontier Approach


for Estimating Determinants of Inefficiency in U.S. Dairy Farms.” Journal of Business &
Economic Statistics, 9(3), 279–286. doi:10.2307/1391292.

Kumbhakar SC, Lovell CAK (2000). Stochastic Frontier Analysis. Cambridge University
Press, Cambridge.

Lai H, Huang CJ (2010). “Likelihood Ratio Tests for Model Selection of Stochas-
tic Frontier Models.” Journal of Productivity Analysis, 34(1), 3–13. doi:10.1007/
s11123-009-0160-8.

Meeusen W, Van den Broeck J (1977). “Efficiency Estimation from Cobb-Douglas Production
Functions with Composed Error.” International Economic Review, 18(2), 435–444. doi:
10.2307/2525757.

Pavlyuk D (2016). spfrontier: Spatial Stochastic Frontier Models Estimation. R package


version 0.2.3, URL [Link]

Piessens R, de Doncker-Kapenga E, Überhuber CW, Kahaner DK (1983). QUADPACK,


A Subroutine Package for Automatic Integration. Springer-Verlag.

R Core Team (2017). R: A Language and Environment for Statistical Computing. R Founda-
tion for Statistical Computing, Vienna, Austria. URL [Link]

Reifschneider D, Stevenson R (1991). “Systematic Departures from the Frontier: A Framework


for the Analysis of Firm Inefficiency.” International Economic Review, 32(3), 715–723.
doi:10.2307/2527115.

Schmidt P, Sickles RC (1984). “Production Frontiers and Panel Data.” Journal of Business
& Economic Statistics, 2(4), 367–374. doi:10.2307/1391278.

StataCorp LP (2015). Stata, Version 14. College Station. URL [Link]

Wang HJ (2012). Manual of Hung-Jen Wang’s Stata Codes. URL [Link]


[Link]/~wangh.

Wang HJ, Ho CW (2010). “Estimating Fixed-Effect Panel Stochastic Frontier Models by


Model Transformation.” Journal of Econometrics, 157(2), 286–296. doi:10.1016/j.
jeconom.2009.12.006.

Wang HJ, Schmidt P (2002). “One-Step and Two-Step Estimation of the Effects of Exogenous
Variables on Technical Efficiency Levels.” Journal of Productivity Analysis, 18(2), 129–144.
doi:10.1023/a:1016565719882.

White H (1980). “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct


Test for Heteroskedasticity.” Econometrica, 48(4), 817–838. doi:10.2307/1912934.
Journal of Statistical Software 27

A. Technical appendix

A.1. Starting values


OLS estimates are used as starting values for the technology parameters β, and a grid search
is applied to find an appropriate value for σ 2 = σv2 + σu2 . Battese and Corra (1977, p. 173)
point out that the OLS estimates of β are unbiased, except for the common constant β0 .
They show that β0 can be corrected as:
r
2
β̂0 = β̂0OLS + σ̂u .
π
Furthermore, they define γ = σu2 /(σv2 + σu2 ), and suggest to try different initial values. The
grid search evaluates the likelihood function over a range of values (γ [0.05, 0.98]) and chooses
the parameters associated with the highest likelihood value. Within this procedure, σv2 and
σu2 are parameterized as:15
 2
σv + σu2 σu2

σv2 = σ 2 × (1 − γ) = σv2 + σu2 ×

− ,
σv2 + σu2 σv2 + σu2
σ2
σu2 = σ 2 × γ = σv2 + σu2 × 2 u 2 .

σv + σu

The search for values for σv2 and σu2 involves partitioning the variance of the composed error
term. Aigner et al. (1977) show that VAR() = σv2 + ((π − 2)/π) σu2 , which can be expressed
as:
  
2 π−2 2 2 π−2
VAR() = σ (1 − γ) + σ γ =σ 1−γ 1− .
π π
Using the variance of the OLS residuals (m2 ) as an estimate for VAR():
 
2 2 m2
m2 = σ 1 − γ ←→ σ 2 = .
π 1 − γ π2

The grid search is called within member function DoEstimation, and runs over γ values with
step length 0.01; it passes the starting values for β and σ 2 back to DoEstimation.

A.2. Estimation and standard errors


After the starting values are obtained from the OLS regression and grid search, the log-
likelihood function is maximized using the BFGS (Broyden-Fletcher-Goldfarb-Shanno) algo-
rithm. The respective Ox routine is named MaxBFGS and documented in Doornik (2009) and
online at [Link] The log-likelihood function can be found as func-
tion fSfa in the source code.16 Analytical first derivatives are used in the case of the POOLED
15
In case of the pooled model, the parameters are σv and σu . In case of the panel models, there is no constant
(β0 ) to be corrected. If the model includes additional covariates (z-variables, related to inefficiency), a vector
of zeros is used for the respective parameters. Zeros are also used as starting values for the individual effects
in the TFE model.
16
Member functions of the source code are structured by case(s) where default = POOLED, 1 = WT, 2 =
LSDV, 3 = CFE, 4 = TFE.
28 Stochastic Frontier Analysis Using SFAMB for Ox

model, while the remaining models employ numerical first derivatives based on the Ox routine
Num1Derivative (finite difference approximation).
The CFE model’s “within-likelihood function” includes a T -dimensional cdf. In their Ap-
pendix C, Chen et al. (2014) show how the T -dimensional integral is reduced to a one-
dimensional integral, referring to Kotz, Balakrishnan, and Johnson (2000) (fKotzetal is the
respective function). The numerical quadrature is executed by function QNG which is part
of the Fortran package QuadPack (Piessens, de Doncker-Kapenga, Überhuber, and Kahaner
1983) and linked in via #include <quadpack.h>.
Standard errors are obtained from m_mCovar (covariance matrix). This data member is pro-
duced by function Num2Derivative (called within member function Covar) that uses a finite
difference approximation (Doornik 2009). Estimation output of the cross-sectional model re-
turns robust standard errors (by default) which are obtained by the method of White (1980).

A.3. Log-likelihood functions


In this section, φ and Φ denote the pdf and cdf of a standard normal distribution, respectively;
λ = σu /σv and σ 2 = σu2 + σv2 .
Kumbhakar and Lovell (2000) present the log-likelihood functions of the normal-truncated
iid
normal model (ui ∼ N + (µ, σu2 )) and the normal-half normal model (µ = 0) for cross-sectional
data. The log-likelihood function of the POOLED model for one observation, with i = yi −β > xi ,
is given by:

√ !
(i + µ)2 1 + λ2
 
1 1 µ λi
ln Li = ln √ − ln σ 2 − + ln Φ − − ln Φ µ . (6)
2π 2 2σ 2 σλ σ σλ

The log-likelihood function of the TFE model (Greene 2005) corresponds to Equation 6, but
with µ = 0 and it = yit − β > xit − αi in the place of i .
The log-likelihood function of the WT model (Wang and Ho 2010) for one individual, with
˜it = ỹit − β > x̃it , and ˜i = (˜
i1 , . . . , ˜iT )> :

1 1 1
ln Li = − (T − 1) ln(2π) − (T − 1) ln(σv2 ) − ˜> −
i Π ˜i
2 2  2
1 µ2∗∗ µ2
      
µ∗∗ µ
+ 2
− 2 + ln σ∗∗ Φ − ln σu Φ ,
2 σ∗∗ σu σ∗∗ σu

where

µ/σu2 − ˜> −
i Π h̃i
µ∗∗ = >
,
h̃i Π− h̃i + 1/σu2
2 1
σ∗∗ = >
.
h̃i Π− h̃i + 1/σu2

Log-likelihood function of the CFE model (Chen et al. 2014), with ˜it = ỹit − β > x̃it , ˜i =
Journal of Statistical Software 29

i1 , . . . , ˜iT )> , and ˜∗i = (˜


(˜ i1 , . . . , ˜i,T −1 )> :

X 1

ln LW = constant + ln ∗i ;
φT −1 (˜ 2
0, σ (IT −1 − ET −1 ))
T
i
X λ2

λ
+ ln ΦT (− ˜i ; 0T , IT + ET ) ,
σ T
i

where In is an n × n identity matrix and En is an n × n matrix of ones; or alternatively:

X  (T − 1)  1
 
1

∗> − ∗
ln LW = −N T ln Φ(0) + − ln (2π) − ln |Σ| − ˜i Σ ˜i
2 2 2
i
T
"Z #
∞  
X Y λ λ
+ ln φ(u0 ) Φ − ˜it − √ U0 du0 .
−∞ σ T
i t=1

Affiliation:
Jonathan Holtkamp
Department of Agricultural Economics and Rural Development
University of Goettingen
D-37073 Goettingen, Germany
E-mail: jonathan-holtkamp@[Link]

Bernhard Brümmer
Department of Agricultural Economics and Rural Development
& Centre of Biodiversity and Sustainable Land Use
University of Goettingen
D-37073 Goettingen, Germany
E-mail: bbruemm@[Link]
URL: [Link]

Journal of Statistical Software [Link]


published by the Foundation for Open Access Statistics [Link]
November 2017, Volume 81, Issue 6 Submitted: 2014-05-04
doi:10.18637/jss.v081.i06 Accepted: 2016-08-07

You might also like