Stochastic Frontier Analysis with SFAMB
Stochastic Frontier Analysis with SFAMB
Abstract
SFAMB is a flexible econometric tool designed for the estimation of stochastic frontier
models. Ox is a matrix language used in different modules, with a console version freely
available to academic users. This article provides a brief introduction to the field of
stochastic frontier analysis, with examples of code (input and output) as well as a technical
documentation of member functions. SFAMB provides frontier models for both cross-
sectional data and panel data (focusing on fixed effects models). Member functions can
be extended depending on the needs of the user.
1. Introduction
SFAMB (stochastic frontier analysis using ‘Modelbase’) is a package for estimating stochastic
frontier production (as well as cost, distance, and profit) functions. It provides specifications
for both cross-sectional data and panel data. SFAMB is written in Ox (Doornik 2009) and is
operated by writing small programs (scripts).
The console versions of Ox are free for research and educational purposes. Ox Console uses
OxEdit to run programs, while the commercial version of the programming language, Ox
Professional, uses the graphical user environment OxMetrics instead.
The structure of the paper is as follows. In the next section, we briefly introduce the econo-
metric foundations and related literature. The first part focuses on the theory of models for
cross-sectional and panel data. A subsection presents the specifications that are available
in SFAMB and mentions some related other software. Section 3 explains the usage of the
package, which includes data structure, model formulation, and model output. Furthermore,
it provides a detailed list of class member functions. For illustration, we present practical
examples using real world data (Section 4) which are distributed with the package. We men-
tion some possible extensions of SFAMB in Section 5. Finally, a technical appendix provides
a brief overview of some underlying workings (Appendix A).
2 Stochastic Frontier Analysis Using SFAMB for Ox
and the scale is modeled using an exponential form, it becomes the RSCFG4 model, where
ui ∼ N + (0, exp(2(δ0 + δ > z i ))). The combination of both models leads to the following form,
ui ∼ N + (µi = µ0 + θ > z i , σu,i
2 = exp(2(δ + δ > z ))), that according to Lai and Huang (2010)
0 i
could be labeled as a generalized linear mean (GLM) model.5
Jondrow, Lovell, Materov, and Schmidt (1982) present a point estimator of inefficiency, given
by E(ui |i ). Battese and Coelli (1988) show that if the dependent variable is in logarithms,
a more appropriate estimator is the point estimator of technical efficiency, given by TE i =
E(exp(−ui )|i ).
iid
This formulation includes the time dimension, a conventional two-sided error vit ∼ N (0, σv2 ),
and an individual intercept αi . The model’s virtue originates from the identification of these N
time-invariant parameters. These “fixed” effects absorb unmeasured time-invariant individual
attributes, and hence, the model accounts for unobserved heterogeneity. One commonly used
name of this approach is “least squares with dummy variables” (LSDV).
Instead of estimating the dummy variables, the conventional strategy is to apply within-
transformation to the dependent and independent variable(s), e.g., for an independent vari-
able:
The observation on xit is transformed by subtracting the respective individual mean x̄i . The
resulting variable x̃it is a deviation from the mean. This procedure eliminates the individual
effects because α̃i = αi − αi = 0. The transformed variables are used for model estimation.
After estimation, the individual effects are calculated as:
>
α̂i = ȳi − β̂ x̄i .
Schmidt and Sickles (1984) use the model in a frontier context. They interpret the individual
with the highest intercept as 100% technically efficient and determine the inefficiency of the
remaining individuals as ui = max(α̂) − α̂i . Accordingly, efficiency scores are time-invariant
and are given by TE i = E(exp(−ui )).
4
Reifschneider and Stevenson (1991); Caudill, Ford, and Gropper (1995).
5
Actually, they label models that include the KGMHLBC form as generalized exponential mean (GEM)
models. The reason is that they refer to the exponential form of the model that has been proposed by Alvarez
et al. (2006). Following the categorization of Lai and Huang (2010), the model implemented in SFAMB is a
GLM model. Furthermore, note that in SFAMB the respective scale parameter in the POOLED model is (the
2
natural logarithm of) σu,i , and not σu,i . While σu2 is often used, the original formulation of CFG involved σu .
4 Stochastic Frontier Analysis Using SFAMB for Ox
This model is proposed by Greene (2005, p. 277) who labels it as the “true fixed effects
[TFE] stochastic frontier model”. Estimation of this model requires the inclusion of all N
dummy variables, i.e., the number of intercepts to be estimated corresponds to the number
of individuals. With fixed T , the estimate of the error variance is inconsistent (incidental
parameters problem). Furthermore, it is likely to be biased as pointed out by Chen, Schmidt,
and Wang (2014). This is a relevant issue since this estimate is required for the assessment
of inefficiency.
Wang and Ho (2010) present the conditional expectation of uit in their Equation 30; efficiency
estimates are given by TE it = E(exp(−uit |˜
it )). The individual effects are calculated as:
>
α̂i = ȳi − β̂ x̄i + ūi .
2
f () = φ Φ −λ .
σ σ σ
While the skew normal distribution is a generalization of the normal distribution, it can be
generalized itself by using the CSN distribution. The composed error has a CSN distribution,
which is expressed by:
λ
it ∼ CSN 1,1 (0, σ 2 , − , 0, 1).
σ
The density of a CSN p,q -distribution includes a p-dimensional pdf and a q-dimensional cdf of a
normal distribution. The five associated parameters describe location, scale, and skewness, as
well as the mean vector and covariance matrix in the cdf. With panel data, the T -dimensional
vector i = (i1 , . . . , iT )> is distributed as:
λ
i ∼ CSN T,T (0T , σ 2 IT , − IT , 0T , IT ),
σ
where I is the identity matrix. Chen et al. (2014, p. 67) make use of the fact that the CSN
distribution is “closed under linear combination”, and partition the vector i into its mean ¯i
and its first T −1 deviations ˜∗i . The model’s likelihood function is derived from ˜∗i . Therefore,
it is free of incidental parameters, and the parameters to be estimated are β, λ, and σ 2 – as
8
With regards to the degrees of freedom, the correction accounts for the N individuals: df = N T −N −K =
N (T − 1) − K.
9
Chen et al. (2014) explain how the SF model is related to the CSN distribution and present the required
properties of CSN distributed random variables. Another plain introduction to the CSN distribution in the
SF context is provided by Brorsen and Kim (2013).
6 Stochastic Frontier Analysis Using SFAMB for Ox
SetMethod Example
SFA – cross section POOLED [Link]
Least squares with dummies LSDV [Link]
SFA – with dummies TFE [Link]
SFA – within-transformation WT [Link]
SFA – consistent fixed effects CFE [Link]
Table 1: Available estimators, with name of estimation method and sample files.
in the basic SF model.10 ¯i and ˜∗i are not independent, unless λ = 0. If λ = 0 the model
becomes the fixed effects model with normal error (LSDV model).
In order to assess the inefficiency, the composed error term must be recovered:
>
it = yit − ŷit = yit − β̂ xit − α̂i .
There are two ways to calculate α̂i . The one used here is labeled as the mean-adjusted
estimate by Chen et al. (2014):
r
M > 2
α̂i = ȳi − β̂ x̄i + σ̂u . (5)
π
2.3. Software
SFAMB provides frontier models of ALS (with extensions), Schmidt and Sickles (1984),
Greene (2005), Wang and Ho (2010) as well as Chen et al. (2014). The available estima-
tors are listed in Table 1.
There are several other software packages that incorporate (some of) these estimators. Ta-
bles 2 and 3 provide an overview of the different specifications that are available in SFAMB
and other software.11
LIMDEP (Econometric Software Inc. 2014) and Stata (StataCorp LP 2015) are comprehensive
commercial packages that implement frontier techniques in their standard distributions. In
case of Stata, there are additional third-party add-ons such as those of Wang (2012) or Belotti,
Daidone, Ilardi, and Atella (2013).
Hughes (2008) has written two free packages, sfa hetmod and sfa mod, that can be used with
gretl (Cottrell and Lucchetti 2014) and include variations of the standard model.
The recent package spfrontier (Pavlyuk 2016) is concerned with (the specific family of) spatial
SF models. It is implemented in R (R Core Team 2017) and allows for various specifications.
The first program to implement frontier techniques was Frontier (Coelli 1996). Later, the
original code was transferred to R in the package frontier by Coelli and Henningsen (2017).
This package provides the standard model (ALS), the extension of KGMHLBC as well as the
model of BC92. Its functionality is augmented by some additional options (e.g., for calculating
marginal effects).
10
The conventional parameterization is: λ = σu /σv and σ 2 = σu2 + σv2 .
11
The coverage of other software may be larger as indicated here. This overview makes no claim to be
complete.
Journal of Statistical Software 7
Cross-sectional data
POOLED
ui ∼ N + (0, σu2 ) N + (µ0 + θ > z i , σu2 ) N + (0, exp(2 δ > z i )) N + (µ0 + θ > z i , exp(2 δ > z i ))
[ALS] [KGMHLBC] [RSCFG] [GLM]
SFAMB∗ X X X X
frontier X X
LIMDEP X X X X
sfa hetmod X
sfa mod X
Stata X X X X
∗ Here, the respective scale parameter is σ , and not σ 2 .
u,i u,i
Panel data
BC92 LSDV TFE WT CFE
ui ∼ N + (µ, σu2 ) uit ∼ N + (0, σu2 ) u∗i ∼ N + (0, σu2 ) uit ∼ N + (0, σu2 )
or u∗i ∼ N + (µ, σu2 )
SFAMB X X X X
frontier X
LIMDEP X X X
Stata X X X X
Similarly, SFAMB offers specific member functions that can be extended by the user. To
date, it is the only package including the CFE model.
3. Using SFAMB
id time y x1
31 1 298384 24145
31 2 333522 27725
31 3 378768 38115
37 1 62473 3401
37 2 212442 12529
37 3 295142 16734
101 1 150037 10752
101 2 158909 10418
101 3 172744 10671
input file (script with ending .ox). This object is an instance of the ‘SFAMB’ class and can use
the functionality of this class. The function Load loads the data file and creates the database.
The type of model is chosen with SetMethod, where the respective arguments are presented
in Table 1. In case of panel data, the panel structure must be specified using Ident. If the
original data are in levels, you can use PrepData for transformation (the classes ‘Database’
and ‘Modelbase’ provide additional functions). Variables are grouped using Select, where
variable names serve as arguments.12
To formulate the frontier function:
To include variables that affect the one-sided inefficiency distribution (for the WT model choose
only one of these groups):
12
In case of the panel models, a common constant is not identified. However, you can leave "Constant" in
the selection because it is ignored automatically.
Journal of Statistical Software 9
Use Select(U_VAR, {".",.,.}) to select variables that shift the individual location
parameter of the distribution, µi .
Use Select(Z_VAR, {".",.,.}) to select variables that affect the scale parameter of
2 .
the distribution, σu,i or σu,i
SetTranslog can be used to choose the functional form of the frontier function. In case
of the translog specification, we recommend to normalize the variables by the respective
sample means. Estimation of the model is executed via Estimate. For more details, see the
documentation of member functions in Section 3.4.
Cross-sectional data
Specific numbers for model POOLED:
gamma was defined Pby Battese and Corra (1977) and is given by γ = σu2 /σ 2 = σu2 /(σv2 + σu2 ),
where σu2 = n1 i exp(2 δ > z i ).
VAR(u)/VAR(total) describes the variance decomposition of the composed error term. The
share of the variance of u in the total variance of the composed error is given by
VAR[u]/VAR[] = [(π − 2)/π]σu2 /[(π − 2)/π]σu2 + σv2 , see Greene (2008, p. 118), where
σu2 = n1 i exp(2 δ > z i ).
P
Test of one-sided err provides a likelihood ratio test statistic for the presence of ineffi-
ciency, i.e., for the null hypothesis H0 : γ = 0. The critical value cannot be taken from
a conventional χ2 -table, see Kodde and Palm (1986).
Panel data
Specific numbers for model LSDV:
SSR
sigma_e describes σv , the square root of the corrected error variance, σv2 = N (T −1)−K . This
estimate is also used to compute the standard errors.
AIC1 (all obs) is given by AIC1 = P−2Pln L + 2 (K + 1); it uses the likelihood function
ṽ 2
ln L = − 2 ln(2π) − 2 ln(σ ) − i2σ2t it with the uncorrected σ 2 = SSR
NT NT 2
NT .
AIC2 uses a different formula for the criterion, AIC2 = ln( SSR K+N
N T ) + (2 N T ); that does not
require the likelihood function and considers the number of individuals in the penalty
term.
AiHat AiHat();
Return value
Returns the calculated individual effects α̂i , N × 1 vector.
Description
– Only panel data – These values can be obtained after estimation, see Section 2 for the
respective formulas.
mifr is the condition that specifies the observation to be dropped, see the documenta-
tion of selectifr.
Returns the calculated output elasticities and the respective t-values for the specified
(single) input variable (for all sample observations).
Description
– Use with SetTranslog() – Only if a translog functional form is used, N T × 2 matrix.
The elasticity is calculated for each observation of the sample: the output elasticity of
input k is given by ∂ ln yi /∂ ln xki . The t-values are calculated using the delta method
(Greene 2012), extracting the required values from the data and covariance matrix of
the parameter estimates.
sXname is the name of the corresponding input variable (string).
GetResults GetResults(const ampar, const ameff, const avfct, const amv);
No return values
Description
– Only POOLED model – This function can be used to store the results of the estimation
procedure for further use. All four arguments should be addresses of variables.
mpar consists of a Npar × 3 matrix, where Npar is the number of parameters in the
model. The first column contains the coefficient estimates, the second column the
standard errors, and the third column the appropriate probabilities.
eff consists of a Nobs × 3 matrix, where Nobs is the number of total observations. The
first column holds the point estimate for technical efficiency, the second and third
columns contain the upper and lower bound of the (1 − α) confidence interval.
fct holds some likelihood function values (OLS and ML), as well as some information
on the correct variance decomposition of the composed error term.
v variance-covariance-matrix.
Ident Ident(const vID, const vPer);
No return values
Description
– Only panel data – Identifies the structure of the panel.
vID is an N T × 1 vector holding the identifier (integer) of the individual.
vPer is an N T × 1 vector holding the identifier (integer) of the time period.
Ineff Ineff();
Return value
Returns point estimates of technical inefficiency, N T × 1 vector.
Description
These predictions are given by the conditional expectation of u, see Section 2 for details.
PrepData PrepData(const mSel, iNorm);
Return value
Returns logarithms of the specified variables, either normalized or not.
Description
This function expects your data in levels and can do two things: It takes logarithms of
your specified variables (if iNorm = 0) or it normalizes your data (by the sample mean
if iNorm = 1), before taking logarithms. The transformed variable should receive a new
name.
12 Stochastic Frontier Analysis Using SFAMB for Ox
A value of zero indicates no further terms to be added, e.g., for a log-linear model,
this corresponds to the Cobb-Douglas form.
A value of one indicates that all square and cross terms of all independent variables
should be constructed, e.g., for a log-linear model, this corresponds to the full
translog form.
An integer value of k > 1 indicates that the square and cross terms should be
constructed for only the first k independent variables (useful when the regressor
matrix contains dummy variables).
TE TE();
Return value
Returns point estimates of technical efficiency, N T × 1 vector.
Description
Journal of Statistical Software 13
These predictions are given by the conditional expectation of exp(−u), see Section 2 for
details.
TestGraphicAnalysis TestGraphicAnalysis();
No return values
Description
Only useful in conjunction with the free Ox package GnuDraw (Bos 2014), which is an Ox
interface to gnuplot (gnuplot Team 2015). This function draws two (or three) graphs:
A histogram of the efficiency point estimates and a respective boxplot. It displays
an additional graph in case of the POOLED model, depicting the interval estimates of
technical efficiency at the specified significance.
4. Examples
#include <oxstd.h>
#include <packages/gnudraw/gnudraw.h>
#import <packages/sfamb/sfamb>
The first file, the so-called standard header, ensures that all standard library functions can
be used. The second line includes the header file of GnuDraw (Bos 2014), an Ox interface to
gnuplot (gnuplot Team 2015). If it is not installed or you do not want to use this package,
delete this line. However, graphics output will then be disabled in the free Ox Console version
(in the commercial OxMetrics version, graphics would still be available). Alternatively, you
can comment it out via //:
// #include <packages/gnudraw/gnudraw.h>
14 Stochastic Frontier Analysis Using SFAMB for Ox
The third line imports the (compiled) source code of the package (you may also use #include
<packages/sfamb/[Link]>). Each Ox program is executed by the main() function that
contains the main loop of Ox.
main(){
...
}
The next steps outlined follow the structure of Figure 1. A new object of class ‘Sfa’ has to
be declared.
The data are loaded with a call to the member function Load. The argument of SetMethod
chooses the respective estimator (see Table 1). Here, the model for cross-sectional data is
specified. The function SetConstant creates a constant (intercept).
[Link]("[Link]");
[Link](POOLED);
[Link]();
Data are either used directly or prepared within the code. Here, the output variable, five
input variables, and a time variable are transformed where logarithms of the mean-normalized
inputs (output) are used.14 New names are assigned to the prepared variables. These names
are used for further instructions. The function Info is useful here because it prints summary
statistics, thereby allowing the transformed data to be checked. The program always stops
at an exit function (that is why it is commented out here).
decl inorm = 1;
[Link]([Link]([Link]("output"), inorm), "lny");
[Link]([Link]([Link]("labour"), inorm), "lnlab");
[Link]([Link]([Link]("land"), inorm), "lnland");
[Link]([Link]([Link]("machinery"), inorm), "lnmac");
[Link]([Link]([Link]("fertilizer"), inorm), "lnfert");
[Link]([Link]("time") - meanc([Link]("time")), "trend");
// [Link](); exit(1);
Selection of variables is carried out by Select where Y_VAR is the selection of the dependent
variable and X_VAR is the selection of the regressors. The function uses the new variable
names defined above (if your data file already includes transformed variables, you would use
the names from within the file). The intercept ("Constant") is available because SetConstant
is called above. Within the Select function there are arrays with three elements (variable
name, start lag, end lag). Here, the lags are set to zero. Note that there must not be a comma
before the closing curly brace of Select.
14 xjit
Normalization of inputs (and output): ln x̄j
; normalization of time trend: t − t̄ .
PrepData is a member function of this package (see Section 3.4). Both of the other functions are member
functions of the ‘Database’ class (see Doornik and Ooms 2007).
Journal of Statistical Software 15
The above selections define the production frontier. Additional covariates associated with
the underlying inefficiency distribution can be introduced (POOLED and WT model). Covariates
used to model the location parameter of the distribution are selected in the group U_VAR.
Here, only "Constant" is selected, meaning that µ 6= 0, but additional variables could be
included.
[Link](U_VAR, {
"Constant", 0, 0
});
Likewise, covariates used to model the scale of the distribution are selected in the group
2 ).
Z_VAR, i.e., these variables parameterize σu,i (in case of the WT model, it is σu,i
[Link](Z_VAR, {
"Constant", 0, 0,
"lnlab", 0, 0,
"lnland", 0, 0,
"lnmac", 0, 0,
"lnfert", 0, 0
});
The next three lines allow for different adjustments. SetSelSample is required and can be
used to choose a subset of the data (here: full sample). SetPrintSfa ensures that estimation
output is printed. MaxControl is an optional function that allows for documentation and
adjustments of the maximization procedure.
The functional form of the production frontier is chosen by SetTranslog where the options
are either Cobb-Douglas or translog. Here, a translog form is specified. Estimation of the
model is invoked via Estimate.
[Link](1);
[Link]();
A number of results can be obtained after estimation. In the SF context, the efficiency scores
(TE i ) are of particular interest. Here, the point estimates are extracted, together with the
16 Stochastic Frontier Analysis Using SFAMB for Ox
lower and upper bounds of a 95% confidence band. The respective function is TEint. The
function Ineff extracts the point estimates of inefficiency, E(ui |i ). These results are labeled
and appended to the object using Renew. The original database together with the transformed
variables and results is saved to file via Save.
There is a graphical functionality involving the package GnuDraw that allows for a visual
assessment of the efficiency scores. The function TestGraphicAnalysis displays the graphics
presented in Figure 2. The confidence band can be changed with SetConfidenceLevel, where
an error probability of 0.05 is the default.
[Link](0.05);
[Link]();
The output of this program appears as follows (omitting information on the maximization
procedure). Some general information:
The transformed variables facilitate the interpretation of the estimated coefficients of the
translog functional form. Thus, the first order coefficients listed below can be interpreted as
output elasticities at the sample mean. These estimates are positive and meet the requirement
of monotonicity – except for the machinery input whose (insignificant) estimate violates the
regularity condition. The parameter associated with trend indicates the estimated average
rate of technical change per year.
Furthermore, the output shows the coefficients of the squared and cross terms that can be
used to calculate the individual output elasticities.
After the technology parameters, the estimates of σv and σu are listed in the form of their
natural logarithms. The next line refers to the noise component.
i.e., the location parameter is a constant (µ0 ) common to all individuals. Additional covariates
could be introduced. The omission of U_VAR in the model specification leads to µ = 0, and
hence, results in the normal half-normal model. Note that this estimate (here, the third
Constant) is always the last Constant term in the list (if a truncated-normal is specified).
log-likelihood -458.928611
no. of observations 2400 no. of parameters 28
AIC.T 973.857222 AIC 0.405773842
18 Stochastic Frontier Analysis Using SFAMB for Ox
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3 TE
0.2 Lower bound
Upper bound
0.1
0 500 1000 1500 2000 2500
#include <oxstd.h>
#include <packages/gnudraw/gnudraw.h>
#import <packages/sfamb/sfamb>
main(){
decl fob = new Sfa();
[Link]("[Link]");
[Link](CFE);
[Link]();
Journal of Statistical Software 19
CFE is the estimator selected. Here, the function SetConstant does not create a constant
because it is not required. However, this line can be kept for convenience. The function
Ident identifies the panel structure of the data. The required information includes the variable
names of the individuals ("ID") and the period ("time").
[Link]([Link]("ID"), [Link]("time"));
Data transformation and model specification correspond to the previous example. Note that
neither U_VAR nor Z_VAR are available here.
decl inorm = 1;
[Link]([Link]([Link]("output"), inorm), "lny");
[Link]([Link]([Link]("labour"), inorm), "lnlab");
[Link]([Link]([Link]("land"), inorm), "lnland");
[Link]([Link]([Link]("machinery"), inorm), "lnmac");
[Link]([Link]([Link]("fertilizer"), inorm), "lnfert");
[Link]([Link]("time") - meanc([Link]("time")), "trend");
[Link](X_VAR, {
"Constant", 0, 0,
"lnlab", 0, 0,
"lnland", 0, 0,
"lnmac", 0, 0,
"lnfert", 0, 0,
"trend", 0, 0});
For this model, there is no calculation of the confidence bounds involved. The efficiency scores
can be extracted as point estimates using function TE.
[Link]([Link](), {"TE"});
[Link]([Link](), {"jlms"});
[Link]();
delete fob;
}
The output of this program appears as follows. Additional information on the panel structure
is printed.
This model is restricted to the normal half-normal case. Here, the estimates of (the natural
logarithms of) σv2 and σu2 are given.
log-likelihood 1476.81739
no. of observations 2400 no. of parameters 22
AIC.T -2909.63478 AIC -1.21234782
mean(lny) 7.55183e-018 var(lny) 0.127523
lambda 2.123
7
Technical efficiency
6
0
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
Technical efficiency
1.1
0.9
0.8
0.7
0.6
0.5
[Link](X_VAR, {
"Constant", 0, 0,
"lnx1", 0, 0,
"lnx2", 0, 0,
"lnx3", 0, 0,
"trend", 0, 0});
If your selection includes dummies, the variables should be ordered like this:
[Link](X_VAR, {
"Constant", 0, 0,
"lnx1", 0, 0,
"lnx2", 0, 0,
"lnx3", 0, 0,
"trend", 0, 0,
22 Stochastic Frontier Analysis Using SFAMB for Ox
"dummy1", 0, 0,
"dummy2", 0, 0});
The following example illustrates one possible way the function may be used. Here, results
are plotted as histograms (see Figure 4). Note that indexing starts at 0 in Ox (Elast returns
an N T × 2 matrix but only the first column is considered here). The first three arguments
of DrawDensity are the most important here: area (panel) index, variable, label. See the
documentations of Ox or GnuDraw for a full description.
5. Future developments
The basic version of SFAMB dates back to the mid 1990s where the capability was restricted
to cross-sectional data. As the package now allows for panel data and the literature on SF
methods is considerably broader and still growing, there is scope for potential extensions.
Some related possibilities are mentioned here.
In the model framework of Chen et al. (2014) there are two ways to calculate the individual
effects. As an alternative to Equation 5, the individual “between estimator of αi ” can be
used. It could be implemented as an optional function, involving a second maximization. Its
availability would allow us to compare results and check the consequences for TE scores.
While the current focus of panel methods is on fixed effects estimation, a more comprehen-
sive supplement might involve random effects models. The most recent SF approach using
the CSN distribution is presented by Colombi, Kumbhakar, Martini, and Vittadini (2014).
Its specification is similar to Equation 3, but the time-invariant part is further decomposed
into two residuals (persistent inefficiency and time-invariant unobserved heterogeneity). Fil-
ippini and Greene (2016) introduce computational simplifications and label the model as the
“Generalized True Random Effects SF model ”.
Journal of Statistical Software 23
6 6
eps1 eps2
5 5
4 4
3 3
2 2
1 1
0 0
-0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2
25 50
eps3 epst
45
20 40
35
15 30
25
10 20
15
5 10
5
0 0
-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 -0.02 -0.01 0 0.01 0.02 0.03 0.04 0.05
Acknowledgments
All code is pure Ox code. However, code of the WT model is partially adapted from Stata code
by Hung-Jen Wang. We thank Hung-Jen Wang for providing data to check our CFE code.
Additional thanks are due for the referees for offering helpful comments that improved the
quality of the manuscript.
References
Aigner D, Lovell CAK, Schmidt P (1977). “Formulation and Estimation of Stochastic Frontier
Production Function Models.” Journal of Applied Econometrics, 6(1), 21–37. doi:10.1016/
0304-4076(77)90052-5.
Alvarez A, Amsler C, Orea L, Schmidt P (2006). “Interpreting and Testing the Scaling
Property in Models Where Inefficiency Depends on Firm Characteristics.” Journal of Pro-
ductivity Analysis, 25(3), 201–212. doi:10.1007/s11123-006-7639-3.
Battese GE, Coelli TJ (1988). “Prediction of Firm-Level Technical Efficiencies with a Gen-
eralized Frontier Production Function and Panel Data.” Journal of Econometrics, 38(3),
387–399. doi:10.1016/0304-4076(88)90053-x.
Battese GE, Coelli TJ (1992). “Frontier Production Functions, Technical Efficiency and Panel
Data: With Application to Paddy Farmers in India.” Journal of Productivity Analysis, 3(1–
2), 153–169. doi:10.1007/bf00158774.
24 Stochastic Frontier Analysis Using SFAMB for Ox
Battese GE, Coelli TJ (1995). “A Model for Technical Inefficiency Effects in a Stochastic
Frontier Production Function for Panel Data.” Empirical Economics, 20(2), 325–332. doi:
10.1007/bf01205442.
Battese GE, Corra GS (1977). “Estimation of a Production Function Model: With Application
to the Pastoral Zone of Eastern Australia.” Australian Journal of Agricultural Economics,
21(3), 169–179. doi:10.1111/j.1467-8489.1977.tb00204.x.
Belotti F, Daidone S, Ilardi G, Atella V (2013). “Stochastic Frontier Analysis Using Stata.”
Stata Journal, 13(4), 719–758. URL [Link]
article=st0315.
Bogetoft P, Otto L (2010). Benchmarking with DEA, SFA, and R, volume 157 of International
Series in Operations Research & Management Science. Springer-Verlag.
Bos CS (2014). GnuDraw – An Ox Package for Creating gnuplot Graphics. URL http:
//[Link]/[Link]/software/[Link].
Brorsen BW, Kim T (2013). “Data Aggregation in Stochastic Frontier Models: The Closed
Skew Normal Distribution.” Journal of Productivity Analysis, 39(1), 27–34. doi:10.1007/
s11123-012-0274-2.
Brümmer B (2001). “Estimating Confidence Intervals for Technical Efficiency: The Case of
Private Farms in Slovenia.” European Review of Agricultural Economics, 28(3), 285–306.
doi:10.1093/erae/28.3.285.
Caudill SB, Ford JM, Gropper DM (1995). “Frontier Estimation and Firm-Specific Inefficiency
Measures in the Presence of Heteroscedasticity.” Journal of Business & Economic Statistics,
13(1), 105–111. doi:10.2307/1392525.
Chen YY, Schmidt P, Wang HJ (2014). “Consistent Estimation of the Fixed Effects Stochastic
Frontier Model.” Journal of Econometrics, 181(2), 65–76. doi:10.1016/[Link].2013.
05.009.
Coelli TJ (1996). A Guide to FRONTIER 4.1: A Computer Program for Stochastic Frontier
Production and Cost Function Estimation. CEPA Working Papers, University of New
England, URL [Link]
Coelli TJ, Henningsen A (2017). frontier: Stochastic Frontier Analysis. R package version
1.1-2, URL [Link]
Coelli TJ, Rao PDS, O’Donnell CJ, Battese GE (2005). An Introduction to Efficiency and
Productivity Analysis. Springer-Verlag. doi:10.1007/978-1-4615-5493-6.
Cottrell A, Lucchetti R (2014). gretl User’s Guide – Gnu Regression, Econometrics and
Time-Series Library. URL [Link]
Journal of Statistical Software 25
Doornik JA, Ooms M (2007). Introduction to Ox: An Object-Oriented Matrix Language. Tim-
berlake Consultants Press, London. Available at [Link]
pdf.
Econometric Software Inc (2014). LIMDEP, Version 10.0. ESI, New York. URL http:
//[Link]/.
Fuglie KO (2012). “Productivity Growth and Technology Capital in the Global Agricultural
Economy.” In KO Fuglie, SL Wang, VE Ball (eds.), Productivity Growth in Agriculture:
An International Perspective. CABI.
gnuplot Team (2015). gnuplot 5.0 – An Interactive Plotting Program. URL http:
//[Link]/projects/gnuplot.
Horrace WC, Schmidt P (1996). “Confidence Statements for Efficiency Estimates from
Stochastic Frontier Models.” Journal of Productivity Analysis, 7(2–3), 257–282. doi:
10.1007/bf00157044.
Hughes G (2008). sfa hetmod and sfa mod. User-Contributed Function Pack-
ages for gretl. URL [Link]
available_user-contributed_function_packages.
Jondrow J, Lovell CAK, Materov IS, Schmidt P (1982). “On the Estimation of Technical In-
efficiency in the Stochastic Frontier Production Function Model.” Journal of Econometrics,
19(2–3), 233–238. doi:10.1016/0304-4076(82)90004-5.
Kodde DA, Palm FC (1986). “Wald Criteria for Jointly Testing Equality and Inequality
Restrictions.” Econometrica, 54(5), 1243–1248. doi:10.2307/1912331.
Kumbhakar SC, Lovell CAK (2000). Stochastic Frontier Analysis. Cambridge University
Press, Cambridge.
Lai H, Huang CJ (2010). “Likelihood Ratio Tests for Model Selection of Stochas-
tic Frontier Models.” Journal of Productivity Analysis, 34(1), 3–13. doi:10.1007/
s11123-009-0160-8.
Meeusen W, Van den Broeck J (1977). “Efficiency Estimation from Cobb-Douglas Production
Functions with Composed Error.” International Economic Review, 18(2), 435–444. doi:
10.2307/2525757.
R Core Team (2017). R: A Language and Environment for Statistical Computing. R Founda-
tion for Statistical Computing, Vienna, Austria. URL [Link]
Schmidt P, Sickles RC (1984). “Production Frontiers and Panel Data.” Journal of Business
& Economic Statistics, 2(4), 367–374. doi:10.2307/1391278.
Wang HJ, Schmidt P (2002). “One-Step and Two-Step Estimation of the Effects of Exogenous
Variables on Technical Efficiency Levels.” Journal of Productivity Analysis, 18(2), 129–144.
doi:10.1023/a:1016565719882.
A. Technical appendix
The search for values for σv2 and σu2 involves partitioning the variance of the composed error
term. Aigner et al. (1977) show that VAR() = σv2 + ((π − 2)/π) σu2 , which can be expressed
as:
2 π−2 2 2 π−2
VAR() = σ (1 − γ) + σ γ =σ 1−γ 1− .
π π
Using the variance of the OLS residuals (m2 ) as an estimate for VAR():
2 2 m2
m2 = σ 1 − γ ←→ σ 2 = .
π 1 − γ π2
The grid search is called within member function DoEstimation, and runs over γ values with
step length 0.01; it passes the starting values for β and σ 2 back to DoEstimation.
model, while the remaining models employ numerical first derivatives based on the Ox routine
Num1Derivative (finite difference approximation).
The CFE model’s “within-likelihood function” includes a T -dimensional cdf. In their Ap-
pendix C, Chen et al. (2014) show how the T -dimensional integral is reduced to a one-
dimensional integral, referring to Kotz, Balakrishnan, and Johnson (2000) (fKotzetal is the
respective function). The numerical quadrature is executed by function QNG which is part
of the Fortran package QuadPack (Piessens, de Doncker-Kapenga, Überhuber, and Kahaner
1983) and linked in via #include <quadpack.h>.
Standard errors are obtained from m_mCovar (covariance matrix). This data member is pro-
duced by function Num2Derivative (called within member function Covar) that uses a finite
difference approximation (Doornik 2009). Estimation output of the cross-sectional model re-
turns robust standard errors (by default) which are obtained by the method of White (1980).
√ !
(i + µ)2 1 + λ2
1 1 µ λi
ln Li = ln √ − ln σ 2 − + ln Φ − − ln Φ µ . (6)
2π 2 2σ 2 σλ σ σλ
The log-likelihood function of the TFE model (Greene 2005) corresponds to Equation 6, but
with µ = 0 and it = yit − β > xit − αi in the place of i .
The log-likelihood function of the WT model (Wang and Ho 2010) for one individual, with
˜it = ỹit − β > x̃it , and ˜i = (˜
i1 , . . . , ˜iT )> :
1 1 1
ln Li = − (T − 1) ln(2π) − (T − 1) ln(σv2 ) − ˜> −
i Π ˜i
2 2 2
1 µ2∗∗ µ2
µ∗∗ µ
+ 2
− 2 + ln σ∗∗ Φ − ln σu Φ ,
2 σ∗∗ σu σ∗∗ σu
where
µ/σu2 − ˜> −
i Π h̃i
µ∗∗ = >
,
h̃i Π− h̃i + 1/σu2
2 1
σ∗∗ = >
.
h̃i Π− h̃i + 1/σu2
Log-likelihood function of the CFE model (Chen et al. 2014), with ˜it = ỹit − β > x̃it , ˜i =
Journal of Statistical Software 29
X 1
ln LW = constant + ln ∗i ;
φT −1 (˜ 2
0, σ (IT −1 − ET −1 ))
T
i
X λ2
λ
+ ln ΦT (− ˜i ; 0T , IT + ET ) ,
σ T
i
X (T − 1) 1
1
∗> − ∗
ln LW = −N T ln Φ(0) + − ln (2π) − ln |Σ| − ˜i Σ ˜i
2 2 2
i
T
"Z #
∞
X Y λ λ
+ ln φ(u0 ) Φ − ˜it − √ U0 du0 .
−∞ σ T
i t=1
Affiliation:
Jonathan Holtkamp
Department of Agricultural Economics and Rural Development
University of Goettingen
D-37073 Goettingen, Germany
E-mail: jonathan-holtkamp@[Link]
Bernhard Brümmer
Department of Agricultural Economics and Rural Development
& Centre of Biodiversity and Sustainable Land Use
University of Goettingen
D-37073 Goettingen, Germany
E-mail: bbruemm@[Link]
URL: [Link]