Bayesian Test for Survival Distributions
Bayesian Test for Survival Distributions
To cite this article: Cachimo Combo Assane, Basilio de Bragança Pereira & Carlos
Alberto de Bragança Pereira (2017): Bayesian significance test for discriminating
between survival distributions, Communications in Statistics - Theory and Methods, DOI:
10.1080/03610926.2017.1406117
survival models is the main objective of the present paper. A Survival Accepted November
distribution should be chosen among the tree celebrated ones, lognor-
KEYWORDS
mal, gamma, and Weibull. For this discrimination, a linear mixture of FBST; Mixture model; Model
the three distributions is an important tool: the FBST is used to test the choice; Separate models;
hypotheses defined on the mixture weights space. Another feature of Significance test; Survival
the paper is that all three distributions are reparametrized in that all the distributions.
six parameters are written as functions of the mean and the variance of
the population been studied. Some numerical results from simulations MATHEMATICS SUBJECT
with some right-censored data are considered. CLASSIFICATION
F; F
1. Introduction
In many scientific disciplines, researchers are constantly faced with the fundamental problem
of choosing among alternative statistical models. The Neyman-Pearson theory of hypothesis
testing applies only if the models belong to the same family of distributions. Alternatively, spe-
cial procedures are required if the models belong to families that are separate (or non-nested)
in the sense that an arbitrary member of one family cannot be obtained as a limit of members
of the other. The set of separate families of probability distributions includes the ones used
here: lognormal, gamma, and Weibull models (Pereira 1981; Araujo and Pereira 2007; Pereira
and Pereira 2017) which have been used widely to describe survival data (Lawless 2002;
Lee and Wang 2003).
A considerable amount of research on separate families of hypotheses has been realized
since the fundamental work of Cox (1961, 1962), who first dealt with the problem. For reviews
and references, see Araujo et al. (2005); Araujo and Pereira (2007); and Pereira and Pereira
(2017).
The Fully Bayesian Significance Test (FBST) introduced by Pereira and Stern (1999) is an
alternative test to the ones that are based on Bayes factor or on the classical p-value; mostly
for the case of precise hypotheses. The basis for the FBST is an index known as e-value
(e stands for evidence) that measures the inconsistency of the hypothesis. For this, it considers
the tangent set, T ; the set of all parameter values for which their posterior density values are
greater than the values of the posterior densities of all points that attend the hypothesis. For
reviews and further references on FBST, see Pereira et al. (2008) and Stern and Pereira (2014).
For a few interesting applications illustrating the use of e-values and the FBST to practical
problems, see Diniz et al. (2012), Lauretto et al. (2003), Lauretto et al. (2007), and Pereira and
Stern (1999).
In the present work, we consider the FBST for discriminating between the lognormal,
gamma and Weibull distributions. We formulate this problem in the context of linear mixture
model, as suggested by Cox (1961). It means that, the models under comparison are consid-
ered as components of a finite mixture model. The FBST is used for testing hypotheses defined
on the mixture weights space. The e-value is the complementary of the posterior probability
of the tangent set T ; ev = 1 − Pr(T |Data),
Additionally, the density functions of the mixture components are reparametrized in terms
of the mean μ and the variance σ 2 of the population. Hence, the models under discrimina-
tion share common parameters (Kamary et al. 2014; Pereira and Pereira 2017). A standard
Bayesian approach to finite mixture models is to consider different pairs of parameters for
Downloaded by [[Link]] at 15:50 05 December 2017
each of these models and to adopt independent prior distributions for each pair of param-
eters and a Dirichlet prior on the mixture weights (Lauretto and Stern 2005; Lauretto et al.
2007). However, since the comparison between the models is based on the same dataset and
on the same sample, we believe that it would be inappropriate to consider different means
and variances for these models. Note that when we try to define the prior distributions for
the population mean and variance, our uncertainties about these default parameters are
not related to the models under comparison. In this way, the parametrization can be used
to any distribution that may be reparametrized in the way was done here. This practical
argument was the reason we decided to present our “mixture” model. We are not certain
that mixture is the correct word because, in fact, we have a convex combination of density
functions.
Moreover, this reparametrization reduces the number of the parameters to be esti-
mated: in our case, including the weights, from eight to only four. The reduction of the
parameter space may lead to low computational costs..
Note that mean and variance are parameters that can be thought as existing invisible
quantities, but the weights of the convex combination do not. The vector of weights must
be defined in a simplex and it is an artifact that helps to discriminate, between the three
models, those who best adjust the observations. It can happen that the own combination
can be the best model as well the combination of a pair of them. To understand the role of
Dirichlet distributions we refer to Pereira and Stern (2008) and Stern (2011). It is impor-
tant to call attention to the fact that the posterior distribution of the weights is the “arti-
fact” that induces the model choice (Cox 1961, 1962).
To illustrate the procedure, numerical results based on simulated right-censored survival
times were considered. Also, a real example is introduced to use the lognormal-gamma-
Weibull mixture model to the dataset of patients, from Rio de Janeiro hospitals, with end-stage
chronic kidney failure who received hemodialysis.
Section 2 presents a brief review of basic concepts and notation for survival analysis. The
parametric distributions used in this paper are also described. Section 3 reviews the basic con-
cepts o FBST. Section 4 discusses the FBST formulation for discriminating between survival
distributions in the context of mixture models. Section 5 presents the results of the simulation
study. Section 6 is about the use of the lognormal-gamma-Weibull on the real dataset. Final
remarks are presented in Section 7.
COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 3
2. Survival analysis
Therefore, when t = 0 then, S(t ) = 1 and H(t ) = 0; and when t = ∞ then, S(t ) = 0 and
H(t ) = ∞.
(log t − α1 )
=1− √ ;
α2
fLN ()
hL (t|α) = .
SLN ()
It is important to note that evidence that favors H is not evidence against the alternative,
H = \ H, because it is not a sharp hypothesis. This interpretation also holds for p-values
in the frequentist paradigm. As in Pereira et al. (2008), we would like to point out that this
Bayesian significance index uses only the posterior distribution, with no need for additional
artifacts such as the inclusion of positive prior probabilities for the hypotheses or the elimi-
nation of nuisance parameters. The computation of the e-values does not require asymptotic
methods, and the only technical tools needed are numerical optimization and integration
methods.
The families of distributions considered include the lognormal, gamma and Weibull mod-
els. Hence, the relationship between the parameters of these models through the μ and σ 2 is
described as follows.
(i) Let y be a lognormal(α1 , α2 ), α1 ∈ R and α2 > 0, with probability density function
1 (log y − α1 )2
fL (y|α1 , α2 ) = √ exp − .
y 2πα2 2α2
6 C. C. ASSANE ET AL.
We then have
⎧
α1 +α2 /2
2
⎨ α1 = log √ μ2 2
μ = E(y|α1 , α2 ) = e μ +σ
⇒ (4.3)
σ 2 = Var(y|α1 , α2 ) = (eα2 − 1)e2α1 +α2 ⎩ α = log μ2 +σ 2 .
2 μ2
(ii) Let y be a gamma(γ1 , γ2 ), γ1 > 0 and γ2 > 0, with probability density function
1 γ2 −1 y
fG (y|γ1 , γ2 ) = y exp − .
(γ2 )γ1γ2 γ1
Therefore
σ2
μ = E(y|γ1 , γ2 ) = γ1 γ2 γ1 = μ
⇒ (4.4)
σ 2 = Var(y|γ1 , γ2 ) = γ2 γ12 γ2 = μ2
.
σ2
(iii) When y ∼ Weibull(β1 , β2 ), β1 > 0 and β2 > 0, with probability density function
Downloaded by [[Link]] at 15:50 05 December 2017
β2 β2 −1 y β2
fW (y|β1 , β2 ) = β2 y exp − ,
β1 β1
then
μ = E(y|β1 , β2 ) = β1 (1 + 1/β2 )
σ 2 = Var(y|β1 , β2 ) = β12 (1 + 2/β2 ) − β12 2 (1 + 1/β2 )
μ
β1 = (1+1/β2 )
⇒ 2 2 (4.5)
2 log (1 + 1/β2 ) − log (1 + 2/β2 ) + log μ μ+σ
2 = 0.
In order to find β2 , the Newton-Rapson method can be used to solve the nonlinear equa-
tion. Here, we use the nleqslv” function in the R” package of the same name.
A special feature of survival data is that survival times are frequently censored. The survival
time of an individual is said to be censored when the event of interest has not been observed
for that individual, but is known only to occur in a certain period of time. There are various
categories of censoring, such as right censoring, left censoring and interval censoring (see
Klein and Moeschberger (2003) for more details). In this paper, we restrict ourselves to data in
which the survival times are subject to right censoring, which is the most common censoring
mechanism in medical research.
In the model for right-censored data, it is convenient to consider the following notation.
Each individual j is assumed to have an event time Tj and a censoring time C j . The observa-
tions consist of (y1 , δ1 ), (y2 , δ2 ), . . . , (yn , δn ), where y j = min{Tj , C j } and δ j = I(Tj ≤ C j ),
indicating whether Tj was observed (δ j = 1) or not (δ j = 0).
Note that the likelihood function given by (4.2) is for uncensored (or exact) observations.
Assuming noninformative censoring, i.e, independence between Tj and C j , then, the likeli-
hood function for right-censored observations is
n n
L(y, θ) = f (y j , δ j |θ) ∝ [ f (y j |θ)]δ j [S(y j |θ)]1−δ j
j j
n
m
δ j m
1−δ j
∝ pk fk (y j |μ, σ ) pk Sk (y j |μ, σ ) , (4.6)
j k=1 k=1
On the other hand, the hypothesis that y has not the density fk (y|ψk ) is equivalent to
H : pk = 0 ∧ pi = 1. (4.9)
i=k
The alternative hypotheses to (4.8) and (4.9) are Ak : pk < 1 and Ak : pk > 0, respectively,
which are not sharp anyway.
The FBST procedure is used to test Hk , k = 1, . . . , m, according to the expressions (3.1)
and (3.2). For the optimization step, we used the conjugate gradient method (Fletcher and
Reeves 1964). In order to perform the integration over the posterior measure, we used an
Adaptive Metropolis Markov chain Monte Carlo algorithm (MCMC) of Haario et al. (2001).
In this paper, the implementation of the Bayesian models is carried out using LaplacesDe-
mon” R” package. The LaplacesDemon” is an open-source package that provides a complete
environment for simulation in Bayesian inference (Statisticat, LCC 2016).
5. Simulations
In this section we present some numerical results based on simulated right-censored survival
times in order to evaluate the performance of the FBST for discriminating between the sur-
vival distributions via lognormal-gamma-Weibull mixture model (LGW). The main purpose
is to measure the convergence rate of correct decisions, concerning the identification of the
true model used to generate the survival times T .
The simulations of this paper were performed on a Intel(R) Core(TM) i7-5500U CPU@
2.40GHz computer.
pc = Pr(δ = 0|λ, μ, σ 2 )
= Pr(C ≤ T ≤ ∞, 0 ≤ C ≤ ∞)
= 1 − Pr(0 ≤ T ≤ C, 0 ≤ C ≤ ∞)
∞ c
=1− g(c|λ) fL (t|μ, σ )dtdc
∞
0 0
where fL and FL are the lognormal probability density and distribution functions of survival
times, respectively.
For generating right-censored survival times from the gamma and Weibull distributions,
an analogous procedure to that used for the lognormal distribution is employed.
Table . Mean of estimates for LGW model parameters and percentages of correct decisions made by FBST
on selecting the true distribution used to generate the survival times, using samples with different right-
censoring percentages.
μ σ2 pL pG pW
% of Rc† Model n — — — % of Cd†
6.1. Dataset
The dataset used in this paper refers to a cohort study of 473 patients with end-stage chronic
kidney failure who received hemodialysis (HD) in four centers in the State of Rio de Janeiro,
Brazil. The patients were followed up 11 years. The observed time for each patient was the
number of months from admission to hemodialysis until death or the end of the observation
period (kidney transplant or end of the study) which indicates a right-censored survival time.
For a complete description of this dataset, see Alves et al. (2014).
In this paper, our main interest is to apply the LGW model to the survival data for HD
patients and use the FBST procedure to examine the mixture parameters in order to choose
the parametric distribution that best fits the observed data. But before that, we have performed
pairwise comparisons by fitting the lognormal-Weibull, lognormal-gamma, and gamma-
Downloaded by [[Link]] at 15:50 05 December 2017
6.2. Results
The measures of evidence provided by HD data in favor of the three models concerning
the pairwise comparisons are presented in Table 2. For the comparison between the log-
normal and Weibull distributions, the FBST indicates to choose the lognormal model since
the e-values ev (HL ) = 0.874 and ev (HW ) = 0.043. For selecting between the lognormal and
gamma distributions, the evidence measures indicate that both models provide good fit to the
dataset. Nevertheless, also we would prefer to choose the lognormal model which is the most
plausible. The results of the tests for comparison between the gamma and Weibull distribu-
tions indicate that the Weibull distribution does not provide reasonable fit to the dataset.
Discrimination based on the LGW mixture model
In order to test simultaneously the three hypotheses, we have applied the the LGW model,
to the HD data.
The estimates for the parameters of the model (6.1) are presented in Table 3. Here, SD, 2.5%
and 97.5% denote the standard deviation, the 2.5th and the 97.5th percentiles of the posterior
distribution of the LGW parameters, respectively. Both the classical and the Bayesian mea-
sures of evidence, presented in Table 4, indicate that neither the gamma and Weibull models
should be considered because the null hypotheses H : p2 = 0 e H : p3 = 0 are not rejected.
HL × HW HL . .
HW . .
HL × HG HL . .
HG . .
HG × HW HG . .
HW . .
∗ p-value calculated according to Diniz et al. ().
COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 11
p1 = 0 . .
p2 = 0 . .
p3 = 0 . .
∗ p-value calculated according to Diniz et al. ().
Downloaded by [[Link]] at 15:50 05 December 2017
Consequently, among the three models, the lognormal model is the most appropriate for mod-
eling HD data.
Figure 1 displays the survival curves calculated using Bayesian estimates of the lognormal
model (Table 5), the LGW mixture model (Table 3) and a procedure called the piecewise expo-
nential estimator (PEXE), introduced by Kim and Proschan (1976), representing the observed
data. Unlike the well-known Kaplan-Meier estimator, the PEXE is smooth and continuous
estimator of the survival function.
It appears reasonable to disregard both the gamma and the Weibull models; the lognormal
model by itself produces a good estimate of survival function.
Time to Survival/Progression
1.0
PEXE
Lognormal
0.8
LGW mixture
Probability
0.6
0.4
0 5 10 15 20 25
Time
Figure . Survival curves based on the estimates of the lognormal model, the LGW model and the PEXE.
12 C. C. ASSANE ET AL.
Note that the preference for the lognormal model is evident in evaluating the LGW mixture
model more than in the comparison between the lognormal and gamma distributions, where
the evidence measures in favor of both models are very close. It means that the discrimination
power provided by LGW model is much higher than the power of the pairwise comparisons.
This finding is in agreement with the discussion of Sawyer (1984).
7. Final remarks
In this paper we considered the FBST for discriminating between survival distributions in
Downloaded by [[Link]] at 15:50 05 December 2017
the context of linear mixture model. The mixture approach allows us to compare between all
alternative models at once by testing the hypotheses on the mixture weights space. The fami-
lies of survival distributions considered include the lognormal, gamma and Weibull models.
In this work, the density functions of the mixture components were reparametrized in terms
of the mean μ and the variance σ 2 of the population so that all models under discrimination
share common parameters (Kamary et al. 2014; Pereira and Pereira 2017).
From the simulation results, we observed that the FBST achieves good performance on
identifying the true distribution used to generate the survival times.
The application of the LGW mixture model to the survival data for HD patients allowed
us to identify the lognormal distribution as the most appropriate in modeling observed data.
Therefore, one can construct a regression model to the HD data considering the lognormal
model as the distribution of the response variable.
It would be of interesting to apply the proposed procedure to survival data also considering
another censoring mechanisms.
Acknowledgements
The authors are grateful for the support of CNPq, COPPE/UFRJ and IME/USP.
References
Araujo, M. I., and B. B. Pereira. 2007. A comparison of bayes factors for separated models: Some simu-
lation results. Communications in Statistics–Simulation and Computation 36:297–309.
Araujo, M. I., B. B. Pereira, R. Cleroux, M. Fernandes, and A. Lazraq. 2005. Separate families of models:
Sir David Cox contributions and recent developments. Student 5:251–8.
Alves, M., N. A. Souza e Silva, L. H. A. Salis, B. B. Pereira, P. H. Godoy, E. M. Nascimento, and J. M.
F. Oliveira. 2014. Survival and predictive factors of lethality in hemodyalisis: D/I polymorphism of
the angiotensin I-Converting enzyme and of the angiotensinogen M235T genes. Arq Bras Cardiol
103:209–18.
Cox, D. R. 1961. Tests of separate families of hypotheses. Proceedings 4th Berkeley Symposium in Math-
ematical Statistics and Probability 1:105–23.
Cox, D. R. 1962. Further results on test of separate families of hypotheses. Journal of the Royal Statistical
Society 24:406–24.
Cox, D. R. 1977. The role of significance tests. Scand. J. Statist 4:49–70.
COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 13
Diniz, M., C. A. B. Pereira, A. Polpo, J. M. Stern, and S. Wechsler. 2012. Relationship between Bayesian
and frequentist significance indices. International Journal for Uncertainty Quantification 2:161–72.
Fletcher, R., and C. M. Reeves. 1964. Function minimization by conjugate gradients. Computer Journal
7:148–54.
Haario, H., E. Saksman, and J. Tamminen. 2001. An adaptive Metropolis algorithm. Bernoulli 7:223–42.
Kamary, K., K. Mengersen, C. P. Robert, and J. Rousseau. 2014. Testing hypotheses via a mixture esti-
mation model. arXiv:1412.2044v2.
Kempthorne, O. 1976. Of what use are tests of significance and tests of hypothesis. Communications in
Statistics -Theory and Methods 8:763–77.
Kim, J. S., and F. Proschan. 1991. Piecewise exponential estimator of the survivor function. IEEE Trans-
actions on Reliability 40:134–9.
Klein, J., and M. L. Moeschberger. 2003. Survival analysis: Techniques for censored and truncated data.
2nd ed. New York, USA: Springer.
Lauretto, M., C. A. B. Pereira, J. M. Stern, and S. Zacks. 2003. Comparing parameters of two bivariate
normal distributions using the invariant full Bayesian significance test. Brazilian Journal of Proba-
bility and Statistics 17:147–68.
Lauretto, M. S., and J. M. Stern. 2005. FBST for mixture model selection. AIP Conference Proceedings
Downloaded by [[Link]] at 15:50 05 December 2017
803:121–8.
Lauretto, M. S., S. R. Faria Jr, C. A. B. Pereira, B. B. Pereira, and J. M. Stern. 2007. The problem of
separate hypotheses via mixture models. AIP Conference Proceedings 954:268–75.
Lawless, J. F. 2002. Statistical models and methods for lifetime data. 2nd ed. New York, USA: John Wiley
& Sons.
Lee, E. T., and J. W. Wang. 2003. Statistical methods for survival data analysis. 3rd ed. New Jersey, USA:
Wiley.
Pereira, B. B. 1981. Choice of a survival model for patients with a brain tumour. Metrika 28:53–61.
Pereira, B. B., and C. A. B. Pereira. 2017. Model choice in nonnested families. 1st edn. Berlin: Springer.
Pereira, C. A. B., and J. Stern. 1999. Evidence and credibility: Full Bayesian significance test for precise
hypotheses. Entropy 1:69–80.
Pereira, C. A. B., and J. Stern. 2008. Special characterization of standard discrete models. Revstat - The
Statistical Journal 6:199–230.
Pereira, C. A. B., J. Stern, and S. Wechsler. 2008. Can a significance test be genuinely Bayesian. Bayesian
Analysis 3:79–100.
Sawyer, K. R. 1984. Multiple hypotheses testing. Journal of Teh Royal Statistical 46:419–24.
Statisticat, LCC 2016. LaplacesDemon: A Complete Environment for Bayesian Inference within R.
R Package version 17.07.2016. [Link]
[Link].
Stern, J. 2011. Symmetry, invariance and ontology in physics and statistics. Symmetry 3:611–35.
Stern, J., and C. A. B. Pereira. 2014. Bayesian epistemic values: Focus on surprise, measure probability.
Logic Journal of The IGPL 22:236–54.
Wan, F. 2017. Simulating survival data with predefined censoring rates for proportional hazards models.
Statistics in Medicine 36:838–54.
Reparametrization facilitates comparison by ensuring that the models share common parameters, specifically the mean (μ) and variance (σ²), which are consistent across all candidate models. This allows for an unbiased comparison by making sure that differences in model fits are due to the model structures themselves rather than differences in parameter scales or units. For a dataset with censored observations, this uniformity in parameterization helps in evaluating and distinguishing among the survival models, like the lognormal, gamma, and Weibull distributions .
The FBST is a Bayesian form of significance testing that evaluates precise hypotheses about model parameters. It is used to measure evidence in favor of specific survival models, such as lognormal, gamma, and Weibull distributions. In the context of survival data modeled with linear mixtures, the FBST is applied to discriminate between different parametric distributions by simultaneously considering all candidate models rather than performing pairwise comparison, thus providing an encompassing test of model suitability .
Using FBST over traditional pairwise comparisons ensures a comprehensive evaluation across all models jointly rather than isolated comparisons. This approach considers entire model frameworks and tests hypotheses regarding model probabilities simultaneously, increasing discrimination power between models. It avoids issues of multiple comparison biases and aligns the context with Bayesian inference practices, providing a consistent, unified basis for concluding model suitability within a multivariate parameter space .
Evidences include e-values and posterior distributions from the FBST process, which assess the fit of lognormal, gamma, and Weibull models. The lognormal distribution was preferred due to higher support from data (higher e-values), suggesting a better fit to hemodialysis patient survival times compared to gamma and Weibull distributions. The FBST indicators pointed towards greater plausibility of the lognormal model within the LGW mixture, which effectively distinguished model performances for this medical data .
A Weibull distribution might be inappropriate if evidence from the data, such as the FBST approach, suggests a poor fit compared to other models. In a study comparing distributions using the LGW mixture model for hemodialysis patient data, the Weibull model provided lower e-values indicating less evidence for its fit compared to the lognormal model. The results suggest that the lognormal model was more appropriate for the data, highlighting cases where the Weibull distribution does not align well with observed patterns .
Reparametrizing the mixture model components in terms of the mean and variance helps maintain consistency across models being compared, providing a common ground for evaluation. This approach ensures that the datasets share parameters, thereby enabling an equitable comparison of model fits to censored survival data. It simplifies the estimation and comparison process by allowing the sharing of common parameters in the presence of right-censored data .
The choice of censoring mechanism directly affects how survival probabilities and hazard rates are estimated, which in turn influences the interpretation of model parameters and the conclusions drawn from the analysis. Right censoring, for example, assumes that survival times extend beyond observed periods, requiring techniques like Kaplan-Meier or parametric models tailored to handle such partial information. It seeks to improve estimates under the assumption that non-informative censoring is maintained, influencing the reliability of the inferred survival curves and hazard functions .
For right-censored survival data, the likelihood function considers both uncensored (exact event times) and censored observations. It is constructed by multiplying the conditional likelihoods of observing each event time or censorship, given the model parameters. For each population, the likelihood combines the probability density and survival function values for each data point, reflecting noninformative censoring assumptions. The likelihood is expressed as L(y, θ) ∝ Σ(pk fk(yj|μ, σ))^δj(Sk(yj|μ, σ)^(1-δj), where δj indicates if an event was observed or censored .
The survival function, denoted as S(t), is defined as the probability that an individual survives beyond time t: S(t) = P(T > t) = 1 − F(t), where F(t) is the distribution function of T. It is a nonincreasing continuous function with boundary conditions S(0) = 1 and S(∞) = 0. The hazard function, denoted by h(t), is the probability of failure during a very small interval, conditioned on the individual having survived to the start of the interval. It is mathematically expressed as h(t) = f(t) / S(t), where f(t) is the probability density function. The hazard function provides the instantaneous failure rate at time t .
Right censoring is crucial in survival analysis because it accounts for cases where the event of interest, such as death or failure, has not been observed within the study period. This mechanism is particularly common in medical research where patients may not experience the studied event before the study concludes or they are lost to follow-up. Understanding and properly handling right censoring is essential to ensure the reliability and accuracy of survival estimates .