Health Status Factors for Ambo Students
Health Status Factors for Ambo Students
DEPARTMENTS OF STATISTICS
ON
ASSESSING THE FACTORS THAT AFFECT HEALTH STATUS OF STUDENTS IN AMBO
UNIVERSITY: A CASE OF 3rd YEAR NATURAL AND COMPUTATIONAL SCIENCES IN 2015
Prepared by
Tafese Ashine
Approval sheet
This was to certify that the research entitled”Factors that Affect Health Status of students” in
Ambo University: in case of College of Natural and computational Science, submitted in partial
fulfillment of bachelor degree in Statistics of under graduate program at department of Statistics.
The assistance and the help received during the course of this investigation have been properly
and suitably acknowledged. Therefore, we recommend that it would be accepted as fulfilling the
[Link]. research project requirement.
Approved by:
Acknowledgement
First, and foremost, i thank God for giving the opportunity to pursue Our graduate study at the
department of Statistics in Ambo University. I would like to say thanks to Almighty GOD, for
his generous and uncountable help in I life span and by his support in the ups and downs I have
had been passed through, gratitude and unfailing love I fall short of words to express to the
satisfaction of we study.
I have many thanks and uncompareable love in our heart for our advisor Mr Jemal Ayalew for
his invaluable comments, useful suggestions and materials like note that contributed to the
successful realization of our study and all our instructors.
Next I would also like to thank all our friends, families for their endless help and Pastors for
believing.
Tables
Figures
ABBR
EVIATIONS AND ACRONOMY
AU……………...Ambo University
HC ……………….Health Center
OR……………….Odds Ratio
PO……………….Proportional Odds
Abstract
This study was intended to investigate the factors that affect the health status of the students in
Ambo University in case of 3rd year College of Natural and computational Science 2015. The
self-health assessment was applied as an indicator of health status in this study. The sample for
this study was drawn from the students of 3 rd year College of Natural and computational Science
Ambo University. The information was collected by questionnaire from the students. The
dependent variables were health status of students. The independent variables included sex,
income, lack of hygiene of café, usage of medicine, smoking cigarette, alcohol consumption,
physical exercise, Environmental factor and bad smelling around the cafe. The statistical method
of analysis applied for testing the association and the factors affecting health status of the
students was chi-square test statistics and Ordinal Logistic regression. The results revealed that
sex, income, lack of hygiene of café, smoking cigarette, bad smelling around the café, Alcohol
consumption and Environmental factor are factors that affect health status of the students. In
general most of the variables included in the regression model had significant effects like sex,
low income (<200birr),lack of hygiene of café, alcohol consumption, smoking cigarette, bad
smelling around the café and Environmental factor are found to be more significantly
Key Words: Factors that Affect Health Status of Students, Stratified Random Sampling, Ordinal
Logistic Regression, SPSS and Ambo University
CHAPTER ONE
[Link]
WHO, an agency of the United Nations work to promote better health through the world. Health
services include all services dealing with the diagnosis and treatment of disease, or the
promotion, maintenance and restoration of health (Grad, Frank P. 2002). Mental illness is
described as 'the spectrum of cognitive, emotional, and behavioral conditions that interfere with
social and emotional well-being and the lives and productivity of people. Having a mental illness
can seriously impair, temporarily or permanently, the mental functioning of a person. Other
terms include: 'mental health problem', 'illnesses, ‘disorder', 'and dysfunction’ (Hungerford et al.
2012).
Health standard of Ethiopia is too few when compared to other because of several reasons like in
sufficient economy and improper use of materials in hand. Ethiopia faces the difficult problems
that improved health status in both vital for growth and development and expensive as well, this
and many other problems placed very sever burden on development in general and health
resources in particular in more ways than [Link] absence of regulatory systems to monitor the
quality, safety, and efficacy of Medicines can compromise the overall effectiveness of health
care services and endanger the public health. A strong regulatory system is considered an
essential component of a health system. In Ethiopia, the Food, Medicine and Health Care
Administration and Authority (FMHACA), formerly known as the Drug Administration and
Control Authority, regulates the country’s pharmaceutical sector in an environment vulnerable to
drug smuggling and the circulation of substandard medicines (Yehulu Denekew, 2014).
Becoming unhealthy was a crucial issue in Ethiopia. A research has been conducted in Dilla
University using logistic regression model (MeleseAbebe, 2013/2014) on this topic. But the
outcomes of health status are ordinal on their nature. Therefore, assessing the determinant factors
of health statutes using ordinal model is mandatory. As a result, we are eager to use ordinal
logistic regression model to see the problem. In line with this, the following research questions
were addressed in this study.
These are:-
Does ordinal logistic regression model suitable for health status data?
What are the determinant factors of health status at University level in particular Ambo
University?
Are the students satisfied in the service provided by the university clinic?
This study will be expected to show clearly the possible of obtaining better utilization of the
clinic by identifying the key factors that affected the health status of students. Besides, the
finding of this study wii be expected to solve the factors that affected the health of student in this
university.
This study purposes to target all students of Ambo University who study in College of Natural
and computational Science in 2015. There are a number of problems and difficulties that should
be suggested during the accomplishment of this paper.
• The students were not volunteer to give full information about health status.
2. LITERATUR REVIEW
Most health determinants like lifestyle background, economic and social conditions are cause for
the illness of Ambo people. Disease like typhoid and food hygiene are usual disease that affect
Ambo peoples as well as Ambo University students, because of factors like whether change,
food hygiene ,sanitation problem dusts are usually cause common called most students(Teka. G.
E, 2004; Baraki N, 2004). In this study the model we use was Ordinal logistic regression. Ordinal
logistic regression in which the means a type of logistic regression analysis that when the
response variable was categorize more than two with having natural order or rank. we can rank
the values, but the real distance between categories was unknown. Diseases are grade on scales
from unhealthy to healthy.
CHAPTER THREE
3.2 Data
The Method of Data Collection in this study was primary source using questionnaire as
instruments of data collection. The questionnaire was administered and distributed to the
students under the direct supervision of the researcher.
In this study, stratified random sampling was applied to determine the sample size of the students
from the whole population. The total number of samples ’n’ was calculated as follows:-
College of Natural
and computational
sciences
sport
BIOLOGY Chemistry Physics Statistics
sciences Mathematics
Year III Year III Year III Year III year III Year III
Now, the sample size was determined by using the formula for sample size determination for the
population. This is.
(Z α )2 p(1− p)
2
n 0=
d2
Z α - Critical value at 95% confidence level of certainty (1.96)
2
For stratum h (each department), number of sample was calculated by using proportional
allocation; based on this the following results were obtained.
Nh
nh = N n , h=1 , 2, 3 , 4 ,5 , 6 , where nh is sample size of the hth stratum (department)
N1 50 N2 43
n1 = n= 139=¿26 n2 = n= 139=¿22
N 267 N 267
N3 32 N4 44
n3 = n= 139 = 17 n4 = n= 139 = 23
N 267 N 267
N5 41 N6 57
n5 = n= 139 = 21 n6 = n= 139 = 30
N 267 N 267
Table 3.1 Nature of the dependent variable with the corresponding code
Categories How often did you visit Ambo university clinic Code
due to sickness for the last 4 months?
Unhealthy >=5 times 0
Moderate 1-4 times 1
Healthy None 2
The independent variables that used in this study were sex, income, and hygiene of café, usage of
medicine, smoking cigarette, alcohol consumption, physical exercise, environmental factor &bad
smelling around the cafe.
Table 3.2 List of independent variables with respective name and category codes
3.5.1Descriptive Statistics
Statistical analysis that would use in this study was descriptive statistics such as a table, bar-charts to
describe the frequency distribution and percentage.
To test the null hypothesis we can compare X 2cal with X 2tab is given by
∑ ∑ ( Oij−Eij )
2
X
2
cal = (i=1, 2.....n, j=1, 2,3,…m)Where, Eij is expected frequency
Eij
corresponding to (ij)th ∧Oijbe the observed frequency. From this calculated and tabulated values
with degree freedom (n-1) (m-1) we decide about the rejection of Ho that say there was no
significant association between two variables.
We reject Ho if X 2cal greater than X 2tab or if p-value less than level of significance, otherwise we
do not reject Ho.
Steps of chi-squares:-
Assumptions of chi-squares
[Link] population must be normally distributed for the variable under study.
[Link] cell (each data entry) should contain at least five observations.
In statistics, the ordered logit model (also ordered logistic regression or proportional odds
model), is a regression model for ordinaldependent variable. It is natural to consider methods for
more categorical responses having more than two possible values. The most well-known of these
ordinal logistic regression methods is called the proportional odds model. The basic idea
underlying the proportional odds model is re-expressing the categorical variable in terms of a
number of tertiary variables based on internal cut-points in the ordinal scale.
We can consider the 3 tertiary logistic models corresponding to regressing each of the
logit [ p ( y ≥ 3 ) ] =α j + β 1 X 1 + β 2 X 2 +. … … ..+ β k X k
Separately againstthe X ' s .many variables of interest are ordinal. That is, we can rank the values,
but the real distance between categories is unknown. Diseases are grade on scales from
unhealthy to healthy.
Assumptions of ordinal regression
• There is no multicollinearity.
• Each independent variable has an identical effect at each cumulative split of the
ordinal dependent variable.
• The effects of any explanatory variables are consistent or proportional across the
different thresholds.
This was the part we really want to find out. We have been thinking of attitudinal responses to
abortion as a set of three unordered responses, but there is a very clear and intentional ordering to
these responses like as unhealthy, moderate, and healthy. If we know that a category is ordinal
then there are special models that tell us how independent variables relate to someone being
higher or lower on the scale.
We could use the ordered logit model so that we could use the categorized directly as our
dependent variable. Different links lead to proportional odds models or ordered probit models.
The model cannot be consistently estimated using ordinary least squares; it is usually estimated
using maximum likelihood.
The /proportional odds model and the /partial proportional odds modelare special cases of the
cumulative logit model. This is a cumulative logit model that assumes that the odds of response
below a given response level are constant regardless of which level we picked. This model
allows separated intercepts for the cumulative logit, but restricted the parameter sets for the
predictors to be the same across all logits. A proportional odds model that constrains some
predictors to have common parameters and leaves other predictors free to have separate
parameters is called a partial proportional odds model.
θ j −[ β 1 X 1+ β 2 X 2 +… .+ β k X k ]
Link ( γ j )=
exp ( τ 1 Z 1+ τ 2 Z 2+ …+ τ m Z m)
Where,γ j is the cumulative probability for the j t h category,θ j is the threshold for the j t hcategory,
β1, … , βk are the regression coefficients, X1, …, Xk are the predictor variables, and k is the
number of predictors. The numerator on the right side determined the location of the model.
Thedenominator of the equation specifies the scale. The τ 1,…,τm are coefficients for the scale
component and Z1,…,Zm are mpredictor variables for the scale component (chosen from the same
set of variables as the X’s).
The Model fitting Information gives the -2 log-likelihood (-2LL) values for the baseline and the
final model and a chi-square to test the difference between the -2LL for the two models.
The goodness of fit statistics indicated that the model fits much better than the location only
model. From the observed and expected frequencies, we could compute the usual Pearson and
Deviance goodness-of-fit measures.
Oij−Eij 2
The Pearson goodness of fit statistic is
2
= ( Eij ) .
Oij
The deviance measure is D = 2Oijln( ).
Eij
Both of the goodness-of-fit statistics should be used only for models that have reasonably large
expected values in each cell. If we have a continuous independent variable or many categorical
predictors or some predictors with many values, we may have many cells with small expected
values. There are several R2like statistics that can be used to measure the strength of the
association between the dependent variable and the predictor variables. They are not as useful as
the R2 statistic in regression, since their interpretation is not straight forward.
2
1)Cox and Snell R2 = 1 – ( L ¿ ¿)
n
Cox∧Snell R 2
2) Nagelkerke’s R = 2
2 and
1−L(β (0))
n
L( β)
3) McFadden’s R2 = 1 – ( ). Where, L (β) is the log-likelihood
L(β(0) )
function for the model with the estimated parametersand L ( β(0) ) is the log-likelihood with just
the thresholds, and nis the number of cases(sum of all weights).
The maximum likelihood estimate (MLE) of θ is that value of θ that maximizedlik (θ): it is the
value that made the observed data the “most probable”. If the X i are iid, then the likelihood
n
simplified to: lik (θ) =∏ f (xi /θ).
i=1
Rather than maximizing this product which could be quite tedious, we often used the fact that the
logarithm is an increasing function so it would be equivalent to maximized the log likelihood:-
i=1
The likelihood-ratio statistic for this test is the deviance of the model and Pearson of the model
Oij−Eij 2
The Pearson statistics for testing goodness of fit is2= ( ) .
Eij
Oij
The corresponding likelihood-ratio (deviance) statistics is = 2Oijln( ). Then, 2and G2have
Eij
approximated chi-squared null distributions.G2 follows2distribution with k degree of freedom.
Where, k is the number predicated variable.
If G2 is larger that is L1>L0, then reject H0 and conclude that the model is good fit or variables are
significantly affected the respondent or dependent variable.
Pseudo-R2 statistics would be use in ordinal regression to estimate the variance explained by the
independent variable.
This would be made through the following:-
Cox∧Snell R 2
Nagelkerke’s R = 2
2
1−L(β (0))
n
Testing of parallel lines
For a POM to be valid, the assumption that all the logit surfaces are parallel must be tested. The
test of parallelism makes use of -2 log-likelihoods for the constrained model (the model that
assumes the planes or surfaces are parallel) and for the general model (the model that assumes
planes or surfaces are separated). The chi-square statistic is the difference between of -2 log-
likelihoods of the two models. There is some debate about the adequacy of the proportional odds
test as it may be too sensitive to sample size (Scott et al., 1997). A non-significant test statistic is
usually taken as sufficient evidence that the POM is valid whereas a significant test statistic
might be misleading. If the test is significant, some have proposed additional, often graphical,
methods for determining whether the logit surfaces are actually parallel. Other solutions include
fitting a partial proportional odds model, or dichotomizing the ordinal outcome.
CHAPTER 4
A total of 139 questionnaires were administered and data were collected from 3 rd year natural and
computational sciences college students at Ambo University. The administered questionnaires
were successfully responded with 100% response rate. The frequency and the percentage
distribution of the characteristics of respondents were summarized in Table 4.1.
Table 4.1 Frequency (Percentage distribution) of Health status of students due to Unhealthy,
moderate & Healthy with their frequencies in each categories are described as table instead:
Variables category Health Status of Students %(n)
Unhealthy Moderate Healthy Total%(n)
Sex Male 35 (14) 62.5(35) 79.1 (34) 59.71(83)
Female 65(26) 37.5 (21) 20.9 (9) 40.29 (56)
Income <200 72.5(29) 46.4 (26) 41.9 (18) 52.52(73)
200-600 17.5 (7) 41.1 (23) 34.9 (15) 32.37 (45
)
>600 10 (4) 12.5 (7) 23.2 (10) 15.11 (21)
Hygiene of café Good 42.5 (17) 25 (14) 20.9 (9) 28.78 (40)
Fair 55 (22) 53.6 (30) 37.2 (16) 48.92 (68)
Bad 2.5 (1) 21.4 (12) 41.9 (18) 22.30 (31)
Medicine No 40 (16) 41.1(23) 46.5(20) 42.45(59)
Yes 60(24) 58.9 (33) 53.5(23) 57.55 (80)
Smoking No 100 (40) 98.2(55) 81.4 (35) 93.52(130)
Yes 0(0) 1.8 (1) 18.6 (8) 6.48 (9)
Alcohol No 52.5 (21) 82.1(46) 69.8(30) 69.78(97)
Yes 47.5(19) 17.9 (10) 30.2 (13) 30.22 (42)
Exercise Never 35 (14) 25 (14) 32.6 (14) 30.22 (42)
1-3 days 45 (18) 53.6 (30) 46.5 (20) 48.92(68)
>3 days 20 (8) 21.4 (12) 20.9 (9) 20.86(29)
Environmental Have no 0(0) 33.9 (19) 25.6 (11) 21.58 (30)
influence
Not such 85 (34) 33.9 (19) 27.9 (12) 46.76 (65)
much
Strict 15 (6) 32.1 (18) 46.5 (20) 31.66(44)
factor
Smelling No 57.5 (23) 39.3(22) 20.9 (9) 38.85(54)
Yes 42.5(17) 60.7 (34) 79.1 (34) 61.15 (85)
As per Table 4.1, out of 139 respondents 59.7% were males and 40.3% were females. This
percentage decreases slightly for unhealthy status with 35 percent males and increases for
healthy with 79 percent male students. Whereas the percentage increases slightly for unhealthy
status with 65 percent females and decreases for healthy with 20.9 percent female students.
In terms of monthly income of the total respondents 51.58(30) of the students has less than 200
birr, 32.37 (45) from 200 – 600 birr and 15.11 (21) more than 600 birr. Among those income less
than 200 birr, most of the respondents 72.5(29) were belongs to unhealthy conditions. On the
other hand majority of the respondents 41.1 (23) from income category 200 – 600 were belongs
to moderate conditions and 20.9(9) from those income greater than 600 were under healthy
conditions.
In terms of Environmental factor of the total respondents 52.52(73) of the students have no
influence, 46.76 (65) from have not such much and 31.66 (44) from strictly factor. Among those
have no influence, most of the respondents 52.52(73) were belongs to unhealthy conditions. On
the other hand majority of the respondents 46.76 (65) from have not such much, were belongs to
moderate conditions and 31.66 (44) from those from strictly factor were under healthy
conditions.
Status of health problem of students because of bad smelling around café (61.15%) is exceeding
the students in which their health status is not determined by those problems (38.85%).
We can also illustrate the sex category by using bar-chartas below as a counter example.
Bar chart
80.0% Healthy status of
students
unhealthy
moderate
healthy
60.0%
Percent
40.0% 79.07%
65.0%
62.5%
20.0% 37.5%
35.0%
20.93%
0.0%
female male
Sex of students
From Figure 4.1we can understood that the number of females that are more exposed to health
problems are larger than males of the same status.
Bivariate analysis is one of the simplest forms of the quantitative (statistical) analysis. It involves
the analysis of two variables (often denoted as X, Y), for the purpose of determining the
empirical relation/association between them. Since the dependent variable in this study is ordinal
the type of relation is association. To determine the association in the bivariate analysis the
appropriate test statistic is Chi-Square Test. Chi-square is simply an extension of a cross-
tabulation that gives us more information about the association.
The bivariate analysis between the dependent variable health status and the expected independent
variables using chi-square were presented in table 4.2.
Table 4.2 shows that among 9 independent variables included in the bivariate analysis 7 of them
have statistically significant associationwith health status of the studentsnamely sex of students,
bad smelling around the cafe, income, Lack of hygiene of cafe, smoking cigarette, Alcohol
consumption and Environmental factor. However,Physical exercise and Usage of medicine have
no significantassociation with Health status of the student.
The Pearson chi-square with p-value <0.05means there is significant association between health
status and the aforementioned explanatory variables those have a p-value of less than 0.05.
Among the significant variables with the modest p- values less than 0.2 (Agresti, 2002 ) were
included for the final model (multiple ordinal logistic regression model).
Logitlink function is used in the analysis because it is evenly distributed categories and is
reasonable choices when the changes in the cumulative probabilities are gradual andlogit
involves all levels of the response and dichotomizes the response scale.
The Model Fitting Information table were displayed in table 4.3, which gives the -2 log
likelihood for the intercept only and final models, can be used in comparisons of nested models.
The statistically significantchi-square statistic (p<0.05) indicates that the Final model gives a
significant improvement over the baseline intercept-only model. This tells us that the model
gives better predictions than if we just guessed based on the marginal probabilities for the
outcome categories. Therefore, the Full model (with factors that affect health status as a
predictor) is significantly better than the ‘health status’ model.
-2 Log Chi-
Model Likelihood Square df Sig.
Intercept
295.439
Only
Goodness-of-Fit
The goodness fit test were displayed in table 4.4. The result of Table 4.4 suggest that the model
does fit very well (p>0.05)(i.e. fail to reject the null hypothesis depending on the observed
data). Means that the model fits the data adequately.
Chi-Square df Sig.
Pseudo R-Square
For logistic and ordinal regression models,it not possible to compute the same R 2 statistic as in
linear regression so, three approximations are computed instead.
Nagelkerke .553
McFadden .310
What constitutes a “good” R2 value depends upon the nature of the outcome and the explanatory
variablesHere, the pseudo R2 values (e.g. Nagelkerke = 55.3%) indicates that there is relatively
small proportion of the variation in health status between students . This is just as we would
expect because there are numerous factors that affect health status of student. The results support
the conclusion that the final model fit the data well.
One of the assumptions underlying ordinal logistic regression is that the relationship between
each pair of outcome groups is the same. This is commonly referred to as the test of parallel lines
because the null hypothesis states that the slope coefficients in the model are the same across
response categories (and lines of the same slope are parallel). If we fail to reject the null
hypothesis, we conclude that the assumption holds. As a result the parallel line assumption test
was displayed in table 4.6. For our model, the proportional odds assumption appears to have held
because our significance of our Chi-Square statistic is .364 > .05.
-2Log Chi-
Model Likelihood Square df Sig.
Null
201.745
Hypothesis
Table 4.6 shows parallel line test for general model with chi square value 15.214 and p-
value=0.364 which is greater than the 5% level of significance,fail to reject the null hypothesis.
Therefore, there is no enough evidence to saylocation parameters (slope coefficients) are not the
same across response categories. Means that the parallel line assumption is held.
The model fitting information, goodness fit and parallel line tests were satisfied for the given
data. Then the multiple ordinal logistic regression coefficients can be estimated using the
maximum likelihood estimation method implemented in the SPSS package. The results are
displayed in Table 4.7 consistsof the coefficients, their standard errors, the Wald test and
associated p-values (Sig.), the 95% confidence interval of the coefficients and odds ratios. The
thresholds are shown at the top of the parameter estimates output, and they indicate where the
latent variable is cut to make the three groups that we observe in our data. Based on Table 4.7 the
appropriate interpretation were made as well.
Out of the considered explanatory variables, Environmental factor, sex, low income (200) , bad
smelling around the café, Lack of hygiene of café, smoking cigarette and Alcohol consumption
have a significant relationship with that of health status of students.
The threshold coefficients are representing the intercepts, specifically the point (in terms of a
logit) where health status might be predicted into the three categories.
In this coefficient the negative sign indicates that those variables have negative effects on health
status of students are sex, low income, lack of hygiene of café, smoking cigarette and bad
smelling around café. Conclusion, the findings indicate that Health status of a student is
significance associated with sex, low income, alcohol consumption, lack of hygiene of cafe, bad
smelling around the café, smoking cigarette and Environmental [Link] also the independent
variables that have no significance association with Health status of a student are usage of
medicine and physical exercise.
Female: The odds for females being in healthy versus moderate and unhealthyis 1.560 less than
males when the other variables in the model are held constant. Likewise, the odds for females
being inthe combinedhealthy andmoderate versusunhealthyis 1.560 less than males when the
other variables in the model are held constant.
Alcohol consumption- The odds for not used alcohol consumption being in healthy versus
moderate and unhealthy is 0.967 less than have used alcohol consumption when the other
variables in the model are held constant. Likewise, the odds for not used alcohol consumption
being in the combined healthy and moderate versus unhealthy is 0.967 less than have used
alcohol consumption when the other variables in the model are held constant.
Bad smelling around café- The odds for not Bad smelling around café being in healthy versus
moderate and unhealthy is 1.166 less than have Bad smelling around café when the other
variables in the model are held constant. Likewise, the odds for not Bad smelling around café
being in the combined healthy and moderate versus unhealthy is 1.166 less than have Bad
smelling around café when the other variables in the model are held constant.
Lack of hygiene of café - The odds for good hygiene of café being in healthy versus moderate
and unhealthy is 1.728 less than bad hygiene of café when the other variables in the model are
held constant. Likewise, the odds for good hygiene of café being in the combined healthy and
moderate versus unhealthy is 1.728 less than bad hygiene of café when the other variables in the
model are held constant. The odds for fair hygiene of café being in healthy versus moderate and
unhealthy is 1.577 less than bad hygiene of café when the other variables in the model are held
constant. Likewise, the odds for fair hygiene of café being in the combined healthy and moderate
versus unhealthy is 1.577 less than bad hygiene of café when the other variables in the model are
held constant.
By taking the exponent of the pooled estimate relative to a given predictor, i.e. taking e βj , we
obtain an estimate of the common odds ratio that describes the relative odds for X j differing by
one unit. Values greater than one indicate that the variable in question increases the odds of
being Unhealthy student and values between 0 and 1 indicate a decrease in the odds of being
Unhealthy student.
Thus, the above table demonstrates the sex, low income (<200), lack of hygiene of café, smoking
cigarette and alcohol consumption and smoking cigarette increases the odds of an individual
student being Unhealthy is just the complement of the odds of being level healthy. exp ( ^β), is a
factor by which the odds of being Unhealthy of individual change when i th independent variable
increases by one unit (i.e. from 0 to 2). For instance from Table 4.6 the value of the odd ratio
exp ( β^ ) =¿ 0.178, for good hygiene of café andexp ( β^ ) =¿ 0.2066, for fair hygiene of café
indicates that students having good hygiene of café have 0.178 times chances to be Unhealthy
than those having fair hygiene of café and exp ( β^ )= 0.2101, for sex indicates that female students
have 0.2101 times chances to be Unhealthy than male student.
CHAPTER FIVE
The current study also shows that the factors that affect health status of students were sex, low
income(<200), lack of hygiene of café, bad smelling around café, alcohol consumption, smoking
cigarette and Environmental factor are found to be more significantly associated with health
status of students.
In this analysis we have looked at regression models that can be applied when our outcome is
represented by an ordinal [Link] we have seen how to evaluate the ordinal
Proportional odds model by completing a series of tertiary logistic regressions at each of the
cumulative splits in the data, and how this can allow us to directly evaluate the consistency in
Odds ratios across an ordinal outcome. Where the Proportional odds assumption is justified
ordinal regression models can be a powerful means of summarizing relationships that utilizes all
the information present in the ordinal outcome.
The Health status of a student is strongly associated with sex of students, low income(<200),
smoking cigarette, lack of hygiene of café, Environmental factor, alcohol consumption and bad
smelling around café. Perhaps due to the biological factors as well as the prevailing low socio
economic status females for maintaining Health status is mandatory.
The independent variables that have no significance association with Health status of a student
are lack of usage of medicine and physical exercise.
Recommendations
Based on the results of this study the following recommendations were made. These are
The university should give emphasis on factors that affect health status of students.
Female students have some differences on health status when compared to male students.
In order to address this deference the university should give special treatment for female.
The concerned bodies should work on minimizing lack of hygiene of café and bad
smelling around café in the university.
The Dean of Ambo University should give attention to problems and report for the
concerned body to solve those problems.
REFERENCES
[1]. Agresti, A (1996). An Introduction to categorical data Analysis. John Wiley and Sons,
Inc., New York.
[2]. Agresti, A. (2002) Categorical Data Analysis, Second Edition. Hoboken, New Jersey: John
Wiley & Sons, Inc.
[3]. Agresti, A. (2010), Analysis of Ordinal Categorical Data, 2nd Edition,and New York: John
Wiley & Sons.
[4]. Alcohol Alert (National Institute on Alcohol Abuse and Alcoholism)(30 PH 359). October
1995. Retrieved 1 Nov 2013.
[5]. Applied Logistic Regression (Second Edition). David W. Hosmer and Stanley Lemeshow.
[6]. CAMH Healthy Aging Project. (2008). Improving Our Response to Older Adults with
Substance Use, Mental Health and Gambling Problems: A Guide for Supervisors, Managers, and
[8]. Greene, William H., Econometric Analysis (fifth edition), Prentice Hall, 2003, 736-740.
[11]. Long, J. S. and Freese, J. (2006) Regression Models for Categorical and Limited Dependent
Variables Using Stata, Second Edition. College Station, Texas: Stata Press.
[12]. Liao, T. F. (1994) Interpreting Probability Models: Logit, Probit, and Other Generalized
Linear Models. Thousand Oaks, CA: Sage Publications, Inc.
[13]. Ordinal Regression using SPSS for social scientists (second edition).
[14]. Powers, D. and Xie, Yu. Statistical Methods for Categorical Data Analysis. Bingley, UK:
Emerald Group Publishing Limited.
Appendix Ι
AMBO UNIVERSITY
DEPARTMENT OF STATISTICS
Questionnaire
Dear Respondents: The objective of this study is design to gather information about factors that
affect health Status of students in Ambo University 3 rd Year Natural and Computational Science.
The confidentiality of your response will be kept and only use for research purpose.
Instruction: Please respond to question by marking “X” in the box. You are kindly request to fill
the questionnaire clearly and neatly, and you are not request to write your name.
6. How do you see the Cleannessof your cafe? A) Good B) Fair C) Bad
7. What do you suggest about the safety of service given from the clinic?
A) Yes B) No
11. Do you think bad smelling around the café influences on your health? A) Yes B) No
12. How much do you think that environmental factors affect your health status?
DECLARATION
This is to declare that the project report entitled “Assessing the factors that affect health status
of students ” submitted by us in partial fulfillment of requirements for the award of the degree
of Bachelor of Sciences, in Statistics, Ambo university, is an honest record of the work carried
out by us under the Advisor and guidance of Jemal Ayalew.
Technological advances, particularly in Consumer Health Informatics (CHI), can significantly improve healthcare by making information more accessible to patients, integrating consumer preferences into health information systems, and enabling remote monitoring and communication. This consumer-oriented approach enhances patient engagement and potentially improves healthcare outcomes .
Pseudo R-square statistics are significant in logistic and ordinal regression models because they provide an approximate measure of variance explained by predictors when traditional R-square cannot be employed. These statistics help gauge the strength and fit of the model in predicting outcomes without linear regression limitations, though their interpretation requires caution .
To evaluate the model's adequacy, goodness-of-fit tests and pseudo R-square statistics were employed. The results indicated good model fit, with significant chi-square showing improvement over the intercept-only model, supported by adequate pseudo R-square values like Nagelkerke's .
The study applied ordinal logistic regression and chi-square statistical tests to address its objectives. These methodologies were crucial in identifying and quantifying the impact of various factors on student health status, showing significant associations and providing a comprehensive understanding of health determinants at Ambo University .
The primary limitations faced during the study included a lack of willingness from students to provide information regarding their health status and insufficient detailed data available from the Ambo University clinic .
The chi-square test statistic was used to analyze the association between health status and independent variables, revealing significant associations for variables like sex, income, and hygiene of the café, among others, with p-values less than 0.05 indicating statistical significance .
The study determined that factors such as sex, income, lack of hygiene of the café, smoking cigarette, bad smelling around the café, alcohol consumption, and environmental factors significantly affect the health status of students at Ambo University .
The ordinal logistic regression model is applied to appropriately assess the health status data, which is ordinal by nature. It allows for the identification and evaluation of factors that have a statistically significant impact on health outcomes amongst the student population by handling ordinal dependent variables and testing relations with independent categorical or continuous predictors .
The proportional odds model is utilized because it enables separating intercepts for cumulative logits while restricting predictor parameters to remain constant across all logits. This model is advantageous in circumstances where the independent variables influence the ordinal response variable in the same way across all thresholds. It provides estimates for common odds ratios, thus simplifying interpretation in ordinal logistics .
Socio-economic factors, such as poverty and low income levels, have been linked to health inequalities. These factors can lead to delayed medical care and limited access to health resources, exacerbating health disparities among marginalized groups, such as African and Native Americans and Hispanics, as mentioned in studies within the document .