Fetal health classification by machine learning algorithms
Saroj Rijal
Contents
1.0 Introduction.........................................................................................................1
Section 1: Prototype Identification and Planning..........................................................2
Section 1.1 Literature Review on Prototype Identification.........................................2
Section 1.2 Reflection on the Prototype Identification..................................................4
Section 2: Development Section....................................................................................5
2.1 Developed code and planning documents for prototype........................................5
Section 3: Evaluation....................................................................................................13
Section 3.1 Report on the Evaluation........................................................................13
Conclusion:...................................................................................................................23
References....................................................................................................................25
Fetal health classification by machine learning algorithms
1.0 Introduction
The purpose of the project is to create a software-based prototype that can correctly
diagnose children's fetal health based on characteristics received from Cardiotocogram
tests, which are typically obtained using various medical equipment. This study's mission
statement, goal, and objectives are as follows:
Mission statement: The prototype's goal statement is to create an automatic fetal health
status detection system when given the input of medical qualities, which will aid medical
practitioners in their decision making and diagnoses.
Aims: Development of a prototype system for autonomous prenatal health status
identification using machine learning techniques.
Objectivities:
1) Gather an appropriate dataset for training the prototype's machine learning models for
fetal health status detection.
2) Selecting appropriate machine learning classifiers for the job so that enough model
performance may be attained for use in a real-world situation in the medical industry.
3) To assess model performance so that the optimal model/s for the goal may be obtained.
4) To offer ways for using the best-fitted model/s for prediction, as well as their potential
limits.
The primary objective for this research is to minimize child mortality in a scientific and
efficient manner. Death of infants during birth is reported to be a serious concern globally,
and hence decreasing child mortality is included in the United Nations' sustainable
development objectives. While the problem must be resolved totally by medical means,
recognizing the state of the kid as probable suspect, normal, or abnormal can greatly aid
doctors in making clinical judgments (Siddiqui, et al., 2020). Traditionally, the status of the
kid during birth is discovered by numerous medical processes that take a lot of time and
knowledge, and the mother's health condition always prevents those techniques from being
used. Furthermore, there is insufficient equipment in rural hospitals. Every medical test on
patients is constantly available. Economic factors also play a role, as families of all patients
cannot afford to have all essential medical tests performed. There are also sociocultural
constraints that limit doctors' ability to conduct all essential medical procedures on the
woman prior to childbirth (like some patients may be reluctant to perform all the
gynecological examinations due to several reasons). Even if the conventional method is
preferable, it is almost difficult to implement it in all hospitals throughout the world. As a
result, machine learning solutions are an efficient and effective technique to treat such
patients better while reducing the work of medical staffs. Machine learning models just
require a few parameters. linked feature of the patient in order to determine the likely fetal
health issue, which may be detected using certain basic tests such as Cardiotocograms,
which are extremely low cost and have less complexity (Ramaravind Kommiya Mothilal,
2020) . Cardiotocogram testing involves sending ultrasound pulses through the body and
recording responses such as fetal heart rate, movement in the fetal area, uterine
contractions, and other pregnancy-related measures. While ML models cannot be
guaranteed to be completely correct in their predictions, estimations may be given with high
confidence when models are sufficiently trained with a large enough number of examples
acquired from various patient segments.
Section 1: Prototype Identification and Planning
Section 1.1 Literature Review on Prototype Identification
The introduction of AI and digital technologies, machine learning, and data analysis
for fetal health categorization has resulted from global technological improvement. In
modern healthcare, artificial intelligence (AI) has aided clinicians in both patient care and
administrative tasks. The most prevalent sort of artificial intelligence being utilized as a
technique to enhance the key competences of healthcare technology is machine learning.
Machine Learning is commonly used in the development of predictive treatment processes
based on the framework for treatment and wellness. The categorization of fetal health is
done using machine learning. For a female, health issues are fairly prevalent throughout her
pregnancy. Machine learning will be used aid in the advancement of illness diagnosis,
treatment, and prognosis categorization and structure (Akbulut, et al., 2018). The report
classifies fetal illness in order to save the lives of both parents and children. One of the
primary priorities for healthcare promotion is the reduction of mother and child mortality
rates. The study analyzes fetal health concerns through the use of multiple prediction
models based on data from the dataset. The study provides critical insight regarding the
identification of potential health concerns connected to prenatal outcomes for both the
mother and the child. Planning and execution are carried out in order to develop
preventative measures based on a study of the important components. The mortality rate
has been reduced reflected directly on the numerous goals of the United Nation's
Sustainable Development Goals and recognized as a critical component for human growth.
The United Governments has envisioned a future in which nations and countries would
abolish the avoidable mortality of infants and newborns under the age of five by 2030. The
countries seek to reduce under-5 mortality to as low as possible, with no more than 25
fatalities occurring per 1000 new-borns. Children's mortality rates are similar to maternal
mortality rates. According to 2017 estimates, over 295000 fatalities occur worldwide each
year as a result of difficulties associated to childbirth. The reason for the high figure is
because a considerable number of deaths occur as a result of lack of presence of adequate
organizational infrastructure in low-resource environments. These elements can be
decreased and avoided. Cardiotocograms are a straightforward and cost-effective method of
identifying fetal health concerns, allowing physicians to take action to reduce maternal and
child death. The apparatus sends and directs ultrasonic pulses, and the pulse responses are
analyzed and recorded. Fetal health is classified by gaining a better understanding of the
fetal heart rate (FHR), foetal movements, urine bladder contraction, and other factors
(Signorini, et al., 2019). The dataset was examined for the construction of cutting-edge
models by analyzing 2126 records of characteristics collected from Cardiotocogram tests
that may be classed as Normal, Suspect, or Pathological. The models utilized for fetal health
categorization are Random Forest, Support Vector Machine (SVM), Gradient Boosting, KNN,
and Logistic Regression. The first model is Random Forest, which is a supervised approach
for machine learning built from the decision tree algorithm. This is a strategy for combining
several classifiers to solve a wide range of challenging issues (Subasia, et al., 2020). SVM is
another approach that is used to determine a hyperplane inside an N-dimensional space,
where N is the number of characteristics that aid in direct categorization of the data points
presented (NabillahRahmayanti, et al., 2021). Gradient boosting with decision tree aids in
the development of prediction models by improving the underlying algorithm with the use
of regularization techniques (Ricciardi, et al., 2020). Another approach for calculating
distances from all supplied locations in close vicinity to an unknown dataset is K-Nearest
Neighbour. This aids in the filtering of data with the shortest distances related to (Attallah,
et al., 2018). The last model discussed is logistic regression, which is another classification
procedure used to observe a collection of discrete classes (O'Sullivan, et al., 2021). It aids in
the transformation of its output by employing the logistic sigmoid function, which returns a
probabilistic value.
Section 1.2 Reflection on the Prototype Identification
Machine learning is used to classify fetal health by analyzing a given dataset and
developing multiple models to determine the outcome of the paper. The study presents a
thorough explanation of the history of AI in healthcare and fetal concerns for illness
identification and diagnosis. Machine learning is utilized to establish a knowledge of
predictive metrics based on numerous frameworks for the child's well-being. The report
classifies fetal health in order to prevent difficulties throughout the prenatal period. For
analysis planning and implementation, essential elements for fetal health are examined.
Cardiotocograms are simple exams that define problems as normal, suspicious, or abnormal.
It is beneficial in learning about fetal heart rate (FHR), fetal problems, and urine bladder
motions and contractions (Signorini, et al., 2019). The data are analyzed from a dataset by
assessing records of data from cardiotocogram tests and testing the data with various
models. Random Forest, Support Vector Machine (SVM), Gradient Boosting, KNN, and
Logistic Regression are the models utilized in the categorization of fetal health. These
models are used in conjunction with machine learning (Molla, et al., 2021). The nations are
attempting to reduce the under-5 mortality rate, which is closely related to the United
Nations' Sustainable Development Goals and one of the fundamental drivers for human
advancement. Significant fatalities in newborns and mothers are seen in modern times as a
result of the absence of a stable organizational framework and the scarcity of (Ricciardi, et
al., 2020). The study will give a rigorous analysis for categorizing the many health concerns
affecting new-borns and mothers. The use of AI and machine learning is critical for
determining the extent of the report on the suggested application prototype.
Section 2: Development Section
2.1 Developed code and planning documents for prototype
The created prototype is a machine learning solution for fetal health during childbirth based
on medical parameters. The whole classification system for fetal health prediction is now
being built in Python on the Jupyter notebook environment. Several machine learning
models are being constructed for the same objective, and their performance is evaluated
using a small portion of the collected data from the Kaggle repository (Fetal Health
Classification 2022). The target attribute fetal health contains three labels: normal,
suspicious, and pathology, which are encoded by the numbers 1, 2, and 3 correspondingly.
As a result, this is a classification problem, therefore the models to be constructed should be
classification models. This project now makes use of numerous libraries for dataset loading,
pre-processing, and display. and, perhaps most importantly, for model fitting and
assessment. The phases of the procedure are as follows: dataset loading, dataset
exploration and pre-processing, splitting feature and response variables into train-test sets,
fitting the selected machine learning classifiers, and evaluating the classifiers on the test set
to evaluate their performance. These procedures, as done in the Jupyter notebook, are now
detailed below.
Used general purpose libraries:
As shown in the preceding screenshot, NumPy is used for pre-processing and manipulating
arrays, while the Pandas library is primarily used to manipulate data obtained in csv format
from the Kaggle repository. Matplotlib's pyplot module is used for plotting visuals, while
seaborn is used for advanced visualizations.
Dataset loading:
As seen above, the dataset in csv format is imported using Pandas' read csv() function, and
as seen in the output, there are 2126 rows in the DataFrame with 22 columns. As a result,
the data contains a total of 2126 occurrences.
Data types of attributes and missing value exploration:
The preceding outputs indicate that all of the properties entered into the DataFrame are
numeric, including the class variable fetal health, which is shown as an integer type. Because
class labels are coded with unique numbers, the appropriate numeric to nominal conversion
must be performed. Furthermore, no missing values are found in any of the attributes,
implying that no attribute filtering or imputation is necessary.
Converting numeric encoded class to nominal accordingly to their names:
As indicated before in the output, the replace function of the Pandas series is used to
convert integers to nominals.
Visual of class distribution:
The Pandas bar chart above depicts the class distribution, and it is evident from the
chart that the dataset is unbalanced, since there are much more instances for the normal
class than the suspicious and problematic classes. The normal class has a frequency of over
1600, the suspect class has a frequency of approximately 300, and the pathological class has
a frequency of less than 200. As a result, machine learning models must be assessed not
only on overall accuracy but also on precisions for the least frequent classes, which are
dubious and pathological.
Exploring correlation among features:
Therefore, while using a machine learning model, it is a recommended practice to
notice the relationship between feature properties so that relevant models may be picked.
The above correlation heatmap explores the reliance among the characteristics by coloring
high positive correlation with high intensity red, high negative correlation with high intensity
blue, and almost uncorrelated features with color close to white. The majority of
characteristics are largely uncorrelated, since the Pearson correlation coefficient is close to
zero, however strong positive and high negative correlation is seen between specific
features (Bisong, 2019). As a result, all of the characteristics cannot be declared to be
independent of one another, therefore models that rely on feature independence to predict
class (such as the nave bayes classifier) are inappropriate for the dataset.
Train-test split in 70 : 30 ratio:
Random forest, logistic regression, gradient boosting, k-nearest neighbor, and
support vector machine are the machine learning models that have been chosen to fit. The
random forest was chosen since it is known to deliver high accuracy for any dataset because
it optimizes decision tree random fit while training and is less impacted by outliers in the
data. Logistics regression is a suitable choice since it does not presume any data properties
and is simple to train and deploy. The gradient boosting model improves trees using a
gradient function, therefore high performance is predicted. KNN performs better with
normalized data and can generate decent results when the value of K is suitably selected.
When there is a clear margin of separation per feature, the support vector machine is useful
for class separation and effective when feature dimensions are large.
Random forest classifier model fitting and evaluation after standardization and
RFECV feature selection:
After standardization of features, the first model random forest is fitted in a pipeline,
followed by recursive feature reduction with 4-fold cross validation using Extra trees
classifier estimator. Essentially, features are removed step by step based on the estimator's
relevance scores with 4-fold cross validation until the scores of all features exceed the mean
threshold of all features. The random forest with a maximum tree depth of 10 and the
Minimal Cost-Complexity Pruning parameter 10-3 is then used with the remaining features.
These parameters are determined after numerous runs to provide the best possible
performance on the test set.
Number of features selected by recursive feature elimination with
CV=4: 9
List of selected features:
['accelerations', 'uterine_contractions', 'prolongued_decelerations',
'abnormal_short_term_variability', 'mean_value_of_short_term_variability',
'percentage_of_time_with_abnormal_long_term_variability',
'histogram_mode', 'histogram_mean', 'histogram_median']
Ranking of features that are not selected
Fetal health classification by machine learning algorithms
It can be observed that throughout the pre-processing phases, the RFECV chose a
total of 9 characteristics for model fitting, which are displayed in the table above, as well as
the ranking of features that are removed (Mustaqim, et al., 2021). It can be observed that
feature severe decelerations has the lowest rank or is deleted first, whereas baseline value
has the greatest rank or is removed last from the feature collection.
The random forest is assessed on the test set after fitting in terms of numerous
metrics such as accuracy, precision, recall, f1 score, MCC, and total area under the ROC
curves for all classes, as shown above. These metrics are obtained using the confusion
matrix and the class probabilities (Wang, et al., 2020).
The assessment method for other models is now the same as before, but the fitting
process is somewhat different since the relevant classifier is put into the last stage of the
pipeline and previous steps are maintained precisely the same for an equitable comparison
of models. The parameters of the models are picked after multiple trial and error runs with
different parameters, and the set that delivered the best result in the test set is chosen, as
shown below.
Logistic regression classifier model fitting:
Gradient boosting classifier model fitting:
K nearest neighbour classifier model fitting:
Support vector machine classifier model fitting:
Section 3: Evaluation
Section 3.1 Report on the Evaluation
The assessment results of the five fitted models in the defined pipeline are now
shown and compared in this part to choose the best model for fetal health prediction. The
previously mentioned assessment metrics are supplied in the form of a classification report,
receiver operating characteristic curves for each of the three classes, and the overall
average area under those curves. Furthermore, all of the reported assessment findings are
reproducible since particular random states are employed in models that use randomness in
the fitting process.
Evaluation results for random forest classifier:
Confusion matrix:
Classification report in test set:
Matthews correlation coefficient:
0.8120 ROC curves of all classes:
According to the random forest evaluation findings, the model has an overall
accuracy of 94%, with correct identifications of 492, 38, and 67 cases of normal,
pathological, and questionable, respectively. Precision, recall, and f1 score are found to be
higher in the typical class than in the other classes. The MCC is 0.812, which is reasonably
acceptable, and the total average AUC across all classes is 0.9882, showing that real positive
cases outnumber false positive ones (Noshad, et al., 2019). However, the area under the
normal class is observed to be the greatest and the area under the pathological class is
observed to be the smallest, showing that the model is partly better at predicting the
normal class than the other classes. The overall average scores of the metrics are likewise
slightly lower than the weighted average scores, demonstrating that classes are assigned
varying weightages based on their individual occurrences.
Evaluation results for logistic regression classifier:
Classification report in test set:
Matthews correlation coefficient: 0.6702
ROC curves of all classes:
According to the logistic regression evaluation findings provided above, the model
has an overall accuracy of 88% with correct identifications of 476, 32, and 56 cases of
normal, pathological, and suspicious, respectively, which is regarded a reasonably excellent
performance. Precision, recall, and f1 score are much higher in the typical class than in the
other classes. The MCC is 0.6702, which is typical, and the total average area across all
classes is 0.9577, showing that real positive cases outnumber false positive ones (Chen, et
al., 2019). However, the area under the normal class is observed to be the greatest and the
area under the pathological class is observed to be the smallest, showing that the model is
partly better at predicting the normal class than the other classes. The overall average
scores of the metrics are likewise much lower than the weighted average scores,
demonstrating that classes are assigned various weightages based on their individual
occurrences.
Evaluation results for gradient boosting classifier:
Classification report in test set:
0.8580 Matthews correlation coefficient The gradient boosting classifier assessment
results reveal that the model has an overall accuracy of 95% with correct identifications of
490, 40, and 76 instances of normal, pathological, and suspicious, respectively, which is
regarded a very excellent performance. Precision, recall, and f1 score are found to be
somewhat higher in the typical class than in the other classes. The MCC is 0.858, which is
good (showing a very strong correlation between predictions and actual class labels in the
test set), and the total average area under all classes is 0.9879, suggesting that real positive
cases outnumber false positive ones (Bynagari & Ahmed, 2021) . However, the area under
the normal class is observed to be the greatest and the area under the pathological class is
observed to be the smallest, showing that the model is partly better at predicting the
normal class than the other classes. The overall average scores of the metrics are likewise
slightly lower than the weighted average scores, demonstrating that classes are assigned
varying weightages based on their individual occurrences.
Evaluation results for K nearest neighbour classifier:
Classification report in test set:
0.7823 is the Matthews correlation coefficient. According to the assessment findings
of the KNN classifier, the model has an overall accuracy of 92%, with correct identifications
of 486, 34, and 69 instances of normal, pathological, and suspicious, respectively, which is
regarded a very excellent performance. Precision, recall, and f1 score are much higher in the
typical class than in the other classes. The MCC is 0.7823, which is pretty acceptable, and the
total average area across all classes is 0.9544, showing that real positive cases outnumber
false positive ones (Gou, et al., 2019). However, the area under the typical class is shown to
be far greater than the regions under the other classifications. The overall average scores of
the metrics are likewise slightly lower than the weighted average scores, demonstrating that
classes are assigned varying weightages based on their individual occurrences.
Evaluation results for support vector machine classifier:
Confusion matrix:
Classification report in test set:
0.7896 is the Matthews correlation coefficient. According to the SVM classifier
assessment findings, the model has an overall accuracy of 92%, with correct identifications
of 482, 37, and 71 cases of normal, abnormal, and suspicious, respectively, which is
regarded good performance. Precision, recall, and f1 score are much higher in the typical
class than in the other classes. The MCC is 0.7896, which is pretty acceptable, and the total
average area across all classes is 0.9787, showing that real positive cases outnumber false
positive ones (Alam, et al., 2021). However, the area under the normal class is seen to be
the greatest, while the area under the pathological class is the smallest (also spanning
suspicious and pathological). pathological class is observed at high TP and FP rate). The
macro average scores of the metrics are observed to be slightly less from the weighted
average scores indicating classes are given different weightages according to their
respective instances.
Hence, comparing the performances it can be seen that taking all metrics into
consideration the performance with random forest on test set is better than others as it
produces highest accuracy and the difference in metrics scores between classes are
comparatively lower than with other models. Hence, it can be stated that among all the
models that are developed in the
prototype random forest is recommended for fetal health prediction in terms of
Cardiotocogram features of a just born child and the mother.
Conclusion:
As a result, it is possible to conclude that the prototype for fetal health prediction using
numeric characteristics derived from cardiotocogram tests is constructed utilizing multiple
machine learning classifier models. The performances of various models are also validated in a
tiny subset known as the test set derived from the obtained data, and it is discovered that
random forest outperforms others in all aspects. After initializing the prototype models with
default settings, those parameters are adjusted numerous times by adjusting the model
parameters until the metrics scores do not improve significantly, and the prototype is thus
developed from its previous iteration. The models are chosen based on the theory of machine
learning models used by earlier researchers to develop classification models for various
applications, and the results from the models are generally adequate. The parameters to be
adjusted are also chosen based on what is known to have a substantial influence on
performance. All proposed models with pre-processing and feature selection operate in
computationally viable time and thus may all be used for fetal health detection in actual
systems; however, random forest is recommended for optimum efficiency. Models are often
utilized with all features or using univariate feature selection methods such as variance
threshold or PCA; however, in this case, a new approach of choosing optimum feature
dimensions is applied by developing sophisticated feature selection strategies like a recursive
feature removal approach. The advantage of utilizing this strategy is that the features that are
most associated with the goal are chosen based on estimator fitting, and the cross validation
guarantees that feature selection does not rely heavily on the samples used for training.
Although the models' performances are proven to be good (particularly for random
forest), the models may not be fully optimized because to the huge parameter space for each of
them and the inability to optimize all of them by brute force. As a result, it may be attempted in
the future scope of this research to create an intelligent algorithm for optimizing models with
big parameters; nevertheless, the performance with this is not expected to improve
considerably. Furthermore, because the sample size of the dataset on which models are fitted is
very small, sampling error can be severe when predicting for vast amounts of data with
uncertain labels. As a result, by gathering Cardiotocogram features from more children and
moms, the models may be retrained to improve model validation for bigger datasets.
Furthermore, information about regional location, hospital, specific instruments used, and age
group are not provided in the data source, so using models to predict fetal health of children in
different regions can result in inaccurate predictions, and it is not recommended to use the
fitted model on any fetal health data available on the web. Other complicated machine learning
models, such as various neural networks, can also be employed to increase performance;
however, those models have a high computational complexity and require substantial
parameter adjustment to produce adequate results; consequently, they are not used for this
purpose.
References
Akbulut, A., Ertugrul, E. & Topcu, V., 2018. Fetal health status prediction based on
maternal clinical history using machine learning techniques. Computer Methods and Programs
in Biomedicine, Volume 163, pp. 87-100.
Alam, S., Sonbhadra, S. K., Agarwal, S. & [Link], 2021. One-class support vector
classifiers: A survey. Knowledge-Based Systems, Volume 196.
Attallah, O., Gadelkarim, H. & Sharkas, M. A., 2018. Detecting and Classifying Fetal Brain
Abnormalities Using Machine Learning Techniques. s.l., IEEE.
Bisong, E., 2019. Building Machine Learning and Deep Learning Models on Google Cloud
Platform. s.l.:Apress.
Bynagari, N. B. & Ahmed, A. A. A., 2021. Anti-Money Laundering Recognition through
the Gradient Boosting Classifier. Academy of Accounting and Financial, Volume 25, pp. 1-11.
Chen, W., Shahabi, H., Shirzadi, A. & Hong, H., 2019. Novel hybrid artificial intelligence
approach of bivariate statistical-methods-based kernel logistic regression classifier for landslide
susceptibility modeling. Bulletin of Engineering Geology and the Environment, Volume 6, p. 78.
Gou, J. et al., 2019. A generalized mean distance-based k-nearest neighbor classifier.
Expert Systems with Applications, Volume 115, pp. 356-372.
Molla, M. M. I. et al., 2021. Cardiotocogram Data ClassificationUsing Random Forest
Based MachineLearning Algorithm. s.l., Proceedings of the 11th National Technical Seminar on
Unmanned System Technology 2019.
Mustaqim, A. Z., Adi, S., Pristyanto, Y. & Astuti, Y., 2021. The Effect of Recursive Feature
Elimination with Cross-Validation (RFECV) Feature Selection Algorithm toward Classifier
Performance on Credit Card Fraud Detection. International Conference on Artificial Intelligence
and Computer Science Technology (ICAICST), pp. 270-275.
NabillahRahmayanti, HumairaPradani, MuhammadPahlawan & RetnoVinarti, 2021.
Comparison of machine learning algorithms to classify fetal health using cardiotocogram data.
s.l., Procedia Computer Science.
Noshad, Z. et al., 2019. Fault Detection in Wireless Sensor Networks through the
Random Forest Classifier. Sensor, 19(7).
O'Sullivan, M. et al., 2021. Classification of fetal compromise during labour: signal
processing and feature. s.l., 2021 29th European Signal Processing Conference (EUSIPCO).
Ramaravind Kommiya Mothilal, A. S. C. T., 2020. Explaining Machine Learning Classifiers
through Diverse Counterfactual Explanations. Conference on Fairness, Accountability, and
Transparency, [Link]. 607-617.
Ricciardi, C., Cesarelli, G., Ponsiglione, A. M. & Amato, F., 2020. Classifying the type of
delivery from cardiotocographic signals: A machine learning approach. s.l., Computer Methods
and Programs in Biomedicine.
Siddiqui, M. K., Morales-Menendez, R., Huang, X. & Hussain, N., 2020. A review of
epileptic seizure detection using machine learning classifiers. Brain informatics, Volume 7, pp.
1-18.
Signorini, M., Pini, N., Malovini, A. & Bellazzi, R., 2019. Integrating Machine Learning
Techniques and Physiology Based Heart Rate Features for Antepartum Fetal Monitoring. s.l.,
Computer Methods and Programs in Biomedicine.
Subasia, A., Kadasaa, B. & Kremicb, E., 2020. Classification of the Cardiotocogram Data
for Anticipation of Fetal Risks using Bagging Ensemble Classifier. Procedia Computer Science,
Volume 168, pp. 34-39.
Wang, D. et al., 2020. Classification, experimental assessment, modeling methods and
evaluation metrics of Trombe walls. Renewable and Sustainable Energy Reviews, Volume 124.