Breast Cancer Detection via Data Mining
Breast Cancer Detection via Data Mining
net/publication/351706367
CITATIONS READS
12 817
3 authors:
6 PUBLICATIONS 70 CITATIONS
Duhok Polytechnic University
207 PUBLICATIONS 5,343 CITATIONS
SEE PROFILE
SEE PROFILE
SEE PROFILE
All content following this page was uploaded by Adnan Mohsin Abdulazeez on 19 May 2021.
Authors’ contributions
This work was carried out in collaboration among all authors. Author SFK managed the literature
searches related to breast cancer classification and wrote the first draft of the manuscript. Author
AMA gave the idea and designed the study. Author ABS performed the statistical analysis data and
discuss the results. All authors read and approved the final manuscript.
Article Information
DOI: 10.9734/AJRCOS/2021/v8i430209
Editor(s):
(1) Dr. G. Sudheer, GVP College of Engineering for Women, India.
Reviewers:
(1) S. Rajasekaran, University of Technology and Applied Sciences-Ibri, Oman.
(2) D. Mallikarjuna Reddy, VIT University, India.
(3) Tesfay Gidey Hailu, Addis Ababa Science and Technology University, Ethiopia.
Complete Peer review History: [Link]
ABSTRACT
Breast cancer is one of the most common diseases among women, accounting for many deaths
each year. Even though cancer can be treated and cured in its early stages, many patients are
diagnosed at a late stage. Data mining is the method of finding or extracting information from
massive databases or datasets, and it is a field of computer science with a lot of potentials. It
covers a wide range of areas, one of which is classification. Classification may also be
accomplished using a variety of methods or algorithms. With the aid of MATLAB, five classification
algorithms were compared. This paper presents a performance comparison among the classifiers:
Support Vector Machine (SVM), Logistics Regression (LR), K-Nearest Neighbors (K-NN), Weighted
K-Nearest Neighbors (Weighted K-NN), and Gaussian Naïve Bayes (Gaussian NB). The data set
was taken from UCI Machine learning Repository. The main objective of this study is to classify
breast cancer women using the application of machine learning algorithms based on their
accuracy. The results have revealed that Weighted K-NN (96.7%) has the highest accuracy among
all the classifiers.
_____________________________________________________________________________________________________
Keywords: Breast cancer; data mining; SVM; Logistics regression; weighted K-NN; Gaussian Naïve
Bayes.
46
Khorshid et al.; AJRCOS, 8(4): 45-59, 2021;; Article [Link].68450
no.
47
Khorshid et al.; AJRCOS, 8(4): 45-59, 2021; Article [Link].68450
3) DM methods are used in the web education percent), has the highest accuracy of all the
classifiers.
In the field of web education, DM techniques are
used to upgrade courseware. The connections Bataineh et. al. [46] presented five nonlinear
are discovered by looking at the consumption algorithms including K-NN, Multi-Layer
data collected during students' sessions. This Perceptron (MLP), Classification and Regression
expertise is extremely beneficial to the course's Tree (CART), Gaussian NB, and SVM was done
instructor or author, who can determine which for BC detection. The author's main goal was to
changes are most necessary to increase the compare the performance and efficacy of BC
course's effectiveness. In the twenty-first century, detection algorithms. The accuracy of each
beginners use DM techniques, which are one of algorithm was also calculated separately by the
the most powerful learning methods available. author. A dataset of Wisconsin BC diagnostics
This allows learners to become more conscious was used in the study (WBCD). To calculate the
of their surroundings. The application of DM accuracy of each algorithm, the author used the
techniques to educational chats is both feasible K Fold validation process. MLP outperformed the
and can improve learning environments in the K-NN, CART, and NB algorithms with an
twenty-first century, according to Web Education accuracy of 96.70 percent.
[40].
Sinha et al. [47] introduced attribute filtering
4) In agriculture strategies, such as frequent itemsets mining, to
identify the most important and applicable
attribute from the Wisconsin BC dataset using a
Scientists and researchers around the world are
classification algorithmic such as SVM. Attribute
dealing with how to make agriculture safe and
filtering was used to compare NB, K-NN, and DT.
resilient in the face of continuing conditions and
With attribute filtering, SVM generated the
environmental change. Transition and
highest area under the curve as compared to
multidisciplinary approaches are needed in the
other classification techniques, resulting in better
agricultural system. For the production and
field accuracy and ROC curve.
efficiency when working with the same limited
resources, intelligent and precision agricultural
Bharati et al. [48] presented the capability of the
approaches were prioritized [43]. The strategy
classification of NB, Random Forest (RF), LR,
requires the collection of data from a variety of
MLP, K-NN in evaluating the BC disease dataset
sources and the effective application of that data
from the UCI repository, which was observed to
in the appropriate area. As a result of this need,
predict the existence of BC. The data set
there has been an increase in interest in
consisting of Kappa Statistics, TP Rate, FP Rate,
extracting information from large troves of data
and other metrics have all been investigated
resulting from various research and survey
exactitude. The efficiency of the K-NN classifier
projects. DM techniques advanced the concept
algorithm was observed.
of knowledge generation and pattern recognition
when they first appeared. Even though DM is a
Ghani et al. [49] used anthropometric data and
new science, it has a wide range of applications
parameters obtained during routine blood
in agriculture and related industries, and it has a
processing that can be used to predict BC. Using
bright future [44].
the recursive feature elimination process, they
first identified the most relevant attributes in the
5. RELATED WORK dataset that could be used as biomarkers. They
discovered that the best biomarkers for BC are
Singh et. al. [45] compared the performance of age, BMI, glucose, HOMA, and resistance. K-
different classifiers (DT classifier (J4.8, Simple NN, ANN, DT, and NB classification techniques
CART)), (Bayes classifier (NB, Bayesian LR)). were used for classification. ANN was found to
They were the most popular DM algorithms for be the most accurate at classifying the attribute,
BC classification. This paper aimed was to with an accuracy of 80.00 percent.
determine which classifier produces the most
reliable results for the Wisconsin Breast Cancer Basunia et al. [50] proposed a stacking classifier
(original) dataset WBCO. Dataset of BC was ensemble approach that effectively classifies
taken from the UCI ML repository using the benign and malignant tumors by combining
WEKA tool. The experimental results show that multiple classification techniques. Their
the DT classifier, i.e., Simple CART (98.13 experiment used the “Wisconsin Diagnosis BC”
48
Khorshid et al.; AJRCOS, 8(4): 45-59, 2021; Article [Link].68450
dataset from the UC Irvine Machine Learning Sudha et al. [54] suggested an improved lion
Repository. They chose 20 top features for BC optimisation algorithm (ILOA) technique that can
prediction using the Univariate Feature Selection identify small feature subsets quickly and
process. Jupyter Notebook is used with some accurately to classify the BC data set. A total of
Python open-source libraries to implement 500 mammogram images (288 benign and 212
various classification techniques such as CART, malignant) were used as a case sample in this
LR, K-NN, SVM, RF, and Stacking Classifier proposed study. After segmentation, each mass
techniques. The overall outcomes indicate was represented with 123 features, including 96
Stacking classifier has the highest accuracy texture features, 9 histogram features, 11 shape
97.20%. features, and 7 radial distance features, using a
region growing algorithm. The Feature selection
Saoud et al. [51] used feature selection technique used a minimum distance classifier, K-
techniques to enhance the accuracy of six NN classifier, and SVM classifier. As compared
algorithms for BC classification and diagnosis: to other algorithms, ILOA with K-NN classifier
Bayes Network, SVM, K-NN algorithm, ANN, DT performed well for BC classification [55].
(C4.5), and LR. They used both databases WBC
and WBCD. The feature selection technique 6. MATERIALS AND METHODS
increased the accuracy of some classifiers, such
as BN, in both WBCD and WBC. However, some In this proposed model many classifiers were
classifiers, such as SVM, had their accuracy used to classify the Breast Cancer tumor with
reduced as a result of the feature selection high accuracy, efficiency using via LR, fine K-NN,
technique. The BN with feature selection is the linear SVM, weighted K-NN, gaussian NB using
best model for classifying BC in WBC, while SVM nine features. The used dataset was the UCI
without feature selection is the best for WBCD. breast cancer machine learning repository. The
mechanism of this proposed model goes through
Kumar et al. [52] proposed two datasets of BC, five main stages, which are (Data Processing,
taken from the UCI Machine Learning repository. Validation Choosing, Classification and
On both datasets, seven algorithms were used. Evaluating the results), as demonstrated in Fig. 2
Which are (Bayes network, NB, SVM, K-NN, DT, that shows the Flowchart Diagram of the
RF, MLP). These two datasets have various proposed model. The preprocessing method is
features, with 11 and 32 features respectively. done for missing feature values (in Single
The datasets are split into two parts. The training Epithelial Cell Size feature, there are 16
data accounts for 65 percent of the overall instances in Groups 1 to 6 that contain a single
dataset, while the evaluation data account for the missing value in breast cancer dataset (i.e.,
remaining 35 percent. The accuracy of the (unavailable), attribute value denoted by “?”). To
Bayesian Network technique on the BCDW 11 test the predictive accuracy of the fitted models,
dataset was 97.13 percent, while the SVM use the 10-fold cross validation process by
technique on the WBCD dataset was 97.89 MATLAB as a classifier tool in this study. After
percent. the classification results of all the five algorithms,
the performance was measured by the confusion
Sakri et al. [53] proposed integrating the feature matrix and ROC area.
selection algorithm with classification algorithms
in BC prognosis. They claimed that using feature 6.1 Dataset
selection techniques to reduce the number of The data for this study was provided by the UCI
features in most classification algorithms, can Machine Learning repository, which is located in
improve them. Some features are more the BC Wisconsin sub-directory, with 699
significant and have a greater impact on the examples, two classes (malignant and benign),
classification algorithms' results than others. and 9 integer-valued attributes (as shown in
They presented the results of their experiments Table 1). In UCI Breast cancer dataset (Dataset’s
with and without the feature selection algorithm, link:[Link]
particle swarm optimization (PSO), on three +cancer) the class distribution are as following:
common classifying algorithms, namely NB, K-
NN, and REP Tree. As a result, NB obtained 1- Benign class: 458 (65.5%) instances.
better results with and without PSO, while the 2- Malignant class: 241 are (34.5%)
other two techniques performed better with PSO. instances.
49
Khorshid et al.; AJRCOS, 8(4): 45-59, 2021;; Article [Link].68450
no.
50
Khorshid et al.; AJRCOS, 8(4): 45-59, 2021; Article [Link].68450
Attribute Domain
1) Sample code number ID Number
2) Clump Thickness 1-10
3) Uniformity of cell size 1-10
4) Uniformity of cell shape 1-10
5) Marginal Adhesion 1-10
6) Single Epithelial cell size 1-10
7) Bare Nuclei 1-10
8) Bland Chromatin 1-10
9) Normal Nucleoli 1-10
10) Mitoses 1-10
11) Class: 2 For benign
4 For malignant
6.2.2 Logistics Regression (LR) these neighbors, despite their similarity to (y, x).
To do this, the distances used in the first stage of
One of the most widely utilized generalized linear the search for closest neighbors must be
models in DM is LR [62]. The probability of an transformed into similarity measurements that
outcome that can take two values from a can be used as weights [69]. Weighted K-NN
collection of predictor variables is predicted using assigns weights to each calculated value, then
LR. LR is primarily used for predicting and computes the nearest neighbors, and finally
calculating performance probabilities [63]. assigns the class to the processed instance [70],
[71], [72].
6.2.3 K-Nearest Neighbors (K-NN)
6.2.5 Gaussian Naïve Bayes (Gaussian NB)
K-NN is a simple algorithm for instance-based
learning that classifies objects in the feature Gaussian NB algorithm uses for classification,
space depending on their closest training dataset which is a special form of NB algorithm [73].
[64], [65]. An object is assigned to a class that When the features have continuous values or all
includes its K-NN. A class is created for an object of the features follow a Gaussian distribution
that includes its K-NN. To find the closest such as a normal distribution, this method is
neighbor, the K-NN algorithm was used, which particularly useful. The features' likelihood is
used Euclidean distance metrics [66], [67]. The assumed to be Gaussian [74].
equation below is used to measure the Euclidean ( )
distance metrics d(x,y) between two points x and ( | )= (− (3)
y. w. Where N denotes the number of features
with x = x1, x2, x3,...,xn and y = y1, y2, y3,...,yn
[68]. In equation (3), x is a continuous data variable,
and the parameters x and y are calculated
( , )=∑ − (2) using maximum likelihood estimation. After the
data has been segmented by class, the mean
6.2.4 Weighted K-Nearest-Neighbors and variance are measured.
(Weighted K-NN)
6.3 The Evaluation Metrics of the
This extension is based on the idea that Classifiers Performance
observations in the learning set that are
especially similar to the new observation (y, x) 6.3.1 Confusion matrix
should be given more weight in the decision than
observations that are far away from the new The confusion matrix (also called as the
observation (y, x). This is not the case with K- "Contingency Matrix") provides a good overview
NN: while only the k closest neighbors affect the of the classifiers performance. Table 2 shows a
prediction, this influence is consistent across standard confusion matric.
51
Khorshid et al.; AJRCOS, 8(4): 45-59, 2021;; Article [Link].68450
no.
Predicted Class
Positive Negative
Positive TP FN
Actual Class Negative FP TN
6.3.2 Receiver
eiver operating characteristics area
or ROC area
52
Khorshid et al.; AJRCOS, 8(4): 45-59, 2021;; Article [Link].68450
no.
benign. We also have 229 true malignant and 12 the lowest accuracy 95.4%. Also, the results
false malignant patients. Fig. 3 (b) illustrates the show can the Weighted K-NN NN Classifier has the
ROC curve plot. The area under the curve (AUC) best training time value (0.5096 6 sec) and ROC
is 0.99. area 0.99. The Weighted K-NN NN method is the
best classifier among the five proposed
classifiers for classifying a tumor as benign or
malignant, according to these findings.
8. Comparitive Study
Fig. 8. Comparison of accuracies between all
the classifiers The comparison summary of the related works is
shown in Table 4. The researchers in the related
The overall outcomes displayed in Table 3 papers used various techniques of feature
indicate that Weighted K-NN
NN Classifier has the selection and classification methods, as well as
highest accuracy 96.7%, where Fine K-NN
K has different datasets with different numbers of
53
Khorshid et al.; AJRCOS, 8(4): 45-59, 2021; Article [Link].68450
Model no. Model Type Accuracy Training time (sec) Area under ROC curve
1 Logistic regression 96.1% 5.0739 0.99
2 Linear SVM 96.6% 2.4899 1.00
3 Fine K-NN 95.4% 1.1064 0.95
4 Weighted K-NN 96.7% 0.5096 0.99
5 Gaussian NB 96.1% 1.2643 0.98
R# Classifier Tool Dataset Number of Data type Data processing Evaluation Validation Accuracy
attribute method method technique
[45] Naive Bayes WEKA UCI 11 Numeric Kappa statitics Performane - 95.26%
Bayesian Logistic Regression repository Attributes (Discrete Mean Absolute classifiers 65.42%
Simple CART value) error 98.13%
J48 97.27%
[46] MultiLayer preceptron MATLAB UCI repository 32 Images Standardize Binary cross 99.12%
K-Nearest Neighbours WDBC dataset Attributes rescaling method classification validation 95.61%
CART Accuracy 93.85%
Gussian Naive Bayes method 94.73%
Support vector machine 98.24%
[47] Support vector machine PYTHON UCI repository 31 Numeric z-score Confusion - 96.61%
Naïve Bayes WBC Attributes (binary normalization Matrix 96.46%
k-Nearest Neighbours value) 91.74%
Decision Tree 90.27%
[48] K-Nearest Neighbors WEKA UCI 10 Numeric Kappa Statistics Binary - 72.37 %
Naïve Bayes repository Attributes classification 71.67%
Random Forest Accuracy 69.58 %
Logistic Regression method 68.8%
Multilayer Perceptron 64.68 %
[49] K-Nearest Neighbors WEKA UCI 9 Numeric Recursive feature confusion matrix - 77.14%
Artificial Neural Networks repository Attributes Elimination for 80.00%
Decision Tree features selection 71.43%
Naive Bayesian 73.91%
[50] CART PYTHON UCI 32 Numeric Features selection confusion matrix Cross 94.74%
54
Khorshid et al.; AJRCOS, 8(4): 45-59, 2021; Article [Link].68450
R# Classifier Tool Dataset Number of Data type Data processing Evaluation Validation Accuracy
attribute method method technique
Logistic regression repository Attributes validation 97.08%
K-Nearest Neighbors 95.91%
Support Vector Machine 95.91%
Random Forest 97.08%
Stacking Classifier 97.20%
[51] Bayes Network WEKA UCI 9WBC Numeric Features selection confusion matrix Cross With WBC
Support Vector Machine repository 32WBCD validation (BN):97.42%
K-Nearest Neighbors WBC WBCD With WBCD
Artificial Neural Networks (SVM):
Decsion Tree (C4.5) 97.36%
Logistic Regression
[52] Bayesian network WEKA UCI 11BCWD Numeric Data statistics Performance Cross With BCDW
Naïve Bayes repository 32WBCD classifiers validation (Bayesian
SVM BCWD Network):
Multi Layer preceptron WBCD 97.13%
K-NN With WBCD
Decision Tree (J48) (SVM):
Random Forest 97.89%
[53] Naïve Bayes WEKA UCI 35 Numeric Features selection confusion matrix Cross 81.3%
K-Nearest Neighbors repository Attributes and extraction validation 75.0%
Fast decision tree learner 93.6%
(REPTREE)
[54] Support Vector Machine MATLAB Digital -30 Images Features selection Performance Cross 98.92%
K-Nearest Neighbours database for Attributes and extraction classifiers validation 99.31%
screening
mammography
(DDSM)
Proposed Logistic regression MATLAB UCI 9 Numeric - Confusion Cross 96.1%
Work Support Vector Machine repository Attributes matrix validation 96.6%
Weighted K-NN ROC area 96.7%
K-Nearest Neighbours 95.4%
Gaussian Naïve Bayes 96.1%
55
Khorshid et al.; AJRCOS, 8(4): 45-59, 2021; Article [Link].68450
features. Comparing with the previous works, the and Fine K-NN with the accuracy ratios of 96.6%,
provided method acquires a high accuracy 96.1%, 96.1%, and 95.4%.
classification of breast cancer. However,
researchers in [47] and [45] used WBC (original) COMPETING INTERESTS
dataset to train and test different DM algorithms.
They respectively registered anaccuracy of Authors have declared that no competing
96.61% (SVM) and 98.13% (CART), despite a interests exist.
high execution time of CART. Researchers in
[46] used the WDBC dataset with the
REFERENCES
standardization method to reach 99.12% for
MLP. In [48], researchers used fewer attributes 1. Abdulqader DM, Abdulazeez AM,
and gained an average of 72.37% accuracy for Zeebaree DQJML. Machine learning
K-NN, 71.67% for NB, 69.58% for RF, and supervised algorithms of gene selection: A
64.68% for LR. Researchers in [49] obtained Review. 2020;62(03).
80% for ANN by using the feature selection 2. Ahmed O, Brifcani A. Gene expression
method. In [50], researchers used the feature classification based on deep learning. in
selection method to reach 97.20% for Stacking 2019 4th Scientific International
Classifier, researchers in [51] used two datasets Conference Najaf (SICN). 2019;145-
with a feature selection technique to reach 149:IEEE.
97.42% from BN for WBC, 97.36% from SVM for 3. Zeebaree DQ, Haron H, Abdulazeez AM.
WBCD. Researchers in [52] used two different Gene selection and classification of
datasets to reach a high accuracy rate of 97.13% microarray data using convolutional neural
from BN for the BCDW dataset, 97.89% from network. in 2018 International Conference
SVM for WBCD. In [53], researchers used many on Advanced Science and Engineering
features but achieved fewer accuracy rates (ICOASE). 2018;145-150:IEEE.
(93.6%) for DT. Lastly, researchers in [54] gained 4. Eesa AS, Abdulazeez AM, Orman
a good accuracy result (98.92%) for K-NN. The ZJSJoUoZ. A DIDS based on the
proposed work utilized five DM classifiers combination of cuttlefish algorithm and
(Logistic Regression (LR), Support Vector decision tree. 2017;5(4):313-318.
Machine (SVM), K-Nearest Neighbors (K-NN), 5. Taher KI, Abdulazeez AM, Zebari
weighted K-Nearest Neighbors (Weighted K-NN), DAJAJoRiCS. Data mining classification
and Gaussian Naïve Bayes (Gaussian NB) algorithms for analyzing soil data. 2021;17-
algorithms) and the best classifier was Weighted 28.
K-NN with 96.7% accuracy. 6. Oskouei RJ, Kor NM, Maleki SAJAjocr.
Data mining and medical world: breast
9. CONCLUSION cancers’ diagnosis, treatment, prognosis
and challenges. 2017;7(3):610.
This paper attempted to improve the accuracy of 7. Ibrahim I, Abdulazeez AJJoAS, Trends T.
breast cancer classification using data mining The Role of Machine Learning Algorithms
techniques. In this study the UCI breast cancer for Diagnosing Diseases. 2021;2(01):10-
dataset used and five data mining algorithms 19.
were used for the classification (Logistic 8. Zebari R, Abdulazeez A, Zeebaree D,
Regression (LR), Support Vector Machine Zebari D, Saeed JJJoAS, Trends T. A
(SVM), K-Nearest Neighbors (K-NN), weighted comprehensive review of dimensionality
K-Nearest Neighbors (Weighted K-NN), and reduction techniques for feature selection
Gaussian Naïve Bayes (Gaussian NB) and feature extraction. 2020;1(2):56-70.
algorithms). All the experiments were done 9. Charbuty B, Abdulazeez AJJoAS, Trends
using MATLAB 2021a. The primary goal is to T. Classification Based on Decision Tree
assess how well each algorithm performs in Algorithm for Machine Learning.
terms of classification test accuracy when it 2021;2(01):20-28.
comes to classifying data. The evaluation of the 10. Sagar M, Vivekkumar G, Reddy M,
results done in terms of the confusion matrix and Devendiran S, Amarnath M. Research on
ROC curve. Investigational results show that the intelligent fault diagnosis of gears using
Weighted K-NN classifier has the highest EMD, spectral features and data mining
accuracy 96.7%, where Fine K-NN has the techniques. in IOP Conference Series:
lowest accuracy 95.4%. The last four classifiers Materials Science and Engineering,
respectively are Linear SVM, LR, Gaussian NB, 2017;263(6) :062047: IOP Publishing.
56
Khorshid et al.; AJRCOS, 8(4): 45-59, 2021; Article [Link].68450
57
Khorshid et al.; AJRCOS, 8(4): 45-59, 2021; Article [Link].68450
58
Khorshid et al.; AJRCOS, 8(4): 45-59, 2021; Article [Link].68450
Peer-review history:
The peer review history for this paper can be accessed here:
[Link]
59
Support Vector Machine (SVM) generally outperforms Naive Bayes in accuracy, often by leveraging complex decision boundaries and high-dimensional data handling, while Naive Bayes, relying on feature independence assumption, provides efficiency and simplicity but may struggle with complex interactions present in breast cancer datasets .
Anthropometric data such as age, BMI, glucose levels, HOMA, and insulin resistance serve as critical inputs for predictive models like K-NN and ANN. By highlighting significant biomarkers, these parameters aid in forming precise predictions about breast cancer, improving model accuracy in classification tasks .
Feature selection can both positively and negatively impact SVM performance in breast cancer classification. In the case of SVM, feature selection reduced its accuracy for the WBCD dataset, indicating that the technique might remove features which are influential for SVM's predictive capability .
Combining recursive feature elimination with classifiers like K-NN, ANN, DT, and NB enhances the identification of biomarkers such as age, BMI, glucose, HOMA, and resistance by systematically eliminating less significant features, which refines model focus and improves classification accuracy to some extent, as ANN showed 80.00% accuracy .
Cross-validation, particularly 10-fold cross-validation, is extensively used to ensure robustness and accuracy of breast cancer classification models by dividing the data into subsets to train and validate iteratively, reducing overfitting and providing a reliable estimate of model performance across different segments of data .
ANN consistently shows high accuracy for breast cancer classification, reaching up to 80.00% accuracy as very effective in classification tasks due to its ability to model complex patterns and interactions within data, supported by references from multiple studies .
K-Nearest Neighbors (K-NN) algorithm performance can vary due to differences in dataset characteristics such as size, feature selection methods, and noise levels. Specific to breast cancer datasets, variations in attribute relevance and preprocessing impact the algorithm's distance-based calculations, causing inconsistencies in accuracy .
The improved lion optimization algorithm (ILOA) effectively identifies smaller, relevant feature subsets quickly and accurately, enhancing classification efficiency. In the context of breast cancer, ILOA with K-NN classifier has delivered superior performance in managing complex datasets .
Preprocessing techniques like z-score normalization standardize data by adjusting means and variances, which enhances the robustness of classification outcomes by ensuring that models, such as those used for breast cancer classification, do not get skewed by scale differences in dataset attributes, thereby improving accuracy .
The Stacking Classifier approach in breast cancer detection combines multiple individual classifiers to enhance overall performance, achieving the highest accuracy of 97.20% according to Basunia et al. This technique leverages the strengths of different algorithms, resulting in improved efficacy over singular methods .