Predictive Analytics in Cyber Security
Predictive Analytics in Cyber Security
T
HE ICT industry has grown to be the backbone of them is too slow. This overworks the resources and also
today’s society over the past half-century. This decreases the effectiveness of the cyber security response
integration has elevated the relevance of cyber security team. Scalability is another concern because the volumes of
to defend ICT systems against different types of cyber threats. data that have to be supervised increases as organizations
Information security is vital for an organization to protect its develop and the digital architecture becomes multifaceted.
data from unauthorized access, which is commonly known as Most of today’s detectors lack escalation mechanisms and
cyber security. Measures such as network, application, and may fail to monitor all points of vulnerability to breaches
operational security, including antivirus software, firewalls, when the network is growing [4].
and intrusion detection systems, are employed to address The incorporation of big data analytics into the cyber
threats like unauthorized access and malware. However, there security models proves both critical and challenging. The use
are existing gaps in the literature that this research aims to of predictive analytics in this area has the prospect of turning
address, particularly in integrating predictive analytics into cyber security into an entirely proactive line of work by
real-world cyber security frameworks. predicting threats before they arise. However, it is not easy to
Risk assessment in cyber security is a major shift from implement these systems since they involve huge capital
simple remedial methods to proactive methods of operation. investments in data collection, analysis, model development,
Through the application of statistics and machine learning, training, and continuous updates to the models, to suit new
predictive analytics helps to identify possible future threats emerging threats [5]. Despite the foundational role of real-time
and risks to a company’s cyber security, enabling preventive cyber-attack detection systems, they still suffer from the
problem of being largely reactive, false alarms, and scalability.
Muhammad Danish is with the University of New Mexico, Albuquerque,
These limitations imply the need to accelerate and redesign
NM 87106 USA (e-mail: mdanish@[Link]). present and prospective technologies and methods in cyber
[Link], [Link], and [Link] files are available online at security.
[Link]
2
This study addresses several key questions: How effective Predictive analytics give signals and alerts of risk to
is predictive analytics in identifying and responding to organizations before they turn into actual breaches. Indeed, the
different types of cyber-attacks in real time? What patterns definition and usage of predictive analytics have changed over
and anomalies can predictive models detect that traditional time in the context of cyber security due to the advancements
security measures often miss? How can predictive analytics in data science and artificial intelligence. Initially, the field
enhance the decision-making process in cyber security was limited to basic data monitoring and detection of
operations centers? This paper aims to fill existing gaps in the anomalies; today, it incorporates highly developed algorithms
literature by providing empirical evidence on these questions and refers to such advanced techniques as predictive threat
and highlighting the practical implications of the findings. modeling and risk assessment [8]. This change marks an
To evaluate these questions, this study aims to assess the evolution from conventional or traditional security methods
effectiveness of predictive analytics in real-time detection and like firewalls and antivirus software, moving towards
response to cyber-attacks, identify key patterns and anomalies intelligence-driven security.
detectable by predictive models, and propose a model that The fundamental principles of predictive analytics in cyber
improves decision-making processes in cyber security security hinge on several core elements including poor data
operations centers by integrating predictive analytics. Indeed, quality, inefficiency in the algorithms used, and lack of timely
the implications of research on predictive analytics for real- threat intelligence. Viable predictive systems also require first-
time threat detection and response are quite monumental when rate and pertinent data to educate the models that are used in
viewed through the prism of current and future cyber security the prediction and provision of attack prevention.
environments. The study aims to improve the knowledge and Furthermore, incorporating real-time threat intelligence means
application of predictive analytics in cyber security with the the models remain accurate on the present threat vectors. The
ultimate goal of shifting the over-reliance on reactive integration of big data analytics in security operations
approaches towards a more proactive and preventive stance. improves not only threat detection effectiveness, but also the
This transition is vital in today's world where the rate of organization's agility. Therefore, security teams can prioritize
evolution of threats and their complexity are increasing at a and spend resources effectively, thereby lessening the
fast pace. The research is most relevant as it fills the current bloodbath that comes with cyber threats and enhancing the
considerations of existing detection mechanisms: for example, organizations' security stance. In addition, predictive analytics
the incapability of detecting and preventing other threats; and fosters compliance with laws by providing proof that the
the issues of extensibility and high rates of false positives. organization is actively pursuing security measures, which is
Enhancing these areas with the help of predictive analytics helpful to industries dealing with high levels of data protection
would make the overall protection against cyber threats laws [9]. Such an approach relying on predictive analytics is
considerably stronger, as it allows sightings of adversative becoming indispensable in the context of the constantly
scenarios earlier, thus minimizing the impacts [6]. changing nature of cyber threats that become more complex
Furthermore, the research aims at providing a way of and that cannot be addressed using conventional methods.
efficiently using the resources in cyber security. High false New technologies that exist in the detection of cyber-attacks
positive rates create noise which distracts security teams and have advanced to the integration of artificial intelligence (AI)
forces them to spend time investigating non-threatening issues and its subsection: machine learning (ML). AI and ML in
that waste an organization’s time and resources. Integrating cyber security mean the ability to automatically perform the
predictive analytics will help align security measures with detection and response to threats which are analyzed from
organizational risk management strategies by providing huge datasets relevant to identify patterns that may point to
realistic threat assessments. This research contributes to threats [10]. These technologies are useful when identifying
advancing technology's influence on strategic business indicators of compromises that may not be easily identified by
planning, crisis response, data security, and regulatory analysts entirely because of the huge volume and the
compliance. By pioneering more advanced predictive models, complexity of data that has to be scanned. Machine learning
the study contributes to setting new standards in cyber algorithms, both supervised and unsupervised learning
security, ensuring that businesses can protect their assets and models, are extremely useful in such cases. Supervised
maintain trust with clients and stakeholders in an increasingly learning models are trained on labeled datasets to differentiate
interconnected world [7]. between benign and malicious activity. Unsupervised learning
is to find the outliers within the system without having any
II. LITERATURE REVIEW labeled data, which helps in the identification of new and
Analytics in the context of cyber security is a highly unknown threats. AI improves threat identification because
advanced concept that adjusts security practice from the data is processed and analyzed far beyond the human capacity
reactive to proactive model. This approach incorporates the and rate. It automates the responses to the threats, thus taking
use of several statistical and machine learning models to a short time to counter the threats once they have been
examine the enormous volumes of data from sources such as identified [11]. AI-powered systems also include predictive
network traffic, user activities, and security logs to develop an analytical components that assess threat trends or patterns to
elaborate system that would alarm an early sign of a threat. predict future threats, hence improving the threat-hunting
process [12].
3
AI and ML play a big role in lowering the false positives in access to systems before they are identified and
threats. They enhance the process of filtering fakes and mitigated. The very nature of zero-day attacks makes
distinguishing between real and potential threats as well as them difficult to predict and detect using conventional
distinguishing them by understanding the degree of difference methods that rely on known signatures or patterns.
between unusual behavior and deliberate malicious actions, Security systems often require updates to their threat
taking care of prioritization of threats and thus, decreasing the intelligence to handle such vulnerabilities, but even then,
amount of work security teams have to do. However, the rapid pace at which new zero-days are discovered
implementing AI into cyber security has its own set of leaves organizations at constant risk [18].
challenges including the quality of data required for preparing Addressing these challenges requires a multifaceted
algorithms, the transparency of AI decision-making, and the approach involving enhanced detection algorithms that reduce
integration of AI systems into the current infrastructure of false positives, scalable security solutions that can grow with
cyber security systems. However, the threat in the cyberspace the organization, and proactive threat hunting that can detect
domain is not stagnant, and hence the AI models must be anomalies indicative of zero-day exploits. One direction is the
updated on a regular basis [13]. integration of the latest developments in the field of big data
The position of AI and ML in the context of cyber-attack and machine learning into cyber security practices, as these
detection is rather important and provides not only better can help analyze patterns, envision risks and attacks, and
detection mechanisms but also the proper and timely handling respond to them automatically to enhance organizational
of cyber security threats in a world where digital threats are security. [19]
frequently evolving. AI and ML are reshaping the sphere of Predictive analytics in cyber security incorporates various
cyber security; they allow for detecting threats quickly, and sophisticated models and techniques to predict and mitigate
often on a large scale, as well as making predictions. These potential threats before they can impact systems. The core of
technologies help to automatically detect and counter cyber this approach is based on the use of machine learning
threats increasing the security responsiveness of the algorithms with a variety of supervised and unsupervised
organization. It is, however, crucial to address some important learning algorithms. In the supervised learning model, specific
issues that relate to the handling of data quality and the ability data is used to train in order to identify known illicit
of the models to change, when integrated into existing systems behaviors. On the other hand, unsupervised learning identifies
in order to effectively cope with constantly emerging forms of and recognizes abnormal behaviors which if exist may be an
cyber threats. [14] indication of a threat. The ability of a system to detect
Modern cyber security processes are accompanied by many suspicious activities is essential for timely prevention of
difficulties that hinder its operations including the issues of threats and strengthening of the security status of any firm.
false positives, scalability, and the identification of previously One more important component of the environment of
unknown vulnerabilities. [15] predictive analytics is the usage of statistical algorithms.
1) False Positives These algorithms are able to compile data used to foresee
One major issue particular to cyber security is the future incidents by comprehending past trends and behaviors.
problem of false positives, which is an alarm that a threat Besides this method contributes not only to the prediction of
exists although it does not. This results in more resource possible threats but also to the development of a more accurate
wastage since security analysts have to go through these representation of risks that can be useful for better preparation
alerts to verify them and determine if they are actually a in organizations. User behavior analysis adds more value to
threat. The issue is further compounded by the fluidity predictive analytics because it investigates user activities to
and heterogeneity of today’s networks characterized by identify suspicious events that might be originating from
typical activity patterns that are easily mistaken for inside threats or stolen credentials. In this method, the basic
threats by security solutions [16]. security measures may not easily detect the anomalies.
2) Scalability Issues Furthermore, anomaly detection systems are used to identify
Due to the growth of organizations and the associated the levels of deviance from the normal behavioral patterns
expansion of the networks, at some point, implemented concerning the network traffic and access log prior to the
cyber security measures may take a hit. This is because times of the actual attack [21].
the scalability problems are evident from the amount and Despite the advantages like early threat identification, better
the number of endpoints that should be monitored and resource management, and faster response to threats,
analyzed by such systems. It has the characteristic of predictive analytics also face challenges in real-world
providing the areas of weakness and slow response to
applications. Forecasting models are only as good as the data
real threats in such a case. [17].
that they are applied to; this is a saying often used in statistics.
3) Zero-Day Vulnerabilities
Lack of quality and/or scope can produce erroneous
Perhaps the most daunting challenge is the detection and
management of zero-day vulnerabilities which are flaws predictions, while the nature of the cyber threats is
in software that the software maker does not know about continuously evolving requiring constant updates of the
and for which no patch exists at the time of discovery. models. Sustaining and periodically updating its application is
These vulnerabilities are highly valuable to attackers necessary to maintain its effectiveness. Further, incorporating
because they can be exploited to gain unauthorized predictive analytics into other infrastructures that are already
4
existent in cyber security can prove to be challenging and preventing attacks before they occur. This shift from a purely
time-consuming and may take considerable time with regular reactive to a proactive stance is increasingly regarded as
monitoring to overcome the possible ethical risks and privacy essential in a world where cyber threats are becoming more
issues that may come with their implementation. Cyber complicated and pervasive. [27]
security is already underway due to third-generation predictive The current body of research in cyber security predictive
analytics that are proactive instead of reactive. However, this analytics is expansive and rich with theoretical developments
success depends on very rigid execution, constant and proposed models. However, a significant gap remains in
modifications, and comprehensive data management in order the literature concerning the practical integration of these
to counter the continuously emerging threats in cyberspace. advanced predictive models into real-world cyber security
[22] frameworks. Despite the fact that such models can serve as
In the context of cyber security within organizations, there is good references, it has to be noted that it's one thing to prove a
a clear differentiation between reactive and predictive strategy or a model effective in an academic environment or at
systems: least in a simulation, and quite another to observe its
1) Reactive Systems effectiveness in realistic, dynamic cyber security settings [28].
Such systems mainly target threats as they emerge and This lack of correspondence is a strong indication that
hence primarily involve treatment. The reactive although there is rich theoretical research for these models, the
approach will sit back and wait for the attack and this lack of actual empirical data as well as actual planning with
poses a disadvantage because reacting to such threats the models, having to integrate them with operational concerns
will take a long time. This method bases its operations and then scaling up the overall system, presents a huge gap
on previous knowledge and, in a way, is ill-equipped to that has not been well covered in the literature. Most of the
deal with threats since it directly targets the known types current research works are majorly centered around the
of attacks and may not be very efficient with the novel improvement of the existing algorithms to be implemented but
attack vectors that are not typical of the previous cases.
minimal on how these algorithms can actually be deployed to
While reactive systems are badly needed to cope with a
work in real-world applications which entail factors such as
threat immediately, they are less complicated to design,
hardware constraints, real-time constraints, and how they can
yet they may be more costly in the long run since the
system’s damage incurred during detection delays can fit in the existing infrastructure of a system to secure it.
amount to much [23]. More efforts are still required to conduct studies linking the
2) Predictive Systems state-of-the-art predictive analytics methods and the real-
Whereas, predictive systems use techniques in analytical world cyber security operations, including design features that
processing such as machine language and statistics to allow solutions to be easily implemented in active technical
avoid predictions of a certain pernicious occurrence of environments with minimal modifications. Overcoming this
an event. Besides, as this approach focuses on analyzing gap is a relevant and necessary step in the development of
patterns and trends from large amounts of data for modern cyber security work, as well as in the practice of
planning future actions, it helps organizations to allocate transferring theoretical achievements into concrete
resources effectively and repel attacks promptly. Risk improvement of the methods for detecting and responding to
predictive systems greatly improve an organization's cyber threats. [29].
capacity to contain and prevent Cyber risk by giving
insights into potential risks. Nevertheless, they rely on III. METHODOLOGY
high-quality and detailed data for their operation, and
Quantitative research was used to conduct the study with the
they have their challenges concerning the constant
training of the models and the connection to the existing aim of understanding the use and outcomes of enhanced
security systems [24]. predictive modeling in real-time CTR. This method is appropriate
Research and implementations have established that for this research study because it permits strength and
supervised systems can significantly decrease the threat’s time significance testing of the hypothesis of the functionality and the
and cost effects by mitigating them before they occur [25]. results of the predictive analytics in the cyber security
Organizations that integrate predictive analytics into their frameworks. [30] The strategy that is proposed here is a
cyber security strategies often experience improved risk systematic experimental method through which all the researchers
management, reduced incident response times, and enhanced will deploy specified predictive models in a realistic IT security
compliance with regulatory requirements. The proactive environment that mimics the actual setting in organizations. This
approach, instead of reactive methodologies, not only helps in environment will have factors such as network traffic flows,
safeguarding against imminent threats but also prepares users' behavior data set, and normalcy of the cyber threat
organizations against emerging cyber threats by constantly scenarios to evaluate the models on how well they work in
updating defense mechanisms in alignment with the evolving recognizing cyber threats.
digital landscape [26]. The main objective is to evaluate the effectiveness of these
While reactive cyber security is necessary for dealing with predictive models with reference to the conventional firewalls or
immediate threats, the integration of predictive analytics into reactive security measures in terms of rate of occurrence of
cyber security frameworks provides a more robust defense by threats, rate of detection, and flexibility of the measures in
handling new types of threats. Sources of data for this study will
5
be data sets from the open source, plus newly generated data sets methods of data, where scaling of features is performed to
to represent new and upcoming cyber security threats. It includes enhance the performance of the learning algorithms [35]. For
the application of inter-model combinations with the aim of training and validation of the developed predictive models, the
bringing out various scenarios and attack vectors that realistically dataset is partitioned into training, validation, and test partitions.
test the capability of the predictive models. This is of extreme Most of the time, the data split is organized so that the training
significance since it allows competence validation of the models dataset is the largest, constituting about 70 percent, while the
in the presence of heteroscedasticity. Measurable factors validation and test datasets are about 15 percent each. This
including the detection rate of threats, false positives and segmentation makes it possible to train the models to their fullest
negatives of the system, and response time of the system will also potential while also giving a sound basis for a decision of the
be included. [31] model's parameters or the examination of the final model
For analytical data, the study will use techniques like performance compared to the performance on unseen data. [36]
regression analysis to determine the connection between the Since the data collected may contain confidential details of an
systems' responses and the success of threat countermeasures. individual or a group, all relevant measures are ensured to conceal
Specific measures that are used regularly in machine learning will the identity of the subject/person. The research follows guidelines
be used in measuring the accuracy of the predictions within the concerning the use of data, and measures being taken in order to
predictive models; some of these are precision, recall, and the F1- avoid the abuse of information. The Kaggle data utilized in the
score. There could be a sub-analysis with the help of statistical study ensures that the authors were bound to adhere to the Kaggle
tools like logistic regression or ROC Curve Analysis to see other data usage policies that are in harmony with general data
significant differences between predictive and reactive systems. It protection regulations and ethical considerations. Now that we
will support the theoretical potential of predictive analytics with have a clear understanding of the dataset and how it should be
quantitative data, and for this reason, this research design has prepared, several techniques can be used in predictive analysis,
been adopted. Thus, the present work endeavors to complement including decision trees, logistic regression, and neural networks.
the literature by providing actionable knowledge regarding how Some of these techniques are adopted due to their efficiency in
these models can be employed effectively, given the fact the dealing with big data while others are chosen due to efficiency in
comparison was performed in a purposefully controlled academic performing classification problems in cyber security. The
environment. This is important for the progression of cyber performance of these models is checked from time to time on the
security as well as the creation of stronger, preventative defense validation set with a view to ensuring that the model’s
strategies against cyber warfare. [32] performance is checked, adjusted, and optimized before the final
As has been highlighted, the essence of this study is to analyze check on the test set.
the importance of predictive analytics and its models in the cyber In this particular study, SPSS software support is crucial in the
security domain accurately; therefore, the selection and collection data processing retrieved from Kaggle to determine the efficiency
of high-quality data is vital. Variety ensures that the database of the predictive analytics models of cyber security [38]. This
acquired by the study is all-inclusive hence the use of data elicited section presents a clear approach to the statistical analysis using
from Kaggle, a platform that offers a wide array of datasets by SPSS which includes data handling, analysis methods, and
users from all over the world. This platform provides massive and results. For data to be exported into SPSS, it needs to undergo
diverse data with regard to the cyber threat scenarios which is certain preparations so that its analysis is accurate and meets the
very useful for this research. [33] standards. This entails data cleansing, which entails the
To start with, the selection of a dataset on Kaggle that is related elimination of unwanted data such as inconsistent records or
to security threats is made. The chosen dataset consists of over flawed records that may distort the results. Other techniques of
4,000 records wherein each record corresponds to one instance of data transformation are also utilized to transform the categorical
network traffic or log data that could be related to a cyber security data into some numerical formats that are more convenient for
threat. This dataset, thus, was chosen as complex and up-to-date, analysis purposes whereby, one and the same method of encoding
so that the results of the study reflect today's security threats. may or may not be appropriate depending on the specifics of the
Every row in the dataset contains features including source IP given algorithms in the course of the predictive modeling as it is
address, packet length, destination IP address, date and time of illustrated in [38].
the data, the type of traffic, and threat bit. These attributes are Descriptive analysis prepares the statistical inclination of data
important because they feed the raw data into learning systems analysis before going for intricate analytical examinations of the
and into the testing. This way, the normal and anomalous patterns data distribution, mean, and spread. In SPSS, these basic
are present in the data set, and the former provides the latter with measures can be obtained by using the descriptive menu and
the variety it needs to be exposed to a spectrum of data before it these include mean, median, mode, range, variance, and standard
can construct an appropriate and reliable automatic guard. [34] deviation. This step is critical to help manage data and look for
In this paper, the data cleaning process forms a critical step any outliers or similar points that need further data munging or
before data feeds can be given to the model development and normalization. [39] To drive theories at the beginning of the
analysis. This phase concerns dealing with missing values, study, inferential statistical analysis methods are used to assess
removing duplicate records, and converting categorical data into a hypotheses. Based on the kind of research questions and
form that is understandable to the machine. Due to the large and hypotheses, a set of tests that involves t-tests, ANOVA, and chi-
diversified data set, there is also a focus on the normalization squaredd tests amongst others are carried out just to test the
6
differences and associations between the set variables in the Several statistical and practical considerations underpin the
collected data. selection of a sample size of 2000 rows for this study on
Regarding understanding how the distinct factors predict threat predictive analytics in cybersecurity, ensuring that the analysis is
identification and the effectiveness of mitigation in cyber security, both reliable and generalizable. One of the primary reasons for
regression analysis is applied. Continuous dependent variables choosing this particular sample size is to achieve sufficient
were analyzed using linear regression, whereas binary dependent statistical power. In quantitative research, power is the probability
variables were analyzed using logistic regression. For this reason, that the study will detect an effect when there is an effect to be
the key analysis method which is employed in this study is detected [45]. A larger sample size reduces the risk of Type II
logistic regression analysis skills as the response variable is errors (failing to reject a false null hypothesis) and increases the
categorical and may include threats detected or not detected. It likelihood that the study can detect a smaller effect size, making
includes the identification of possible predictor variables the findings more robust and persuasive.
grounded on given conceptual knowledge and prior literature Cyber security data encompasses a wide variety of features,
review, checking for multicollinearity, and model fine-tuning in from IP addresses and timestamps to types of attacks and their
regard to complexity/detail and accuracy of prediction [40]. outcomes. A substantial sample size ensures that the dataset
To establish the goodness of fit for models, several tests are run contains a comprehensive range of these features, including less
on SPSS and Anker including R squared test for linear regression common but potentially significant occurrences. This diversity is
models and Hosmer–Lemeshow chi-squaredd test for logistic important for developing accurate models that can extrapolate
models. Among them, some measures reflect the degree to which well from existing to new data sets rather than training the model
the model explains the variation in the response variable, and one on existing data and having it perform comic replication of these
measure assesses the overall fit of the model. Furthermore, using data [46].
their p-values, the level of significance of individual predictors is When conducting research, the dataset is designed to contain a
assessed, with the prevailing popular level of significance level broad spectrum of problem cases, and therefore having 2000 rows
being 0.05 [41]. allows for problems with more complexity to be captured in the
If the cyber security data provided is rather large, which is result. [47] The representativeness is crucial as it influences the
often the case with cyber security data due to the nature of threats external reliability and applicability of the study results in other
and attacks, further analysis may involve more sophisticated settings or subpopulations of the cyber security domain,
methods, for instance, cluster analysis or principal component especially in real world applications.
analysis (PCA) to find other underlying patterns within data or There is always a potential in machine learning, especially
data dimensionality reduction. They are useful in the when working in a relatively new and rapidly developing branch
identification of underlying relationships that often would not be such as cyber security, to over-train the model, that is, to achieve
easily detected through regression models. Various parameters good results only on the basis of the training set but get low
such as mean absolute error, root mean square error, correlation scores on a new dataset [48]. This risk is less of a concern for
coefficient, and coefficient of variation are used to judge the larger sample sizes because that way the researcher has enough
models and improve their efficiency. data to train even more complicated. On the other hand, it avoids
The k-fold cross-validation technique is used in which the data under-fitting whereby the model used is not sufficient in
set is divided into k subsets, which are then used to create complexity to fit the pattern of the data applied and thus ensures
multiple train and test sets for the model. The performance of the that the predictive models developed are complex. Having larger
trained predictive models is tested using accuracy related to datasets could yield even more confident information and
measures such as the area under the curve, sensitivity (true conclusions, but at the same time, this means more computational
positive value), and specificity (true negative value) since these power is needed, and managing and dealing with more and more
values are important in evaluating the efficiency of the predictive complicated data may become an issue. A dataset of 2000 rows
analytics systems in an operational environment with cyber strikes a balance between comprehensiveness and manageability,
security threats [42]. allowing for detailed analysis without overwhelming the
The final phase encompasses the extraction and interpretation computational and analytical resources available for the study
of meaningful insights that would affect cyber security practices. [49].
On its own, SPSS offers complete output that comes with The chosen sample size of 2000 rows from the original dataset
estimates of coefficients (B), odds ratios, and confidence intervals is justified based on its ability to provide sufficient statistical
that are valuable in arriving at conclusions regarding the effects of power, represent the diverse and complex nature of cyber security
various predictors. These results are then discussed in relation to threats, ensure the representativeness of the findings, balance the
the existing body of knowledge within the cyber security domain risks of overfitting and underfitting, and remain feasible for
and present generalizable findings, research limitations, and comprehensive analysis within the resource constraints of this
future studies' implications [43]. Through meticulous data study [50]. This sample size is pivotal in achieving the research
analysis using SPSS, this study aims to contribute significantly to objectives while ensuring the validity and reliability of the results
the field by providing empirical evidence to support the [51].
hypothesis. The structured approach ensures that the findings are
robust, reproducible, and relevant to enhancing cyber security
measures in various organizational contexts [44].
7
This affirmation of our research questions provides a learning security to securing machine learning for CPS. IEEE
Communications Surveys & Tutorials, 23(1), 524-552.
fundamental appreciation of the importance of predictive
[8] Gupta, B.B., & Sheng, M. (2019). Machine Learning for Computer and
analytics in defining the future of cyber security practices Cyber Security. CRC Press.
[67]. [9] Díaz-Verdejo, J.E., Alonso, R.E., Alonso, A.E., & Madinabeitia, G.
(2023). A critical review of the techniques used for anomaly detection of
HTTP-based attacks: taxonomy, limitations and open challenges.
VI. CONCLUSION Computers & Security, 124, 102997.
This research focuses on assessing predictive analytics in [10] Vinayakumar, R., Alazab, M., Soman, K.P., Poornachandran, P., Al-
improving the identification procedures and countermeasures Nemrat, A., & Venkatraman, S. (2019). Deep learning approach for
intelligent intrusion detection system. IEEE Access, 7, 41525-41550.
to cyber threats in real-time systems. The presented research [11] Alrowais, F., Althahabi, S., Alotaibi, S.S., Mohamed, A., Hamza, M.A.,
relied on quantitative research methodologies, based on the & Marzouk, R. (2023). Automated Machine Learning Enabled
large and consistent dataset of network traffic and security Cybersecurity Threat Detection in Internet of Things Environment.
events. In addition, cross-sectional analysis was performed on Computer Systems Science & Engineering, 45(1).
[12] Sharma, S., & Nebhnani, M. (n.d.). Securing the Digital Frontier: Data
the collected data suggesting the use of advanced statistical Science Applications in Cyber security and Anomaly Detection.
modeling such as the logistical regression and clustering [13] Mohammed, T.M., Nataraj, L., Chikkagoudar, S., Chandrasekaran, S., &
analysis to present an understanding of how predictive Manjunath, B.S. (2021). Malware detection using frequency domain-
based image visualization and deep learning. arXiv preprint
analytics affects cyber security work.
arXiv:2101.10578.
The key findings highlight that predictive analytics [14] Jmila, H., Blanc, G., Shahid, M.R., & Lazrag, M. (2022). A survey of
significantly enhances a system’s ability to identify and smart home IoT device classification using machine learning-based
respond to various types of cyber-attacks in real time, offering network traffic analysis. IEEE Access, 10, 97117-97141.
an advantage over conventional reactive methods. The study [15] Gottwalt, F., Chang, E., & Dillon, T. (2019). CorrCorr: A feature
selection method for multivariate correlation network anomaly detection
highlighted the feasibility and importance of predictive techniques. Computers & Security, 83, 234-245.
analytics including improved threat detection, reduced [16] Rupa Devi, T., & Badugu, S. (2019). A review on network intrusion
response times, and better overall security management. detection system using machine learning. International Conference on
Despite many advantages, the study also discusses E-Business and Telecommunications, 598-607. Cham: Springer
International Publishing.
challenges and limitations in the use of predictive analytics. [17] Srinivasan, S., Ravi, V., Alazab, M., Ketha, S., Al-Zoubi, A.M., &
Future research should focus on real-time data integration and Padannayil, S.K. (2021). Spam emails detection based on distributed
adaptive learning algorithms that aims at improving the word embedding with deep learning. Machine Intelligence and Big Data
Analytics for Cybersecurity Applications, 161-189.
accuracy and timeliness of threat detection. Emerging
[18] Abdallah, E.E., & Otoom, A.F. (2022). Intrusion detection systems using
technologies and statistical methods could further advance supervised machine learning techniques: a survey. Procedia Computer
predictive analytics, providing more precise and context-aware Science, 201, 205-212.
cyber security tools. [19] Muwardi, R., Gao, H., Ghifarsyam, H.U., Yunita, M., Arrizki, A., &
Andika, J. (2021). Network security monitoring system via notification
alert. Journal of Integrated and Advanced Engineering (JIAE), 1(2),
ACKNOWLEDGMENT 113-122.
The author would like to thank the contributors of the datasets [20] Sharma, R., Kumar, V.R., & Sharma, R. (2019). AI based intrusion
detection system. Think India Journal, 22(3), 8119-8129.
used in this study obtained from Kaggle. [21] Berman, D.S., Buczak, A.L., Chavis, J.S., & Corbett, C.L. (2019). A
survey of deep learning methods for cyber security. Information, 10(4),
REFERENCES 122.
[22] KOŞAN, M., Yildiz, O., & Karacan, H. (2018). Comparative analysis of
machine learning algorithms in detection of phishing websites.
[1] Fatima, A., Maurya, R., Dutta, M.K., Burget, R., & Masek, J. (2019). Pamukkale University Journal of Engineering Sciences-Pamukkale
Android malware detection using genetic algorithm based optimized Universitesi Muhendislik Bilimleri Dergisi, 24(2), 234-241.
feature selection and machine learning. 42nd International Conference [23] Dupont, B., Shearing, C., Bernier, M., & Leukfeldt, R. (2023). The
on Telecommunications and Signal Processing (TSP), 220-223. IEEE. tensions of cyber-resilience: From sensemaking to practice. Computers
[2] Bazuhair, W., & Lee, W. (2020). Detecting malign encrypted network & Security, 132, 103372.
traffic using perlin noise and convolutional neural network. 2020 10th [24] Nathiya, T., & Suseendran, G. (2019). An effective hybrid intrusion
Annual Computing and Communication Workshop and Conference detection system for use in security monitoring in the virtual network
(CCWC), 0200-0206. IEEE. layer of cloud computing technology. Data Management, Analytics and
[3] Alani, M.M. (2021). Big data in cybersecurity: a survey of applications Innovation: Proceedings of ICDMAI 2018, Volume 2, 483-497. Springer
and future trends. Journal of Reliable Intelligent Environments, 7(2), 85- Singapore.
114. [25] Sajovic, I., & Boh Podgornik, B. (2022). Bibliometric analysis of
[4] Molina-Coronado, B., Mori, U., Mendiburu, A., & Miguel-Alonso, J. visualizations in computer graphics: a study. Sage Open, 12(1),
(2020). Survey of network intrusion detection methods from the 21582440211071105.
perspective of the knowledge discovery in databases process. IEEE [26] Ferrag, M.A., Maglaras, L., Moschoyiannis, S., & Janicke, H. (2020).
Transactions on Network and Service Management, 17(4), 2451-2479. Deep learning for cyber security intrusion detection: Approaches,
[5] Chalapathy, R., & Chawla, S. (2019). Deep learning for anomaly datasets, and comparative study. Journal of Information Security and
detection: A survey. arXiv preprint arXiv:1901.03407. Applications, 50, 102419.
[6] Mammen, A.L., Allenbach, Y., Stenzel, W., Benveniste, O., De [27] Heaton, J. (2018). Ian Goodfellow, Yoshua Bengio, and Aaron
Bleecker, J., Boyer, O., Casciola-Rosen, L., Christopher-Stine, L., Courville: Deep learning: The MIT Press, 2016, 800 pp, ISBN:
Damoiseaux, J., Gitiaux, C., & Fujimoto, M. (2020). 239th ENMC 0262035618. Genetic Programming and Evolvable Machines, 19(1),
international workshop: classification of dermatomyositis, Amsterdam, 305-307.
the Netherlands, 14–16 December 2018. Neuromuscular Disorders, [28] Ongun, T., Spohngellert, O., Miller, B., Boboila, S., Oprea, A., Eliassi-
30(1), 70-92. Rad, T., Hiser, J., Nottingham, A., Davidson, J., & Veeraraghavan, M.
[7] Olowononi, F.O., Rawat, D.B., & Liu, C. (2020). Resilient machine (2021). PORTFILER: port-level network profiling for self-propagating
learning for networked cyber physical systems: A survey for machine malware detection. 2021 IEEE Conference on Communications and
Network Security (CNS), 182-190. IEEE.
10
[29] Hasan, S.S., & Eesa, A.S. (2020). Optimization algorithms for intrusion intrusion detection systems: A systematic review. International Journal
detection system: A review. International Journal of Research- of Information Technology & Decision Making, 22(01), 589-636.
GRANTHAALAYAH, 8(08), 217-225. [51] Khraisat, A., Gondal, I., Vamplew, P., & Kamruzzaman, J. (2019).
[30] Injadat, M., Salo, F., Nassif, A.B., Essex, A., & Shami, A. (2018). Survey of intrusion detection systems: techniques, datasets and
Bayesian optimization with machine learning algorithms towards challenges. Cybersecurity, 2(1), 1-22.
anomaly detection. 2018 IEEE Global Communications Conference [52] Rosli, N.A., Yassin, W., Faizal, M.A., & Selamat, S.R. (2019).
(GLOBECOM), 1-6. IEEE. Clustering analysis for malware behavior detection using registry data.
[31] Dong, Y., Wang, R., & He, J. (2019). Real-time network intrusion International Journal of Advanced Computer Science and Applications
detection system based on deep learning. 2019 IEEE 10th International (IJACSA), 10, 12.
Conference on Software Engineering and Service Science (ICSESS), 1-4. [53] Beslin Pajila, P.J., Golden Julie, E., & Harold Robinson, Y. (2023).
IEEE. ABAP: Anchor node based DDoS attack detection using adaptive neuro-
[32] Kushal, S., Shanmugam, B., Sundaram, J., & Thennadil, S. (2024). Self- fuzzy inference system. Wireless Personal Communications, 128(2),
healing hybrid intrusion detection system: an ensemble machine learning 875-899.
approach. Discover Artificial Intelligence, 4(1), 28. [54] Saleh, A.I., Talaat, F.M., & Labib, L.M. (2019). A hybrid intrusion
[33] Chew, Y.J., Lee, N., Ooi, S.Y., Wong, K.S., & Pang, Y.H. (2022). detection system (HIDS) based on prioritized k-nearest neighbors and
Benchmarking full version of GureKDDCup, UNSW-NB15, and optimized SVM classifiers. Artificial Intelligence Review, 51, 403-443.
CIDDS-001 NIDS datasets using rolling-origin resampling. Information [55] Kharche, D., & Patil, R. (2020). Use of genetic algorithm with fuzzy
Security Journal: A Global Perspective, 31(5), 544-565. class association rule mining for intrusion detection. International
[34] Asjad, S. (n.d.). Intrusion Detection and Cyber Attack Classification for Journal of Computer Science and Information Technologies.
Encrypted DDS Communication Middleware in OT Networks using [56] Mehdi, M., & Khan, S. (2019). A novel intrusion detection system for
Machine Learning (Master's thesis, University of South-Eastern detection of black hole attacks in MANET using fuzzy logic.
Norway). International Journal of Computer Applications, 178(8), 12-17.
[35] Fernandes, G., Rodrigues, J.J., Carvalho, L.F., Al-Muhtadi, J.F., & [57] Latif, Z., Sharif, K., Li, F., Karim, M.M., Biswas, S., & Wang, Y.
Proença, M.L. (2019). A comprehensive survey on network anomaly (2020). A comprehensive survey of interface protocols for software
detection. Telecommunication Systems, 70, 447-489. defined networks. Journal of Network and Computer Applications, 156,
[36] Zhang, J., Ling, Y., Fu, X., Yang, X., Xiong, G., & Zhang, R. (2020). 102563.
Model of the intrusion detection system based on the integration of [58] Khraisat, A., Gondal, I., Vamplew, P., & Kamruzzaman, J. (2019).
spatial-temporal features. Computers & Security, 89, 101681. Survey of intrusion detection systems: techniques, datasets and
[37] Rupa Devi T, Badugu S. (2019). A review on network intrusion challenges. Cybersecurity, 2(1), 1-22.
detection system using machine learning. International Conference on [59] Ozkan-Okay, M., Samet, R., Aslan, Ö., & Gupta, D. (2021). A
E-Business and Telecommunications, 598-607. Cham: Springer comprehensive systematic literature review on intrusion detection
International Publishing. systems. IEEE Access, 9, 157727-157760.
[38] Ahmad, Z., Shahid Khan, A., Wai Shiang, C., Abdullah, J., & Ahmad, F. [60] Nweke, L.O. (n.d.). A survey of specification-based intrusion detection
(2021). Network intrusion detection system: A systematic study of techniques for cyber-physical systems.
machine learning and deep learning approaches. Transactions on [61] Garg, S., Kaur, K., Kumar, N., Kaddoum, G., Zomaya, A.Y., & Ranjan,
Emerging Telecommunications Technologies, 32(1), e4150. R. (2019). A hybrid deep learning-based model for anomaly detection in
[39] Al-Imran, M., & Ripon, S.H. (2021). Network intrusion detection: an cloud datacenter networks. IEEE Transactions on Network and Service
analytical assessment using deep learning and state-of-the-art machine Management, 16(3), 924-935.
learning models. International Journal of Computational Intelligence [62] Amma, N.G., & Subramanian, S. (2019). Feature correlation map based
Systems, 14(1), 200. statistical approach for denial of service attacks detection. 2019 5th
[40] Al-Saeed, I.A., Selamat, A., Rohani, M.F., Krejcar, O., & Chaudhry, International Conference on Computing Engineering and Design
J.A. (2020). A systematic state-of-the-art analysis of multi-agent (ICCED), 1-6. IEEE.
intrusion detection. IEEE Access, 8, 180184-180209. [63] Siddique, K., Akhtar, Z., Khan, F.A., & Kim, Y. (2019). KDD cup 99
[41] Soriano-Valdez, D., Pelaez-Ballestas, I., Manrique de Lara, A., & data sets: A perspective on the role of data sets in network intrusion
Gastelum-Strozzi, A. (2021). The basics of data, big data, and machine detection research. Computer, 52(2), 41-51.
learning in clinical practice. Clinical Rheumatology, 40(1), 11-23. [64] Mokbal, F.M., Dan, W., Imran, A., Jiuchuan, L., Akhtar, F., & Xiaoxi,
[42] Naoui, M.A., Lejdel, B., Ayad, M., Amamra, A., & Kazar, O. (2021). W. (2019). MLPXSS: an integrated XSS-based attack detection scheme
Using a distributed deep learning algorithm for analyzing big data in in web applications using multilayer perceptron technique. IEEE Access,
smart cities. Smart and Sustainable Built Environment, 10(1), 90-105. 7, 100567-100580.
[43] Qu, Z., Liu, H., Wang, Z., Xu, J., Zhang, P., & Zeng, H. (2021). A [65] Aslan, Ö., Ozkan-Okay, M., & Gupta, D. (2021). Intelligent behavior-
combined genetic optimization with AdaBoost ensemble model for based malware detection system on cloud computing environment. IEEE
anomaly detection in buildings electricity consumption. Energy and Access, 9, 83252-83271.
Buildings, 248, 111193. [66] Vinayakumar, R., Alazab, M., Soman, K.P., Poornachandran, P., &
[44] Or-Meir, O., Nissim, N., Elovici, Y., & Rokach, L. (2019). Dynamic Venkatraman, S. (2019). Robust intelligent malware detection using
malware analysis in the modern era—A state of the art survey. ACM deep learning. IEEE Access, 7, 46717-46738.
Computing Surveys (CSUR), 52(5), 1-48. [67] Abusnaina, A., Abuhamad, M., Alasmary, H., Anwar, A., Jang, R.,
[45] Alomari, E.S., Nuiaa, R.R., Alyasseri, Z.A., Mohammed, H.J., Sani, Salem, S., Nyang, D., & Mohaisen, D. (2021). Dl-fhmc: Deep learning-
N.S., & Esa, M.I., Musawi, B.A. (2023). Malware detection using deep based fine-grained hierarchical learning approach for robust malware
learning and correlation-based feature selection. Symmetry, 15(1), 123. classification. IEEE Transactions on Dependable and Secure
[46] Li, Y., Wen, Y., Tao, D., & Guan, K. (2019). Transforming cooling Computing, 19(5), 3432-3447.
optimization for green data center via deep reinforcement learning. IEEE
Transactions on Cybernetics, 50(5), 2002-2013.
[47] Fernandes, G., Rodrigues, J.J., Carvalho, L.F., Al-Muhtadi, J.F., &
Proença, M.L. (2019). A comprehensive survey on network anomaly
detection. Telecommunication Systems, 70, 447-489.
[48] Alhasan, S., Abdul-Salaam, G., Bayor, L., & Oliver, K. (2021). Intrusion
detection system based on artificial immune system: a review. 2021
International Conference on Cyber Security and Internet of Things (ICS
IoT), 7-14. IEEE.
[49] Mansouri, N., Javidi, M.M., & Mohammad Hasani Zade, B. (2021). A
CSO-based approach for secure data replication in cloud computing
environment. The Journal of Supercomputing, 77(6), 5882-5933.
[50] Alamleh, A., Albahri, O.S., Zaidan, A.A., Alamoodi, A.H., Albahri,
A.S., Zaidan, B.B., Qahtan, S., Binti Ismail, A.R., Malik, R.Q., Baqer,
M.J., & Jasim, A.N. (2023). Multi-attribute decision-making for