0% found this document useful (0 votes)
25 views3 pages

Anomaly Detection Methods in Networks

Uploaded by

kifirid776
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views3 pages

Anomaly Detection Methods in Networks

Uploaded by

kifirid776
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

A Survey of Anomaly Detection Methods in Networks

Weiyu Zhang, Qingbo Yang, Yushui Geng


Modern Educational Technology Center
Shandong Institute of Light Industry
Jinan, China
zwy@[Link]

Abstract--Despite the advances reached along the last 20 years,


anomaly detection in networks is still an immature technology, II. PROBLEM STATEMENT
Nevertheless, the benefits which could be obtained from a better To pose the problem of anomaly detection in any system
understanding of the problem itself as well as the improvement of implies the existence of a subjacent concept of normality. The
these methods. Therefore, in this paper we present a survey on notion of ‘normal’ is usually provided by a formal model that
anomaly detection in networks. In order to distinguish between
expresses relations between the fundamental variables involved
the different approaches used for anomaly detection in networks
in the system dynamics. Consequently, an event is catalogued
in a structured way, we have classified those methods into four
categories: statistical anomaly detection, classifier based anomaly as anomalous because its degree of deviation in relation to the
detection, anomaly detection using machine learning and finite profile of characteristic behavior of the system, specified by the
state machine anomaly detection. We describe each method in model of normality, is high enough.
details and give examples for its applications in networks. Formally, an anomaly detection system S can be defined as
a pair S = (M,D), where M is the model of normal behavior of
Keywords-Anomaly detection; Machine learning; Intrusion
the system and D is a similarity measure that allows obtaining,
detection; Network security
given an activity record, the degree of deviation that such
activities have with regard to the model M. Therefore, the core
I. INTRODUCTION of the system is constituted by two main modules: the
Communication networks make physical distances modeling subsystem and the detection subsystem. The first of
meaningless. While we are enjoying the ease of being them works during a training stage, and performs an event
connected, it is also recognized that an intrusion of malicious processing in order to obtain the model M of the normal
users from one place can cause severe damages to wide areas. behavior of the system. The obtained model is subsequently
Computer network’s security becomes a critical issue and it is used by the detection engine to evaluate new events. As was
important to develop mechanisms to defense against the stated before, this evaluation is a measurement of the degree of
intrusions. deviation that such events present in relation to the model of
the system. These two modes of operation are usually carried
The existing intrusion detection methods fall in two major out separately. However, it is important to note that systems
categories: signature recognition and anomaly detection [1]. evolve and, therefore, the model should be reconstructed
For signature recognition techniques, signatures of known periodically in order to provide a way of adaptation to the new
attacks are stored and monitored events are matched against the environment.
signatures. The techniques signal an intrusion when there is a
match. An obvious limitation of these techniques is that they
cannot detect new attacks whose signatures are unknown. III. ANOMALY DETECTION METHODS
Anomaly detection, on the other hand, builds models of normal
data and detects any deviation from the normal model in the A. Anomaly detection using statistics
observed data. Given a set of normal data to train from, and In statistical methods for anomaly detection, the system
given a new piece of test data, the goal is to determine whether observes the activity of subjects and generates profiles to
the test data belong to “normal” or to an anomalous behavior. represent their behavior. Typically, two profiles are maintained
The anomaly detection techniques have the advantage that they for each subject: the current profile and the stored profile. As
can detect new types of intrusions as deviations from normal the network events are processed, the system updates the
usage [2]. However, their weakness is the high false alarm rate. current profile and periodically calculates an anomaly score by
comparing the current profile with the stored profile using a
The remainder of this article is organized as follows. In function of abnormality of all measures within the profile. If
Section 2, we introduce the problem through the presentation the anomaly score is higher than a certain threshold, the system
of theoretical considerations. In Section 3, we provide detailed generates an alert.
discussions on the various techniques used in anomaly
detection. Finally, in section 4 we conclude this paper.

978-1-4244-5273-6/09/$26.00 ©2009 IEEE


Authorized licensed use limited to: UNIVERSIDAD VERACRUZANA. Downloaded on July 01,2021 at [Link] UTC from IEEE Xplore. Restrictions apply.
Statistical anomaly detection has a number of advantages. decision tree and then extract a set of classification rules from
Firstly, these systems do not require prior knowledge of the decision tree. Other algorithms directly induce rules from
security flaws and/or the attacks themselves. In addition, the data by employing a divide-and-conquer approach.
statistical approaches can provide accurate notification of
malicious activities that typically occur over extended periods Fuzzy logic techniques have been in use in the area of
of time. However, statistical anomaly detection schemes also network security since the late 1990's [5]. Dickerson et al. [6]
have drawbacks. Firstly, it can be difficult to determine developed the Fuzzy Intrusion Recognition Engine (FIRE)
thresholds that balance the likelihood of false positives with the using fuzzy sets and fuzzy rules. FIRE uses simple data mining
likelihood of false negatives. In addition, statistical methods techniques to process the network input data and generate
need accurate statistical distributions, but, not all behaviors can fuzzy sets for every observed feature. The fuzzy sets are then
be modeled using purely statistical methods. used to define fuzzy rules to detect individual attacks. FIRE
does not establish any sort of model representing the current
Haystack [3] is one of the earliest examples of a statistical state of the system, but instead relies on attack specific rules
anomaly-based intrusion detection system. It used both user for detection.
and group-based anomaly detection strategies, and modeled
system parameters as independent, Gaussian random variables. Genetic algorithms, a search technique used to find
approximate solutions to optimization and search problems,
Haystack defined a range of values that were considered
normal for each feature. If during a session, a feature fell have also been extensively employed in the domain of
outside the normal range, the score for the subject was raised. It intrusion detection to differentiate normal network traffic from
was designed to detect six types of intrusions. But, one anomalous connections. The major advantage of genetic
drawback of Haystack was that it was designed to work offline. algorithms is their flexibility and robustness as a global search
method. The earliest attempt to apply genetic algorithms to the
Statistical Packet Anomaly Detection Engine (SPADE) [4] problem of intrusion detection was done by Crosbie and
is a statistical anomaly detection system. SPADE was one of Spafford [7] in 1995, when they applied multiple agent
the first papers that proposed using the concept of an anomaly technology to detect network based anomalies.
score to detect port scans, instead of using the traditional
approach of looking at p attempts over q seconds. In [4], the C. Anomaly detection using machine learning
authors used a simple frequency based approach, to calculate Machine learning aims to answer many of the same
the 'anomaly score' of a packet. The fewer times a given packet questions as statistics. However, unlike statistical approaches
was seen, the higher was its anomaly score. Once the anomaly which tend to focus on understanding the process that
score crossed a threshold, the packets were forwarded to a generated the data, machine learning techniques focus on
correlation engine that was designed to detect port scans. building a system that improves its performance based on
However, the one major drawback for SPADE is that it has a previous results. In other words systems that are based on the
very high false alarm rate. This is due to the fact that SPADE machine learning paradigm have the ability to change their
classifies all unseen packets as attacks regardless of whether execution strategy on the basis of newly acquired information.
they are actually intrusions or not.
A Bayesian network is a graphical model that encodes
B. Anomaly detection using a classifier probabilistic relationships among variables of interest. When
In this section we focus on the anomaly detection using a used in conjunction with statistical techniques, Bayesian
classifier. Anomaly detection depends on the idea that normal networks have several advantages for data analysis [8]. Several
characteristics behavior can be distinguished from abnormal researchers have adapted ideas from Bayesian statistics to
behavior. A classifier can be used to predict the normal create models for anomaly detection. Valdes et al. [9]
incoming event given the current event. If during the developed an anomaly detection system that employed naive
monitoring phase the next event is not the one predicted by the Bayesian networks to perform intrusion detection on traffic
classifier, it is considered as an anomaly. The classification bursts.
process typically involves the following steps: 1. Identify class Bayesian techniques also have been frequently used in
attributes and classes from training data. 2. Identify attributes classification and suppression of false alarms areas. Kruegel et
for classification. 3. Learn a model using the training data. 4. al. [10] proposed a multi-senor fusion approach where the
Use the learned model to classify the unknown data samples. A outputs of different IDS sensors were aggregated to produce a
variety of classification techniques have been proposed in the single alarm. This approach is based on the assumption that
literature. These include inductive rule generation techniques, any anomaly detection technique cannot classify a set of events
fuzzy logic and genetic algorithms-based techniques. as an intrusion with sufficient confidence. Although using
Inductive rule generation algorithms typically involve the Bayesian networks for intrusion detection can be effective in
application of a set of association rules and frequent episode certain applications, their limitations should be considered in
the actual implementation. Since the accuracy of this method
patterns to classify the audit data. The advantage of using rules
is that they tend to be simple and intuitive, unstructured and is dependent on certain assumptions that are typically based on
less rigid. As the drawbacks they are difficult to maintain, and the behavioral model of the target system, therefore, selecting
in some cases, are inadequate to represent many types of an accurate model is the most important things towards solving
information. A number of inductive rule generation algorithms the problem. Unfortunately selecting an accurate behavioral
have been proposed in literature. Some of them first construct a model is a difficult task as typical networks are complex.

Authorized licensed use limited to: UNIVERSIDAD VERACRUZANA. Downloaded on July 01,2021 at [Link] UTC from IEEE Xplore. Restrictions apply.
Typical datasets for intrusion detection are very large and IV. CONCLUSIONS
multidimensional. To tackle the problem of high dimensional Networks are becoming increasingly complex at the same
datasets, researchers have developed a dimensionality time that security concerns do not cease to grow and require
reduction technique known as principal component analysis more and more attention. Hence, there is a strong need for
(PCA). PCA is a technique where n correlated random anomaly detection as a frontline security research area for
variables are transformed into d<n uncorrelated variables. The network security. In order to give a clear vision about the use
uncorrelated variables are linear combinations of the original of this technique, we present in this paper a classified survey of
variables and can be used to express the data in a reduced form. the methods that are used for anomaly detection in networks.
Shyu et al. [11] proposed an anomaly detection scheme, where We believe that a deeper knowledge is required until this
PCA was used as an outlier detection scheme and was applied technology achieves a solid maturity.
to reduce the dimensionality of the audit data and arrive at a
classifier that is a function of the principal components.
REFERENCES
Mahoney et al. [12–14] presented several methods that [1] H. S. Javitz and A. Valdes, The SRI Statistical Anomaly Detector,
address the problem of detecting anomalies in the usage of Proceedings of the 1991 IEEE Symposium on Research in Security and
network protocols by inspecting packet headers. The common Privacy, May 1991.
denominator of all of them is the systematic application of [2] D. E. Denning, An Intrusion Detection Model, IEEE Transactions on
learning techniques to automatically obtain profiles of normal Software Engineering, SE-13, pp. 222-232, 19517.
behavior for protocols at different layers. Packet Header [3] S.E. Smaha, Haystack: An intrusion detection system, in: Proceedings of
Anomaly Detector (PHAD) [12], LEarning Rules for Anomaly the IEEE Fourth Aerospace Computer Security Applications
Detection (LERAD) [13] and Application Layer Anomaly Conference, Orlando, FL, 1988, pp. 37–44.
Detector (ALAD) [14] use time-based models in which the [4] S. Staniford, J.A. Hoagland, J.M. McAlerney, Practica automated
detection of stealthy portscans, Journal of Computer Security 10, 2002,
probability of an event depends on the time. For each attribute, pp. 105–136.
they collect a set of allowed values and flag novel values as [5] H.H. Hosmer, Security is fuzzy!: applying the fuzzy logic paradigm to
anomalous. PHAD, ALAD, and LERAD differ in the attributes the multipolicy paradigm, in: Proceedings of the 1992-1993 Workshop
that they monitor. PHAD monitors 33 attributes from the on New Security Paradigms Little Compton, RI, United States, 1993.
Ethernet, IP and transport layer packet headers. ALAD models [6] J.E. Dickerson, J.A. Dickerson, Fuzzy network profiling for intrusion
incoming server TCP requests: source and destination IP detection, in: Proceedings of the 19th International Conference of the
addresses and ports, opening and closing TCP flags, and the list North American Fuzzy Information Processing Society (NAFIPS),
Atlanta, GA, 2000,pp. 301–306.
of commands in the application payload. Depending on the
attribute, it builds separate models for each target host, port [7] M. Crosbie, G. Spafford, Applying genetic programming to intrusion
detection, in: Working Notes for the AAAI Symposium on Genetic
number (service), or host/port combination. LERAD also Programming, Cambridge, MA, 1995, pp. 1–8.
models TCP connections. The authors break down the [8] D. Heckerman, A Tutorial on Learning With Bayesian Networks,
multivariate problem into a set of univariate problems and sum Microsoft Research, Technical Report MSRTR-95-06, March 1995.
the weighted results from range matching along each [9] A. Valdes, K. Skinner, Adaptive model-based monitoring for cyber
dimension. The advantage of this approach is that it makes the attack detection, in: Recent Advances in Intrusion Detection Toulouse,
technique more computationally efficient and effective at France, 2000, pp. 80–92.
detecting network intrusions. [10] C. Kruegel, D. Mutz, W. Robertson, F. Valeur, Bayesian event
classification for intrusion detection, in: Proceedings of the 19th Annual
Computer Security Applications Conference, Las Vegas, NV, 2003.
D. Anomaly detection using finite state machines
[11] M.-L. Shyu, S.-C. Chen, K. Sarinnapakorn, L. Chang, A novel anomaly
A finite state machine (FSM) is a model of behavior detection scheme based on principal component classifier, in:
composed of states, transitions and actions. In this model, a Proceedings of the IEEE Foundations and New Directions of Data
state stores information about the past, a transition indicates a Mining Workshop, Melbourne, FL, USA, 2003, pp. 172–179.
state change and is described by a condition that would need to [12] M.V. Mahoney, P.K. Chan, PHAD: Packet Header Anomaly Detection
for Identifying Hostile Network Traffic Department of Computer
be fulfilled to enable the transition. An action is a description Sciences, Florida Institute of Technology, Melbourne, FL, USA,
of an activity that is to be performed at a given moment. Technical Report CS-2001-4, April 2001.
The finite state machine has been used to detect attacks on [13] M.V. Mahoney, P.K. Chan, Learning Models of Network Traffic for
Detecting Novel Attacks Computer Science Department, Florida
the DSR protocol in [15]. First, an algorithm for monitor Institute of Technology CS-2002-8, August 2002.
selection for distributed monitoring all nodes in networks was [14] M.V. Mahoney, P.K. Chan, Learning nonstationary models of normal
proposed and then the correct behaviors of the nodes according network traffic for detecting novel attacks, in:Proceedings of the Eighth
to DSR were manually abstracted. Using this method has the ACM SIGKDD International Conference on Knowledge Discovery and
advantage of detecting intrusions without the need of trained Data Mining, Edmonton, Canada, 2002, pp. 376–385.
data or signatures, also unknown intrusions can be detected [15] P. Yi, Y. Jiang, Y. Zhong, and S. Zhang, Distributed Intrusion Detection
with few false alarms. As a result, a distributed network for Mobile Ad hoc Networks, Proceedings of the 2005 Symposium on
Applications and the Internet Workshops (SAINTW'05),pp. 94-97.
monitor architecture which traces data flow on each node by
[16] R. Sekar, A. Gupta, J. Frullo, T. Shanbhag, A. Tiwari, H. Yang, S. Zhou,
means of finite state machine was proposed. In Ref. [16], Sekar Specification-based anomaly detection: a new approach for detecting
et al. present a specification-based model as well as a prototype network intrusions, Proceedings of the Ninth ACM Conference on
with excellent detection performance. The model proposed by Computer and Communications Security, Washington, DC, USA,
authors consists of developing protocol specifications by using November 18–22, 2002, pp. 265–274.
Extended Finite State Automata (EFSA).

Authorized licensed use limited to: UNIVERSIDAD VERACRUZANA. Downloaded on July 01,2021 at [Link] UTC from IEEE Xplore. Restrictions apply.

Common questions

Powered by AI

Finite state machines (FSMs) contribute to detecting intrusions in network protocols by using state-based modeling to trace correct protocol behaviors and changes in state based on defined conditions. FSMs describe sequences of actions and transitions that should occur with normal protocol usage, allowing deviations and anomalies to be identified when an unexpected transition or state occurs. They effectively model protocol specifications, aiding in detecting unknown intrusions without pre-existing data signatures or trained data, thus reducing false alarm rates while focusing on actual breaches in protocol usage .

Fuzzy logic techniques and genetic algorithms each bring distinct advantages and disadvantages to anomaly detection. Fuzzy logic offers adaptability and simplicity in defining rules that can handle uncertainty and imprecision in data, making it intuitive for profiling network behavior . However, their rules can be unstructured and inadequate for representing complex information . Conversely, genetic algorithms excel in robustness and flexibility, capable of searching globally for optimal solutions to classify anomalies effectively . They are less intuitive but can handle complex, non-linear relationships in data robustly . The main drawback is their computational intensity and potential lack of interpretability, requiring careful tuning and resource allocation .

High false alarm rates, such as those observed in SPADE, hinder the overall effectiveness of anomaly detection systems by leading to "alarm fatigue," where genuine threats might be ignored due to frequent false alarms . This diminishes users' trust in the system, making security management inefficient. Such high false positives cause the system to become less reliable and resource-intensive as more time and effort are required to analyze unnecessary alerts, impacting its operational utility negatively . Maintaining an optimal false alarm rate is crucial for balancing alert sensitivity and reliability.

Inductive rule generation algorithms impact the flexibility of anomaly detection models positively by creating adaptable and straightforward rules based on observed patterns and associations within data . These rules can simplify model logic and allow for easier modification and understanding of detection criteria. However, they may negatively affect maintenance due to potential complexity in managing rule sets as they grow and change, especially if many rules become necessary to cover a wide variety of anomaly types . Over time, rules might require regular updates to maintain relevance, making ongoing maintenance a significant challenge that must be managed carefully .

The primary challenges in using Bayesian networks for intrusion detection in complex network environments include accurately modeling the probabilistic relationships among numerous variables. These relationships are crucial as the accuracy of Bayesian networks largely depends on the assumptions made about behavioral models, which can be difficult to establish due to the complexity and dynamic nature of network traffic . Additionally, the computational cost associated with managing large, multidimensional datasets and updating models as environments evolve presents further challenges in maintaining model accuracy and efficiency .

Extended finite state automata (EFSA) differ from traditional finite state machines in anomaly detection by enhancing descriptive power and flexibility. EFSA incorporate additional variables and conditions into the state transitions, allowing for more detailed and context-dependent behavior modeling, compared to traditional finite state machines which rely solely on discrete states and fixed transitions . This additional expressiveness enables EFSA to capture more complex sequences of events and conditions, thus improving detection performance in intricate environments like network protocols, where context and state need sophisticated specification .

Anomaly score enhances the identification of port scans by offering a quantifiable measure of deviation from normal network activity. Systems like SPADE use anomaly scores to detect port scans by assigning higher scores to packets that appear less frequently, as these are more likely to represent suspicious activities . This approach shifts focus from traditional temporal detection (e.g., p attempts over q seconds) to examining the rarity and potential threat level of the packets, allowing for more nuanced detection of stealthy port scans that might not trigger alarms in traditional systems .

Principal component analysis (PCA) enhances anomaly detection accuracy by reducing the dimensionality of data, transforming it into a set of uncorrelated variables that maintain the most informative aspects of the original variables . This reduction simplifies data complexity, allowing detection algorithms to focus on significant patterns without the noise of extraneous dimensions. By capturing the variance in fewer dimensions, PCA helps to identify anomalies more effectively as outliers in a lower-dimensional space, where deviations from the expected behavior are more pronounced .

Classifier-based anomaly detection methods incorporate flexibility and adaptiveness through their ability to predict normal behavior and identify deviations when monitored. These systems use a variety of classification techniques such as inductive rule generation, fuzzy logic, and genetic algorithms, allowing them to adapt to new data by learning from past events and updating their models . For example, inductive rule generation provides simple and intuitive rules that can be adjusted, while fuzzy logic is adaptable due to its less rigid structure, and genetic algorithms are robust in exploring diverse solutions through global searches .

Statistical anomaly detection systems face significant challenges in determining thresholds that balance the likelihood of false positives with false negatives due to the variability and unpredictability of data patterns. Setting a threshold too high may result in overlooking actual anomalies (increasing false negatives), while a threshold too low can lead to frequent false alarms (increasing false positives). This balancing act requires careful tuning and often a compromise, leveraging statistical distributions and domain expertise to adjust thresholds appropriately based on historical data and evolving patterns, which can be complex given not all behaviors can be modeled purely statistically .

You might also like