Android IDS for CAN Bus Security
Android IDS for CAN Bus Security
Abstract—Modern automobiles depend heavily on electronics, Foster et al. [6] demonstrated remote attacks via compromise
controlled by the vehicle’s internal network. The controller area of telematic devices often used by insurance companies or
network (CAN bus) is the predominant protocol, known for its
GLOBECOM 2022 - 2022 IEEE Global Communications Conference | 978-1-6654-3540-6/22/$31.00 ©2022 IEEE | DOI: 10.1109/GLOBECOM48099.2022.10001536
1782
Authorized licensed use limited to: Hellenic Mediterranean University. Downloaded on February 19,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
2022 IEEE Global Communications Conference: Communication & Information Systems Security
the CAN bus, these messages would appear legitimate, even without tools [10]. An inexpensive piece of standard hardware,
expected. An intrusion detection system (IDS) might recognize an ELM 327-type device [11], connects the OBD-II port to the
an increase in traffic, as the legitimate messages compete IDS application on the consumer’s smartphone. This Android-
with the injected messages. An IDS might also observe a based IDS application is called IDS for CAN.
change in the timing or identifier patterns of the messages.
However, the CAN bus will not. For example, an ECU sends
the vehicle speed to the speedometer on the dashboard. A flood
of messages asserting a different speed will drown out the
messages containing the real speed, and, in most vehicles, the
speedometer will display the spoofed speed because it tries to
display the most recently reported speed. Before it can display
the real speed, the next spoofed message will arrive, so with
a sufficient volume of spoofed messages, the real speed is
never displayed [4], [5]. An IDS would detect the substantial
increase in speed-reporting messages, but the CAN bus and
the ECU controlling the speedometer will not.
c) Contributions: In order to enhance security in auto-
motive networks, many researchers have designed IDS-based
mechanisms to defend against attacks. However, most existing
solutions are lacking in practicality and may also entail high Fig. 1: Our Setup: An ELM 327-type device (foreground) that
implementation cost due to required hardware or complex inserts into the OBD-II port (background). The 2015 Chevrolet
algorithms. Motivated by the above observations, we aim to Silverado is pictured.
develop a practical IDS for CAN bus security with low cost.
Our application alerts the consumer if suspicious communi-
Our contributions can be summarized as below:
cation is detected. When the application’s IDS identifies traffic
• We develop IDS for CAN, a practical ID sequence-based
that does not match the vehicle’s profile, IDS for CAN gener-
IDS, as a security enhancement to the existing security ates an Android notification to warn the user that the vehicle
mechanisms on modern automobiles. It is deployed as an may be compromised. The user should set the application’s
Android application to identify malicious activities (ID notification tone to a sound that he or she will immediately
sequences) on the CAN bus system. associate with the automotive security application. If the alert
• In the evaluation, we investigate the performance of IDS
occurs while the user is driving, then the user should hear the
for CAN with both CAN datasets and real vehicles. The alert tone and find a safe place to stop as soon as possible.
experimental results demonstrate the expected detection The user can then inspect the vehicle for additional indicators
performance with low false positives. of compromise, read the text of the alert to understand which
The rest of this paper is organized as follows. Section II type of alert has occurred, and potentially restart the vehicle
introduces our proposed IDS for CAN, including the selection to see if the alert persists. The user may take additional action
of detection methods and the implementation details. In Sec- he or she deems appropriate.
tion III, we provide a performance evaluation based on CAN
datasets. Section IV presents our evaluation with real vehicles. A. ID Sequence-based Detection
Section V introduces some related work on IDS in the CAN
system and Section VI discusses limitations and future work. There are two main approaches to an IDS [12]: signature-
Finally, we conclude the work in Section VII. based and anomaly-based. A signature-based IDS relies on the
signatures that identify known attacks. An anomaly-based IDS
II. O UR P ROPOSED A PPROACH has some form of baseline for normal behavior, and it detects
In this work, we develop IDS for CAN as a practical security deviations from normal behavior that are significant enough
enhancement that would be attainable to the average consumer. to warrant an alert. An anomaly-based IDS can develop its
The proposed enhancement is both low-cost and low-effort for baseline via a number of methodologies, such as specifications
the consumer, in order to encourage adoption. of expected behavior, a training phase in a machine-learning
We have devised and implemented a practical IDS that will process, or the tracking of statistical data [13].
alert consumers if any suspicious communication occurs in the When selecting an IDS type, we had to consider the fact
CAN bus system. The IDS comes in the form of an Android that, because we are creating an application that is affordable
smartphone application and interfaces with the vehicle via to the average consumer and utilizes standard hardware, it is
an inexpensive, standard piece of hardware. The Onboard going to be limited by the specifications of that hardware.
Diagnostic Protocol II (OBD-II) is a diagnostics port that has ELM 327-type hardware, especially with the latencies intro-
been mandatory in U.S.-made cars and light trucks since 1996 duced by Bluetooth, is known to occasionally drop packets.
and became mandatory in the European Union (EU) in 2001. Furthermore, on low-end devices, the messaging rate on the
It lies within arms’ reach of the drivers’ seat and is accessible CAN bus will exceed the sniffing rate of the device. As such,
1783
Authorized licensed use limited to: Hellenic Mediterranean University. Downloaded on February 19,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
2022 IEEE Global Communications Conference: Communication & Information Systems Security
TABLE I: Boolean matrix identifiers or unusual patterns of identifiers. If the IDS detects
013 0AA 0BD unknown IDs, it will generate the following alert: “Invalid
013 F T F messages have been detected. This may indicate a bus error
0AA F F T
0BD T F F
or an attack.” If, instead, the IDS detects a suspicious traffic
sequence, the following alert will be raised: “Unusual patterns
the IDS needs to be robust in the event of occasional packet of messages have been detected. This may be the result of
loss or inexact message timings. unusual activity, or it may indicate an attack.” The second
Ultimately, an ID sequence-based IDS was selected and type of alert is shown in Fig. 2.
implemented in this work. The IDS should track the sequences
of arbitration identifiers that appear in attack-free CAN traffic,
and develop a profile of the vehicle, in the form of a matrix,
which contains every transition from one identifier to another.
The matrix is initialized to the Boolean “False” value, and
the transitions observed in healthy CAN traffic are entered
as “True”, while all others remain false. During monitoring
mode, if a transition is “True” in the matrix, then it is normal
traffic. If a transition is “False” in the matrix, then it is
considered an anomaly [14], [15]. Algorithm 1 describes the
matrix generation process. The matrix serves as a profile for
ID sequence-based detection. The rows and columns of the
matrix correspond to the arbitration identifiers observed in the
CAN data. The matrix will be square, and its size will be
equivalent to the number of unique identifiers.
1784
Authorized licensed use limited to: Hellenic Mediterranean University. Downloaded on February 19,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
2022 IEEE Global Communications Conference: Communication & Information Systems Security
packets in order to overwhelm the packets from the legitimate dataset, which frequently produced upwards of 500,000 alerts,
ECU. As such, an occasional anomaly is more likely to be a differences of one or two digits of alerts were common.
less common legitimate packet, not a symptom of an attack. 1) Attack-free (Normal) State: Alerts raised during replay
We then crafted two criteria to allow the matrix to update in of the attack-free dataset are false positives. Raising the alert
the event of an occasional anomaly that is probably a false threshold can reduce these false positives, while lowering the
positive. threshold raises these false positives. Because users will ignore
• One criteria is a streak of healthy messages following the application’s notifications if they occur too frequently,
the initial anomaly. For example, if an anomaly occurs, reducing false positives is paramount.
and then 5,000 healthy messages follow, the anomaly is 2) Denial of Service Attack: A Denial of Service (DoS)
perhaps a false positive. If an anomaly occurs, and then, 5 attack delivers a high volume of CAN packets with the highest
messages later, a second anomaly occurs, then it is more priority arbitration identifiers in the hopes of crippling the
prudent to think in terms of an attack. CAN system. Because these packets have the highest priority,
• The second path to a matrix update is identification of a they will supersede legitimate packets of lower priority, such
high ratio of healthy messages to unhealthy–anomalous– that legitimate packets cannot use the system. The DoS attack
messages. The ratio is evaluated once a certain number of dataset is highly detectable, as the traffic is extremely abnor-
messages have been received since the last evaluation. For mal. Each variation of the experiment produced over 500,000
example, if 2,000 messages have been received, and, out alerts for invalid IDs.
of the total number of messages, the percent of healthy 3) Fuzzy Attack: A fuzzy attack, in this particular dataset,
messages is higher than 90%, then the matrix should involves sending CAN packets with random values for the ID
be updated. If the percent of healthy messages is lower, and data fields. The fuzzy attack raises fewer alerts, but it is
perhaps 60%, then an attack might be ongoing and an consistently detected under all experimental variations.
update will not occur. 4) Impersonation Attack: In an impersonation attack, the
attacker masquerades as a particular ECU and sends messages
III. E VALUATION WITH DATASETS as though they came from that particular ECU. Of the three
In the evaluation, we adopted the datasets produced by Lee attack datasets, the impersonation attack proved most difficult
et al. [16], [17]. They had developed an IDS for the CAN bus for the IDS to detect. If the alert threshold is high to reduce
that tracked and analyzed the offset ratio and time interval false positives, then the impersonation attack goes undetected.
between the CAN request and the CAN response messages. There is a narrow range of thresholds that are both low enough
In particular, they developed four CAN datasets: a control to detect the impersonation attack and high enough to avoid
set, a denial of service (DoS) attack, a fuzzy attack, and an false positives in the attack-free dataset.
impersonation attack. The data was extracted from a Kia Soul. In our experiments, we determined that the alert threshold
In order to replay the datasets, we had to convert the datasets could be as low as 3 without generating false positives in the
to the format expected by canplayer [18]. attack-free dataset, and it could also be as high as 7 with-
out generating false negatives in the impersonation dataset–
A. Experimental Setup provided that the remaining parameters were set appropriately.
The experiments were conducted in a stand-alone Java Hence an alert threshold of 5 seemed to be an appropriate
application we set up specifically for IDS development and balance between the two, and to reduce the danger of false
evaluation. Our machine was running the Linux Mint operating negatives, the matrix update parameters were set to somewhat
system and utilizing SocketCAN to replay the datasets. The stricter values. If the matrix were to update such that attack
first 1,000,000 messages in the attack-free dataset were used traffic was treated as legitimate traffic, there would be a serious
for training, while the remaining 1,369,398 messages were problem. As such, the streak of healthy traffic required to
used for testing. update the matrix is set to 2,500, and the percent healthy traffic
out of total traffic cannot be checked until 5,000 messages
B. Discussion of Results are received, such that the percent is more stable when it is
In the ideal case, the attack-free dataset should not raise checked.
any alerts, while all other datasets should raise at least one
alert. We restarted and retrained the IDS for each experiment, IV. E VALUATION WITH R EAL V EHICLES
so that matrix updates in one experiment did not affect a We further collected our own data to validate the per-
subsequent experiment. Note that even with a virtual replay formance of our IDS for CAN application. We tested some
of CAN frames, packet loss does occur, so an experiment run hardware and eventually found a CAN-to-USB device that
multiple times may produce different results every time. That connected our Linux machine to our vehicle and allowed to
said, we observed that if an experiment raised at least one collect traces and conduct attacks. While testing our applica-
alert the first time, it would raise at least one alert the second tion on all the vehicles available–a Chevrolet Impala, Traverse,
time. If no alerts were generated the first time, no alerts would and Silverado, we finally selected the 2011 Chevrolet Impala
be generated the second time. For some of the datasets that for our experimental evaluation. This is because Impala had
produced an extreme number of alerts, such as the DoS attack fewer electronic systems than the others, which meant its
1785
Authorized licensed use limited to: Hellenic Mediterranean University. Downloaded on February 19,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
2022 IEEE Global Communications Conference: Communication & Information Systems Security
CAN traffic was easier to comprehend. In addition, because it Attack, idle - 1 (936,256 messages)
was more mechanical and less electronic, we had less risk of Attack, idle - 2 (123,963 messages)
sending a message that could be damaging or dangerous.
C. Interpreting the Results
A. Crafting the Attack Similar to our evaluation using the datasets from Lee et
When tracing the Impala, we were able to determine that the al. [16], [17], we used the first 1,000,000 attack-free CAN
arbitration identifier “0C9” is associated with the Revolutions messages as training data. To assess false positives, we re-
per Minute (RPM) gauge. By observing the corresponding data played the remaining collections of normal traffic. There were
field, we were able to construct CAN messages to spoof the two false positives in the second attack-free driving dataset,
RPM gauge. We could set the gauge to zero while accelerating, and there was one false positive in the attack-free idle dataset,
and we could set it to 2000 RPMs when the vehicle was parked as summarized in Table II. Here, “alert” refers to an Android
with the engine off. The settings are summarized as below: notification warning the user that a potential attack has been
detected. An alert will not be generated unless the anomaly
The following CAN packet sets the gauge to zero RPMs:
threshold has been reached. An “anomaly” refers to an unusual
0C9#00000000004008
event, such as an unknown arbitration identifier or a suspicious
The following CAN packet sets the gauge to 2000 RPMs: ID sequence. In this experiment, the anomaly threshold for ID
0C9#8021C0071B101000 sequences was set to five, meaning that the IDS will raise
While manipulating the RPM gauge, we also found that the an alert if it detects five or more suspicious ID sequences in
arbitration identifier “0C9” is used to indicate if the vehicle a short period. Table II shows that all of the attack datasets
is in gear (not in “park”) or if the vehicle is not in gear (in triggered alerts, even when the attacks were extremely short.
“park”). The second to last byte is “40” when in park, and it is This is an extremely promising result.
“41” when in drive, neutral, reverse, etc. In the Impala, telling V. R ELATED W ORK
the vehicle it was in park while driving had no discernible
In the literature, many research studies have been devoted to
effect. However, when we tried this attack on a 2011 Chevrolet
addressing security concerns in automotive networks, includ-
Traverse, telling the vehicle it was in park while driving, the
ing multiple variations of the ID sequence-based IDS [19].
vehicle would behave as though in neutral: the accelerator
Miller and Valasek [5] developed an intrusion prevention
did not work, but the vehicle continued to roll until it ran
system (IPS) for the CAN bus. They implemented the IPS on
out of momentum. By checking the behavioral discrepancy
a board computer, though they pointed out that it could be
between the two vehicles, we discovered that the Traverse has
implemented as an ECU or used to augment an existing ECU.
an electronic transmission, so the transmission was reacting to
However, to our knowledge, the technical details of this IPS
the spoofed CAN message. The Impala’s transmission is not
have never been published, including hardware specifications,
fully electronic, so it did not react to our attack messages.
algorithms, and source code. When the IPS shorts the CAN
B. Assembling the Datasets bus, the attack stops–and so does all CAN communication.
Ultimately, we collected traces of normal traffic–both driv- Abbott-McCune and Shay [20] outlined an IDS approach that
ing and idling–from the 2011 Chevrolet Impala, followed by can be extended into intrusion prevention. They suggested
traces of attack traffic. The descriptions of the datasets, as well adding an additional CAN transceiver which will, in the event
as the number of messages in each, are as follows:3 of a malicious CAN packet, send several dominant bits (“0”
bits), causing a bus error and invalidating the malicious packet.
Attack-free data, driving - 1 (1,027,968 messages)
This approach is more sophisticated and safer than shorting the
Attack-free data, driving - 2 (392,921 messages)
CAN bus, but the system involves a lot of added hardware and
Attack-free data, idle - 1 (267,884 messages)
wiring, which might increase the cost.
Attack, driving - 1 (168,415 messages)
Olufowobi et al. [21] proposed an IPS based on reboot-
Attack, driving - 2 (36,498 messages)
recovery, in which an IDS is required to be placed in the
Attack, driving - 3 (189,554 messages)
CAN system to observe all traffic. When it detects suspicious
3 Our collected datasets are available at: [Link] traffic, it transmits an error frame that increments the error
ids for can/tree/master/datasets. counter of the anomalous node, ultimately leading to reboot.
1786
Authorized licensed use limited to: Hellenic Mediterranean University. Downloaded on February 19,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
2022 IEEE Global Communications Conference: Communication & Information Systems Security
1787
Authorized licensed use limited to: Hellenic Mediterranean University. Downloaded on February 19,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.