Fault-Based Attacks on ECC Security
Fault-Based Attacks on ECC Security
Countermeasures for
Elliptic Curve Cryptosystems
by
A thesis
presented to the University of Waterloo
in fulfilment of the
thesis requirement for the degree of
Doctor of Philosophy
in
Electrical and Computer Engineering
iii
Abstract
For some applications, elliptic curve cryptography (ECC) is an attractive choice be-
cause it achieves the same level of security with a much smaller key size in comparison
with other schemes such as those that are based on integer factorization or discrete log-
arithm. Unfortunately, cryptosystems including those based on elliptic curves have been
subject to attacks. For example, fault-based attacks have been shown to be a real threat
tacks and countermeasures for ECC. We propose a new fault-based attack against the
Montgomery ladder elliptic curve scalar multiplication (ECSM) algorithm. For security
to fault attacks against ECSM at two levels: module and algorithm. For protections at
the module level, where the underlying scalar multiplication algorithm is not changed, a
parallel computation. It is shown that these structures can be used for detecting errors
with a very high probability during the computation of ECSM. For protections at the
algorithm level, we use the concepts of point verification (PV) and coherency check (CC).
We investigate the error detection coverage of PV and CC for the Montgomery ladder
always method that are resistant to the safe error (SE) attack. We demonstrate that one
of these algorithms also resists the sign change fault (SCF) attack.
v
Acknowledgements
vii
Malena.
I also want to thank CONACYT, and especially the ITESM Campus Querétaro
México for sponsoring my studies at the University of Waterloo, allowing me to
achieve this goal in my life and reinforcing my commitment to the development
and progress of my country.
viii
To Lila, my beloved wife
ix
Contents
1 Introduction 1
1.1 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Summary of research contributions . . . . . . . . . . . . . . . . . . 7
2 Background 9
2.1 Finite fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Elliptic curve cryptography (ECC) . . . . . . . . . . . . . . . . . . 15
2.2.1 Non-supersingular elliptic curves of characteristic two . . . . 16
2.2.2 Elliptic curves of characteristic p > 3 . . . . . . . . . . . . . 17
2.2.3 Group law . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.4 Group order and structure . . . . . . . . . . . . . . . . . . . 19
2.2.5 Coordinate systems . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Elliptic curve scalar multiplication (ECSM) . . . . . . . . . . . . . 21
2.3.1 Montgomery’s ECSM method . . . . . . . . . . . . . . . . . 25
2.3.2 Elliptic curve discrete logarithm problem (ECDLP) . . . . . 30
2.4 ECC fault-based attacks . . . . . . . . . . . . . . . . . . . . . . . . 33
2.5 Error detection strategies . . . . . . . . . . . . . . . . . . . . . . . . 40
xi
2.5.1 Point verification (PV) . . . . . . . . . . . . . . . . . . . . . 41
2.5.2 Re-computation and parallel computation . . . . . . . . . . 43
2.5.3 Coherency check (CC) . . . . . . . . . . . . . . . . . . . . . 45
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
xii
4.4 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.4.1 Parameters, fault model, and process . . . . . . . . . . . . . 113
4.4.2 Results obtained . . . . . . . . . . . . . . . . . . . . . . . . 116
4.4.3 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
xiii
B Example of Undetected Errors with Point Negation 169
Bibliography 173
xiv
List of Tables
xv
4.7 Average Hamming weight of the Z-coordinate of the result . . . . . 119
5.1 Operation counts for the Montgomery ladder ECSM method . . . . 140
5.2 Operation counts double-and-add-always ECSM method F2m . . . . 150
5.3 Operation counts double-and-add-always ECSM method t = 163 . . 151
5.4 Operation counts double-and-add-always ECSM method Fp . . . . . 152
5.5 Operation counts double-and-add-always ECSM method t = 192 . . 153
xvi
List of Figures
xvii
A.1 Values of the random variable w in Tables A0 and A1 . . . . . . . . 164
B.1 Stuck-at-1 fault at the gate that gets bit i of the y-coordinate . . . 170
xviii
List of Acronyms
xix
ISO International Standards Organization
ISO-IEC International Standards Organization - International
Electrotechnical Commission
lcm Least Common Multiple
NIST National Institute of Standards and Technology
PC Parallel Computation
PRC Parallel and Re-Computation
PV Point Verification
RC Re-Computation
RC partial Partial Re-Computation
RSA Rivest Shamir Adleman
SCF Sign Change Fault
SE Safe Error
SPA Simple Power Analysis
TMR Triple Modular Redundancy
xx
Chapter 1
Introduction
Cryptography refers to design principles, means and methods for rendering plain
information unintelligible to unauthorized parties. In the past, cryptography was
used only for secret communications between powerful security entities such as
military and intelligence agencies. Today, with the widespread use of computers
and the Internet, secure communications are more than a privilege; they are a
priority requirement even for the general public. In 1976, with the introduction of
public key cryptography by Diffie and Hellman [25], secure communications were
made practical. E-commerce and smart cards are examples of how cryptographic
applications have become a part of everyday life.
Elliptic curve cryptography (ECC) was independently proposed by both Miller
[67] and Koblitz [50] in 1985. Since then, ECC has been a subject of extensive re-
search and standardization efforts that have led it to be widely known and accepted.
Some of the ECC standards include: FIPS 186 [33], IEEE P1363 [42], ANSI X9.62
[5], and ISO 15946 [43]. Today ECC is an attractive choice because it achieves
1
2 Introduction
the same level of security with a much smaller key size in comparison with other
methods such as integer factorization or discrete logarithm based cryptosystems.
Smaller key sizes generate smaller signatures, require less memory for storage, and
use less bandwidth for communications. Because of this, for devices such as smart
cards, personal digital assistants (PDAs), and mobile telephones, ECC appears to
be an attractive choice for providing the public key services required for secrecy,
authentication and non-repudiation purposes. In secrecy terms, ECC could provide
the key distribution privacy needed for symmetric algorithms. For authentication
and non-repudiation, ECC could be used to provide digital signatures.
Unfortunately, cryptosystems including those based on elliptic curves have been
subject to attacks. Cryptoanalytic attacks may reveal system vulnerabilities, which
then need to be addressed with countermeasures. Various researchers have empha-
sized the significance of cryptographic applications being resistant to side-channel
analysis (e.g., reconstruction of a secret key from analysis of timing [52], power
consumption signals [53], and electromagnetic emanations [2] during cryptographic
operations).
Another type of attacks that has received considerable attention is the fault
analysis attack. Introduced by Boneh et al. [15], this attack is based on produc-
ing malfunctions in cryptosystems to leak sensitive information (i.e., secret keys).
They have shown how some cryptographic schemes, such as the RSA and Rabin
digital signatures, are vulnerable to induced computational errors. Particularly,
for an implementation of RSA based on the Chinese Remainder Theorem (CRT),
they have demonstrated that with only two signatures of the same message (one
computed correctly and one produced after some fault), it is possible to efficiently
factor the modulus used. In order to avoid such attacks, they suggest verifying
3
add-always method (e.g., [23]). On the other hand, the SCF attack is applicable
to elliptic curves over prime fields, where a sign change in a point implies only
a change of sign of its y-coordinate. An interesting aspect of this attack is that
the elliptic curve operations do not need to leave the original group E(Fp ). They
assume that the attacker can induce a fault that produces a sign change into an
intermediate point during the ECSM operation. After having a set of erroneous
results due to SCF attacks and the correct result Q, it is possible to recover the
scalar k for a input pair (k, P ).
As described above, fault-based attacks against cryptosystems are a real threat
and should be taken into account. Accordingly, the design of cryptosystems should
include some countermeasures against fault-based attacks. In this thesis we present
our work on fault-based attacks and countermeasures for ECC.
In general, fault attacks take advantage of errors that occur while a crypto-
graphic device is performing a private-key operation. Such errors may be induced
by a malicious adversary who has physical access to the device or may occur be-
cause of hardware failure. An adversary may derive sensitive information from the
incorrect output. Thus, error detection is an essential process from a security point
of view. In the case of ECC, PV has been shown to be an important countermeasure
against fault attacks. However, since there exist attacks where PV is not sufficient,
it is necessary to include other protections.
While error detection is a sufficient countermeasure for preventing fault-based
attacks, fault-tolerant characteristics enable a system to perform its normal opera-
tion in the presence of some faults. This will result in more reliable systems where
faults may occur due to natural causes such as, abnormal temperature, electro-
magnetic interference (EMI) or power supply changes. Error detection plays an
6 Introduction
Background
9
10 Background
4. There exists an identity element e ∈ G called identity such that a∗e = e∗a = a
for any a ∈ G.
5. For every a ∈ G , there exists an inverse element a−1 ∈ G such that a∗a−1 = e.
Definition 2.2 A finite group G is said to be cyclic if all elements of the group
can be generated by applying the group operation repeatedly to an element α ∈ G
which is denoted as a generator of the group G.
Definition 2.3 For a finite group (G, ∗, e), the order of an element a (denoted
ord(a) = b) is the smallest positive integer b such that |a ∗ a ∗{z· · · ∗ a} = e.
b times
2.1. Finite fields 11
Theorem 2.1 Let G be a cyclic group of order n and d|n. Then G has φ(d)
elements of order d and φ(n) generators, where the function φ is called the Euler
phi function 1 [66].
Definition 2.4 A field F is a set of elements with two binary operators, denoted
as + and ·, which exhibits the following properties:
3. The distributive laws apply (i.e., a·(b+c) = a·b+a·c and (b+c)·a = b·a+c·a
for all a, b, c ∈ F ).
Example 2.1 (a) The set of real numbers under multiplication and addition is a
field (infinite field). (b) Let p be a prime. The set {0, 1, . . . , p − 1} forms a finite
1
φ(n) corresponds to the number of positive integers < n and relatively prime to n.
12 Background
Theorem 2.2 For p prime and m ≥ 2, there is a unique finite field of order pm ,
denoted as Fpm . It is called the extension field of the prime field. A proof of this
theorem is presented by Golomb and Gong [38].
Definition 2.5 Let m ≥ 2. The field F2m is called characteristic-two finite field or
binary finite field. It can be seen as a vector space of dimension m over the field F2
which has only the elements 0 and 1. There are m elements (αm−1 , αm−2 , . . . , α1 , α0 )
in F2m such that each element a ∈ F2m can be represented in the following form:
The set {αm−1 , αm−2 , . . . , α1 , α0 } is called a basis of F2m over F2 . Every element
in the field can be represented as a bit string of the form (am−1 am−2 . . . a1 a0 ). The
field addition is simply the bit-wise XOR operation, and the field multiplication
depends on the field basis chosen. The selection of a particular basis may depend
on the used platform (e.g., hardware or software). This choice frequently influences
implementation cost and the complexity of finite field computations. The two most
common types of bases used in conventional software and hardware applications are
polynomial and normal bases. Others, like dual, redundant and triangular bases,
are less commonly used but have some advantages in specific implementations. The
work discussed in this thesis uses polynomial basis.
In polynomial basis the elements of F2m are represented as linear combinations
of the set {αm−1 , αm−2 , . . . , α2 , α, 1}, where α is a root of an irreducible polynomial
f (z) of degree m over F2 . Polynomial basis is in many cases referred to as canonical
2.1. Finite fields 13
Table 2.1: NIST-recommended binary finite fields and their reduction polynomials
or standard basis. For U.S. Federal Government usages, the National Institute of
Standards and Technology (NIST) has recommended five binary fields along with
their corresponding reduction polynomials (see Table 2.1 [32]). It is important to
note two aspects of these polynomials. First, these polynomials are trinomial or
pentanomial. Second, the degree of the second leading term is a small number in
comparison with the extension degree of the binary field (i.e., m). Both aspects are
important in terms of efficiency of finite field operations.
Binary finite fields are very attractive to implementers due to their “carry-
free” arithmetic, and the availability of different representations of the field (i.e.,
basis and/or polynomial selection) which can be suited to and optimized for the
computational environment [11]. Some cryptographic applications, such as elliptic
curve cryptosystems, permit the use of either a prime field Fp or a binary finite
field F2m . For hardware implementation, in order to reduce the complexity of the
design, a binary finite field may be selected [1] [78].
equation from a given x ∈ F2m . Also, it plays a crucial step in the context of
ECSM utilizing point halving [49]. In this thesis, some results from this subsection
are used in Chapters 3 and 5. For more details on this topic the reader is referred
to [8] and [18].
First, let us define two functions that are important for solving quadratic equa-
tions over the binary field, the trace and half-trace functions:
Definition 2.6 The trace function of an element β ∈ F2m , denoted by Tr(β), can
be defined as
m−1
X i
Tr(β) = β2 .
i=0
The trace and the half-trace functions are linear, i.e., Tr(α + β) = Tr(α) + Tr(β)
and Ht(α + β) = Ht(α) + Ht(β), for all α, β ∈ F2m . Also it can be shown that
Tr(β) = Tr(β 2 ) = Tr2 (β), Ht(β 2 ) = Ht2 (β), and Ht(β 2 ) + Ht(β) = Tr(β) + β, for
all β ∈ F2m .
Let us solve an equation over F2m of the form M 2 + uM + v = 0. First consider
√ m−1
the trivial case where u = 0 in which case the solution is M = v = v 2 . Then
for u 6= 0, we can perform a change of variables M ← M u that yields the following
simplified equation
M 2 + M + w = 0, where w = v/u2 . (2.1)
2.2. Elliptic curve cryptography (ECC) 15
Theorem 2.3 Let m be an odd integer. Equation (2.1) has a solution over F2m if
and only if Tr(w) = 0. In such a case one solution corresponds to z = Ht(w) and
the other to Ht(w) + 1.
Proof Let z be a solution of Equation (2.1). Obtaining the trace function of this
equation, we have
This will be valid if and only if Tr(w) = 0. To show that z = Ht(w) is a solution
we can obtain that
in which case z is a solution as claimed. It can be easily verified that Ht(w) + 1 also
satisfies Equation (2.1).
E : y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6 , ai ∈ Fq . (2.2)
Equation (2.2) is referred to as the affine form of the Weierstrass equation. The
resulting set of points, plus an additive identity defined as the point at infinity
(O), together with a particular operation, denoted as point addition (]), form an
abelian group.
16 Background
Definition 2.9 Two elliptic curves are isomorphic over Fq if they have defining
equations which are the same under some admissible changes of variables.
a3 a21 a4 + a23
x = a21 x1 + y = a31 y1 + ,
a1 a31
where a02 and a06 are appropriate functions of a1 , a2 , a3 , a4 , and a6 . Now we can write
a simplified affine Weierstrass equation for binary finite fields for non-supersingular
elliptic curves as follows,
y 2 + xy = x3 + ax2 + b. (2.3)
Theorem 2.4 Let E and E be non-supersingular elliptic curves defined over F2m .
E and E given by the equations
E : y 2 + xy = x3 + ax2 + b
E : y 2 + xy = x3 + āx2 + b̄
are isomorphic over F2m if and only if Tr(a) = Tr(ā) and b = b̄. If the last conditions
are met, then there is an admissible change of variables (x, y) → (x, y + tx) that
converts E into E for some t ∈ F∗2m that satisfies ā = t2 + t + a.
2.2. Elliptic curve cryptography (ECC) 17
From Equation (2.2) we can complete the square in the left side,
2
a1 x a3 2 3 a21 2
a1 a3 a3
y+ + = x + a2 + x + a4 + x+ + a6 ,
2 2 4 2 4
a1 x a3
where y1 = y + 2
+ 2
; and a02 , a04 , and a06 are appropriate functions of a1 , a2 , a3 ,
a02
a4 , and a6 . Now we can perform a change of variable in the right side as x1 = x + 3
obtaining,
for some constants a and b. Now we can write a simplified affine Weierstrass
equation for prime finite fields (p > 3) as follows,
y 2 = x3 + ax + b, (2.4)
Let P and Q be any two distinct points on E, which is defined by either Equation
(2.3) or (2.4). The rules for point addition are given as follows:
2. The negative of the point P = (x, y), denoted as −P , such that P ](−P ) = O
is,
(x, x + y) if E is defined over F2m ,
−P =
(x, −y) if E is defined over Fp .
Definition 2.11 The order of a point P ∈ E(Fq ), denoted as ord(P ), is the small-
est positive integer e such that eP = O.
Using a result of Lagrange’s theorem from group theory [89], we can affirm that
the order of any point always divides the order of the group. As a result, if #E(Fq )
is prime, then the order of any point is #E(Fq ) and it can be used as a generator.
For cryptographic applications, the order of a selected point P should be divisible
by a sufficiently large prime [42].
In order to learn more about the structure of the group E(Fq ), it is important
to know the value of #E(Fq ). The next well-known theorem gives us a bound for
this parameter.
The exact value of #E(Fq ) can be efficiently obtained using some point counting
algorithms such as the Schoof-Elkies-Atkin algorithm [11] for curves over prime
fields, or the Satoh-Skjernaa-Taguchi algorithm [77] for curves over the binary field.
Let Zn be a cyclic group of order n. The following theorem is about the group
structure of E(Fq ).
Theorem 2.6 (Rück Theorem [75]). Let E be an elliptic curve over Fq . Then
E(Fq ) is isomorphic to Zn1 × Zn2 where n2 |n1 and n2 |(q − 1).
20 Background
Based on Theorem 2.6, E(Fq ) can be either a cyclic group or the direct sum of
two cyclic groups. With #E(Fq ) = n1 n2 , if n2 = 1, then E(Fq ) is a cyclic group. If
n2 > 1, then E(Fq ) is said to have rank 2 [40].
The elliptic curves obtained from Equations (2.3) and (2.4) with their respective
point addition rules are for affine coordinates (i.e., (x, y)). In order to reduce
computational cost, a number of projective coordinate systems (i.e., (X, Y, Z))
have been suggested in the literature. In Table 2.2 some of these systems with their
correspondence with the affine system are presented.
For each coordinate system, the total number of field operations is different
resulting in different time cost for elliptic curve point addition and doubling. Pre-
viously in Subsection 2.2.3, the EC group law using affine coordinates has been
given. Using relations between some other systems with the affine, it is possible to
obtain the point addition and doubling equations for those systems.
Standard projective (P) [64], Jacobian (J ) [42], Chudnovsky Jacobian (J c )
2.3. Elliptic curve scalar multiplication (ECSM) 21
[19], and López and Dahab’s (LD) [57] coordinate systems avoid the necessity to
compute one multiplicative inverse in each point operation. This is achieved at
the expense of more multiplications and storage space. The decision regarding
whether to use affine coordinate system or other is based primarily on implemen-
tation aspects such as the availability of memory for storing temporary values and
the relative performance of the field inversion and multiplication algorithms used
to implement the EC group operations. Table 2.3 shows the number of finite field
operations needed to perform point addition and doubling for each system [40] [22].
The symbols M , S, and I denote, respectively, the cost of finite field multiplication,
squaring, and inversion. These symbols are used for the remainder of this thesis.
{z· · · ] P}.
kP = |P ] P ]
k times
Table 2.3: Field operations count for point addition and doubling using various
coordinate systems
Q = kt−1 2t−1 P ] kt−2 2t−2 P ] · · · ] (k1 2P ) ] (k0 P ) . (2.5)
The operations in Equations (2.5) and (2.6) can be performed utilizing the well-
known double-and-add method. To that end, Algorithm 2.1 implements the opera-
tions of Equation (2.6). This algorithm scans bits of scalar k from left to right (i.e.,
from the most significant bit to the least significant bit), one bit at a time. In every
2.3. Elliptic curve scalar multiplication (ECSM) 23
1. Q ← O. 1. Q ← O.
3. Return(Q). 3. Return(Q).
For ECC applications that use a projective coordinate system for point repre-
sentation, Algorithm 2.1 is usually preferred over Algorithm 2.2. The main reason
is that the point addition required in Algorithm 2.1 uses a fixed operand (i.e., P ).
This aspect permits the use of mixed coordinates for point addition, i.e., one point
to be added is given in some projective coordinate system, and point P in the affine
24 Background
system. As illustrated in Table 2.3, the use of mixed coordinates saves some finite
field multiplications.
Coron [23] has shown that algorithms with a non-homogeneous operation flow,
such as Algorithms 2.1 and 2.2, are vulnerable to a simple power analysis (SPA)
attack. As a countermeasure he proposed a method called double-and-add-always.
The idea is to add a dummy point addition operation whenever the bit scalar is
equal to zero during the main loop. The corresponding method is presented in
Algorithms 2.3 and 2.4 for the left-to-right and right-to-left versions, respectively.
The included dummy operation permits to have a uniform execution flow, i.e., one
point addition and one point doubling are executed in every iteration during the
loop. As a result, there is a performance penalty since the required point operations
are t doublings and t additions. Moreover, Yen and Joye [91] have observed that
algorithms with dummy operations might be susceptible to a special fault attack
1. Q0 ← O. 1. Q0 ← O.
3. Return(Q0 ). 3. Return(Q0 ).
2.3. Elliptic curve scalar multiplication (ECSM) 25
called safe-error (SE) attack. This is an example that in some cases a countermea-
sure against one attack may benefit another attack.
2
Joye and Yen described this idea in the context of modular exponentiation.
26 Background
Now, let Q0,i = Li P and Q1,i = Mi P be points for i ∈ {0, t − 1}. Utilizing Equation
(2.7), we can obtain the pair (Q0,i , Q1,i ) as follows
(2Q0,i+1 , Q0,i+1 ] Q1,i+1 ) if ki = 0,
(Q0,i , Q1,i ) = (2.8)
(Q ]Q , 2Q ) if ki = 1.
0,i+1 1,i+1 1,i+1
Note that if Lt−1 = 1 and Mt−1 = 2 (i.e., Q0,t−1 = P and Q1,t−1 = 2P ), and we use
Equation (2.8) repeatedly for i from t − 2 to 0, then Q0,0 and Q1,0 will be kP and
(k + 1)P , respectively. The complete procedure that uses this idea is presented as
Algorithm 2.5. This algorithm is referred to as basic Montgomery’s ladder ECSM.
Let us use the word “basic” to distinguish this algorithm among others that do not
utilize the y-coordinate of the intermediate points Q0 and Q1 during the ECSM
computation.
1. Q0 ← P , Q1 ← 2P.
2. For i = t − 2 downto 0 do
3. Return(Q0 ).
2.3. Elliptic curve scalar multiplication (ECSM) 27
This algorithm keeps the difference between Q1 and Q0 equal to P at any value of
i during the loop. Also, one point doubling and one point addition are performed in
every iteration which make this algorithm attractive against attacks such as timing
[52] and SPA [53]. Additionally, since it does not have dummy instructions, the SE
fault attack does not apply. Furthermore, due to the usage of all point coordinates,
it is possible to have an improved error detection using coherency check among
involved variables. We discuss this in Chapter 5.
An extension of the ECSM Montgomery idea for non-supersingular elliptic
curves over the binary finite field was presented by López and Dahab [58]. They
showed algorithms for both affine and projective coordinate systems. Let us present
some resulting expressions from lemmas given by López and Dahab [58]. Let
P0 = (x0 , y0 ) and P1 = (x1 , y1 ) be points that belong to the elliptic curve defined
by Equation (2.3). The x-coordinate of P0 ] P1 , x2 , can be obtained as follows:
x0 y1 + x1 y0 + x0 x21 + x20 x1
x2 = . (2.9)
(x0 + x1 )2
Additionally the y-coordinate of P0 , y0 , can be obtained from P = (x, y), and the
28 Background
Based on Algorithm 2.5 and Equations (2.10) and (2.11), Algorithm 2.6 implements
the affine version of Montgomery’s ECSM. During each interaction of the algorithm,
a point doubling and a point addition are performed without y-coordinate. This is
possible due to the difference of the two intermediate points, namely Q0 and Q1 ,
being known (i.e., = P ). After the final interaction the x-coordinates of Q0 = Q =
kP and Q1 = (k + 1)P are obtained, i.e., Q0x and Q1x . Using these values and P ,
the y-coordinate of the result is computed in Step 3. Note that in Algorithm 2.6
x(·) corresponds to the x-coordinate of the point given in the argument.
2. For i = t − 2 downto 0 do
4. Return(Q0x , Q0y ).
2.3. Elliptic curve scalar multiplication (ECSM) 29
In Algorithm 2.6 the intermediate points and their related operations are in
the affine coordinate system. In applications where the multiplicative inverse is
relatively expensive, it might be more attractive to use a projective coordinate
system. To this end, López and Dahab [58] presented the projective version of
Algorithm 2.6. This particular algorithm represents an attractive option because
it gives a computational advantage over other algorithms that do not use pre-
computation such as that the binary [64] and addition-subtraction [42] methods.
Let P0 = (X0 , Y0 , Z0 ) and P1 = (X1 , Y1 , Z1 ) be points represented in the López
and Dahab projective coordinates system. Suppose that P = (x, y) is the difference
in affine coordinates between P1 and P0 . Then, the Z- and X-coordinates of the
point doubling and addition can be obtained by the following functions:
X 2Z 2 if P0 = P1 ,
0 0
Z(P0 ] P1 ) = Z2 = (2.12)
(X Z + X Z )2 if P0 6= P1 .
0 1 1 0
X 4 + bZ 4 if P0 = P1 ,
0 0
X(P0 ] P1 ) = X2 = (2.13)
xZ + X X Z Z if P 6= P .
2 0 1 0 1 0 1
requirements and a faster method since some finite field multiplications are saved.
2. For i = t − 2 downto 0 do
5. Else
The ECSM is the base operation used for ECC. In fact, the elliptic curve discrete
logarithm problem (ECDLP) is based on the difficulty of obtaining k given P and
2.3. Elliptic curve scalar multiplication (ECSM) 31
Q(= kP ) for some integer k and P , Q ∈ E(Fq ). This principle has led to schemes
equivalent to DLP-based cryptosystems, such as: Diffie-Hellman key exchange [25],
ElGamal public key encryption [30], ElGamal digital signatures [30], and DSA [33].
In practice for the ECDLP to be intractable, it is important to select appropriate
domain parameters such as the finite field Fq where the curve E is defined, the curve
E itself, and the base point P . When the order n of the base point P is a large prime,
the fastest known algorithms to solve the ECDLP, namely the baby-step giant-step
√
[82] and the Pollard’s rho [73] algorithms, need O( n) steps. Consequently, for
security purposes it is necessary that the size of the underlying finite field be at
least the double of the security level in bits. Security level of L bits is referred to
as the best algorithm for breaking the system that takes approximately 2L steps
[40]. For example, for achieving an 80-bit security level, the cryptosystem would
require an elliptic curve defined over a finite field Fq , where q ≈ 2160 . With respect
to the selection of the elliptic curve E, some types of curves should be avoided for
cryptographic applications since the ECDLP can be reduced. These curves include
supersingular curves [65], anomalous curves [76] [80], and curves over F2m for some
non-prime values of m [34] [36] [61].
If the order of the base point P does not contain at least a large prime fac-
tor, then it is possible to use an extension for ECC of the Silver-Pohlig-Hellman
algorithm [72] to solve the ECDLP as presented in Algorithm 2.8. This algorithm
reduces the problem to subgroups of prime order. Let n be the order of the base
Q
point P with a prime factorization n = j−1 ei
i=0 pi , where pi < pi+1 . Suppose that
Q = lP , where P, Q ∈ E(Fq ) and l ∈ [0, n − 1]. This algorithm obtains during the
outer loop, the value of l mod pi ei for each 0 ≤ i ≤ j − 1. With these values l mod n
can be uniquely computed using the CRT. It is important to note that at Step
32 Background
1. For i = 0 to j − 1 do
1.1 Q0 ← O, li ← 0.
1.2 Pi ← (n/pi )P.
1.3 For t = 0 to (ei − 1) do
1.3.1 Qt,i ← (n/pt+1 0
i )(Q ] Q ).
2. Use the CRT to solve the system of congruences l ≡ li (mod pi ei ). This gives
us l mod n.
3. Return(l).
Example 2.2 Let E be the curve y 2 + xy = x3 + 1 over the field F211 given by the
polynomial f (z) = z 11 +z 2 +1. Let us represent the elements of F211 in hexadecimal
form. Consider the point P = (0x10F,0x27A) whose order is n = 92 = 22 · 23. Let
Q = (0x1FB,0x2C6). We can use Algorithm 2.8 to obtain l = logP Q as follows:
2.4. ECC fault-based attacks 33
• During the first loop for i = 0 we can obtain l0 = l mod 22 . We can find that
l0 = W0,0 + 2W2,0 = 1 + 2 · 0 = 1.
• For the second loop for i = 1 we determine l1 = l mod 23. It can be shown
that l1 = W1,0 = 18.
To resist the Silver-Pohlig-Hellman attack one can simply select an elliptic curve
E such that its group order, #E(F2m ), is prime or almost prime, i.e., #E(F2m ) =
hn, where n is a prime and h is small [40] (e.g., h ∈ [1, 4]). However, in Chapter
3 we will show how Algorithm 2.8 could be useful under an invalid-curve attack to
the ECSM algorithm based on Montgomery’s ladder.
A basic assumption for this attack is that the parameter a6 from Equation (2.2)
is not involved for point operations formulas. In this way, the computation could
e 1 , a2 , a3 , a4, , e
be performed in a cryptographically less secure elliptic curve E(a a6 ),
34 Background
which differs from the original elliptic curve E(a1 , a2 , a3 , a4 , a6 ) only in the curve
a6 and Pe = (e
parameter a6 . The relation between e x, ye) can be obtained from
Equation (2.2) as follows:
a6 = ye2 + a1 x
e e3 − a2 x
eye + a3 ye − x e2 − a4 x
e. (2.14)
e 1 , a2 , a3 , a4 , e
the operation was performed, defined as E(a a6 ), where e
a6 is obtained
as follows
a6 = yek2 + a1 x
e e3k − a2 x
ek yek + a3 yek − x e2k − a4 x
ek .
e we obtain
Using Equation (2.2) for E
e : y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + e
E a6 . (2.15)
Since Pe ∈ E(F
e q ), it needs to follow Equation (2.15). Substituting its coordinates
(e
x, y) and grouping the terms with x
e we have the following expression
e3 + a2 x
x e2 + (a4 − a1 y)e a6 − y 2 − a3 y) = 0.
x + (e
for obtaining k can be followed. Additionally, the authors consider other attacks
where the underlying finite field or the elliptic curve parameters are disturbed
by a fault. They assume that it is possible to inject a transient fault into these
parameters. For example, in a smart card system parameters are usually stored
in non-volatile memory (e.g., EEPROM) and they are transferred into RAM for
executing a specific algorithm. It is possible to induce an error in such data before
or during this transfer.
Antipa et al. [6] state the importance of checking whether the received point for
any EC key agreement and public-key encryption protocols are on the proper elliptic
36 Background
curve. Specifically, they show the vulnerability under this scenario of the one-pass
EC Diffie-Hellman protocol, the EC integrated encryption scheme (ECIES), the
one-pass ECMQV protocol and the EC digital signature algorithm. Some of these
protocols or algorithms are defined in standards such as ANSI X9.62, ANSI X9.63,
IEEE 1363-2000, ISO/IEC 15946-3, FIPS 186-2, and ISO 15946-2. They mention
the significance for implementers of taking into account point verification, even
when some standards do not mandate that. A possible countermeasure considered
by Antipa et al. [6] is to use other formulas for the addition law that use both
elliptic curve parameters, a and b. In fact, some standards such as IEEE 1363
[42] include group formulas for curves over F2m that involve both parameters (i.e.,
point doubling formula uses b while point addition uses a). In this scenario, the
invalid-curve attacks described by Biehl et al. [9] and Ciet and Joye [21] might not
constitute a threat.
It is important to note that the invalid-curve attacks presented by Biehl et
al. [9] and Ciet and Joye [21] apply to applications where parameter b is not used
for the group formulas. However for Algorithms 2.6 and 2.7, it is not the case since
parameter b is utilized. In Chapter 3 we present an attack that takes advantage of
parameter a not being used in these algorithms.
Biehl et al. [9], in addition to the invalid-curve attack, proposed an ECC dif-
ferential fault analysis (DFA) attack. Their work was an extension of the attack
presented by Boneh et al. [15] for RSA cryptosystems. They assume that the
adversary knows the details of the implementation, such as the parameters, the
utilized algorithm, and the representation of internal variables. They consider that
the attacker can precisely flip a bit in a register that holds a partial result during
the ECSM operation. They also assume that the attacker can determine that the
2.4. ECC fault-based attacks 37
fault was induced v iterations before the ECSM computation finished. Consider
a left-to-right algorithm where the scalar bits are processed from i = t − 1 to 0
(e.g., Algorithm 2.1). If v is small enough for doing an exhaustive search, then the
attacker can obtain the v least significant bits as follows:
2. Repeat the operation inducing a single bit-flip fault into a variable that holds
e
a partial result (e.g., Step 2.1 of Algorithm 2.1) at i = v − 1 to obtain Q.
e − dP.
3. For all v-bit integers d compute Q − dP and Q
4. Look for pairs of results that differ only in one bit. Biehl et al. [9] show that
the probability that more than one pair satisfies this condition is low. If only
one pair exists, then they are the original and the faulty intermediate results,
respectively. The associated value of d gives the v least significant bits of k.
The full value of k can be recovered by repeating Steps 1-4 for different (and
ascending) values of v and utilizing the known bits of the scalar.
One countermeasure against the attacks presented by Biehl et al. [9], Ciet and
Joye [21], and Antipa et al. [6] is to verify whether or not the output point lies on
the original elliptic curve. This process is often called point verification (PV). This
basic countermeasure is not sufficient for all cases. Blömer et al. [14] presented an
ECC fault-based attack called sign change fault (SCF) attack in which PV is not a
sufficient protection. This attack is based on changing the sign of an intermediate
point during the ECSM. It is important to note that this attack only applies to
cryptosystems that use elliptic curves over prime fields where the change of a point
involves only a change of the sign of the y-coordinate.
38 Background
An interesting aspect of the SCF attack is that the elliptic curve operations do
not leave the original group E(Fp ) in contrast with those attacks described by Biehl
et al. [9], Ciet and Joye [21], and Antipa et al. [6]. Blömer et al. assume that the
attacker can induce a fault that produces a sign change into an intermediate point
Q0 , such that Q0 ← −Q0 during the ECSM operation. They show the applicability
of this attack for applications utilizing conventional crypto co-processors. The SCF
attack can be mounted in some ECSM algorithms such as the non-adjacent form
(NAF) based ECSM by double-and-add (e.g., Algorithm 1 of [14]). In this case
the attacker needs to induce a SCF in the point Q0 during the execution of specific
steps (e.g., Steps 3 and 4 of Algorithm 1 of [14]). The error can be induced for
some unknown values of i during the loop. They show that having a number of
c = (t/m) log(2t) attacks on the same input pair (k, P ) and the correct result Q,
it is possible to recover the scalar k with a probability of at least 1/2, where t
is the length of NAF(k) and m is a parameter related with the amount of offline
work. Otto [70] has extended the SCF attack to other algorithms such that the
ECSM by double-and-add (Algorithm 2.1) and the basic Montgomery ladder ECSM
(Algorithm 2.5).
A straightforward countermeasure against an SCF attack is to use a version
of Montgomery’s ladder algorithm that does not use the y-coordinate for comput-
ing ECSM (e.g., [17]). Another countermeasure presented by Blömer et al. uses
a second elliptic curve whose order is a small prime number. This curve is uti-
lized to define a named “combined curve” that is used to verify the final result
to avoid this SCF attack. One disadvantage of this approach is that elliptic curve
crypto-processors might have fixed parameters (e.g., those recommended by NIST).
Hence, it might not be possible to redefine the curve parameters for having such
2.4. ECC fault-based attacks 39
last example was for a memory-safe error fault attack. However, it can be easily
extended to a computational-safe attack if the attacker changes the result of the
operation performed in Step 2.1 (Q1 = Q0 ] P ) and follows in similar way the
procedure described above. Note that SE attack depends only on the availability
of injecting a fault during loop iterations where a dummy instruction is computed.
One countermeasure against SE attack is to utilize randomization. In fact,
randomization of the data processed is essential in resisting other attacks such as
SPA and differential power analysis (DPA) attacks [28]. Another way to prevent
the SE attack is to use an algorithm that does not have dummy operations (e.g.,
the Montgomery ladder algorithm). Another countermeasure is to verify, even in
dummy operations, that an error has occurred. This approach will be utilized in
Chapter 5 for detecting errors during the ECSM using coherency check among the
involved variables.
Q
P ECSM
module Output
k PV PV(Q)
module
Figure 2.1: Point verification (PV) module after the ECSM module
For ECC, PV can be used to detect errors during the ECSM. This is a basic
countermeasure to prevent some fault attacks on ECC and it has been considered
by Biehl et al. [9], Ciet and Joye [21], Antipa et al. [6], Blömer et al. [14], and
Domı́nguez and Hasan [27]. A PV module takes a point Q as its input and checks
whether the point is in E(Fq ) (see Figure 2.1). This checking is done simply by
verifying whether the coordinates of Q satisfy the governing elliptic curve equation,
e.g., Equation (2.3) for curves defined over binary fields and represented with affine
coordinates. If they do, only then an affirmative signal (say ‘ok’) is generated as
output, i.e.,
ok = 1 if Q ∈ E(Fq ),
PV(Q) =
0 otherwise.
The PV process requires only a few finite field operations, and hence, its imple-
mentation is relatively easy. For example, for curves defined over the binary field
using affine coordinates it takes two finite field multiplications, two squarings, and
three additions3 . If implemented in hardware as a separate module, assuming a
3
Assume a = 1. If a = 0, one less addition is required. If a ∈
/ {0, 1}, one more multiplication
is needed.
42 Background
• SCF attack : The SCF attack is based on changing the sign into an targeted
intermediate point during the ECSM operation. Then, the faulty points never
leave the original elliptic curve E and the PV module alone cannot provide
resistance against this attack.
• SE attack : Let us assume that we have a PV module after the ECSM module
as illustrated in Figure 2.1. Consider that the ECSM algorithm uses dummy
instructions for having a uniform execution (e.g., Algorithm 2.4). If an SE
attack is mounted as described in previous section the attacker can perform
a similar oracle attack, i.e., kl = 1 if the final result is not given and kl = 0 if
the correct result is released by the PV module. Consequently, PV does not
provide the sufficient protection.
4
Utilizing the addition formulas given by Avanzi et al. [22].
2.5. Error detection strategies 43
Since some attacks are not prevented with PV module alone, it is necessary to add
other protections.
For applications where the time required for error detection is not critical, it is
possible to compute the result twice (i.e., once at time t0 and then again at time t1 )
and then compare the two results to detect a possible error. This technique and its
variants are used in various computing applications (e.g., integer multiplication);
however, to the best of our knowledge there has been no precedence of our extension
of these techniques to ECSM. Under this technique, a transient error occurring in
any one of the two ECSM computations (i.e., at time t0 or t1 ), but not in both will
be detected. However, if the computation module is permanently in a faulty state,
the errors produced will remain undetected. Because of this, the re-computation
at time t1 is better done with a different method.
A general re-computation based scheme that uses only one computing module is
shown in Figure 2.2. The compute module is multiplexed between the upper and the
lower data paths at time t0 and t1 , respectively. The result of the upper data path
is stored in a register. The encoder in the bottom data path of the figure produces a
different set of inputs to the compute module and the decoder transforms the output
of the module back to the original domain for the final comparison. The encoding
and the decoding processes enable us to detect errors caused by a permanent fault in
the compute module. Encoding and decoding can be carried out through a number
of approaches. For conventional number systems, the most popular ones include:
44 Background
Output
Position Position
at t0 at t0
Register
Input Compa-
Compute rator
Encoder Decoder
Position Position
at t1 at t1
Output
Register/
Compute
Delay unit
Input Compa-
rator
Encoder Compute Decoder
alternating logic [74], re-compute with shifted operands [71] and re-compute with
swapped operands [45]. In Chapter 4, we provide encoding and decoding processes
that are suitable for ECSM.
Similar to re-computation, where we have one module that performs the compu-
tation twice, we can have two independent modules working in parallel. A general
parallel computation based scheme is given in Figure 2.3, where the register or the
delay unit synchronize the inputs to the comparator.
2.5. Error detection strategies 45
Consistency or coherency check (CC) process verifies the intermediate or final re-
sults with respect to a valid pattern. In fact, this technique is widely used in
areas such as telecommunications, computers, and media storage. For example,
some computers use address and op-code coherence checking to avoid accessing an
invalid memory register or executing an invalid instruction.
In the context of public key cryptography, this technique has been used for de-
tecting errors during the RSA signature generation. As shown by Boneh et al. [15],
the RSA digital signature scheme using the CRT is particularly vulnerable to fault
attacks, i.e., with only one faulty signature and its corresponding error-free version
it is possible to efficiently factor the modulus used. A natural countermeasure for
this attack is to verify the signature utilizing the public exponent. However, in
some applications this value might not be available. Additionally, if the public
exponent is not small, then the verification could be costly. For these reasons, a
number of countermeasures that include protections inside the modular exponenti-
ation algorithm have been proposed in the literature (e.g., [81], [94], [13], [20], [37],
and [16]5 ). The countermeasures proposed by Giraud [37] and Boscher et al. [16]
use CC for detecting errors during the modular exponentiation operation. Let us
describe both approaches.
Giraud [37] has shown how the SPA-and-SE resistant algorithm published by
Joye and Yen [47] can be utilized to prevent fault analysis (FA) attacks. This
modular exponentiation algorithm is based on the Montgomery ladder method.
Let N be the product of two large primes. Let (dn−1 · · · d1 d0 )2 be the binary
5
Many of them have been shown vulnerable when they are analyzed under a different fault
model (e.g., [7], [88], [12], and [48]).
46 Background
a0 = m d mod N,
n −d−1
a1 = m 2 mod N,
n
A = m2 mod N.
n
a0 · a1 · m = m2 = A.
In this way, errors can be detected at the end of the modular exponentiation if
2.6. Conclusion 47
the above expression does not hold as illustrated in Step 3 of Algorithm 2.10. In
Chapter 5 we extend some of the concepts utilized by Giraud [37] and Boscher
et al. [16] for detecting errors in ECSM. For example, for the Montgomery ladder
ECSM we investigate the difference in terms of error detection between PV and
CC.
1. a0 ← 1, a1 ← m. 1. a0 ← 1, a1 ← 1, A ← m.
3. t ← a0 · m mod N. 3. t ← a0 · a1 · m mod N.
2.6 Conclusion
In this chapter, we have presented background related to ECC. This includes the
fundamental operation for elliptic curve cryptosystems, i.e., the scalar multiplica-
tion (ECSM). We have given some algorithms that can be utilized for computing
48 Background
Fault-Based Attack on
Montgomery’s Ladder Algorithm
49
50 Fault-Based Attack on Montgomery’s Ladder Algorithm
3.1 Preliminaries
By Theorem 2.4 we can state that the number of isomorphism classes for elliptic
curves defined by Equation (2.3) is 2m+1 − 2. The latter comes from the number of
possible values for parameter b (i.e., 2m − 1) times the possible values of the trace
function of parameter a (i.e., 2). With the last observation, for a fixed value of
parameter b there are only two isomorphic classes of curves, one for each value of
γ ∈ {0, 1}, where Tr(a) = γ. Let us define two representative elliptic curves, E0
and E1 , one for each of these isomorphic classes:
E0 : y 2 + xy = x3 + b (a = 0), (3.1)
E1 : y 2 + xy = x3 + x2 + b (a = 1). (3.2)
Lemma 3.1 Let E0 and E1 be two elliptic curves over F2m defined by Equations
(3.1) and (3.2), respectively.
√
(i) The only points that E0 (F2m ) and E1 (F2m ) share are O and (0, b).
(ii) Let (u, v) ∈ Ej (F2m ), where u ∈ F∗2m , v ∈ F2m , and j ∈ {0, 1}. Then, there
does not exist any point in E j (F2m ) of the form (u, w) for any w ∈ F2m , where
j = 1 − j.
(iii) There exist two points of the form (u, v) and (u, u + v) in either E0 (F2m ) or
E1 (F2m ) for each u ∈ F∗2m and some v ∈ F2m .
Proof First, if we solve the quadratic expressions resulting from Equations (3.1)
√
and (3.2) with x = 0, we obtain a unique solution y = b. For x 6= 0, by Theorem
2.3 Equation (2.3) has a solution for y if and only if
b
Tr(x) + Tr(a) + Tr = 0. (3.4)
x2
Since the only difference between Equations (3.1) and (3.2) is the value of parameter
a, we can conclude from Equation (3.4) that if any value of x ∈ F∗2m does not have
a solution with a = j, then it does with a = j̄ for j = 0 or 1. Also this equation
shows that it is not possible to have a solution for both E0 and E1 with the same
x 6= 0.
Additionally, by Theorem 2.3 we know that for a given value of x 6= 0 we have
two distinct solutions that represent two elliptic curve points (i.e., a point and its
negative). To this end, for x 6= 0, #E0 (F2m ) + #E1 (F2m ) consider exactly 2m+1 − 2
√
points on both curves. In addition, the points O and (0, b) are common and are
counted twice in the sum of both orders, bringing the total up to 2m+1 +2 as shown in
Equation (3.3).
{(0x00,0x01),(0x01,0x00),(0x01,0x01),(0x02,0x1F),(0x02,0x1D),(0x03,0x0C),
(0x03,0x0F),(0x04,0x12),(0x04,0x16),(0x05,0x1A),(0x05,0x1F),(0x07,0x1F),
(0x07,0x18),(0x09,0x1D),(0x09,0x14),(0x0B,0x16),(0x0B,0x1D),(0x0C,0x05),
(0x0C,0x09),(0x0D,0x0B),(0x0D,0x06),(0x0F,0x19),(0x0F,0x16),(0x10,0x09),
(0x10,0x19),(0x11,0x03),(0x11,0x12),(0x12,0x14),(0x12,0x06),(0x15,0x12),
(0x15,0x07),(0x17,0x0B),(0x17,0x1C),(0x18,0x0F),(0x18,0x17),(0x1A,0x11),
52 Fault-Based Attack on Montgomery’s Ladder Algorithm
(0x1A,0x0B),(0x1B,0x0F),(0x1B,0x14),(0x1C,0x09),(0x1C,0x15),(0x1F,0x06),
(0x1F,0x19),O}.
On the other hand, E1 (F25 ) has an order of 22 with the following set of points:
{(0x00,0x01),(0x06,0x10),(0x06,0x16),(0x08,0x17),(0x08,0x1F),(0x0A,0x18),
(0x0A,0x12),(0x0E,0x07),(0x0E,0x09),(0x13,0x1C),(0x13,0x0F),(0x14,0x0D),
(0x14,0x19),(0x16,0x02),(0x16,0x14),(0x19,0x04),(0x19,0x1D),(0x1D,0x1B),
(0x1D,0x06),(0x1E,0x15),(0x1E,0x0B),O}
b 2m ) is
We notice that for each listed NIST-recommended curve E, the group E(F
b 2m ) are smaller than the
cryptographically weaker, i.e., all the prime factors of #E(F
larger prime factor of #E(F2m ), with only one exception for the case of m = 283 for
b 2m ) are almost prime. In
Koblitz curves, where the orders of both E(F2m ) and E(F
Table 3.3, the size of each prime factor of the group orders of these elliptic curves is
b 2m ),
presented. Additionally, it can be shown by Theorem 2.6 that E(F2m ) and E(F
where m ∈ {163, 233, 283, 409, 571}, are cyclic groups for all the curves in Tables
3.1 and 3.2.
3.1. Preliminaries 53
Table 3.1: Examples for NIST-recommended randomly chosen curves over F2m
54 Fault-Based Attack on Montgomery’s Ladder Algorithm
Table 3.1: (Contd.) Examples for NIST-recommended randomly chosen curves over
F2m
3.1. Preliminaries 55
Table 3.2: (Contd.) Examples for NIST-recommended Koblitz curves over F2m
3.1. Preliminaries 57
Fault model. Let us assume that the adversary can inject a flip-fault (single or
multiple bit) into the x-coordinate of the input point P = (Px , Py ) ∈ E(F2m ) of a
device computing the ECSM utilizing either Algorithm 2.6 or 2.7. Suppose that
the resulting finite field pair after the fault injection is known and is Pe = (Pex , Py ).
3.2. Invalid-curve attacks on Montgomery’s ladder algorithm 59
e = k Pe = (Q
Consider that the result Q ex , Q
ey ) is released.
Having Pb, Q
b ∈ E(F
b 2m ) one can obtain l = k or #E(F
b 2m ) − k mod n using
Algorithm 2.8, where n = ord(Pb). This would be possible because the computation
b 2m ) and not in the original group E(F2m ).
is performed in the weaker group E(F
One can then exhaustively search for an integer k 0 that satisfies (i) l = k 0 mod n or
b 2m ) − k 0 mod n and (ii) Q
#E(F e = k 0 Pe. Thus, the idea of the basic attack is that
search will be able to retrieve the secret scalar k with a probability of success ρ.
Let e be a parameter such that 2e is the maximum acceptable amount of exhaustive
search space. The complete attack procedure is presented as Algorithm 3.1.
b 2m ) − k mod n is obtained. The value
In Step 8 of Algorithm 3.1, l = k or #E(F
of l has only partial information about k. The remaining part of the scalar might
be obtained using an exhaustive search. The latter involves two main steps: (i)
solve a system congruences with a test candidate and the known part of the scalar
(Step 11.2.1), and (ii) perform a scalar multiplication to verify if the solution of the
system of congruences is the desired scalar (Step 11.2.2).
b 2m ).
Let r be the exhaustive search space. This value depends on n and #E(F
b 2m ) it is necessary that
In Step 11.2.1, for having a unique solution mod #E(F
b 2m ).
lcm(n, r) = #E(F (3.5)
60 Fault-Based Attack on Montgomery’s Ladder Algorithm
e = k Pe = (Q
2. Compute Q ex , Q
ey ) using either Algorithm 2.6 or 2.7.
ex + b/Q
3. T ← Q e2x + b
a.
4. If (Tr(T ) = 0) then
bx ← Q
4.1 Q ex , Q
by ← Q
ex · Ht(T );
5. Else
5.1 Go to Step 1.
7. Obtain n = ord(Pb).
10. If (r = 1) then
3.2. Invalid-curve attacks on Montgomery’s ladder algorithm 61
11.1 k 0 ← 0.
11.2 While (k 0 < r) do
11.2.1 Solve the system of congruences k 00 ≡ k 0 (mod r) and k 00 ≡ l (mod n).
11.2.2 Compute R = k 00 Pe using either Algorithm 2.6 or 2.7.
e then return(k 00 );
11.2.3 If (R = Q)
e then return(#E(F
11.2.4 Else if (R = −Q) b 2m ) − k 00 );
11.2.5 Else k 0 ← k 0 + 1.
For efficiency r should be selected as the minimum value that satisfies Equation
b 2m ) = 2e0 pe11 pe22 · · · pu−1
u−1 e b 2m ),
(3.5). Let #E(F be the prime factorization of #E(F
f
where ej ≥ 1 for j ∈ [0, u − 1]. Let n = 2f0 pf11 pf22 · · · pu−1
u−1
be the prime fac-
torization of n = ord(Pb), where 0 ≤ fj ≤ ej for j ∈ [0, u − 1]. Similarly, let
g
r = 2g0 pg11 pg22 · · · pu−1
u−1
be the prime factorization of r. Using notations similar to
those utilized by Menezes et al. [66] with regard to lcm, we can express Equation
(3.5) as
The exponents of the minimum value of r that satisfies Equation (3.6) are
0 if ej = fj ,
gj = (3.7)
e
j otherwise,
62 Fault-Based Attack on Montgomery’s Ladder Algorithm
uniquely the value of k (i.e., r) is more than the maximum admissible exhaustive
b 2m ) from the NIST-
search space (i.e., 2e ). For example for a weaker group E(F
recommended curves, as we show below, the probability of failure is quite low even
for small values of e. Moreover, in the case of not success with a particular pair
(Pe, Q),
e the attacker can repeat the attack procedure until an inevitable success.
The probability of success of Algorithm 3.1 (i.e., ρ), depends on the maximum
acceptable amount of exhaustive search 2e and the order of point Pb. Assume that
point Pb is taken randomly from group E(F
b 2m ). In a cyclic group, it is well known
b 2m ) is not prime, and
that the number of elements of order d is φ(d). Here #E(F
b 2m ) have an order #E(F
consequently not all the points in E(F b 2m ). Moreover, if
b 2m ) has several prime factors (i.e., it is expected since E(F
#E(F b 2m ) is assumed to
be a weaker group), the order of the points could have any combination of those
prime factors or their respective prime powers. For example the number of points
b 2m ) is φ(#E(F
with the full order #E(F b 2m )). In contrast, there is only one point of
√
order two which corresponds to (0, b).
Here we will obtain the probability of success ρ, first for specific values and then
for an arbitrary value of e.
#E(F b 2m ) is
b 2m ) of order #E(F
b 2m ). The number of points in E(F
u−1
Y
b 2m )) = 2n0 −1
φ(#E(F pni i (1 − 1
),
pi
j=1
Clearly this value is bounded to 1/2. If p1 >> 1, then ρe=0 would be close to
1/2 (e.g., all the Koblitz curves in Example 4.2).
ρe=1 = (3.9)
u−1
Y
1 1
2 (1 −
pj
), otherwise.
j=1
ρe=2 = (3.10)
u−1
Y
1 1
2 (1 −
pj
), otherwise.
j=1
u−1
Y
5
(1 − 1
), if n0 = 1 or 2, and n1 = 1,
6 pj
j=2
u−1
Y
2
(1 − 1
), if n0 = 1 or 2, and n1 ≥ 2,
3 j=2 pj
ρe=2 = (3.11)
u−1
Y
1
(1 − 1
), if n0 ≥ 3, and n1 = 1,
6 pj
j=2
u−1
Y
1
(1 − 1
), otherwise.
3 pj
j=2
n
b 2m ) = 2n0 pn1 1 pn2 2 · · · pt−1
t−1 nt n
t+1 n
u−1
#E(F pt pt+1 · · · pu−1 .
n
log2 (2n0 pn1 1 · · · pt−1
t−1
) ≤ e and log2 (pt ) > e.
If these conditions are satisfied, then the number of points whose order divides
n n
pnt t pt+1
t+1 u−1
· · · pu−1 is
g−1
X
j0 (i) j1 (i) j2 (i) jt−1 (i) nt nt+1 nu−1
s= φ 2 p1 p2 · · · pt−1 pt pt+1 · · · pu−1 , (3.12)
i=0
where
i
j1 (i) = mod (n1 + 1),
n0 + 1
i
j2 (i) = mod (n2 + 1),
(n0 + 1)(n1 + 1)
..
.
i
jt−1 (i) = mod (nt−1 + 1).
(n0 + 1)(n1 + 1) · · · (nt−2 + 1)
n n −1 n −1
s = 2n0 pn1 1 · · · pt−1
t−1 nt −1 t+1
pt (pt − 1)pt+1 u−1
(pt+1 − 1) · · · pu−1 (pu−1 − 1).
complete procedure for this case is stated in Algorithm 3.2. This algorithm
also includes the computation of ρ for Cases 1-4.
5. Else
5.1 Search for the smallest prime factor such that log2 (pi ) > e. Set t with
this index.
n n
5.2 d ← pnt t pt+1
t+1 u−1
· · · pu−1 .
5.3 ρ ← 0.
5.4 For jt−1 = 0 to nt−1 do
For jt−2 = 0 to nt−2 do
..
.
For j0 = 0 to n0 do
3.2. Invalid-curve attacks on Montgomery’s ladder algorithm 67
j j
h ← 2j0 pj11 · · · pt−2
t−2 t−1
pt−1 .
b 2m ).
Find the smallest value of r for lcm(d · h, r) = #E(F
If (r ≤ 2e ) then
ρ ← ρ + φ(h).
n
5.5 ρ ← ρ(pt − 1)(pt+1 − 1) · · · (pu−1 − 1)/(2n0 pn1 1 pn2 2 · · · pt−1
t−1
pt pt+1 · · · pu−1 ).
5.6 Return(ρ).
ρ
Case m e=0 e=1 e=2 e=5 e = 10
163 0.48333745 0.48333745 0.96667491 0.98278616 0.99943089
Randomly 233 0.39784981 0.39784981 0.79569963 0.99462453 0.99677211
chosen 283 0.40601504 0.40601504 0.81203008 0.94736842 0.96992481
curves 409 0.44966230 0.44966230 0.89932460 0.93679646 0.99732494
571 0.42819973 0.42819973 0.85639945 0.99913270 0.99913270
163 0.49915775 0.49915775 0.99831549 0.99831549 0.99908107
Koblitz 233 0.49999457 0.99998915 0.99998915 0.99998915 0.99998915
curves 409 0.49999991 0.99999982 0.99999982 0.99999982 0.99999982
571 0.49999999 0.99999999 0.99999999 0.99999999 0.99999999
b 2m )
Table 3.4: Probability of success ρ of obtaining k with Algorithm 3.1 for E(F
3
from the NIST-recommended curves for a given parameter e
Cost of Algorithm 3.1. Most of the computational cost of Algorithm 3.1 is in-
volved in phases 2 and 3, i.e., obtaining k partially using the Silver-Pohlig-Hellman
algorithm (Algorithm 2.8) and the exhaustive search with verification process, re-
spectively. The cost of both phases depends on the order of Pb, i.e., n, and the order
b 2m ). Let us consider the cost of each phase:
#E(F
163 7 10 17
Randomly 233 5 12 20
chosen 283 11 14 14
curves 409 8 12 23
571 5 5 15
163 2 10 15
Koblitz 233 1 1 18
curves 409 1 1 1
571 1 1 1
Table 3.5: Minimum value of parameter e for obtaining a probability ρ smaller than
b 2m ) from the NIST-recommended curves
some given values for E(F
running time can be further reduced using a parallelized version of the Pol-
p
lard’s rho algorithm [86] to about ( πpt−1 /2)/M point operations, where M
is the number of processors used for solving the ECDLP instance. Addition-
ally, as shown by Gallant et al. [35] if a Koblitz curve over F2m is utilized,
then the parallelized version of the Pollard’s rho algorithm can take about
p
( πpt−1 /m)/(2M ) point operations.
tion (3.5) (see Step 9 of Algorithm 3.1). Thus, assuming t ≈ m the phase 3
of Algorithm 3.1 will require r scalar multiplications in the worst case which
represents at most (3mr)/2 point operations if a binary method is utilized
(e.g., Algorithm 2.1).
70 Fault-Based Attack on Montgomery’s Ladder Algorithm
Example 3.3 Let us consider the cost of phases 2 and 3 of Algorithm 3.1 for
b 2m ) from the NIST-recommended curve K-163. For a single processor, the
E(F
√
cost of phase 2 is of about 3 p4 ≈ 243.6 point operations, where p4 is the largest
b 2m ) (see Table 3.2). Now, assume that we have M = 10, 000
prime factor of #E(F
computers for solving the instance of the ECDLP. In this case the expected number
p
of point operations for each processor is approximately ( πp4 /163)/20000 ≈ 224.9 .
For the phase 3 cost, from Tables 3.1 and 3.5 we can notice that with a probability
999
greater than 1000
the exhaustive search space will be less than 210 , which implies a
number of point operations < 3(163)(210 )/2 ≈ 217.9 .
Fault model. Let us assume that the adversary can inject a single bit-flip fault
into the x-coordinate of the input point Pi = (Pi,x , Pi,y ) ∈ E(F2m ) of a device
computing the ECSM utilizing either Algorithm 2.6 or 2.7 for some i. Suppose that
the resulting finite field pair after the fault injection Pei = (Pei,x , Pi,y ) is unknown.
Also, consider that the fault location is at a random position of the x-coordinate.
ei = k Pei = (Q
Consider that the result Q ei,x , Q
ei,y ) is realized.
Under this scenario the attacker might retrieve the secret scalar as follows. First,
ei = k Pei = (Q
it is necessary to collect some faulty outputs of the form Q ei,x , Q
ei,y )
bi ∈ E(F
for which there exists a point Q bi = (Q
b 2m ) such that Q ei,x , Q
bi,y ) for some
Q bi ∈ E(F
bi,y ∈ F2m . In fact, with two different points Q b 2m ), where i ∈ {0, 1}, and
Table A0 Table A1
Figure 3.1: Tables A0 and A1 with the output of the Silver-Pohlig-Hellman algo-
bi , ni,j ), where i ∈ {0, 1}, j ∈ [0, ci − 1], and ni,j = ord(Ri,j )
rithm for each (Ri,j , Q
Figure 3.1. Thus, having li,j mod ni,j in each entry of Tables A0 and A1 , we could
b 2m ) − k. The
distinguish those that are likely to be equivalent to either k or #E(F
idea is to search entry pairs v and w that satisfy either
1. i ← 0.
3.2. Invalid-curve attacks on Montgomery’s ladder algorithm 73
2. While (i < 2) do
2.1 Inject a fault in Pi = (Pi,x , Pi,y ) for obtaining Pei = (Pei,x , Pi,y ).
ei = k Pei = (Q
2.2 Compute Q ei,x , Q
ei,y ) using either Algorithm 2.6 or 2.7.
ei,x + b/Q
2.3 T1 ← Q e2i,x + b
a.
2.4 If (Tr(T1 ) = 0) then
bi,x ← Q
2.4.1 Q ei,x , Q
bi,y ← Q
ei,x · Ht(T1 ), i ← i + 1.
3. For i = 0 to 1 do
4. T2 ← 1.
4.1 For j = 0 to m − 1 do
4.1.1 Rx ← Pi,x + T2 .
4.1.2 T3 ← Rx + b/Rx 2 + b
a.
4.1.3 If (Tr(T3 ) = 0) then
(a) Ry ← Rx · Ht(T3 ).
(b) Obtain n = ord(R).
bi )|n) then
(c) If (ord(Q
bi , n) to obtain l mod n.
(i) Utilize Algorithm 2.8 with (R, Q
(ii) Store (l, n) in Table Ai .
4.1.4 T2 = T2 1.
5. For some entries v and w in tables Tables A0 and A1 , respectively, search for
b 2m ) −
candidate pairs that satisfy lv ≡ lw (mod gcd(nv , nw )) or lv ≡ #E(F
lw (mod gcd(nv , nw )).
74 Fault-Based Attack on Montgomery’s Ladder Algorithm
8. Return(“failure”).
3.2. Invalid-curve attacks on Montgomery’s ladder algorithm 75
ηmax ≤ ηi ≤ 1,
1
Qu−1 1
where ηmax = 2 j=1 (1 − pj ). The lower bound of the above expression correspond
bi ) = #E(F
for the case when ord(Q b 2m ). In this case the reduction factor is maximum
(i.e., ηmax ), and consequently the number of entries of Table Ai is minimum (i.e.,
ηmax m
cmin ≈ 2
).
On the other hand, theoretically the upper bound of ηi holds only
√
bi ) is the point of order two (0, b). However, for the cases where p1 >>
when ord(Q
2 (e.g., E(F bi ) = #E(F
b 2m ) for the Koblitz curves of Table 3.2) if ord(Q b 2m )/2e0 , then
the reduction factor is close to unity. For these cases the number of entries of Table
m
Ai is maximum (i.e., cmax ≈ 2
). In Table 3.6 the values of ηmax , cmin , and cmax are
b 2m ) from the NIST-recommended curves. Also, this table shows
given for each E(F
the average cases for ηi and ci (i.e., η and c, respectively).
Algorithm 3.3 needs to compute in total c0 + c1 EC discrete logarithms using
the Silver-Pohlig-Hellman algorithm. This number is fixed since the search for
candidate pairs and the exhaustive search phases are performed after the tables’s
construction. If we merge these three phases, a speedup on average can be achieved.
Let us describe two approaches one could take to combine these phases:
ηmax m ηm
Case m ηmax η 2
≈ cmin m
2
≈ cmax 2
≈c
163 0.483 0.665 39.4 81.5 54.2
Randomly 233 0.398 0.574 46.3 116.5 66.9
chosen 283 0.406 0.573 57.5 141.5 81.1
curves 409 0.450 0.623 92.0 204.5 127.3
571 0.428 0.603 122.2 285.5 172.1
163 0.499 0.686 40.7 81.5 55.9
Koblitz 233 0.499 0.749 58.2 116.5 87.4
curves 409 0.499 0.749 102.2 204.5 153.4
571 0.499 0.749 142.7 285.5 214.1
Table 3.6: Minimum, maximum and average number of entries of Tables Ai for
b 2m ) from the NIST-recommended curves
E(F
in (3.14) or (3.15) with any entry of A0 . For each candidate pair found (if
any) we proceed with the exhaustive search and verification process. If the
verification fails, then we continue to obtain the next entry of Table A1 and
repeat the process until the scalar is obtained. Even when using this approach
the number of EC discrete logarithms in the worst case is the same as that
using Algorithm 3.3 (i.e., c0 + c1 ), on average it is roughly c0 + 21 c1 .
The first two terms represent the probability that for a given e we could obtain
the scalar from at least one of the two pairs. The third term, λ, is the probability
that the “combination” does succeed in obtaining the scalar with exhaustive search
when neither pair individually does so for a given value of e. Equation (3.16) gives
b 2m )
an explicit lower bound for σ, i.e., σ ≥ 2ρ − ρ2 . In fact, for the cases of E(F
from the NIST-recommended curves we notice that σ ≈ 2ρ − ρ2 for e ≥ 2.
For obtaining a more precise value of σ one can check, from all the possible
order values of two points (i.e., Pb0 and Pb1 ), which ones provide sufficient scalar
information for obtaining the rest using exhaustive search for a given parameter e.
78 Fault-Based Attack on Montgomery’s Ladder Algorithm
1. σ = 0
N ← φ(D)
For ju−1 = 0 to nu−1 do
For ju−2 = 0 to nu−2 do
..
.
For j0 = 0 to n0 do
j
d ← 2j0 pj11 · · · pu−1
u−1
n ← lcm(D, d)
b 2m ).
Find the smallest value of r for lcm(n, r) = #E(F
If (r ≤ 2e ) then
σ ← σ + N · φ(d).
b 2m ))2
3. σ = σ/(#E(F
4. Return(σ).
3.2. Invalid-curve attacks on Montgomery’s ladder algorithm 79
σ
Case m e=0 e=1 e=2 e=5 e = 10
163 0.74921865 0.74921865 0.99895820 0.99973864 0.99999970
Randomly 233 0.71998855 0.71998855 0.95998473 0.99998410 0.99999555
chosen 283 0.73265871 0.73265871 0.97687829 0.99722992 0.99926508
curves 409 0.74515657 0.74515657 0.99354209 0.99797754 0.99999814
571 0.73469332 0.73469332 0.97959110 0.99999925 0.99999925
163 0.74999822 0.74999822 0.99999763 0.99999763 0.99999939
Koblitz 233 0.74999999 0.99999999 0.99999999 0.99999999 0.99999999
curves 409 0.74999999 0.99999999 0.99999999 0.99999999 0.99999999
571 0.74999999 0.99999999 0.99999999 0.99999999 0.99999999
b 2m )
Table 3.7: Probability of success σ of obtaining k with Algorithm 3.3 for E(F
from the NIST-recommended curves for a given parameter e
163 2 5 10
Randomly 233 5 5 12
chosen 283 3 9 14
curves 409 2 6 12
571 3 5 5
163 2 2 10
Koblitz 233 1 1 1
curves 409 1 1 1
571 1 1 1
search with verification process, respectively. Let us consider the cost of each
phase:
candidate points for Pbi . As discussed earlier, the bounds for ci are approxi-
ηmax m m
mately 2
≤ ci ≤ 2
, where ηmax is the maximum reduction factor which
b 2m ).
depends on #E(F
Example 3.4 Let us consider the cost of phases 2 and 4 of Algorithm 3.3 for
b 2m ) from the NIST-recommended curve K-163. Let us use the minimum and
E(F
maximum values of ci form Table 3.6 to give an interval for each cost. For a single
√ √
processor, the cost of phase 2 is approximately in the interval [6cmin p4 ,6cmax p4 ] ≈
b 2m ) (see
[249.9 , 250.9 ] point operations, where p4 is the largest prime factor of #E(F
Table 3.2). Now, assume that we have M = 10, 000 computers for solving the
instances of the ECDLP. In this case the expected number of point operations
√ √
cmin ( πp4 /163) cmax ( πp4 /163)
for each processor is approximately in the interval [ 10000
, 10000
]≈
[231.2 , 232.2 ]. For the phase 4 cost, from Tables 3.1 and 3.5 we can notice that with
999
a probability greater than 1000
the exhaustive search space will be r ≤ 4. Here the
cost of phase 4 is negligible.
82 Fault-Based Attack on Montgomery’s Ladder Algorithm
3.3 Countermeasures
The attacks presented in the previous section only need one or two faulty outputs
to break the given instance of ECSM with a high probability of success. Hence, this
may constitute a threat to cryptosystems using the Montgomery ladder ECSM for
elliptic curves over the binary field. Therefore, some countermeasures are needed.
In the following, we will describe possible protections against the attacks presented
in this chapter.
b 2m ) is a
Curve selection. The attacks presented in this chapter assume that E(F
cryptographically weaker group where the ECDLP could be solved in a reasonable
period of time for a given E(F2m ). However, this assumption is not true if both
b 2m ) are almost prime. From the NIST-recommended curves,
#E(F2m ) and #E(F
the only curve that satisfies this condition is referred to as K-283. Although, this
curve selection criteria is an effective countermeasure against the fault-based attacks
presented in this chapter, it might be too restrictive from the practical point of view.
Moreover, the following two countermeasures represent a possible solution without
b 2m ) is not
limiting the use of particular group E(F2m ) even when the order of E(F
an almost prime number.
3.3. Countermeasures 83
3.4 Conclusion
In this chapter we have presented two invalid-curve attacks that apply to the Mont-
gomery ladder ECSM algorithms proposed by López and Dahab [58]. These attacks
exploit the fact that parameter a is not used in the group formulas for these partic-
b 2m ) is a weaker group with the same parameters
ular algorithms. In this way, if E(F
than the original group E(F2m ) except for parameter a and we are able to inject
a fault in the input point as described in Algorithms 3.1 and 3.3, then we would
retrieve the scalar k with a high probability of success. For the purpose of the
NIST-recommended curves, we have shown that there exists a weaker group for
nine of the ten cases that include the randomly chosen and Koblitz curves. The
b 2m ) are almost
only exception is the curve K-283 for which #E(F2m ) and #E(F
prime. Also, we have obtained the theoretical probability of success for each of the
presented attacks. Additionally, we have determined numerical values of the prob-
b 2m ) from the NIST-recommended curves. And finally,
abilities of success for E(F
we have presented some countermeasures to prevent the attack described in this
chapter.
Chapter 4
In this chapter we present some structures that permit detection of errors in ECSM
without modifying the curve parameters. These are based on re-computation and
parallel computation. We use a number of encoding techniques that rely on the
properties of elliptic curves and provide a high probability of detection of errors
caused by faults that occurred naturally or injected deliberately by an attacker.
In addition, we consider fault-tolerant ECSM. While error detection is a suffi-
cient countermeasure for preventing fault-based attacks, fault-tolerant characteris-
tic enables a system to perform its normal operation in spite of faults. For certain
fault models, we propose structures that can perform correct ECSM operations
in the presence of faults that may occur in a limited number of ECSM modules,
primarily due to natural causes such as abnormal temperature, electromagnetic in-
terference, or power supply changes. An attacker who is not able to inject faults at
85
86 Robust ECSM Using Repeated and Parallel Computations
precise time and locations (i.e., “less sophisticated” attacker) is assumed to have
an effect similar to that of natural causes. On the other hand, a “sophisticated”
attacker is able to inject faults at precise locations in arbitrary number of ECSM
modules. For provide resistance against a sophisticated attacker, the encoding tech-
niques used with the inputs can essentially prevent such structures from outputting
incorrect results due to faults. Most of the work presented in this chapter has been
presented by Domı́nguez and Hasan [26] [27].
The organization of the remainder of this chapter is given as follows: In Section
4.1, we present encoding schemes for error detection for ECSM and probability of
undetected errors. In Section 4.2, we give fault-tolerant ECSM structures. We
present overhead costs and experimental results for the probability of undetected
errors in Sections 4.3 and 4.4, respectively. Finally, we make some concluding
remarks in Section 4.5.
ECSM
In this section, we propose error-detecting structures for ECSM. Here, we consider
a high-level design, where the ECSM module is the main block implemented in
hardware to accelerate some ECC applications and may become faulty either by
natural causes or by deliberate attacks from an adversary. Other modules used in
our structures are much less complex1 than the ECSM module and are assumed to
be implemented in a secure environment – either in software or hardware.
1
In Section 4.3, area and time complexities of these modules are given for F2163 .
4.1. Encoding/decoding and error detection for ECSM 87
• The input point P is verified by the cryptosystem which uses the ECSM
module to be on the valid elliptic curve before each ECSM computation.
This validation is especially important for preventing the attacks described
in Chapter 3 and those proposed by Biehl et al. [9] and Antipa et al. [6].
• An appropriate elliptic curve has been selected (e.g., using the guidelines of
a recognized standard such as FIPS 186-2 [32]).
• The input and output of the ECSM are given in projective coordinates.
The encoding/decoding process in Figures 2.2 and 2.3 plays an important role
in the detection of errors caused by faults in the compute (i.e., ECSM) module.
For example, without the encoding/decoding the re-computation based scheme in
Figure 2.2 would fail to detect errors produced in two cases: (i) the errors are
produced by permanent errors and (ii) the same transient fault is present for both
‘runs.’ In both cases, the erroneous results at times t0 and t1 will be the same and
the comparator would not detect such errors. Similarly, for the parallel computation
based scheme in Figure 2.3, if the two ECSM modules have the same permanent
and/or transient faults, in the absence of the encoding/decoding the two modules
generate erroneous but the same results. Again, such errors are not detected by
the comparator.
4.1. Encoding/decoding and error detection for ECSM 89
(x, x + y) x, y ∈ F2m ,
−P = −(x, y) = (4.1)
(x, −y) x, y ∈ Fp .
For projective coordinates, namely the López and Dahab system for curve E over
F2m and the Jacobian system for E over Fp , the point negation is
(X, XZ + Y, Z) X, Y, Z ∈ F2m ,
−P = −(X, Y, Z) = (4.2)
(X, −Y, Z) X, Y, Z ∈ Fp .
In Figure 2.2, if the input is point P , then the encoder performs the point negation
in accordance with Equation (4.1) or (4.2). For the encoded input, the output of
90 Robust ECSM Using Repeated and Parallel Computations
the ECSM module is k(−P ) = −kP. Hence, the decoder in Figure 2.2 also performs
a point negation to generate the expected output kP.
Although, the encoding (and decoding) using point negation is simple – for pro-
jective coordinates, one multiplication and one addition for E over F2m , and only
one addition for E over Fp – it is important to note that such an encoding scheme
changes only the y-coordinate of the input point, while the x- and, if applicable, the
z-coordinate remain unchanged. As a result, even in the presence of some faults in
the ECSM module, it is possible that the comparator in Figure 2.2 gets two equal
but incorrect points at its input and generates an ‘ok’ signal. This is illustrated in
Appendix B.
To change all the coordinate values of the input point, we can use a property of
projective coordinate systems, which consists of having multiple representations for
a given point. This principle is known as point randomization [23] and is applicable
to all the projective coordinate systems [46]. For the López and Dahab and the
Jacobian projective systems, Equations (2.3) and (2.4) become
Y 2 + XY Z = X 3 Z + aX 2 Z 2 + bZ 4 (4.3)
and
Y 2 = X 3 + aXZ 4 + bZ 6 , (4.4)
respectively. It is easy to verify that trios (γX, γ 2 Y, γZ), where γ ∈ F∗2m , satisfy
Equation (4.3) and have the same affine representation as that of (X, Y, Z). Thus,
these trios can give different projective representations of a single point on the
curve defined by Equation (2.3). Similarly trios (γ 2 X, γ 3 Y, γZ), where γ ∈ F∗p ,
4.1. Encoding/decoding and error detection for ECSM 91
One implication of (4.5) is that if the encoder in Figure 2.2 or 2.3 is only for
the mapping
(X, Y, Z) 7→ (γ c X, γ d Y, γZ), (4.6)
It is well known that the order of any point P divides the order of the group #E(Fq ),
i.e., #E(Fq )P = O. Let k 0 = #E(Fq ) − k. Then
Thus, k can be simply encoded to k 0 , which can be viewed as some kind of scalar
negation operation, and the corresponding decoding process involves a point nega-
92 Robust ECSM Using Repeated and Parallel Computations
k 7→ k 00 = k + j#E(Fq ), (4.7)
where j is a random integer of at least 20 bits long2 . Such randomization can also
be used as an encoding scheme, since
tected error
As stated before, for the purpose of encoding, one can map input pair k and
P = (X, Y, Z) to k + j#E(Fq ) and (γ c X, γ d Y, γZ), respectively. These encod-
ing schemes produce different representations of inputs that are equivalent to the
original input and do not require the decoding process. In addition, as shown in
Section 4.4 with experimental results, these two encoding schemes lead to a lower
probability of undetected errors when compared with other encoding schemes dis-
cussed earlier, namely P 7→ −P alone or combined with k 7→ k 0 = #E(Fq ) − k.
As a result, in this work, we use these two encoding schemes for error detection in
ECSM. In particular, for re-computation based error detection, at t0 we compute
the ECSM with (k, j0 , P, γ0 ) as inputs, and then at t1 compute another ECSM with
(k, j1 , P, γ1 ), where j0 and j1 are two random integers of appropriate length (say 20
bits for a 160 bits long k), and γ0 and γ1 are two random non-zero elements of the
underlying finite field.
In the comparator unit, the outputs of the two ECSM operations that are in
projective representation need to be compared. For this matter, below we apply an
idea originally presented by Meloni [63] in a different context, namely to speed up
ECC scalar multiplication. Assume that Q0 = (X0 , Y0 , Z0 ) and Q1 = (X1 , Y1 , Z1 )
are the two input points of the comparator. Then, similar to (4.5), we can transform
each Q0 and Q1 with γ = Z1 and Z0 , respectively, i.e.,
where for the López and Dahab system c = 1 and d = 2, and for the Jacobian system
c = 2 and d = 3. If Q0 and Q1 map to the same affine representation, then the new
X- and Y -coordinates of Q0 and Q1 must be equal since the new Z-coordinates
are the same. Thus, the cost of comparison of these two points is four finite field
multiplications and two squarings for the López and Dahab system, and six finite
field multiplications and two squarings for the Jacobian system. In addition, two
dlog2 qe-bit comparators are needed. The main advantage of this comparison scheme
is that an explicit transformation from projective to affine coordinates is not needed
resulting in the elimination of expensive field inversion operations.
Finally, if the compared points are the same, one of them is produced as the
final output. Otherwise, no final output is given. This scheme is shown in Figure
4.1, and we refer to it as full re-computation based scheme or RC for short.
Because of random γi and ji for i = 0 and 1, one can assume that in the presence
of faults – whether identical or not – the ECSM module’s output Qi is a random trio
(Xi , Yi , Zi ) of finite field elements. If the fault makes the ECSM module produce an
incorrect result, then Qi 6= kP . Since the number of elements in the finite field is q
and Qi has q − 1 different projective representations, the probability that Q0 = Q1
and Qi 6= kP (i.e., undetected error) can be expressed as follows:
q−1 1
Pr(undetected error)RC ≈ 3
≈ 2. (4.8)
q q
For many of today’s security applications , where q ≈ 2160 , the above probability
of undetected error is quite small. The counterpart of RC that uses parallel com-
4.1. Encoding/decoding and error detection for ECSM 95
γ0 γ1
Position Position Output
at t0 at t1
Position
at t0
Register
Q0
Point ECSM Compa-
P
encoding module rator
Q1
Position
Scalar at t1
k
encoding
Position Position
at t0 at t1
j0 j1
Figure 4.1: ECSM using full re-computation with point and scalar randomization
(RC)
putation is shown in Figure 4.2. This parallel computation based scheme (or PC
for short) can detect any error confined in one module. Additionally, if both ECSM
modules have errors, since they use different input representations, the probability
of having equal erroneous outputs from these modules is low as given in Equation
(4.8).
The main penalty of using full re-computation (i.e., using RC) is that it doubles
the running time of ECSM. In applications where the ECSM module is subject to
only permanent faults injected by an attacker or caused naturally, the running time
96 Robust ECSM Using Repeated and Parallel Computations
k Scalar
j0
γ0 encoding Output
Point ECSM Q0
encoding module
Compa-
P
rator
Point ECSM Q1
encoding module
γ1 Scalar
k encoding j1
Figure 4.2: Parallel computation based ECSM with point and scalar randomization
(PC)
cated attacker
1) ECC differential fault attack: Suppose that the attacker can inject a fault
at a random state in a register that holds partial results during the ECSM operation
as described by Biehl et al. [9]. In such a case, the point operations result in normal
additions before the fault and pseudo-additions [9] after the fault. The latter, in
turn, represents an operation that leaves the original group structure. Thus, after
the fault a random finite field trio is obtained at the ECSM output and the PV
process can detect such errors with a probability of
(#E(Fq ) − 1)(q − 1) + 1 q2 1
Pr(Qi ∈ E(Fq )) = ≈ 3 = . (4.9)
Number of finite field element trios q q
Therefore, for large q, PV after the ECSM (Figure 2.1) might represent a counter-
measure against this attack. However, that is not the case for all ECC fault attacks
as shown in the following scenario.
scalar, if we consider that the output is now a random point on the curve, then the
probability of undetected error is about 1/q. The same probability applies for the
PC scheme if the attacker is able to inject an SCF to each ECSM module. Both
schemes, RC and PC, can be utilized as a countermeasure to the SCF attack. The
extra cost for these schemes in terms of time and area will be presented in Section
4.3.
its characteristic expressed by the probability that it will perform its function [85].
Finally, we provide a reliability comparison among these schemes.
Traditional TMR utilizes three elements performing the same operation [87]. In
the context of the work presented here, these elements would correspond to ECSM
modules as shown in Figure 4.3. In this TMR scheme, as long as two or all three
ECSM modules yield correct results, the majority voter produces a correct final
output. When two or more ECSM modules become faulty by natural causes or by
some simple malicious action by a less sophisticated attacker, then the faulty mod-
ules are likely to produce different but incorrect results, and the majority voter will
produce no final output. However, when a sophisticated attacker can deliberately
inject faults that may cause two or more ECSM modules to generate the same but
incorrect result, then the majority voter will produce an erroneous output and the
TMR scheme will fail.
In an attempt to reduce the chance of TMR producing an erroneous result, we
proceed as follows: Scalar k and point P , which are inputs to the ECSM modules,
are encoded using the randomization techniques discussed in Section 4.1, as illus-
trated in Figure 4.4. The ECSM outputs are connected to a secure majority voter
implemented in either software or hardware.
Like the comparator unit, the majority voter needs to process their inputs in
projective coordinates. Assume that Q0 = (X0 , Y0 , Z0 ), Q1 = (X1 , Y1 , Z1 ), and
Q2 = (X2 , Y2 , Z2 ) are the three input points of the majority voter. Then, similar
to (4.5), we can transform each Q0 , Q1 , and Q2 with γ = Z1 Z2 , Z0 Z2 , and Z0 Z1 ,
4.2. Fault-tolerant structures for ECSM 101
ECSM
module
QTMR
P
ECSM Majority
module voter
ECSM
module
respectively, i.e.,
where c and d are as defined earlier. With this new transformed points, a normal
voting process can be performed that will produce a final result QT M R , which is the
majority of these points. If there is no majority, then no final output is generated.
Clearly, if errors occur in only one of the ECSM modules, then their effects can be
masked and a correct final output can be obtained.
Assume that an attacker can inject the same fault, transient or permanent, in
two or all three modules in an attempt to make the faulty modules produce the
same but incorrect result. In such circumstances, the point and scalar encodings
102 Robust ECSM Using Repeated and Parallel Computations
γ0
Point Q0
ECSM
encoding module
Scalar
j0
encoding
γ1
Q1 QTMR
Point ECSM Majority
P encoding module voter
Scalar
j1
encoding
γ2
Q2
Point ECSM
encoding module
Scalar
k j2
encoding
help to keep the risk of giving an incorrect result to a low level. This is explained
as follows: Because of the encoding of inputs, in the presence of faults, each ECSM
module is expected to produce a random projective representation, which maps
to one of the q 2 affine coordinates representations. An erroneous final result is
produced by the majority voter if two or all three outputs represent the same but
incorrect points. Thus, the probability that the final result has an error is
3 q 2 (q 2 − 1) q2
Pr(QT M R 6= kP ) ≈ × 2 +
2 q × q2 × q2 q2 × q2 × q2
3 1 3
= 2 + 4 ≈ 2 (for large q). (4.10)
q q q
4.2. Fault-tolerant structures for ECSM 103
Compa-
rator
F
Point ECSM Q1
encoding module
γ1 PV
Scalar
k j1
encoding
binary finite field F2m . For elliptic curves over the prime field Fp , the PV module
is not sufficient for detecting errors produced by the SCF attack. However, for the
case of elliptic curves over F2m , where the SCF attack does not apply, we can use
the PV module to validate the ECSM operations.
As shown in Figure 4.5, the two ECSM modules operate in parallel and their
outputs are verified by PV modules. The F block is used to stop the system output
if Q0 , Q1 ∈
/ E(F2m ); or when Q0 6= Q1 , and Q0 , Q1 ∈ E(F2m ). The final output of
the DMR PV based system is given as follows:
Q1 (or Q0 )
if PV(Q0 or Q1 ) = ok and Q0 = Q1 ,
QDM R P V = Qi / E(F2m ),
if Qi ∈ E(F2m ) and Qi ∈
no output otherwise,
where i = 1 − i.
The DMR PV scheme can clearly tolerate any faults confined in only one of the
two modules and can produce the correct final output. Additionally, the scheme
can detect some situations where errors exist in the outputs of both modules and
hence helps to avoid producing erroneous results at the final output. However,
there are two cases of reasonably low probability when this scheme fails and gives
an incorrect result: first, if Q0 = Q1 and Q0 , Q1 ∈ E(F2m ) \ kP ; secondly, when
Q0 6= Q1 , Qi ∈ E(F2m ) \ kP , and Qi ∈
/ E(F2m ) for i = 0 or 1. Then, the probability
of giving an incorrect result when both ECSM modules are in error is
1 1 1 1 2
Pr(QDM R P V 6 kP ) ≈ 3 + 2 1 − 2
= 1− ≈ (for large q). (4.11)
q q q q q
4.2. Fault-tolerant structures for ECSM 105
ECSM
The above DMR PV scheme reduces the silicon area requirement but increases the
probability of an incorrect result. It is possible to have a fault-tolerant ECSM sys-
tem that is as area efficient as DMR PV and has the same low probability of incor-
rect result as TMR. To this end, one can use a parallel and re-computation3 (PRC)
based scheme shown in Figure 4.6. In PRC, at time t0 , with input (k, j0,t0 , P, γ0,t0 )
and (k, j1,t0 , P, γ1,t0 ) the two ECSM modules produce Q0,t0 and Q1,t0 , respectively.
These two points are then compared using the technique on page 94. If the points
are the same, then one of them is produced as the final output QP RC . Otherwise, the
ECSM modules perform re-computations with (k, j0,t1 , P, γ0,t1 ) and (k, j1,t1 , P, γ1,t1 )
and produce Q0,t1 and Q1,t1 . If errors are confined in only one of the ECSM mod-
ules, then for only one of the two values of i, i.e., either i = 0 or i = 1, Qi,t0 and
Qi,t1 are the same after their Z-coordinates are made equal and one of them can
be produced as the final output QP RC .
An erroneous final QP RC may be produced in the following two cases: (i) the
ECSM modules produce the same but incorrect result at t0 , and (ii) any of the
two ECSM modules gives an incorrect but same result at both t0 and t1 . For the
PRC operation described above, the probability that an erroneous final result is
produced is
1 1 1 3
Pr(QP RC 6 kP ) ≈ 2 + 2 1 − 2
= 2
≈ 2 (for large q) (4.12)
q q q q
3
The combination of time and hardware redundancy for fault-tolerant system design has been
considered in other contexts, e.g., Lima et al. [55] have used this combination for having fault-
tolerance on FPGAs.
106 Robust ECSM Using Repeated and Parallel Computations
j0,t0 j0,t1
Position Position
at t0 at t1
γ 0,t0 γ 0, t1
Scalar k Position
Position Position
at t0 at t1 encoding at t0
Register
Q0,t0
Point ECSM
encoding module Q0, t1
Position Compa-
at t1 rator QPRC
P Position and
at t0 switches
Register
Q1,t0
Point ECSM
encoding module
Q1,t1
Position Position Position
at t0 at t1 Scalar at t1
k
γ1, t0 γ1, t1 encoding
Position Position
at t0 at t1
j1,t0 j1,t1
The reliability comparison between several fault-tolerant systems has been consid-
ered in the literature [83] [54]. In the context of this work, note that compared to
the ECSM module, each of the other modules in Figures 4.4, 4.5, and 4.6, namely
the majority voter, the point and the scalar randomization modules, PV modules,
comparators, registers, and multiplexors would require a lot less hardware in prac-
tice. Thus, these small components can be implemented such that each of those
will have a reliability factor that is close to unity. Then, for the case of TMR
based ECSM, i.e., a correct result is obtained when at least two out of three ECSM
modules are performing their functions, the reliability is
3 2 2 3
RT M R ≈ rECSM + 3rECSM (1 − rECSM ) = 3rECSM − 2rECSM , (4.13)
where rECSM is the reliability factor of a single ECSM module. On the other hand,
for both DMR PV and PRC based ECSM schemes their reliability RDM R P V /P RC is
related to the probability that at least one of the two ECSM modules work without
errors. The resulting expression for RDM R P V /P RC is:
2 2
RDM R P V /P RC ≈ rECSM + 2rECSM (1 − rECSM ) = 2rECSM − rECSM . (4.14)
108 Robust ECSM Using Repeated and Parallel Computations
RDM R P V /P RC − RT M R
2 2 3
= 2rECSM − rECSM − (3rECSM − 2rECSM )
= 2rECSM (1 − rECSM )2 .
Since 0 < rECSM < 1, RDM R P V /P RC − RT M R is always positive, i.e., we can state
that the reliabilities of DMR PV and PRC based schemes are greater than that
of the TMR based system (see Figure 4.7). The improved reliability of DMR PV
and PRC schemes is primarily due to their ability to mask out errors confined in
one module with fewer number of ECSM modules compared to the TMR. The use
of the reduced number of ECSM modules is possible due to the PV modules in
DMR PV and the re-computation in PRC.
1.0
0.9
2rECSM
2
- rECSM
0.8
(DMR PV,PRC)
_
0.7
rECSM
0.6
(Stand-alone module)
r 0.5
S
0.4
0.3
3r 2 ECSM
- 2r 3
ECSM
0.2 (TMR)
0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
r
ECSM
Figure 4.7: Reliability comparison among TMR, DMR PV and PRC schemes
a NIST-recommended elliptic curve over F2163 has been used. The performance
results are based on a Xilinx Virtex 2000E FPGA implementation, and the ECSM
operation uses a finite field multiplier, a squaring unit, and an adder. The area and
timing results of these arithmetic units are:
The entire module, which performs ECSM, with input and output points in pro-
jective coordinates, requires 5009 slices and 17160 clock cycles for each scalar mul-
tiplication.
For implementing the extra modules (i.e., PV module, point and scalar random-
110 Robust ECSM Using Repeated and Parallel Computations
Table 4.1: Cost and performance for the ECSM and other extra modules used for
error-detecting and fault-tolerant structures
Table 4.2: Performance and cost for error-checking systems over F2163
Table 4.3: Performance and cost for fault-tolerant systems over F2163
At this speed, the ECSM with the PV module, as shown in the second row of
Table 4.2, will require 17984/(66 × 106 ) ≈ 272.48 µsec. On the other hand, if all
modules are implemented in software, then using the results reported by Hankerson
et al. [39] for finite field arithmetic units, an ECSM unit with the PV module will
require approximately 5.38 msec. These results may vary depending on the target
platform and the level of optimization used; nevertheless, they show the speed
advantage of using hardware over software.
pendability of the results for a specific input pair. Clearly, these exhaustive fashion
experiments limit the maximum feasible finite field size that the system can han-
dle. However, the main factor for this limitation is the number of ECSM operations
needed for finding errors in structures where the probabilities of undetected errors
are expected to be very low (e.g., 1/q, 1/q 2 ).
In order to have experiments with the above description in a reasonable amount
of time, we have selected the prototype to be based on the finite field F211 . For
this small prototype, the simplest experiment that involves only the PV module,
the number of ECSM operations is approximately 0.5 billion4 requiring roughly
2.5 hours to complete in the actual hardware. Other experiments involve much
more operations, for example in each parallel computing scheme the number of
ECSM operations is approximately 35 billions5 and about seven days to complete.
Although this ECSM prototype is small for a real application, it allows us to perform
experiments in an exhaustive way, which in turn permits us to illustrate the error
detection coverage for the schemes presented earlier.
implementing the finite field operations (117, 6 and 11 for finite field multiplication,
squaring and addition, respectively) over F211 . The fault model used is a permanent
stuck-at-0 or stuck-at-1 fault in these gates, and only one gate can be faulty at a
time.
Faults have been injected in a similar way as described by Zarandi et al. [95].
The idea is to add a multiplexor to each gate to be tested, such that we can select
if the gate will work normally or with a stuck-at-0 or stuck-at-1 fault at its output
(see Figure 4.8). Presence or absence of a fault is controlled by the fault injection
signal (FIS). The fault selector signal (FSS) chooses a stuck-at-0 or stuck-at-1
fault for the experiment.
Table 4.4: Methods utilized for finite field operations over F211
The elliptic curve utilized has 1980 points on it that are of order either 1982
or 991. In our experiments, these 1980 points6 are input for each value of scalar k
in the range of 2 to 990. For each fixed set of input point and scalar, a fault-free
6
The remaining two points include the point at infinite and (0x00,0x01). The latter has the
order of two. Both of these points are not considered suitable for cryptographic applications.
4.4. Experimental results 115
Gate to be 0 Output
Inputs
tested MUX
1
FSS FIS
Encoding schemes
Pr(undetected error)
Encoding Stuck-at-0 Stuck-at-1
fault fault
1 1 1
(k, −P ) 117 12,692
2 1 1
(k 0 , −P ) 128 1,395,753
3 1 1
(k, P 0 ) 421,190 2,029,025
3, 4 1 1
(k 00 , P 0 ) 1,097,914 7,497,185
1 2 0
Point negation is needed for decoding. k = #E(F211 ) − k.
3
P 0 = (γX, γ 2 Y, γZ), γ ∈ F∗2m . 4 k 00 = k + j#E(F211 ).
For each error-detecting scheme presented in Section 4.1, the probability of unde-
tected error was obtained. Table 4.6 shows these probabilities for the stuck-at-0
and stuck-at-1 fault models. The PV alone gives the worst error detection coverage
1
for these schemes (i.e., 217
for the stuck-at-0 fault model).
Our experiment for scheme RC shows values that are close of the value from
Equation (4.8) if an average of the stuck-at-0 and stuck-at-1 results is considered
(i.e., q 2 = 4, 194, 304 vs. 4,297,550). For the PC scheme, if the faults are confined in
only one of the two ECSM modules, then there are no undetected errors as shown
in the fourth row of Table 4.6. The results for the case where both ECSM modules
had a fault each are shown in the last row.
Table 4.6: Probabilities of undetected errors for our experiment over F211
118 Robust ECSM Using Repeated and Parallel Computations
4.4.3 Comments
We have noticed that for both types of faults and especially for the stuck-at-0 fault
model, the probability of obtaining two special points, namely O and P is relatively
high. The following two scenarios contribute to these higher probabilities. First,
if the projective Z-coordinate QZ of the ECSM result Q is zero, then irrespective
of the values of X- and Y -coordinates the result is the O point. Second, consider
the case where the Z-coordinate of the variable representing Q is zero in the last
iteration of the loop in the double-and-add algorithm for ECSM. In such a case,
assuming a left-to-right version of the algorithm, the final result will be O or P
depending of the value of the least significant bit of k.
With the above observations, it is useful that the PV module does not consider
O or P as a valid output of ECSM. In fact, from the cryptographic point of view, if
P and k are selected as a non trivial value (i.e., k 6= {0, 1, ord(P )}), a valid result
will not be either O or P . If we assume that the PV module is modified such that
O and P are not considered as a valid ECSM output, the resulting probabilities are
1982 1
very close to the value obtained using Equation (4.9), i.e., 211 211
≈ 2116
as shown
in the second row of Table 4.6.
Another interesting observation of this experiment is that it is more likely to
obtain a result equal to O with faults that are stuck-at-0 than those with stuck-
at-1. For our experiments, stuck-at-0 faults tend to reduce the Hamming weight
of the Z-coordinate of the output as shown in Table 4.7. In contrast, on average
stuck-at-1 faults produce results with more binary 1s. For this reason, the stuck-at-
0 faults have more cases with QZ = 0, and consequently, the resulting point on the
curve corresponds to Q = O. This fact implies that the probabilities of undetected
4.5. Conclusion 119
Case e Z ))
Average(H(Q
Fault free 5.5
Stuck-at-0 5.159
Stuck-at-1 5.534
Table 4.7: Average Hamming weight of the Z-coordinate of the result for our
experiment over F211
1
error for stuck-at-1 cases are less than their stuck-at-0 counterparts (e.g., 7,497,185
1
vs. 1,097,914
for RC).
As stated earlier, the error detection capability depends on various factors in-
cluding the actual inputs, the faults and implementation architecture. For our
experiments, the architecture might have considerable impact on the numerical re-
sults, since the finite field arithmetic unit was invoked several times during each
ECSM operation. In addition, the fault type (stuck-at or bit-flip), fault location,
etc. might have affected our experiments.
4.5 Conclusion
In this chapter, we have presented error-detecting and fault-tolerant structures for
ECSM. For the purpose of error detection, the concepts of re-computation and
parallel computation have been used. In order to have a higher probability of er-
ror detection during the ECSM operation, we have presented encoding/decoding
schemes suitable for ECSM computation. Schemes are based on the concepts of
scalar and point randomization. These schemes provide resistance to attacks where
fault-induced operations do not leave the original elliptic curve (e.g., the SCF at-
120 Robust ECSM Using Repeated and Parallel Computations
tack). By generating single stuck-at faults exhaustively for a small ECSM prototype
we have given experimental results that show the probabilities of undetected errors
for the proposed error-detecting schemes.
Traditionally, the concept of having only two modules working in parallel has
been associated with error-detecting systems only. However, for ECSM, we have
shown that with only two ECSM modules along with either PV (i.e., the proposed
DMR PV based scheme) or re-computation (i.e., PRC based scheme), it is possible
not only to detect but also correct errors due to faults. These fault-tolerant schemes
are more efficient, i.e., it uses one less ECSM module than the TMR based scheme.
Chapter 5
In Chapter 4 we have presented error detection and fault tolerance in ECSM at the
module level. That is, we have added external elements to the ECSM module in
order to detect errors and/or tolerate faults, where the underlying scalar multiplica-
tion algorithm is not modified. In contrast, this chapter presents error detection at
the algorithm level. Here, we add protections inside the ECSM algorithm in order
to detect errors caused either by natural causes or deliberately by faults injected by
an attacker. For this purpose, we use point verification (PV) and coherency check
(CC) among selected variables utilized for the ECSM. The CC functions that we
define in this chapter are algorithm specific. On the contrary, PV can be applied
to any ECSM method.
In this chapter we investigate the error detection capability of different methods
in ECSM. For the remainder of this chapter, the following assumptions are made:
121
122 Algorithm-level Error Detection for ECSM
• Any variable utilized in the ECSM can be a target of natural faults or faults
injected by an attacker. For simplicity in the analysis we assume that variables
such as the loop counter i and scalar k can be checked for integrity in order
to prevent disturbance of their values.
• Faults might occur directly in the registers containing the variables using a
flip-bit fault model or at a finite field arithmetic level. In the latter, any error
might spread into one or more variables.
• In Sections 5.1 and 5.2 we assume that decisional tests (e.g., “if (PV(Q) = 1)
then”) are not susceptible to faults. This might be the case for secure mi-
crocontrollers utilized in today’s smart cards where hardware protections are
added for not permitting fault injection into sensitive registers (e.g., CPU’s
status register). In Section 5.3 we relax this assumption considering the dou-
ble-fault attack proposed by Yen et al. [94] and refined by Kim and Quisquater
[48] for RSA cryptosystems.
Let us define a vector (V0 , V1 , . . . Vj−1 ) which is composed of the variables utilized
in the ECSM algorithm. As a consequence of faults produced naturally or injected
by an attacker the vector might be changed to (Ve0 , Ve1 , . . . Vej−1 ). Depending on
the resultant vector, the error-detecting scheme may detect the presence of an
error caused by the fault. Whether or not the error is detected depends on the
rules utilized for error detection (e.g., PV(Q)) and the actual values of the specific
variables to be tested. We can see this by analogy as a binary code of coding
5.1. Error detection in the Montgomery ladder algorithm 123
theory. For this case the length of the code n is the size (in bits) of the vector
(V0 , V1 , . . . Vj−1 ). The codewords are those cases where no error is detected, i.e., the
error-detecting scheme cannot distinguish between an error-free computation and
a faulty one. For comparing the error-detecting schemes presented in this chapter
we use the following definition [56]:
Definition 5.1 The ratio cR = log2 (r)/n is called the code rate, where r and n are
the number of codewords and length of the code, respectively.
for error detection. We give a comparison between both approaches that suggests
the use of an integrity check (IC) with either a PV or CC process.
Let k = (kt−1 · · · k1 k0 )2 and Q = kP be the scalar and the ECSM result,
respectively, where P = (x, y) ∈ E(F2m ) and n = ord(P ). First, let us define the
√
“exceptional” cases for the ECSM result be Q = ±P , Q = O, and Q = (0, b). In
the Montgomery ladder algorithm (Algorithm 2.5), for fault-free computations the
exceptional cases arise from the following values of k:
n+1 (i.e., Q0 = P , Q1 = 2P ),
n−1 (i.e., Q0 = −P , Q1 = O),
k=
n (i.e., Q0 = O, Q1 = P ),
√
n/2 (i.e., for n even Q0 = (0, b)).
In the algorithms presented in this section let us restrict these exceptional cases for
error-free computations, i.e., simply by restricting the input k being n ± 1 and n,
and n/2 if n is even.
For Algorithm 5.1 let us obtain the cases where PV(Q0 ) = 1 in Step 3.2, i.e.,
where Q0 is released as the ECSM output. Clearly this occurs always in an error-free
computation. However, this is not likely to be the case when errors are produced
by faults occurring naturally or injected deliberately by an attacker. Let us assume
that an adversary can induce fault(s) during the execution of the ECSM. Consider
e =
that this produces an incorrect result Q 6 Q which will be checked by the PV
process.
2. For i = t − 2 downto 0 do
Let us consider that any variable utilized by the ECSM algorithm can be affected
by errors due to faults. For the case of Algorithm 5.1 instead of having an error-
free vector (Q0x , Q1x , x, y) we have the erroneous vector (Q e1x , x
e0x , Q e, ye). From now
on, let us refer to Q0x , Q1x , x, and y as the final value of these variables after
the main loop. Assume that the adversary is able to inject faults during the main
loop, i.e., where the sensitive information (i.e., scalar k) is utilized. With the above
e0 ) = 1. We can substitute the
considerations let us obtain the cases where PV(Q
e0x , Q
point (Q e0y ) in the governing elliptic curve equation (Equation (2.3)) to obtain:
e20 + Q
Q e30 + aQ
e0y = Q
e0x Q e20 + b. (5.2)
y x x
The above equation has two solutions if and only if Tr(w1 ) = 0, where
2
e20 + ye + x
w1 = Q
ye
e2 + + x
e+a+
b
. (5.3)
x
x
e 2 x
e e20
Q x
e1x are
In such a case the two solutions for Q
x
eQe0x
e1x (a) =
Q Ht(w1 ), (5.4)
e2 + Q
x e20
x
x
eQe0x
e1x (b) =
Q (Ht(w1 ) + 1). (5.5)
e2 + Q
x e20
x
5.1. Error detection in the Montgomery ladder algorithm 127
Using Lemma 5.1 we can obtain the code rate cR for this case as
In Step 3.1 of Algorithm 5.1 Q0y is obtained as a function of Q0x , Q1x , x, and
y. This computation assumes that the difference between Q1 and Q0 is P . If due
e0 will become
to a fault this difference is lost, then presumably the corresponding Q
a finite field pair that does not rely on E(F2m ). Then PV process at the end of
ECSM will detect such errors. This is not the case for the combinations obtained
in Lemma 5.1.
O P 2P
(n — 1)P = — P 3P
(n — k + 1)P
— Q0 = (n — k)P Q0 = kP
— Q1 = (n — k — 1)P Q1 = (k + 1)P
ing ECSM output is released. For checking the coherency among a given vec-
tor (Q0x , Q1x , x, y) we can proceed as follows. First, we can search for a point
Q b0y ∈ E(F2m ). This step involves
b0y ) for some Q
b0 ∈ E(F2m ) of the form (Q0x , Q
b0y from the elliptic curve equation. Let
the solution of a quadratic equation for Q
b0 will always exist. In fact, for
us consider an error-free computation for which Q
b0y to one
Q0x 6= 0 there are two solutions of the quadratic equation. Let us set Q
b0 will be either Q0 or −Q0 , depending on which
of these solutions. In this way, Q
b0 , we can perform Q
solution is selected. With Q b0 ] P which will result in either
purposes, since the only information available about Q1 is its x-coordinate, we can
b0 ] P ) and x(−Q
perform only x(Q b0 ] P ) and compare the results with Q1x . Let us
define the CC function that defines the error detection rules for this scheme as:
ok = 1 b0 ] P ) = Q1x or x(−Q
if x(Q b0 ] P ) = Q1x ,
CC1(Q0x , Q1x , x, y) =
0 otherwise.
5.1. Error detection in the Montgomery ladder algorithm 129
The Montgomery ladder ECSM algorithm that uses function CC1 at the end for
error detection is presented as Algorithm 5.3. Now let us examine the cases where
Algorithm 5.3 releases Q0 , i.e., CC1(Q0x , Q1x , x, y) = 1. This is always the case for
130 Algorithm-level Error Detection for ECSM
e0x + b + a.
w2 = Q (5.7)
Qe20
x
2. For i = t − 2 downto 0 do
2.2 Else
2.2.1 Q0x ← x(Q0 ] Q1 ), Q1x ← x(2Q1 ).
In the following lemmas we show some similarities and differences in terms of error
detection between PV(Q0 ) and CC1(Q0x , Q1x , x, y). In particular, the next lemma
shows that if one of these error-detecting approaches fails to detect a vector with a
e0x , then the other approach also fails to detect some vectors with
specific value of Q
e0x .
the same Q
(i) If PV((r, l)) = 1, then there exists a vector (r, s0 , u0 , v 0 ) for which CC1(r, s0 ,
u0 , v 0 ) = 1 for some s0 , v 0 ∈ F2m , and u0 ∈ F∗2m .
132 Algorithm-level Error Detection for ECSM
(ii) If CC1(r, s, u, v) = 1, then there exists a pair (r, l0 ) for which PV((r, l0 )) = 1,
for some l0 ∈ F2m .
e0 ), to obtain solutions to Q
Proof For PV(Q e1x using Equations (5.4) and (5.5)
which corresponds to Tr(w2 ), i.e., Tr(w1 ) = Tr(w2 ). Then (i) and (ii) are true since
Tr(w1 ) (and Tr(w2 )) depends on Q e1x , x
e0x and not on Q e, or ye.
In some cases, these error-detecting approaches fail to detect the same vectors.
As illustrated in Figure 5.2, this happens in about 22m+1 of the possible combina-
tions of arbitrary (Q e1x , x
e0x , Q e, ye). This is explained in the following lemma.
(i) There exists a vector (r, s, u, v) for which PV((r, l)) = 1 and CC1(r, s, u, v) =
1, i.e., cases where both Algorithms 5.1 and 5.3 fail in detecting the same
vector (Q e1x , x
e0x , Q e, ye) (the overlapping area in Figure 5.2).
∼ ∼ ∼ ∼∼
PV(Q0 ) = 1 CC1( Q0x , Q1x , x, y ) = 1
≈2 3 m ≈2 3 m
≈ 2 2 m +1
Error coverage 24 m
e1x (a)PV = Q
Q e1x (a)CC1 or Q
e1x (a)PV = Q
e1x (b)CC1 .
e1x (b)PV = Q
Clearly, if one of these conditions is satisfied, then either Q e1x (a)CC1 or
e1x (b)PV = Q
Q e1x (b)CC . If we equate Equations (5.4) and (5.8), and Equations (5.4)
e e ye
x
eQ0x Tr Q0x + + x e = εP , (5.10)
x
e
e e ye
eQ0x Tr Q0x + + x
x e + 1 = εP , (5.11)
x
e
e0x +
respectively. Depending on the value of Tr(Q ye
+x
e), Equations (5.10) and
x
e
which shows that (i) and (ii) are true. In summary, for a given vector (Q e1x , x
e0x , Q e, ye),
e0x , Q
if Q e1x , x
e, and ye satisfy either Equation (5.4) or (5.5), and either Equation
e0 ) and CC1(Q
(5.12) or (5.13), then PV(Q e1x , x
e0x , Q e, ye) do not detect such an erro-
neous vector. Now let us obtain the number of combinations that satisfies (i). Let
us consider separately the cases where εP = 0 from those with εP = x
eQe0x :
• Case 2: εP = x
eQe0x . From the definition of εP we have:
ye2 + x e3 + ae
eye + x x2 + b + x
eQe0x = 0. (5.14)
The resultant quadratic expression for ye will have a solution if and only if:
!
b e0x
Q
Tr x
e+a+ 2 = Tr . (5.15)
x
e x
e
be satisfied in about half of the combinations for which two solutions for ye
e0x and x
are obtained for arbitrary Q e. Accordingly, for each possible value of
e0x we can have about 2m pairs of (e
Q x, ye) that satisfy Equation (5.14). Then
the number of combinations for which both Algorithms 5.1 and 5.3 fail when
εP = x
eQe0x is (#E(F2m ) − 2)2m ≈ 22m .
From Lemma 5.3 we can deduce that if we combine PV(Q0 ) and CC1(Q0x , Q1x ,
x, y) we can obtain a code rate of
2m + 1 1 1
cR ≈ = + . (5.16)
4m 2 4m
However, from this lemma we can also see that if εP = 0 both error detection
approaches have the same error detection coverage. This means that if we add
a point verification process to P (i.e., PV(P )), we can use either PV(Q0 ) or
1
CC1(Q0x , Q1x , x, y) and have an improved cR of about ≈ 2
. This code rate of
about 0.5 means that the number of redundant bits utilized for error detection is
about half of the length of the code.
In the previous two subsections we have provided an analysis of two different ap-
proaches to detect errors in the Montgomery ladder ECSM. The first consists of
only verifying whether or not the result Q0 lies on E(F2m ) which is a basic coun-
termeasure against fault-based attacks and has been considered by Biehl et al. [9],
136 Algorithm-level Error Detection for ECSM
Q0x Q1x x y
g ( Q0x , Q1x , x, y ) IC
Q0y IC(P )
Output
PV Q0
PV(Q0 )
Ciet and Joye [21], Antipa et al. [6], Blömer et al. [14], and Domı́nguez and Hasan
[27]. The second approach consists of checking the coherency between the involved
variables which is an extension of the work presented by Giraud [37] in the context
of RSA cryptosystems. Even when both approaches (and their combination) give
a very good code rate, it is possible to have a further improvement with practically
no cost. The idea is to have an integrity check (IC) of P . That is, a verification
after the main loop to check whether or not the register containing P = (x, y)
corresponds to the original input point. Hence, IC does not involve computations
with other variables as required for CC. This idea is illustrated in Figure 5.3. The
IC process can be implemented using duplication and/or the well-known cyclic re-
dundancy check. Let us assume that this mechanism permits the detection of any
alteration on the register containing P .
Let us obtain the cases where the error-detecting scheme presented in Figure 5.3
5.1. Error detection in the Montgomery ladder algorithm 137
e0 =
will not detect an erroneous output Q e0y is computed
6 kP . Using function g, Q
as a function of Q e0y in Equation (5.2)
e1x , x, and y. Replacing this value of Q
e0x , Q
The above equation has two solutions if and only if Tr(w3 ) = 0, where
e20 + x2 + b + b .
w3 = Q (5.17)
x
x2 Qe20
x
e1x are
In such a case the two solutions for Q
e0x
xQ
e1x (a) =
Q Ht(w3 ), (5.18)
e20
x2 + Q x
e0x
xQ
e1x (b) =
Q (Ht(w3 ) + 1) . (5.19)
e20
x2 + Q x
Utilizing these values, Equations (5.18) and (5.19) can be rewritten as follows
e0x h
xQ y i
e1x (a) =
Q Ht(w30 ) + + x + Tr(x) , (5.20)
2 e
x + Q0x2 x
e0x h
xQ y i
e1x (b)
Q = Ht(w30 ) + + x + Tr(x) + 1 . (5.21)
2 e
x + Q0x2 x
log2 (#E(F2m ) − 4) 1
cR = ≈ , (5.22)
4m 4
which represents the best value of cR obtained so far. This is mainly based on
the simple observation of checking the integrity of P . The overhead of this error-
detecting mechanism is quite low in comparison with the main ECSM procedure.
Assuming a random vector (Q e1x , x
e0x , Q e, ye), a code rate of 0.25 is equivalent of having
a theoretical probability of undetected error of 1/23m which is zero for any practical
scenario. However, the assumption of having a random vector of (Q e1x , x
e0x , Q e, ye)
might not be true for faults injected by a sophisticated attacker (e.g., SCF attack
for applications using elliptic curves defined over Fp ).
ok = 1 if Q0 ] P = Q1 ,
CC2(Q0 , Q1 , P ) =
0 otherwise.
Since Q0 , Q1 , P ∈ E(F2m ), for each pair (Q0 , P ) there is only one value of Q1 for
e0 , Q
which CC2(Q0 , Q1 , P ) = 1. Then, for an arbitrary vector (Q e1 , Pe) the value for
cR is
m 1
cR ≈ = . (5.23)
6m 6
Even when this approach theoretically has an improved error detection capabil-
ity, it has an important drawback in terms of performance as illustrated in Table
5.1. This table compares the operation counts for ECSM among the Montgomery
ladder algorithms proposed by López and Dahab [58] (Algorithms 2.6 and 2.7),
the Montgomery ladder with PV at the end, the basic Montgomery ladder (Algo-
rithm 2.5), and the basic Montgomery ladder (Algorithm 2.5) with PV(Q0 ) and
CC2(Q0 , Q1 , P ). This table shows that the basic Montgomery ladder with PV(Q0 )
and CC2(Q0 , Q1 , P ) requires about double the number of finite field multiplications
in comparison with the Montgomery ladder with PV at the end for the algorithms
that use the affine system, i.e., 4tM vs. (2t + 4)M . For the case of algorithms
that use the projective coordinate system, the basic Montgomery ladder algorithm
with PV(Q0 ) and CC2(Q0 , Q1 , P ) requires about the triple more finite field mul-
tiplications than the Montgomery ladder with PV at the end, i.e., (18t − 6)M vs.
(6t + 6)M .
Table 5.1: Operation counts for computing the ECSM utilizing error detection at
the algorithm level for Montgomery’s ladder method
attack proposed by Blömer et al. [14] only applies to applications using curves over
Fp , and not for those using curves defined over F2m . Additionally, because of its
uniformity this method is resistant to attacks based on timing [52] and simple power
analysis [53].
For invalid-curve attacks, including those discussed in Chapter 3, verifying if
the output is in E(F2m ) is crucially important. For all the approaches presented
in this section that utilize PV and/or CC the output always relies on the original
elliptic curve. In DFA, the attacker needs to inject faults in a given location during
a number of runs during the ECSM algorithm and obtain erroneous results. How-
ever, as we have shown in this section the error detection coverage of the methods
presented is quite high. This makes DFA impractical. However, “a security sys-
5.2. Error detection in ECSM by double-and-add-always 141
tem is only as strong as its weakest link. It doesn’t matter how strong the other
parts are” [31]. For applications utilizing elliptic curves over F2m , from the attacker
point of view, PV (or CC1 or CC2) can be seen as a strong protection that does
not permit producing any faulty results as output. Then, the question is how this
type of protections can be bypassed. And, in such a case what protections should
be added against these strong adversaries. In Section 5.3 we consider the scenario
where the attacker can inject a fault in the main algorithm and then bypass an
error detection mechanism based on decisional tests (e.g., “if (PV(Q) = 1) then”).
only SE and DFA attacks but also the SCF attack. This is an interesting result,
since to the best of our knowledge the only countermeasures proposed against SCF
attack are to use a called combined curve [14], utilizing the Montgomery ladder
algorithm without the y-coordinates, and those that use scalar randomization [27].
For this section, let us consider elliptic curves defined over either F2m or Fp , i.e.,
Fq .
Compared to Algorithm 2.1, Algorithm 2.3 adds a dummy instruction in Step 2.3
whenever the scanned bit of the scalar is 0, i.e., if kl = 0 Step 2.3 becomes Q0 ← Q0
for any l ∈ [0, t − 1]. Since the value of Q1 is not utilized when kl = 0 and it is
overwritten at i = l + 1 with Q1 ← Q0 ] P , it can be target of the SE attack.
The idea is to modify this algorithm in such a way that even during dummy
instructions the alteration of the related registers could be detected. The resultant
method is presented as Algorithm 5.4. Here, Q0 follows the same sequence of oper-
ations as the original double and add method (Algorithm 2.1), i.e., each interaction
Q0 is doubled and if ki = 1 then it is added with P . On the other hand, Q1 will
change only during the dummy instructions, i.e., whenever ki = 0, Q1 will be added
to P . In this way at the end of the loop Q1 = H(k)P , where H(k) denotes the
Hamming weight of the binary complement of k, i.e., k = 2t − k − 1. Note that the
output Q0 does not depend on Q1 . However, for defending against an SE attack
we can check any alteration on Q1 utilizing the following CC function:
ok = 1 if Q1 = H(k)P,
CC3(k, P, Q1 ) =
0 otherwise.
5.2. Error detection in ECSM by double-and-add-always 143
Since H(k) ∈ [0, t − 1], in the worst case the scalar multiplication H(k)P takes
2dlog2 (t − 1)e point operations. For example, for an elliptic curve defined over F2163
and t = m, H(k)P takes at the most 16 point operations. If P is known a priori,
we can precompute jP for j ∈ [2, t − 1] and store those results in a table. Since
k and P are verified for integrity, and assuming that the computation of H(k)P is
error free, CC3 function will detect any alteration in Q1 resulting in an effective
defense against the SE attack.
1. Q0 ← O, Q1 ← O.
2. For i = t − 1 downto 0 do
2.1 Q0 ← 2Q0 .
2.2 Qki ← Qki ] P.
3.1 Return(Q0 );
equivalent to the basic Montgomery ladder ECSM, i.e., from Equation (5.23) cR ≈
1
6
. Even when this code rate might be quite good for some cases (e.g., random
errors), for curves defined over Fp Algorithm 5.4 is insecure against the SCF attack.
144 Algorithm-level Error Detection for ECSM
In a similar way as described by Blömer et al. [14], if the attacker can change the
sign of the register containing the intermediate value of Q0 at random values of
i, then the attacker can retrieve the scalar. The problem of Algorithm 5.4 is that
PV is not sufficient to avoid the SCF attack, i.e., faulty points never leave E(Fp ).
Additionally, the CC3 function does not verify alterations in Q0 , i.e., CC3 verifies
alterations in Q1 that permits to resist the SE attack. In the next subsection we
show that using another CC function it is possible to resist the SCF attack for the
right-to-left version of ECSM by double-and-add-always.
Q0 = kP,
Q1 = kP,
Q2 = 2t P,
Q0 ] Q1 ] P = kP ] (2t − k − 1)P ] P = 2t P = Q2 .
5.2. Error detection in ECSM by double-and-add-always 145
Hence, for verifying the coherency among these points we can use the following CC
function
ok = 1 if Q2 = Q0 ] Q1 ] P,
CC4(Q0 , Q1 , Q2 , P ) =
0 otherwise.
In addition to CC, Algorithm 5.5 includes PV and IC processes. This permits for
e0 , Q
an arbitrary vector (Q e1 , Q
e2 , Pe) to have a code rate cR ≈ 1 .
4
1. Q0 ← O, Q1 ← O, Q2 ← P.
2. For i = 0 to t − 1 do
3.1 Return(Q0 );
SE Attack: Let us assume that the attacker can inject one fault, single or
multiple-bit, in one variable during the ECSM. In fact, since the output Q0 does
not depend on intermediate values of Q1 , the register holding the latter point might
be exploited for mounting the SE attack. Suppose that the attacker injects a fault
e1,i . This error will propagate
in Q1 at a specific iteration i, i.e., Q1,i becomes Q
for subsequent values of the variable Q1 . At the end the main loop we will have
e1 6= kP . Here Algorithm 5.5 provides a two-level protection. The first is with
Q
PV of variable Q1 . For a random error in the corresponding coordinates of Q1,i ,
PV(Q1 ) might be sufficient to make SE impractical. However, suppose the attacker
e1,i ∈ E(Fq ) (e.g., SCF attack). For this case
can inject a fault in such a way that Q
e1 ) = 1 and CC4 provides a second protection. Here, CC4 will compute
PV(Q
e1 ] P = (k + 1)P + Q
Q0 ] Q e1 .
e1 = (2t − k − 1 + jn)P,
Q (5.24)
where j is an integer and n = ord(P ). For these exceptional cases PV and CC4
will fail detecting such errors. However in our opinion this is unlikely to happen in
e1 ∈ E(Fq )
practice for the following reasons. First, the only known attack where Q
is the SCF attack. Secondly, even if an SCF is injected on Q1,i , the attacker is
e1 that satisfies Equation
unlikely to have the precision for obtaining a final result Q
(5.24) since he/she does not know k. As a result, Algorithm 5.5 can be considered
as SE resistant.
5.2. Error detection in ECSM by double-and-add-always 147
SCF Attack: Let us assume that the attacker can inject one SCF into an in-
termediate point of Algorithm 5.5 at some random and unknown loop iteration
i ∈ [0, t − 1]. This fault model is similar to the assumed by Blömer et al. [14]. Since
the intermediate values of Q0 and Q2 are used for computing the final result of the
ECSM, these two points might be susceptible to an SCF attack. From Equation
(2.6), we can obtain the output of Algorithm 5.5 Q0 = kP as
where Q0,i is the intermediate value of Q0 during some loop iteration i ∈ [0, t −
Pi j
1], i.e., Q0,i = j=0 kj 2 P. Now, let us consider separately an SCF attack on
• SCF attack targeting Q0 in Step 2.1. Assume that the attacker mounts an
e0,i = −Q0,i .
SCF attack on Q0 at some loop iteration i ∈ [0, t − 1], such that Q
From Equation (5.25) we have
t−1
X t−1
X i
X
e0 =
Q j
kj 2 P − Q0,i = j
kj 2 P − kj 2j P .
j=i+1 j=i+1 j=0
k. This can be repeated for different and ascending values of i utilizing the
known bits of k until the retrieval of the complete scalar. However, CC4 does
e0 =
not permit to release any Q 6 Q0 since
i
X
e
Q0 ] Q1 ] P = −2 kj 2j P ] 2t P,
j=0
Pi
which does not match with Q2 = 2t P for j=0 kj 6= 0. Hence, CC4 protects
Algorithm 5.5 for the SCF attack in Q0,i .
• SCF attack targeting Q2 in Step 2.2. For this case the SCF attack is on Q2 at
e2,i = −Q2,i . Since Q2,i = 2i+1 P,
some loop iteration i ∈ [0, t − 1], such that Q
e2,i = −2i+1 P, and at the end of the loop we have
Q
e2 = −2t P .
Q
t−1
X i
X t−1
X
e0 = Q0,i −
Q kj 2j P = kj 2j P − kj 2j P ,
j=i+1 j=0 j=i+1
i
X
e
Q0 = 2 kj 2j P − kP .
j=0
e1 is
Similarly, it can be shown that the final value of Q
i
X
e
Q1 = 2 k j 2j P − kP .
j=0
5.2. Error detection in ECSM by double-and-add-always 149
i
X i
X
e e j
Q0 ] Q1 ] P = 2 kj 2 P ] k j 2j P − kP − kP,
j=0 j=0
i
X
= 2 2j P ] 2P − 2t P,
j=0
= 2i+2 P − 2t P,
2i+2 P = O.
The latter cannot be true if the order of P is prime. As a result, CC4 will
detect any SCF attack in Q2,i .
Under the same fault model as the utilized by Blömer et al. [14], Algorithm 5.5 is
SCF attack resistant. This is an interesting result because this algorithm does not
use a combined curve [14] or randomization as RC and PC schemes presented in
Chapter 4. This protection is based on CC among the involved variables and it
resists both the SE and the SCF attacks.
Table 5.2: Operation counts for the double-and-add-always ECSM method for
curves defined over F2m
the cost in terms of finite field operations for the double-and-add-always algorithms
utilizing elliptic curves defined over F2m . The first four rows present the cost of
Algorithms 2.3 and 2.4 which do not include any error-detecting processes. In
contrast, the last four rows shows their counterparts that includes PV and CC (Al-
gorithms 5.4 and 5.5). Table 5.3 shows estimates of these costs for a specific value
of t = m = 163. In this table we can notice an overhead in terms of finite field oper-
ations for the algorithms that include PV and CC for error detection in comparison
with the algorithms without error detection. For affine coordinates, these overheads
are approximately 4.91%I + 5.06%M + 5.32%S and 0.61%I + 0.92%M + 1.23%S for
Algorithms 5.4 and 5.5, respectively. On the other hand, for the projective coordi-
nates case, the overheads are of about 4.95%M +5.08%S and 1.00%M +1.07%S for
Algorithms 5.4 and 5.5, respectively. Additionally, we can notice a speed advantage
5.2. Error detection in ECSM by double-and-add-always 151
2.4. 4 Using Algorithm 5.4 to obtain kP and Algorithm 2.3 for H(k)P, assuming H(k) =
Table 5.3: Number of finite field operations for the double-and-add-always ECSM
method for t = m = 163 for curves over F2163
of using the left-to-right versions over a their right-to-left counterparts when using
projective coordinates, e.g., 1I + 1959M + 1142S vs. 1I + 2611M + 1305S for Algo-
rithms 2.3 and 2.4, respectively; and 1I + 2056M + 1200S vs. 1I + 2637M + 1319S
for Algorithms 5.4 and 5.5, respectively. This is mainly for the use of mixed coor-
dinates.
Similarly, Table 5.4 shows the count of finite field operations for the double-and-
add-always algorithms utilizing elliptic curves defined over Fp . In addition, Table
5.5 gives estimates of these costs for a specific value of t, i.e., t = 192. For the
case of affine coordinates, in this table we can notice an overhead of approximately
4.17%I +4.30%M +4.51%S and 0.52%I +0.78%M +1.04%S for Algorithms 5.4 and
5.5, respectively. For the projective coordinates case, the overheads are of about
152 Algorithm-level Error Detection for ECSM
4.20%M + 4.31%S and 0.85%M + 0.91%S for Algorithms 5.4 and 5.5, respectively.
Additionally, we have shown that the right-to-left version of the double-and-add-
always ECSM can be equipped with PV and CC in order to resist the SCF attack.
The overhead if the ECSM is computed in affine coordinates is of about 0.53%1 .
For the case of Algorithm 5.5 utilizing projective coordinates, let us compare it
with the left-to-right version without error detection. The main reason is due to
the penalty for not using mixed coordinates. With this consideration, the overhead
of Algorithm 5.5 utilizing projective coordinates is about 27.4%1 . This contrasts
with the 30 − 40% reported by Blömer et al. [14] or about the 100% for RC or PC
schemes.
Table 5.4: Operation counts for the double-and-add-always ECSM method for
curves defined over Fp
1
We assume that I = 80M and S = 0.85M as [40].
5.3. Double-fault attack resistant ECSM 153
2.4. 4 Using Algorithm 5.4 to obtain kP and Algorithm 2.3 for H(k)P, assuming H(k) =
Table 5.5: Number of finite field operations for the double-and-add-always ECSM
method for t = 192 for curves over Fp192
Fault Model. The attacker can mount two attacks during one run of the ECSM
algorithm. The first to the main algorithm in order to corrupt operations that
depend on sensitive information (i.e., scalar k). The second to any decisional test
used even on those with error detection procedures (e.g., “if (PV(Q) = 1) then”).
Let us base our countermeasure on Montgomery’s ladder ECSM for non-super-
singular elliptic curves over the binary finite field proposed by López and Dahab
[58] (Algorithm 2.6). However, these concepts can generally be extended to the
corresponding Montgomery’s ladder ECSM for prime fields presented by Brier and
Joye [17]. Since SCF attack does not apply to Montgomery’s ladder ECSM methods
that do not use the y-coordinate for computing the ECSM, under this double-fault
model the most dangerous threat is the DFA attack. An effective defensive measure
against this attack can be achieved adding randomness to the computation in such
a way that the attacker cannot obtain useful information from a faulty output. In
fact, if a projective coordinate system is utilized one can simply apply base point
randomization [23] for making DFA impractical. However, for the affine system we
can use the concept of point “blinding” presented by Coron [23]. The idea is to add
a random point R to the initial values of Q0 and Q1 . Let r be defined as r = logP R,
i.e., R = rP . Using Equation (2.7), the original Montgomery ladder algorithm sets
Lt−1 = 1 and Mt−1 = 2 (i.e., Q0 = P and Q1 = 2P ), and we use Equation (2.8)
repeatedly for i from t − 2 to 0 to obtain Q0 = kP and Q1 = (k + 1)P . Now, we
can set Lt−1 = 1 + r and Mt−1 = 2 + r (i.e., Q0 = P ] R and Q1 = 2P ] R). Note
that since R is added initially to both Q0 and Q1 , at each iteration during the loop
5.3. Double-fault attack resistant ECSM 155
their corresponding part dependent on R is doubled. Thus, after the main loop the
following is obtained:
Q0 = kP ] 2t−1 R,
Q1 = (k + 1)P ] 2t−1 R.
O P 2P
3P
Q1 = (k + 1)P 2 t −1 R
Q0 = kP 2t −1 R (r + 1)P = P R
(r + 2)P = 2P R
kP = Q0 — 2t —1 R
2. Q2 ← −R.
3. T ← (y + Ry )/(x + Rx ).
4. Q0x ← T 2 + T + x + Rx + a.
5. T ← x/(x + Q0x ).
6. Q1x ← T 2 + T + Rx
7. For i = t − 2 downto 0 do
9. Q0 ← Q0 ] Q2 .
10.1 Return(Q0 );
5.4 Conclusion
In this chapter we have presented error-detecting schemes at the algorithm level
for ECSM. We have used PV in conjunction with CC functions that are algorithm
dependent. In the Montgomery ladder algorithm, we have considered the use of PV
of the output and the concept of CC. In fact, we have shown that if we verify the
integrity of the input point P (i.e., IC(P )), these two approaches are equivalent with
respect to their error detection coverage. In this way, we can use PV of the output
along with IC of P to have an improved error detection coverage with negligible
cost. The double-and-add-always ECSM method presented by Coron [23] have
been shown to be susceptible to the SE attack [91] [92]. In this chapter, we have
presented the left-to-right and the right-to-left methods which provide resistance
to the SE attack by utilizing CC among selected variables. Additionally, we have
shown that for the right-to-left ECSM by double-and-add-always it is possible to
resist the SCF attack presented by Blömer et al. [14]. This result is interesting since
this algorithm do not use the combined curve [14], or the use randomization as RC
and PC schemes. For applications utilizing affine coordinates it has a negligible cost
158 Algorithm-level Error Detection for ECSM
in terms of additional finite field operations needed (i.e., less that 1% for t ≥ 192).
And even, for the case of projective coordinates, we have noted that the cost is of
about 27.4% for t = 192. This value is less than the 30 − 40% reported by Blömer
et al. [14], or about 100% for schemes like RC or PC. Finally, we have considered
the case where an attacker could mount a double-fault attack. Even with this
strong fault model, it is possible to avoid fault attacks by utilizing some type of
randomization. In this case we have utilized the concept of point blinding on the
Montgomery ladder ECSM method.
Chapter 6
6.1 Conclusions
In this thesis, several aspects of fault-based attacks and countermeasures for ECC
have been studied. We have presented a new fault-based attack against a popular
algorithm, namely the Montgomery ladder ECSM for curves over the binary field.
In addition, we have presented error-detecting schemes for ECSM at both module
and algorithm levels.
In Chapter 3, we have introduced an invalid-curve attack on the Montgomery
ladder ECSM algorithm proposed by López and Dahab. This attack is based on
the fact that curve parameter a is not utilized for point operation formulas in this
ECSM method. From an original group E(F2m ), we have assumed that there exists
a weaker group with the same parameters except for a. We have shown that this
assumption is true for nine of the ten NIST-recommended elliptic curves over the
binary field. Under a specific fault model, we have presented two versions of this
159
160 Conclusions and Future Work
attack with their respective probabilities of success. This attack underlines the
importance of verifying the correctness of the ECSM operation.
Error detection is an essential task for protecting against fault-based attacks.
Applying some properties of elliptic curves, it is possible to detect errors during
the ECSM operation utilizing different techniques. The most obvious is to verify
if the output relies on the original elliptic curve. However, it has been shown that
in some cases this protection is not sufficient. Other techniques utilized to detect
errors during the ECSM computation have been presented in Chapters 4 and 5.
In Chapter 4, we have used the concepts of re-computation and parallel com-
putation to achieve error detection and fault tolerance on ECSM. By means of
this we have introduced encoding/decoding schemes suitable for ECSM which are
based on scalar and point randomization. The proposed structures permit us to
detect errors with high probability. Additionally they resist attacks such as the
SCF attack without modifying the curve parameters. This might be important for
crypto-processors that have fixed parameters (e.g., those recommended by NIST).
Also, utilizing a small ECSM prototype, we have presented experimental results for
the probability of undetected errors for the error-detecting structures presented.
We have also shown that, with two ECSM modules working in parallel, it is possi-
ble to have fault-tolerant schemes that will not only detect but also correct errors.
This contrasts with the three modules needed for the TMR based systems. Using
an FPGA as hardware target with an NIST-recommended elliptic curve over F2163 ,
we have shown that the savings in terms of area with respect to TMR based ECSM
is about 24.8% for DMR PV and 23.6% for PRC.
In Chapter 5, we have presented error detection at the algorithm level for ECSM.
We have utilized the concepts of PV, CC, and IC. The idea of CC has been carried
6.2. Future work 161
out by algorithm specific functions CC1-CC4. For the Montgomery ladder ECSM
algorithm for elliptic curves over the binary field, we have demonstrated that PV has
the same error detection coverage as CC1 if an IC of the input point is performed.
We have shown that using PV and IC it is possible to have improved error detection
without degrading the performance. This contrasts with the notable error detec-
tion coverage of the basic Montgomery ladder ECSM algorithm where the penalty
in performance might be unacceptable. Utilizing CC as error-detecting technique,
we have proposed two versions of the double-and-add-always ECSM method (Algo-
rithms 5.4 and 5.5). We have shown that both algorithms resist the SE attack. In
addition, we have proved that the right-to-left version (Algorithm 5.5) resists the
SCF attack. This is the first countermeasure against the SCF attack reported in
the literature that does not use a combined curve [14] or randomization as RC and
PC schemes presented in Chapter 4. For today’s applications, where t ≈ 192, we
have shown that our countermeasure has a overhead in terms of finite field opera-
tions of about 0.8% when using the affine coordinate system and about 27.4% for
applications utilizing the projective coordinate system. These values contrast with
the overhead utilizing a combined curve (i.e., 30 − 40% [14]) or re-computation or
parallel computation (i.e., about 100% for RC and PC schemes).
163
164 Average Number of EC Discrete Logarithms for Algorithm 3.3
Table A0 Table A1
1 2
3 4
c0 4 5
entries M M
2c0 - 3 2c0 - 2 c1
2c0 - 1 2c0 entries
2c0 + 1
2c0 + 2
M
c0 + c1
Figure A.1: Values of the random variable w according to entries of Tables A0 and
A1 considering c0 < c1
(w2 − 1)/(4c2 ) w odd, and 1 ≤ w ≤ 2c0 − 1,
0
F (w) =
w2 /(4c2 ) w even, and 2 ≤ w ≤ 2c0 .
0
165
(w − 1)/(2c2 ) w odd, and 1 ≤ w ≤ 2c0 − 1,
0
f (w) =
w/(2c2 ) w even, and 2 ≤ w ≤ 2c0 .
0
2c0
X
µ= wf (w).
w=1
w−1 w
After performing a change of variables (i.e., y = 2
and y = 2
for the odd and
even number cases, respectively) µ can be re-written as
c−1
X c0
X
y(2y + 1) 2y 2 8c20 + 3c0 + 1 4
µ= + = ≈ c0 (for c0 >> 1). (A.1)
y=0
c20 y=1
c20 6c0 3
Case: c0 < c1 . Similar to the previous case we can write F (w) for some given
values as follows
We express F (w) as
(x2 − 1)/(4c0 c1 ) x odd, and 1 ≤ x ≤ 2c0 − 1,
F (x) = x2 /(4c0 c1 ) x even, and 2 ≤ x ≤ 2c0 ,
(x − c )/c
0 1 2c0 + 1 ≤ x ≤ c0 + c1.
(x − 1)/(2c0 c1 ) x odd, and 1 ≤ x ≤ 2c0 − 1,
f (x) = x/(2c0 c1 ) x even, and 2 ≤ x ≤ 2c0 ,
1/c
1 2c0 + 1 ≤ x ≤ c0 + c1.
cX
0 +c1
µ= xf (x)
x=1
x−1 x
After performing a change of variables (i.e., y = 2
and y = 2
for the odd and
even number cases, respectively, where 1 ≤ x ≤ 2c0 ) µ can be expressed as
167
cX
0 −1 c0 cX
0 +c1
y(2y + 1) X 2y 2 x
µ= + + ,
y=0
c0 c1 cc
y=1 0 1
c
x=2c +1 1 0
Case: c0 > c1 . This case is vary similar to the previous case. In fact, from
Equation (A.2) we can perform the changes of variables c0 ← c1 and c1 ← c0 to
obtain the mean for this case:
Suppose we compute the ECSM using re-computation (Figure 2.2) and point nega-
tion as encoding and decoding process using the affine coordinate system. Let the
elliptic curve utilized be E(a, b) and be defined over F2m . Assume that this scheme
is implemented in hardware with one fault in the ECSM module. For simplicity
assume that we have a stuck-at-1 fault, and it is located at the gate that gets bit i
of the y-coordinate ypi as shown in Figure B.1. Let the elliptic curve points that are
input to the ECSM module at time t0 and t1 be P = (xp , yp ) and −P = (xp , yp ),
respectively, where xp = xp and yp = xp + yp . The input y-coordinates passing
169
170 Example of Undetected Errors with Point Negation
xp0
xp 1
xp
xp m−2
xpm −1
P = ( xp , y p ) Q
ECSM
yp0 module
yp 1
yp yp i
stuck-at-1
gate
yp m −2
ypm −1
Figure B.1: Stuck-at-1 fault at the gate that gets bit i of the y-coordinate
where the vector (00 . . . 010...00) has 0s in all its bits with the exception of bit
position i, and the operator | is a bit-wise OR.
If ypi = 1 and xpi + ypi = 1, this fault does not produce any error in either of the
two ECSM runs (see Table B.1). This is because the i-th bit of both y-coordinates
is 1 and the fault does not change this bit’s value. Any other combination of ypi and
xpi + ypi gives a different value as shown in Table B.1. For the second (respectively
third) case we can see that an original input point, P (respectively −P ), is present
at time t0 (respectively t1 ). For these two cases the results will be different. This
is because it is necessary to have complementary points at both input modules in
171
order to have the two outputs to match. For the fourth case, the input points are
different from P and −P , say P1 and P2 . The resultant finite field element pairs
are the negative of each other, say P1 = −P2 . Furthermore, they are not part of
the original elliptic curve E(a, b), but they are part of another elliptic curve, say
e eb). Because the point addition and doubling might not use the parameter b,
E(a,
e 2m ). When the computations are
the two ECSM runs will be performed over E(F
complete, the two results will be the same but incorrect. In this particular case the
system will fail by accepting an erroneous result. Similar analysis and results can
be obtained for a stuck-at-0 fault at the same bit position of the y-coordinate (i.e.,
ypi ), or if the fault is located at the gate that gets the i-th bit of the x-coordinate
(i.e., xpi ).
Bibliography
[1] “The elliptic curve cryptosystem for smart cards,” Certicom Corp., 1998.
[Online]. Available: [Link] 13
[3] R. Anderson and M. Kuhn, “Low cost attacks on tamper resistant devices,” in
Security Protocols, 5th International Workshop Proceedings, ser. LNCS 1361.
Springer-Verlag, 1997, pp. 125–136. 3
[5] ANSI X9.62, Public Key Cryptography for the Financial Services Industry:
The Elliptic Curve Digital Signature Algorithm (ECDSA). American National
Standards Institute, 1999. 1
173
174 BIBLIOGRAPHY
ser. LNCS 2567. Springer-Verlag, 2003, pp. 211–223. 35, 36, 37, 38, 41, 83,
88, 136
[7] C. Aumüller, P. Bier, W. Fischer, P. Hofreiter, and J.-P. Seifert, “Fault at-
tacks on RSA with CRT: Concrete results and practical countermeasures,” in
CHES 2002: Cryptographic Hardware and Embedded Systems, ser. LNCS 2523.
Springer-Verlag, 2002, pp. 260–275. 45
[9] I. Biehl, B. Meyer, and V. Müller, “Differential fault attacks on elliptic curve
cryptosystems,” in CRYPTO 2000: Advances in Cryptology, ser. LNCS 1880.
Springer-Verlag, 2000, pp. 131–146. 4, 33, 35, 36, 37, 38, 41, 49, 82, 83, 88,
97, 98, 135, 155
[10] E. Biham and A. Shamir, “Differential fault analysis of secret key cryptosys-
tems,” in CRYPTO 1997: Advances in Cryptology, ser. LNCS 1294. Springer-
Verlag, 1997, pp. 513–525. 3
[13] J. Blömer, M. Otto, and J.-P. Seifert, “A new CRT-RSA algorithm secure
against Bellcore attacks,” in ACM Conference on Computer and Communica-
tions Security. ACM, 2003, pp. 311–320. 45
[14] ——, “Sign change attacks on elliptic curve cryptosystems,” in FDTC 2005:
Fault Diagnosis and Tolerance in Cryptography, ser. LNCS 4236. Springer-
Verlag, 2006, pp. 36–42. 4, 37, 38, 41, 97, 98, 136, 140, 142, 144, 147, 149, 152,
157, 158, 161
[16] A. Boscher, R. Naciri, and E. Prouff, “CRT RSA algorithm protected against
fault attacks,” in WISTP 2007: Information Security Theory and Practices.
Smart Cards, Mobile and Ubiquitous Computing Systems, International Work-
shop, ser. LNCS 4462. Springer-Verlag, 2007, pp. 229–243. 45, 46, 47, 141,
144
[17] E. Brier and M. Joye, “Weierstraß elliptic curves and side-channel attacks,”
in PKC 2002: Public Key Cryptography, ser. LNCS 2274. Springer-Verlag,
2002, pp. 335–345. 25, 38, 154
[18] C.-L. Chen, “Formulas for the solutions of quadratic equations over GF (2m ),”
IEEE Transactions on Information Theory, vol. 28, no. 5, pp. 792–794, 1982.
14
addition in formal groups and new primality and factorization tests,” Advances
in Applied Mathematics, vol. 7, no. 4, pp. 385–434, 1986. 21
[20] M. Ciet and M. Joye, “Practical fault countermeasures for Chinese remainder-
ing based RSA,” in FDTC 2005: Workshop on Fault Diagnosis and Tolerance
in Cryptography, 2005, pp. 124–132. 45
[21] ——, “Elliptic curve cryptosystems in the presence of permanent and transient
faults,” Designs, Codes and Cryptography, vol. 36, no. 1, pp. 33–43, 2005. 34,
36, 37, 38, 41, 49, 83, 136
[22] H. Cohen and G. Frey, Eds., Handbook of elliptic and hyperelliptic curve cryp-
tography. CRC Press, 2005. 9, 21, 23, 42
[23] J.-S. Coron, “Resistance against differential power analysis for elliptic curve
cryptosystems,” in CHES 1999: Cryptographic Hardware and Embedded Sys-
tems, ser. LNCS 1717. Springer-Verlag, 1999, pp. 292–302. 5, 24, 90, 92, 141,
154, 157
[24] R. E. Crandall, “Method and apparatus for public key exchange in a crypto-
graphic system,” United States Patent 5,159,632, October 1992. 25
[27] ——, “Improved error-detection and fault-tolerance in ECSM using input ran-
domization,” CACR Technical Reports CACR 2006-41, University of Water-
loo, Tech. Rep., 2006, a revised version to appear in IEEE Transactions on
Dependable and Secure Computing. 41, 86, 136, 142
[29] N. Ebeid and M. A. Hasan, “On randomizing private keys to counteract DPA
attacks,” in SAC 2003: Selected Areas in Cryptography, ser. LNCS 3006.
Springer-Verlag, 2003, pp. 58–722. 92
[32] FIPS 186-2 Digital Signature Standard (DSS), Federal Information Processing
Standards Publication 186-2. National Institute for Standards and Technology,
2000. 13, 88
[33] FIPS 186 Digital Signature Standard (DSS), Federal Information Processing
Standards Publication 186. National Institute for Standards and Technology,
1994. 1, 31
[37] C. Giraud, “An RSA implementation resistant to fault attacks and to simple
power analysis,” IEEE Transactions on Computers, vol. 55, no. 9, pp. 1116–
1120, 2006. 45, 47, 136, 153
[38] S. W. Golomb and G. Gong, Signal Design for Good Correlation: For Wire-
less Communication, Cryptography, and Radar. Cambridge University Press,
2005. 12
[42] IEEE P1363, IEEE Standard Specifications for Public Key Cryptography.
IEEE, 2000. 1, 19, 20, 29, 36
[44] T. Itoh and S. Tsujii, “Effective recursive algorithm for computing multiplica-
tive inverses in GF (2m ),” Electronics Letters, vol. 24, no. 6, pp. 334–335, 1988.
114
[46] M. Joye and C. Tymen, “Protections against differential analysis for elliptic
curve cryptography — an algebraic approach,” in CHES 2001: Cryptographic
Hardware and Embedded Systems, ser. LNCS 2162. Springer-Verlag, 2001, pp.
377–390. 90
[47] M. Joye and S.-M. Yen, “The Montgomery powering ladder,” in CHES 2002:
Cryptographic Hardware and Embedded Systems, ser. LNCS 2523. Springer-
Verlag, 2002, pp. 291–302. 25, 45
[48] C. H. Kim and J.-J. Quisquater, “Fault attacks for CRT based RSA: New at-
tacks, new results, and new countermeasures,” in WISTP 2007: Information
Security Theory and Practices. Smart Cards, Mobile and Ubiquitous Comput-
180 BIBLIOGRAPHY
[55] F. Lima, L. Carro, and R. A. da Luz Reis, “Designing fault tolerant systems
into SRAM-based FPGAs,” in DAC. ACM, 2003, pp. 650–655. 105
[56] S. Lin and D. J. Costello, Error Control Coding. Prentice-Hall, 2004. 123
BIBLIOGRAPHY 181
[57] J. López and R. Dahab, “Improved algorithms for elliptic curve arithmetic
in GF (2n ),” in SAC 1998: Selected Areas in Cryptography, ser. LNCS 1556.
Springer-Verlag, 1998, pp. 201–212. 21
[58] ——, “Fast multiplication on elliptic curves over GF (2m ) without precompu-
tation,” in CHES 1999: Cryptographic Hardware and Embedded Systems, ser.
LNCS 1717. Springer-Verlag, 1999, pp. 316–327. 6, 25, 27, 29, 49, 84, 123,
124, 139, 154
[60] J. Lutz and M. A. Hasan, “High performance FPGA based elliptic curve cryp-
tographic co-processor,” in ITCC 2004: Information Technology Coding and
Computing, vol. 2. IEEE Computer Society, 2004, pp. 486–492. 181
[61] M. Maurer, A. Menezes, and E. Teske, “Analysis of the GHS Weil descent
attack on the ECDLP over characteristic two finite fields of composite degree,”
LMS Journal of Computation and Mathematics, vol. 5, pp. 127–174, 2002. 31
[62] R. J. McEliece, Finite Fields for Computer Scientists and Engineers. Kluwer
Academic Publishers, 1987. 9
[63] N. Meloni, “Fast and secure elliptic curve scalar multiplication over prime fields
using special addition chains,” Cryptology ePrint Archive, Report 2006/216,
2006. 93
182 BIBLIOGRAPHY
[68] P. L. Montgomery, “Speeding the Pollard and elliptic curve methods of factor-
ization,” Mathematics of Computation, vol. 48, pp. 243–264, 1987. 25
[69] K. Okeya and K. Sakurai, “Efficient elliptic curve cryptosystems from a scalar
multiplication algorithm with recovery of the y-coordinate on a Montgomery-
form elliptic curve,” in CHES 2001: Cryptographic Hardware and Embedded
Systems, ser. LNCS 2162. Springer-Verlag, 2001, pp. 126–141. 25
[72] S. Pohlig and M. Hellman, “An improved algorithm for computing logarithms
over GF (p) and its cryptographic significance,” IEEE Transactions on Infor-
mation Theory, vol. 24, pp. 106–110, 1978. 31
[73] J. M. Pollard, “Monte Carlo methods for index computation (mod p ),” Math-
ematics of Computation, vol. 32, pp. 918–924, 1978. 31, 68
[75] H.-G. Rück, “A note on elliptic curves over finite fields,” Mathematics of Com-
putation, vol. 49, no. 179, pp. 301–304, 1987. 19
[76] T. Satoh and K. Araki, “Fermat quotients and the polynomial time discrete
log algorithm for anomalous elliptic curves,” Commentarii Mathematici Uni-
versitatis Sancti Pauli, vol. 47, pp. 81–92, 1998. 31
[79] R. Schoof, “Elliptic curves over finite fields and the computation of square
roots mod p,” Mathematics of Computation, vol. 44, no. 170, pp. 483–494,
1985. 58
184 BIBLIOGRAPHY
[81] A. Shamir, “Method and apparatus for protecting public key schemes from
timing and fault attacks,” United States Patent 5,991,415, November 1999. 45
[85] U.S. Department of Defense, The Reliability Design Handbook. The Reliability
Analysis Center, Grifiss Air Force Base, New York, 1976. 100
[87] J. von Neumann, Probabilistic Logics and the Synthesis of Reliable Organisms
from Unreliable Components. Princeton University Press, 1956. 100
BIBLIOGRAPHY 185
[90] R. B. Wells, Applied Coding and Information Theory for Engineers. Prentice-
Hall, 1999. 123
[91] S.-M. Yen and M. Joye, “Checking before output may not be enough against
fault-based cryptanalysis,” IEEE Transactions on Computers, vol. 49, no. 9,
pp. 967–970, 2000. 4, 24, 39, 141, 157
[92] S.-M. Yen, S. Kim, S. Lim, and S. Moon, “A countermeasure against one phys-
ical cryptanalysis may benefit another attack,” in ICISC 2001: International
Conference Seoul on Information Security and Cryptology, ser. LNCS 2288.
Springer-Verlag, 2001, pp. 414–427. 4, 39, 141, 157
[93] ——, “RSA speedup with residue number system immune against hardware
fault cryptanalysis,” in ICISC 2001: International Conference on Information
Security and Cryptology, ser. LNCS 2288. Springer-Verlag, 2001, pp. 397–413.
185
[94] ——, “RSA speedup with residue number system immune against hardware
fault cryptanalysis,” IEEE Transactions on Computers, vol. 52, no. 4, pp.
461–472, 2003, an earlier version appears in [93]. 45, 122, 153
186 BIBLIOGRAPHY
The use of dummy instructions in the double-and-add-always method aims to mitigate certain side-channel attacks by making the number of operations constant during scalar multiplication. However, this technique introduces other vulnerabilities, such as susceptibility to simple power analysis (SPA) and differential fault analysis (DFA) attacks. By consistently altering registers during these dummy operations, adversaries may exploit potential discrepancies to compromise cryptographic systems, thus highlighting the trade-off between offsetting timing attacks and susceptibility to other forms of analysis .
A balanced execution path in the Montgomery ladder algorithm is significant for elliptic curve cryptography as it mitigates the risk of side-channel attacks, particularly those exploiting timing variations. By maintaining a uniform execution pattern regardless of the input data, the Montgomery ladder obscures potential leaks in timing or power consumption that could reveal sensitive information. This consistent operational behavior is crucial in protecting cryptographic implementations from attackers who might exploit observable discrepancies to deduce secret keys or operational states .
Point verification (PV) and coherency check (CC) techniques are integrated into ECSM algorithms to detect errors from natural faults or malicious faults injected by attackers. PV ensures that operations remain within the valid elliptic curve, preventing computation faults from yielding incorrect outputs. CC ensures internal consistency among variables during the ECSM operation, adding a second layer of fault detection that can verify the correctness of computations and mitigate attempts to bypass error detection through faults .
The concept of a Combined Curve involves using multiple curve parameterizations to obscure the mathematical structure, thus complicating an attacker's ability to reliably exploit elliptic curve vulnerabilities. This technique acts as a countermeasure by diversifying the operations across different potential curve equations, decreasing the chance that a fault can result in a single, predictable outcome. Thus, it increases the robustness of cryptographic implementations against fault-based attacks, especially those targeting specific curve assumptions .
Affine and projective coordinate systems are methods of representing points on an elliptic curve. Affine coordinates use two values (x, y) which are straightforward but can be computationally expensive during operations due to the required inversion calculations. Projective coordinates avoid division, instead using a third coordinate (X:Y:Z) to simplify operations on the curve. As a result, projective coordinates can lead to higher implementation overhead due to increased calculation requirements but enhance cryptographic efficiency as inversions are avoided. Consequently, choosing between these systems involves balancing overhead against computational speed and security needs .
Fault-based attack strategies can involve injecting a flip-fault into the x-coordinate of input points in an elliptic cryptographic system. This attack allows adversaries to potentially manipulate computations so that they incorrectly operate in a weaker elliptic curve group. If executed successfully, attackers can retrieve secret scalar values with some probability by inducing incorrect results, which raises significant security concerns for systems relying on elliptic curve cryptography, as it undermines the algorithm's reliability and correctness .
The Montgomery ladder algorithm is advantageous for ECSM in cryptographic applications due to its balanced timing and resistance to certain side-channel attacks, such as timing and simple power analysis. Its uniform execution path makes it harder for attackers to infer key information through side-channel measures. Furthermore, it supports fault tolerance by ensuring that operations can be carried out consistently in either of the elliptic curve groups, E(F2m) or bE(F2m), without relying on specific curve parameters susceptible to faults .
The success probability of fault-based attacks in elliptic curve cryptosystems is influenced by the prime factorization of # bE(F2m). Groups with multiple small prime factors allow adversaries to focus attacks on smaller prime order subgroups, maximizing the effectiveness of exhaustive search algorithms like the Silver-Pohlig-Hellman and achieving higher success probabilities. With fewer large prime factors, the subgroup size aligns more strongly with predictable cyclic orders, where adversaries can exploit easier computation of discrete logarithms .
Input randomization enhances the robustness of elliptic curve scalar multiplication by minimizing the predictability of input data and, thus, obstructing systematic fault induction attacks. By varying input variables used in cryptographic operations, attackers cannot easily manipulate inputs to achieve desired faulty outputs. This randomization disrupts the reliability of exploiting deterministic operations and forces attackers to address a more complex, less predictable problem space, thereby enhancing the cryptographic system's fault tolerance .
The Silver-Pohlig-Hellman algorithm is utilized to compute the elliptic curve discrete logarithm in cyclic groups derived from weaker elliptic curves, such as those defined over F2m with parameters that make them vulnerable. In these situations, if an attacker has sufficient computational resources, they can exploit the reduced security to extract secret keys by focusing on these weaker curve parameterizations .