1
The Language’s Impact on the Enigma Machine
Daniel Matyas Perendi and Prosanta Gope, IEEE, Senior Member
Abstract—The infamous Enigma machine was believed to be A. General history of the Enigma
unbreakable before 1932 simply because of its variable settings
and incredible complexity. However, people realised that there is a
known pattern in the German messages, which then significantly
reduced the number of possible settings and made the code
breaker’s job easier. Modern cryptanalysis techniques provide
a lot more powerful way to break the Enigma cipher using letter
frequencies and a concept called index of coincidence. In turn,
this technique only works well for the English language(using the
characters of the English alphabet), but what if we encountered
an Enigma machine designed for the Hungarian language, where
the alphabet consists of more than 26 characters? Experiments
on the Enigma cipher with different languages have not been
done to date, hence in this article we show the language’s impact
on both the machine and the cipher. Not only the Hungarian,
but in fact, any language using more characters than the English
language could have a significant effect on the Enigma machine
and its complexity if there existed one. By a broad comparative
analysis, it is proven that the size of the alphabet has a significant
impact on the complexity and therefore the cryptanalysis.
Index Terms—Hungarian; Enigma; cryptanalysis; cipher;
complexity
Fig. 1: The Enigma machine (Rijmenants 2004)
I. I NTRODUCTION
As Figure 1 shows, the Enigma machine which is looked
very much like an old typewriter with some extra elements.
T HIS investigation revolves around the famous Enigma
machine, which was used by the German military to
encrypt and protect commercial, diplomatic and military com-
This amazing piece of engineering was mainly used in World
War II by the German forces to securely transmit sensitive
information across the battlefield. The first cipher machine,
munication during World War II. Especially during the war, the Enigma A appeared in 1923. Its successor, Enigma B was
need for secrecy was larger than ever, hence the Enigma was a introduced soon after Enigma A, however these machines
lifeline for the German army as it provided highly complicated were extremely heavy (around 50kg) and quite big in size,
and secure encryption. This machine contained a series of so they weren’t suitable for military usage where portability
interchangeable rotors, which rotated every time a key was was key. A few years later Enigma C was equipped with the
pressed to keep the cipher changing continuously. This was reflector and the lampboard replaced the “printer” part. This
combined with a plugboard on the front of the machine, where solution was a lot more compact and hence more suitable for
pairs of letters were transposed; these two systems combined military use. Enigma D was introduced in 1927 and appeared
offered approximately 107,458,687,327,250,619,360,000 (107 in several different versions with different rotor configurations
sextillion) possible settings to choose from, which back then across Europe. This machine had three rotors which could
seemed unbreakable. This “unbreakable” machine was broken be set in one of the 26 positions(one for each character of the
by the gigantic effort of the British in Bletchley Park with Alan alphabet). In 1932, the Wehrmacht(aka. Enigma I) replaced the
Turing’s involvement in January 1940. The Enigma Machine commercial Enigma D and extended that with the plugboard,
relied on one default alphabet, which raises the question: What which was attached to the front of the machine. The plugboard
if this machine existed for different alphabets of different created a huge number of extra setting possibilities, hence this
languages? The aim of this project is to experiment with component became the target of cryptographic attacks. This
different languages and alphabets to observe the effect they version was introduced on a larger scale in the Heer(Army)
have on the machine and the cipher. and the Luftwaffe(Air Force). The German Navy also involved
this machine in their communication protocols, however they
extended the set of rotors to 8 and they named it “M3”.
Contact: Daniel Matyas Perendi, Email: perendi.matyas98@[Link] Although they thought this machine was unbreakable, an
2
admiral called Karl Dönitz insisted on adding an extra rotor of the Enigma machine, which made the code-breakers’
for greater security(Figure 2). This version was named M4. job a lot harder. Bletchley Park was divided into smaller
(Rijmenants 2004) teams(Huts), where each team focused on different areas of
the deciphering procedure. Hut 6 carried out the breaking
of the three-wheel Enigma, Hut 8 dealt with the naval four-
rotor Enigma, Hut 3 and Hut 4 respectively were in charge
of producing and transmitting valuable intelligence -sourced
from the deciphered messages produced by Hut 6 and 8-
to the competent authorities. Various great minds contributed
to Bletchley Park’s success: Alan Turing, Hugh Alexander,
Gordon Welchman, Dilly Knox and many others. Different
machines required different code-breaking techniques(Section
II-C): Bombe machines, and the use of cribs and menus.
C. The Bombe
Fig. 2: The Enigma M4 with open cover (Rijmenants 2004)
II. R ELATED WORKS
This section introduces some of the most important Enigma-
breaking approaches that have significantly evolved over the
years. We will go all the way back to the very first successful
attempt, which will be followed by the famous Bletchley Park
effort and finally wrapped up by the most recent and modern
technique.
A. Polish mathematicians
According to Gaj & Orłowski (2003) and Rijmenants
(2004), the very first successful effort was made by a group
Fig. 3: The Bombe machine (Gladwin 1997, p. 211)
of Polish mathematicians (Marian Rejewski, Jerzy Róycki,
Henryk Zygalski), which focused on the beginnings of the
The previously mentioned Polish Bomba machine lost its
German messages, because back in 1932 they all started
usefulness due to German procedural changes (1939-1940),
with a 6-letter sequence, which essentially contained the key
which prompted Alan Turing to design his version of this
to the system. This 6-letter sequence was constructed from
great code-breaking machine(Copeland 2020). Alan Turing
two successive encryption of three letters. These letters were
started working on his machine in 1939 with more or less
unique for each message, however they were all encrypted with
success -due to the complexity of the Enigma machine- and
the actual daily settings of the Enigma machine. This led to
finished it in 1940 with an important change -the diagonal
significant information leakage, as the letters in position 1 and
board- proposed by Gordon Welchman. The ground principle
position 4, 2-5, 3-6 were the same before the encryption took
of this electro-mechanical instrument is to discover the daily
place. Using this information as a basis, they could recover
Enigma key by testing all the possible settings, however the
the missing permutations (Borowska & Rzeszutko (2014)).
time required to exhaust all possibilities at first exceeded the
This group also invented an electro-mechanical machine (the
24 hours at disposal. In order to break the code within the 24-
Bomba), which sped up the breaking process. This method
hour window, some changes had to be done. Cribs and menus
only worked until 1939, when the cipher design changed and
combined with the Bombe resulted in a short enough running
the 6-letter sequence was eliminated.
time of the searching method. Cribs are known pieces of a
message, which were repeatedly used by the Germans during
B. Bletchley Park the war such as ”Wettervorhersage” meaning weather forecast
Codebreakers : the inside story of Bletchley Park presents and ”Keine besonderen Ereignisse” meaning nothing special to
the inside story of Bletchley Park and its importance in report. We know that no letter can be encrypted to themselves,
winning World War II. Numerous different signals(German, so one could align the crib with the ciphertext the way it is
Japanese and Italian) were successfully intercepted and broken shown on Figure 4.
here, which provided an enormous amount of help to the After a valid alignment, a menu could be created from the
Allied commanders on different fronts of the battlefield. Many letter connections (Figure 5), which then was plugged into the
of these messages were encrypted with different “versions” back of the Bombe machine as an electric circuit.
3
Fig. 4: A crib
Fig. 5: A menu (Gladwin 1997, p. 211)
Fig. 6: The drums (CryptoMuseum 2009)
D. Modern approaches
The most recent effort(Ostwald & Weierud (2017)) makes
use of Friedman’s idea of index of coincidence, which in sim-
ple terms is a measure of letter distributions in the candidate
text Friedman (1922). Index of coincidence (also referred to as
IoC or IC) calculates the probability of selecting two matching
letters at random from a given text. This is useful, because
letter distributions in natural languages are not even, hence
the basic idea is to match the decryption attempt’s IC to the
language’s IC. The index of coincidence can be calculated by
The Bombe can be thought of as 36 interconnected Enigma using the following equation(Index of Coincidence (n.d.)):
machines, where one drum represented one rotor and every n
drum rotated synchronously through all the 263 possibilities 1 X
IC = Fi (Fi − 1) (1)
(Figure 6). The front of the machine was responsible for N (N − 1) i=1
working through all the 17,576 different rotor positions and
stopping upon finding the correct settings (rotor order, rotor where
positions and plugboard connections) (Carter 2010). This N is the length of the text
“Stop” was the moment of relief, because the code-breakers n is the length of the alphabet
knew that at that moment they found the daily key for the Fi is the occurrences of the ith letter of the alphabet
Enigma machine, so they could decipher all of that day’s If N is large enough i.e. N approaches infinity, we can calculate
intercepted messages. the expected IC for the language itself:
4
V. A NALYSIS AND RESULT
n
A. The Hungarian language
X
IClanguage ≈ p2i (2)
i=1
where pi = FNi . Using IC calculations as “validation” methods
with hill climbing(Hillclimbing the Enigma Machine from Sul-
livan & Weierud (2006)) simplifies the brute-force technique Fig. 7: The Hungarian 44-letter alphabet
significantly. Note that for a new wire, the IC is calculated
for every possible rotor setting! This method works fine for The Hungarian language uses a very unique alphabet con-
the first few correct wires, but unfortunately, it fails to find sisting of 44 letters (Figure 7), furthermore -as any other
the rest as it is described in “Modern breaking of Enigma language- it also has its unique characteristics(special letter
ciphertexts”(Ostwald & Weierud 2017, p. 403-409). Bigram connections, words). Using this information it is possible to
and trigram scores have been proven to be useful in finding construct an Enigma machine using 44-letter rotors, an ex-
the remaining wires. As amazing as it sounds, this method is tended plugboard and a fixed 22-pair reflector. The complexity
not entirely robust either as its efficiency depends on the length of this machine(assuming it uses 3 out of 5 rotors) can now
of the text(we use this number to calculate the IC score), the be calculated the following way:
shorter the text is, the more difficult/incorrect this approach 3 2
• Rotors: (5 × 4 × 3) × 44 × 44 = 9, 894, 973, 440
is. • Plugboard using 18 wires:
39,282,388,067,747,317,859,706,965,625 (Figure 8)
III. M OTIVATION • Total: 388,698,186,590,132,630,853,038,071,042,368,000,000
As discussed in the Section II-D, for breaking messages
of the 26-letter general English alphabet there exist some
very efficient modern code-breaking techniques, but how these
techniques are affected if an Enigma machine is suited for
another language -using a larger letter set- existed? Such
experiments haven’t been done to date, therefore it is the
perfect opportunity to investigate the impact of a different
alphabet on the complexity and the structure of the Enigma.
The authors have a strong Hungarian background, hence it is
worth starting the analysis with the Hungarian language, which
could give a strong starting point for further -more general-
analysis. The main objective is to observe the alphabet’s
influence on the Enigma cipher. All the techniques that have
been used previously in the breaking process heavily rely
on the complexity of this brilliant machine, hence observing
the patterns in the increasing complexity would prove the
cipher’s greater security. In order to start this process, we
have to go back to the fundamental complexity calculations
to convert them into a parametric format, which will apply to
any language and its alphabet.
IV. E VALUATION PROCESS
The above-mentioned variants of the Enigma machine have
been broken by either mathematicians or cryptanalysts thanks
to their great efforts. However, all of their solutions are built
upon the ground complexity of the original machine. Nowa- Fig. 8: Possible plugboard setting combinations for the 44-
days people use numerous languages all over the world, which letter alphabet
fuelled the idea of tailoring the Enigma machine to different
languages. The base case of this experiment is the Wehrmacht The calculations suggest that this machine is approximately
machine, which will be used as the basis of comparisons. 3,617,187,183,818,893(3.6 quadrillion)-times more complex
It is suspected that the size of the letter set(alphabet) is than the Wehrmacht. The difference is colossal! There is
directly proportional to the complexity, therefore this is the however a major issue with using this alphabet the same way
main hypothesis that requires further evidence. The Hungarian we would use the 26-letter English alphabet: This letter set
language will be tested first, which will be followed by a contains double as well as triple-character letters (CS, DZ,
general solution that can be applied to any languages that use DZS, GY, LY, NY, SZ, TY, ZS). This is a huge problem when
a different alphabet. it comes to the decryption process.
5
B. The problem
During the encryption/decryption process the input text is
read character by character, so it is a possibility that a single
letter encrypts to a triple letter ( e.g. A → DZS), which not
only causes a difference in the output length, but also affects
at the decryption process (e.g. DZS → ?): There is no way we
can tell whether these letters follow each other by coincidence,
or they are meant to form this triple-character letter in this
specific order. In the conventional process “D” will be pressed
first on the machine, followed by “Z” and finally “S”, which
in the simplest case will produce an output of length 3 instead
of the expected “A”. This issue is demonstrated on Figure 9.
Fig. 11: Possible plugboard setting combinations for the 35-
letter alphabet
The following table summarises the key differences between
the English and Hungarian languages:
Fig. 9: The double and triple-character problem D. Deriving a general formula
There are two main factors in the matter of the Enigma
machine’s complexity calculation: The number of possible
rotor settings and the full range of combinations that the
C. The solution plugboard yields. The aim is to derive a parametric formula,
which can show the effect of the parameter(number of extra
letters in the alphabet) on the newly “constructed” Enigma
machine’s complexity. Before we dig into the calculations, it is
important to lay down two ground rules to ensure the cipher’s
and the machine’s correct mechanisms:
Fig. 10: The Hungarian 35-letter alphabet • The alphabet can only contain single characters
• The number of wires used for the plugboard is at most
half of the alphabet’s size
The Hungarian alphabet contains 9 letters built up from
1) The Rotors: For the purpose of this experiment a 3-rotor
either two or three characters. All of these characters can be
Enigma is considered, where 3 rotors are chosen from a total
constructed from the single characters of the alphabet, hence
set of 5 rotors. The selection process yields 5 × 4 × 3 = 60
we are allowed use the alphabet without the non-singular
combinations, which will remain constant in our formula. The
letters, which will result in an alphabet of size 35(Figure
variable part includes the rotor positions and the notch(ring)
10). This change also has a great effect on the complexity
settings; Both of these depend on the size of the alphabet.
of the machine and consequently on the cryptanalysis as well.
In the original machine’s case these equal to 263 and 262
Redoing the calculations with the adoption of the new table of
respectively, so merging these two terms will result in 265 . As
possible plugboard combinations and the change of the rotors
these depend on the alphabet, we will add the parameter into
will result in a machine that is 42,094,345.5(42 million)-times
the equation: (26 + x)3 × (26 + x)2 , where “x” is the number
more complex than the original version. This number is still
of extra letters compared to the English alphabet. Following
insanely huge, however it is nowhere near 3.6 quadrillion.
mathematical transformations the final result is:
Judging from these calculations a pattern seems to emerge:
The larger the alphabet is, the more complex the Enigma 265 + [x5 + 130x4 + 6760x3 + (10 × 263 )x2 + (5 × 264 )x]
machine gets, thus the longer it takes to break the code with
either old or modern techniques. This matter is the subject The expression in square brackets calculates the number of
of further investigation, which is conducted in the following added possibilities. For example, 1 extra letter will add 1 +
section. 130+6760+(10×263 )+(5×264 ) = 2,467,531 rotor settings.
6
Fig. 12: Summative comparison of the two languages
2) The Plugboard: The plugboard provides an incredible E. The Final Formula
amount of setting possibilities, therefore it is important to In order to calculate the total difference in terms of
examine the equation and derive a formula where the number complexity, the added rotor complexity and the plugboard
of wires and the number of the extra letters in the alphabet multiplier has to be plugged into the general formula (Rotor
are the parameters. As previously mentioned, it is essential combinations(60) x Rotor settings x Plugboard settings):
to keep the number of wires either at or below half of
the alphabet’s size, because the wires connect two letter- 60×(265 +[x5 +130x4 +6760x3 +(10×263 )x2 +(5×264 )x])×
slots on the plugboard, hence the maximum number of wires
(26 + x)! × (26 − 2n)! 26!
the Enigma machine can handle is balphabet size/2c. The [ ]×
Wehrmacht’s plugboard settings are calculated the following 26! × (26 + x − 2n)! n! × (26 − 2n)! × 2n
way in relation to the number of wires(“n”): Although this formula looks hectic and confusing, it is
possible to make it look more pleasant. The previously derived
26! formulas can be represented as symbols: the additional rotor
n! × (26 − 2n)! × 2n settings are denoted as “RA” and the plugboard multiplier is
indicated by “PM”:
The next step is to introduce the second parameter: The
number of extra letters in the alphabet(“x”):
26!
60 × (265 + RA) × ( × PM)
(26 + x)! n! × (26 − 2n)! × 2n
n! × (26 + x − 2n)! × 2n
It can be rearranged further in such a way that the original
This formula is lot more complicated than the one for calcu- Enigma machine’s formula is multiplied by a number:
lating the added rotor complexity, therefore we will consider
the numerator and the denominator separately. The numerator 26! P M × RA
can be broken down into a multiplication: 60 × 265 × × [P M + ]
n! × (26 − 2n)! × 2n 265
(26 + x)! P M × RA
26! × = Original formula × [P M + ]
26! 265
and similarly the denominator: Taking everything into account, the expression in square
brackets is what determines “how many times more settings
(26 + x − 2n)! the new machine has”, or in other words, how many times
n! × (26 − 2n)! × × 2n more complicated the second machine is in relation to the
(26 − 2n)!
number of wires used and the number of extra letters in the
These two expressions are very similar to the original formula, alphabet. As an example, an Enigma machine designed for one
thereby with some rearrangements we get the following result: extra letter in the alphabet while using the default 10 wires
has ∼4.66 times more setting possibilities than the Wehrmacht
(26+x)!
26! 26! Enigma. However, this formula only calculates this multiplier
×
(26+x−2n)! n! × (26 − 2n)! × 2n accurately for “n” values between 0 and 13 as the original
(26−2n)!
formula would produce a negative result for any larger “n”
values. Although for larger “n”, we can use the following
(26 + x)! × (26 − 2n)! 26! formula to calculate the new number of possible settings,
=[ ]×
26! × (26 + x − 2n)! n! × (26 − 2n)! × 2n though can’t compare it to the Wehrmacht Enigma for the
previously mentioned reason:
where the expression within square brackets is the plugboard’s
complexity multiplier for a given number of wires(“n”) and a (26 + x)!
given number of extra letters(“x”). 60 × (26 + x)5 ×
n! × (26 + x − 2n)! × 2n
7
VI. C ONCLUSION
Based on the Wehrmacht Enigma machine’s complexity
calculations, a general formula had been derived for the
purpose of proving the alphabet’s influence on both the
machine and the cipher. Since both known-plaintext attacks
and ciphertext-only attacks depend on the entire key-space,
the alphabet affects the machine’s cryptanalysis likewise. It
has also been proven that only a single extra letter in-
creases the number of rotor combinations by 20 percent,
the plugboard combinations by around 3.85 times and the
total machine complexity by approximately 4.66 times. This
difference expressed in terms of numbers yields a new total
of 500,757,482,821,242,167,040,000 possible settings.
R EFERENCES
Borowska, A. & Rzeszutko, E. (2014), ‘The cryptanalysis of
the enigma cipher. the plugboard and the cryptologic bomb.’,
Computer Science 15.
Carter, F. (2010), ‘The turing bombe’, The Rutherford Journal
3.
Copeland, B. J. (2020), ‘Alan turing’, Encyclopedia Britannica
.
CryptoMuseum (2009), ‘Crypto and cipher machines’, https:
//[Link]/crypto/[Link] (last accessed:
30.06.2021).
Friedman, W. (1922), ‘The index of coincidence and its
applications in cryptanalysis’.
Gaj, K. & Orłowski, A. (2003), Facts and myths of enigma:
Breaking stereotypes, Vol. 2656, pp. 106–122.
Gladwin, L. A. (1997), ‘Alan turing, enigma, and the breaking
of german machine ciphers in world war ii’, Prologue .
Index of Coincidence (n.d.), [Link]
NSF-4/Tutorial/VIG/[Link] (last accessed:
25.04.2021).
Ostwald, O. & Weierud, F. (2017), ‘Modern breaking of
enigma ciphertexts’, Cryptologia 41(5), 395–421.
Rijmenants, D. (2004), ‘Cipher machines and cryptology’,
[Link]
(last accessed: 30.06.2021).
Sullivan, G. & Weierud, F. (2006), ‘Hillclimbing the enigma
machine’.