18 Finite Automata
The theory of automata provides efficient and convenient tools for the
representation of linguistic phenomena.
The theory of automata plays a significant role in providing solutions of
many problems in natural language processing. For example ,speech
recognition, spelling correction, information retrieval etc.
Finite state methods are useful in processing natural language as the
modeling of information using rules has many advantages for language
modeling.
Finite state automaton has a mathematical model which allows data to be
represented in a compacted form using finite state automaton.
19 Formal languages
20 Automata and Languages
An automaton is an abstract model of a computer
which reads an input string, and changes its internal
state depending on the current input symbol.
It can either accept or reject the input string.
Every automaton defines a language (the set of
strings it accepts).
Different automata define different language classes:
Finite-state automata define regular languages
Pushdown automata define context-free languages
Turing machines define recursively enumerable
languages
21
22 Finite State Automata (FSAs)
23 State Transition Diagram
24
Finite-State Methods
for Morphology
25 Finite-State Morphological
Parsing
Finite State Automata for
26
Morphology
27 Union: Merging Automata
Stem Changes
28
Finite Automata: Language
29
Recognizer
Fig : NFA for the words Boy and Bat
30
31 FSAs for derivational morphology
32 Recognition vs. Analysis
FSAs can recognize (accept) a string, but they
don’t provide its internal structure.
Transducer :is a machine that maps
(transduces) the input string into an output
string that encodes its structure:
33
Applications
FSTs are used for a variety of different applications:
Word Inflections.
For example, pluralizing words (cat -> cats, dog ->
dogs, goose -> geese, etc.).
Morphological Parsing
i.e., extracting the “properties” of a word
(e.g., computers -> computer + [Noun] + [Plural])
Simple Word Translation,
e.g., translating US English to UK English.
Simple commands made to a computer
34
35
36 Finite-State Morphological
Parsing
We need at least the following to build a
morphological parser:
1. Lexicon: the list of stems and affixes, together with basic
information about them (Noun stem or Verb stem, etc.)
2. Morphotactics: the model of morpheme ordering that explains which
classes of morphemes can follow other classes of morphemes inside a
word. E.g., the rule that English plural morpheme follows the noun rather
than preceding it.
3. Orthographic rules: these spelling rules are used to model the
changes that occur in a word, usually when two morphemes combine
(e.g., the y→ie spelling rule changes city + -s to cities).
37
Construction of a Finite state Lexicon
A lexicon is a repository for words.
The simplest one would consist of an explicit list of every word of the language.
Incovenient or impossible!
Computational lexicons are usually structured with
a list of each of the stems
Affixes of the language together with a representation of morphotactics telling us how they can fit
together.
The most common way of modeling morphotactics is the finite-state automaton
Reg-noun Irreg-pl-noun Irreg-sg-noun plural
fox geese goose -s
fat sheep sheep
fog Mice mouse
fardvark
38 Construction of a Finite state Lexicon
Reg-verb-stem Irreg-verb-stem Irreg-past-verb past Past-part Pres-part 3sg
walk cut caught -ed -ed -ing -s
fry speak ate
talk sing eaten
impeach sang
spoken
39 Construction of a Finite state Lexicon
40 Construction of a Finite state Lexicon
41 Finite-State Morphological Parsing
An FSA for another fragment of English derivational morphology
42 Finite-State Morphological Parsing
43 Finite - State Transducers
44 Finite State Transducers(FST)
It is like an FSA but defines regular
relations, not regular languages
It has two alphabet sets
It has a transition function relating
input to states
It has an output function relating state
and input to output
It can be used to recognize, generate,
translate or relate sets.
45 Finite - State Transducers
46 Morphological parsing with FST
The objective of the morphological parsing is to produce output lexicons
for a single input lexicon
47
Morphological parsing with FST
Lexical: Representing a simple concatenation of morphemes making
up a word
Surface: Representing the actual spelling of the final word.
Morphological Parsing
48
with FST
The FST is a multi-function device, and can be viewed in
the following ways:
Translator: It reads one string on one tape and outputs
another string,
Recognizer: It takes a pair of strings as two tapes and
accepts/rejects based on their matching.
Generator: It outputs a pair of strings on two tapes along
with yes/no result based on whether they are matching
or not.
Relater: It compares the relation between two sets of
strings available on two tapes.
Morphological parsing with
49
FST
The composition is useful to convert a FST
as parser to FST as a generator.
Morphological Parsing with
50
FST
51
52
53