0% found this document useful (0 votes)
7 views36 pages

Finite Automata in Natural Language Processing

The document discusses the theory of finite automata and its applications in natural language processing, including tasks like speech recognition and morphological parsing. It explains the role of finite-state automata in defining regular languages and highlights the construction of finite-state lexicons for effective word processing. Additionally, it covers finite-state transducers and their functions in recognizing, generating, and translating linguistic data.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views36 pages

Finite Automata in Natural Language Processing

The document discusses the theory of finite automata and its applications in natural language processing, including tasks like speech recognition and morphological parsing. It explains the role of finite-state automata in defining regular languages and highlights the construction of finite-state lexicons for effective word processing. Additionally, it covers finite-state transducers and their functions in recognizing, generating, and translating linguistic data.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

18 Finite Automata

 The theory of automata provides efficient and convenient tools for the
representation of linguistic phenomena.
 The theory of automata plays a significant role in providing solutions of
many problems in natural language processing. For example ,speech
recognition, spelling correction, information retrieval etc.
 Finite state methods are useful in processing natural language as the
modeling of information using rules has many advantages for language
modeling.
 Finite state automaton has a mathematical model which allows data to be
represented in a compacted form using finite state automaton.
19 Formal languages
20 Automata and Languages
 An automaton is an abstract model of a computer
which reads an input string, and changes its internal
state depending on the current input symbol.
 It can either accept or reject the input string.
 Every automaton defines a language (the set of
strings it accepts).
 Different automata define different language classes:
 Finite-state automata define regular languages
 Pushdown automata define context-free languages
 Turing machines define recursively enumerable
languages
21
22 Finite State Automata (FSAs)
23 State Transition Diagram
24

Finite-State Methods
for Morphology
25 Finite-State Morphological
Parsing
Finite State Automata for
26
Morphology
27 Union: Merging Automata
Stem Changes
28
Finite Automata: Language
29
Recognizer

Fig : NFA for the words Boy and Bat


30
31 FSAs for derivational morphology
32 Recognition vs. Analysis
FSAs can recognize (accept) a string, but they
don’t provide its internal structure.
Transducer :is a machine that maps
(transduces) the input string into an output
string that encodes its structure:
33
Applications
FSTs are used for a variety of different applications:
 Word Inflections.
For example, pluralizing words (cat -> cats, dog ->
dogs, goose -> geese, etc.).
 Morphological Parsing
i.e., extracting the “properties” of a word
(e.g., computers -> computer + [Noun] + [Plural])
 Simple Word Translation,
e.g., translating US English to UK English.
 Simple commands made to a computer
34
35
36 Finite-State Morphological
Parsing

 We need at least the following to build a


morphological parser:
1. Lexicon: the list of stems and affixes, together with basic
information about them (Noun stem or Verb stem, etc.)
2. Morphotactics: the model of morpheme ordering that explains which
classes of morphemes can follow other classes of morphemes inside a
word. E.g., the rule that English plural morpheme follows the noun rather
than preceding it.
3. Orthographic rules: these spelling rules are used to model the
changes that occur in a word, usually when two morphemes combine
(e.g., the y→ie spelling rule changes city + -s to cities).
37
Construction of a Finite state Lexicon

A lexicon is a repository for words.


 The simplest one would consist of an explicit list of every word of the language.
 Incovenient or impossible!
 Computational lexicons are usually structured with
 a list of each of the stems
 Affixes of the language together with a representation of morphotactics telling us how they can fit
together.
 The most common way of modeling morphotactics is the finite-state automaton

Reg-noun Irreg-pl-noun Irreg-sg-noun plural


fox geese goose -s
fat sheep sheep
fog Mice mouse
fardvark
38 Construction of a Finite state Lexicon

Reg-verb-stem Irreg-verb-stem Irreg-past-verb past Past-part Pres-part 3sg


walk cut caught -ed -ed -ing -s
fry speak ate
talk sing eaten
impeach sang
spoken
39 Construction of a Finite state Lexicon
40 Construction of a Finite state Lexicon
41 Finite-State Morphological Parsing

An FSA for another fragment of English derivational morphology


42 Finite-State Morphological Parsing
43 Finite - State Transducers
44 Finite State Transducers(FST)
It is like an FSA but defines regular
relations, not regular languages
It has two alphabet sets
It has a transition function relating
input to states
It has an output function relating state
and input to output
It can be used to recognize, generate,
translate or relate sets.
45 Finite - State Transducers
46 Morphological parsing with FST
 The objective of the morphological parsing is to produce output lexicons
for a single input lexicon
47
Morphological parsing with FST

 Lexical: Representing a simple concatenation of morphemes making


up a word
 Surface: Representing the actual spelling of the final word.
Morphological Parsing
48
with FST
 The FST is a multi-function device, and can be viewed in
the following ways:
 Translator: It reads one string on one tape and outputs
another string,
 Recognizer: It takes a pair of strings as two tapes and
accepts/rejects based on their matching.
 Generator: It outputs a pair of strings on two tapes along
with yes/no result based on whether they are matching
or not.
 Relater: It compares the relation between two sets of
strings available on two tapes.
Morphological parsing with
49
FST

The composition is useful to convert a FST


as parser to FST as a generator.
Morphological Parsing with
50
FST
51
52
53

You might also like