0% found this document useful (0 votes)

7 views50 pages

NLP Slides Module2

Chapter 3 discusses morphology, the study of word formation from morphemes, and finite-state transducers (FSTs) used for morphological parsing. It outlines different types of morphemes, including stems and affixes, and explains inflection and derivation in English morphology. The chapter also details the structure of lexicons, morphotactics, and the role of FSTs in recognizing and generating morphological forms.

Uploaded by

rabbies777

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views50 pages

NLP Slides Module2

Uploaded by

rabbies777

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Chapter 3.

Morphology and
Finite-State Transducers
From: Chapter 3 of An Introduction to Natural Language
Processing, Computational Linguistics, and Speech
Recognition, by Daniel Jurafsky and James H. Martin
Background
• Morphology — knowledge of the meaningful components of words

• The problem of recognizing that foxes breaks down into the two
morphemes fox and -es is called morphological parsing.

• The technique of retrieval of stem or root word by removing prefixes

and suffixes: stemming

Morphology and FSTs 2

3.1 Survey of (Mostly) English Morphology

• Morphology is the study of the way words are built up from smaller
meaning-bearing units, morphemes.
• Two broad classes of morphemes:
– The stems: the “main” morpheme of the word, supplying the main
meaning, while
– The affixes: add “additional” meaning of various kinds.
• Affixes are further divided into prefixes, suffixes, infixes, and
circumfixes.
– Suffix: eat-s
– Prefix: un-buckle
– Circumfix: ge-sag-t (said) sagen (to say) (in German)
– Infix: hingi (borrow) humingi (the agent of an action) )in Philippine
language Tagalog)

Morphology and FSTs 3

3.1 Survey of (Mostly) English Morphology

• Circumfixes: Circumfixes are affixes that attach to both the beginning

and the end of a base word to create a new word. While true
circumfixes are rare in English, some linguistic scholars argue that
certain combinations of prefixes and suffixes can function similarly.
For example:
– "Un-" + "-ed": In words like "unraveled" or "untangled,".
• Infixes: Infixes are affixes that are inserted into the middle of a base
word to create a new word.

Morphology and FSTs 4

3.1 Survey of (Mostly) English Morphology

• Prefixes and suffixes are often called concatenative morphology.

• A number of languages have extensive non-concatenative
morphology
– Ablaut: A process involving vowel alternation within a root to indicate
grammatical information or create related words.
• Example: "sing" (base form), "sang" (past tense), "sung" (past participle).
– Reduplication: The repetition of all or part of a word to indicate plurality,
intensification, or other grammatical features.
• Example: Tagalog "lakad" (walk) → "lakad-lakad" (stroll).
– Suppletion: The use of entirely different morphemes to express related
meanings.
• Example: English "to be" → "am," "is," "are," "was," "were."

Morphology and FSTs 5

3.1 Survey of (Mostly) English Morphology

• Two broad classes of ways to form words from morphemes:

– Inflection: Inflection involves adding affixes to a word to indicate
grammatical information such as tense, number, case, gender, or mood.

– Derivation: Derivation involves adding affixes to a word to create a new

word with a different meaning or part of speech.

– Inflectional morphemes typically do not change the part of speech or

meaning of the word, while derivational morphemes often do.

– Example: In English, "-er" can be inflectional, as in "bigger"

(comparative), or derivational, as in "teacher" (noun derived from
"teach").

Morphology and FSTs 6

3.1 Survey of (Mostly) English Morphology
Inflectional Morphology
• In English, only nouns, verbs, and sometimes adjectives can be
inflected, and the number of affixes is quite small.
• Inflections of nouns in English:
– An affix marking plural,
• Suffix "-s": The most common plural marker in
English, added to the end of most nouns.
– "cat" → "cats", "dog" → "dogs", "book" → "books"
• Suffix "-es": Used for nouns ending in sibilant
sounds (s, sh, ch, x, z).
– "bus" → "buses", "box" → "boxes", "watch" → "watches"
• Suffix "-en": Very rarely used
– "child" → "children", "ox" → "oxen", "brother" →
"brethren“
– An affix marking possessive
• llama’s, children’s, llamas’, Euripides’ comedies
Morphology and FSTs 7
3.1 Survey of (Mostly) English Morphology
Inflectional Morphology
• Verbal inflection is more complicated than
nominal inflection.
– English has three kinds of verbs:
• Main verbs: Express the main action or state in a sentence,
– eat, sleep, impeach
• Modal verbs: Auxiliary verbs that express necessity,
possibility, permission, or ability
– can will, should
• Primary verbs: Act as both main and auxiliary verbs (be,
have, do).
– be, have, do

Morphology and FSTs 8

3.1 Survey of (Mostly) English Morphology
Inflectional Morphology
– Morphological forms of regular verbs

stem walk merge try map

-s form walks merges tries maps
-ing principle walking merging trying mapping
Past form or –ed participle walked merged tried mapped

– These regular verbs and forms are significant in the morphology of

English because of their majority and being productive.

Morphology and FSTs 9

3.1 Survey of (Mostly) English Morphology
Inflectional Morphology

– Morphological forms of irregular verbs

stem eat catch cut

-s form eats catches cuts
-ing principle eating catching cutting
Past form ate caught cut
–ed participle eaten caught cut

Morphology and FSTs 10

3.1 Survey of (Mostly) English Morphology
Derivational Morphology
• Nominalization in English:
– The formation of new nouns, often from verbs or adjectives
Suffix Base Verb/Adjective Derived Noun
-action computerize (V) computerization
-ee appoint (V) appointee
-er kill (V) killer
-ness fuzzy (A) fuzziness

– Adjectives derived from nouns or verbs

Suffix Base Noun/Verb Derived Adjective

-al computation (N) computational
-able embrace (V) embraceable
-less clue (A) clueless
Morphology and FSTs 11
3.1 Survey of (Mostly) English Morphology
Derivational Morphology
• Derivation in English is more complex than inflection because
– Generally less productive
• A nominalizing affix like –ation can not be added to absolutely every verb.
eatation(*)
– There are subtle and complex meaning differences among nominalizing
suffixes. For example, sincerity has a subtle difference in meaning from
sincereness.

Morphology and FSTs 12

Terminology
• Parsing English morphology

Meaning Example
SG Singular (N) Fox
PL Plural (N) Foxes
------ ------------------------------- --------------------------------------
1SG First-Person Singular I walk
2SG Second-Person SG You walk
3SG Third-Person SG He (She | It) walks
1PL First-Person Plural We walk
2PL Second-Person PL You walk
3PL Third-Person PL They walk

Morphology and FSTs 13

Morphological Parsing
• Parsing English morphology

Input Morphological parsed output

cats cat +N +PL
cat cat +N +SG
cities city +N +PL
geese goose +N +PL
goose (goose +N +SG) or (goose +V)
gooses goose +V +3SG She gooses the engine to make it start
merging merge +V +PRES-PART
caught (caught +V +PAST-PART) or (catch +V +PAST)

h o l ogical features
d mo r p
Stems an
Morphology and FSTs 14
Morphological Parsing

• We need at least the following to build a morphological parser:

1. Lexicon: the list of stems and affixes, together with basic information
about them (Noun stem or Verb stem, etc.)
2. Morphotactics: The rules and patterns governing the combination and
arrangement of morphemes within a word. It involves the study of how
morphemes are structured and ordered to create words in a particular
language. It involves: Ordering of Morphemes, Co-occurrence
Restrictions etc.
3. Orthographic rules: These spelling rules are used to model the changes
that occur in a word, usually when two morphemes combine (e.g., the
y→ie spelling rule changes city + -s to cities).

Morphology and FSTs 15

Morphological Parsing
Lexicon Morphotactics Orthographic Rules
Set of rules that decide
Stores basic information Set of rules to make
the changes to the
about a word decisions
spelling
Decide whether the word can City 🡪 Cities
appear/not appear City 🡪 Citys
Word is stem or affix?
before/after/in-between other Knife 🡪 Knives
words Knife 🡪 Knifes
If it is stem, whether a Use-full-ness 🡪
Use-full-ness
noun stem or a verb stem? Usefulness
If affix, whether a prefix,
full-use-ness
suffix, circumfix or infix?

Morphology and FSTs 16

Finite-State Automata for Morphology

Morphology and FSTs 17

3.2 Finite-State Morphological Parsing
The Lexicon and Morphotactics
• A lexicon is a repository for words.
– The simplest one would consist of an explicit list of every word of the language.
– Computational lexicons are usually structured with
• a list of each of the stems and
• Affixes of the language together with a representation of morphotactics telling us how
they can fit together.
– The most common way of modeling morphotactics is the finite-state automaton.

Reg-noun Irreg-pl-noun Irreg-sg-noun plural

fox geese goose -s

fat sheep sheep
fog Mice mouse

An FSA for English nominal inflection

Morphology and FSTs 18
3.2 Finite-State Morphological Parsing
The Lexicon and Morphotactics
Reg-noun Irreg-pl-noun Irreg-sg-noun plural

dog geese goose -s

cat

Morphology and FSTs 19

3.2 Finite-State Morphological Parsing
The Lexicon and Morphotactics

An FSA for English verbal inflection

Reg-verb-stem Irreg-verb-stem Irreg-past-verb past Past-part Pres-part 3sg

walk cut caught -ed -ed -ing -s

fry speak ate
talk sing eaten
impeach sang
spoken

Morphology and FSTs 20

3.2 Finite-State Morphological Parsing
The Lexicon and Morphotactics

• English derivational morphology is more complex than English

inflectional morphology, and so automata of modeling English
derivation tends to be quite complex.

big, bigger, biggest

cool, cooler, coolest, coolly
red, redder, reddest
clear, clearer, clearest, clearly, unclear, unclearly
happy, happier, happiest, happily
unhappy, unhappier, unhappiest, unhappily
An FSA for a fragment of English adjective real, unreal, really
Morphology #1

Morphology and FSTs 21

3.2 Finite-State Morphological Parsing
• The FSA#1 recognizes all the listed adjectives, and ungrammatical forms
like unbig, redly, and realest.
• Thus #1 is revised to become #2.
• The complexity is expected from English derivation.

An FSA for a fragment of English adjective

Morphology #2
Morphology and FSTs 22
3.2 Finite-State Morphological Parsing

An FSA for another fragment of English derivational

morphology

Morphology and FSTs 23

3.2 Finite-State Morphological Parsing
• We can now use these FSAs to
solve the problem of
morphological recognition:
– Determining whether an input
string of letters makes up a
legitimate English word or not
– We do this by taking the
morphotactic FSAs, and plugging
in each “sub-lexicon” into the FSA.
– The resulting FSA can then be
defined as the level of the
individual letter.