0% found this document useful (0 votes)

8 views64 pages

Computer Engineering Morphology Guide

The document outlines the vision and mission of a Computer Engineering department, emphasizing the development of competent citizens and fostering research and innovation. It covers various aspects of English morphology, including word formation, inflectional and derivational morphology, and techniques like stemming and lemmatization. Additionally, it discusses regular expressions and finite state automata, highlighting their roles in computational linguistics and morphological parsing.

Uploaded by

22301011ankit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views64 pages

Computer Engineering Morphology Guide

Uploaded by

22301011ankit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Vision and Mission (Department)

Vision
To develop competent citizens who will be valuable contributors in the field of
Computer Engineering.

Mission
To create an environment which will stimulate research and innovation.
To provide students with comprehensive knowledge of the latest developments in
the field of Computer Engineering.
Module # 2

Word Level Analysis

CO : Students will be able to analyze the functioning of a language model for word-level text analysis.
English Morphology

► Morphology is the study of the way words are built up from smaller
meaning bearing units, morphemes.
► A morpheme is often defined as the minimal meaning-bearing unit in
a language.
► So for example the word fox consists of a single morpheme (the
morpheme fox) while the word cats consists of two: the morpheme
cat and the morpheme -s.

► There are two broad classes of Morphemes: Stems and Affixes.

English Morphology (Contd…)

► Affixes are further divided into prefixes, suffixes, infixes, and

circumfixes.
► Prefixes precede the stem, suffixes follow the stem, circumfixes do
both, and infixes are inserted inside the stem.
► For example, the word eats is composed of a stem eat and the suffix
-s.
► The word unbuckle is composed of a stem buckle and the prefix un-.
► English doesn’t have any good examples of circumfixes and infixes,
but many other languages do.
English Morphology (Contd…)

► A word can have more than one affix.

► For example, the word rewrites has the prefix re-, the stem write,
and the suffix -s.
► The word unbelievably has a stem (believe) plus three affixes (un-,
-able, and -ly).
► While English doesn’t tend to stack more than 4 or 5 affixes,
languages like Turkish can have words with 9 or 10 affixes, as we saw
above.
► Languages that tend to string affixes together like Turkish does are
called agglutinative languages.
► There are two broad (and partially overlapping) classes of ways to
form words from morphemes: inflection and derivation.
Inflectional Morphology

► Inflection is the combination of a word stem with a grammatical

morpheme, usually resulting in a word of the same class as the
original stem, and usually filling some syntactic function like
agreement.
► For example, English has the inflectional morpheme -s for marking
the plural on nouns, and the inflectional morpheme -ed for marking
the past tense on verbs.
► English has a relatively simple inflectional system; only nouns, verbs,
and sometimes adjectives can be inflected, and the number of
possible inflectional affixes is quite small.
Inflectional Morphology (Contd…)

► English nouns have only two kinds of inflection: an affix that marks
plural and an affix that marks possessive.

► While the regular plural is spelled -s after most nouns, it is spelled

–es after words ending in -s (ibis/ibises) , -z, (waltz/waltzes) -sh,
(thrush/thrushes) -ch, (finch/finches) and sometimes -x (box/boxes).
► Nouns ending in -y preceded by a consonant change the -y to -i
(butterfly/butterflies).
Inflectional Morphology (Contd…)

► The possessive suffix is realized by apostrophe + -s for regular

singular nouns (llama’s) and plural nouns not ending in -s (children’s)
and often by a lone apostrophe after regular plural nouns (llamas’)
and some names ending in -s or -z (Euripides’ comedies).
► English verbal inflection is more complicated than nominal inflection.
► First, English has three kinds of verbs; main verbs, (eat, sleep,
impeach), modal verbs (can, will, should), and primary verbs (be,
have, do)
► We will mostly be concerned with the main and primary verbs,
because it is these that have inflectional endings.
► Of these verbs a large class are regular, that is to say all verbs of this
class have the same endings marking the same functions.
Inflectional Morphology (Contd…)

► These regular verbs (e.g. walk, or inspect), have four morphological

forms, as follows:

► The irregular verbs are those that have some more or less
idiosyncratic (individual) forms of inflection
Derivational Morphology

► Derivation is the combination of a word stem with a grammatical

morpheme, usually resulting in a word of a different class, often with
a meaning hard to predict exactly.
► A very common kind of derivation in English is the formation of new
nouns, often from verbs or adjectives. This process is called
nominalization.
Derivational Morphology (Contd…)

► Adjectives can also be derived from nouns and verbs.

► Here are examples of a few suffixes deriving adjectives from nouns or
verbs.
Stemming and Lemmatization

► Stemming algorithms work by cutting off the end or the beginning of

the word, taking into account a list of common prefixes and suffixes
that can be found in an inflected word.
► This indiscriminate cutting can be successful in some occasions, but
not always, and that is why we affirm that this approach presents
some limitations.
Stemming and Lemmatization (Contd…)

► Lemmatization, on the other hand, takes into consideration the

morphological analysis of the words.
► To do so, it is necessary to have detailed dictionaries which the
algorithm can look through to link the form back to its lemma
Porter Stemmer

► STEMMING a simplified form of morphological analysis – simply find

the stem.
► Porter Stemmer is a simple rule-based algorithm for stemming

► Based on rules like:

► ATIONAL -> ATE (e.g., relational -> relate)

► The algorithm consists of five sets of rules, applied in order.

Porter Stemmer (Contd…)

► Definitions:
► CONSONANT: a letter other than A, E, I, O, U, and Y preceded by
consonant
► VOWEL: any other letter
► With this definition, all words are of the form:
► (C)(VC)m(V)
► C=string of one or more consonants (con+)
► V=string of one or more vowels
► M=measure of word or word part, when represented in the form (VC)
Porter Stemmer (Contd…)
Porter Stemmer (Contd…)

► The rules are of the form: (condition) S1 -> S2

► Where S1 and S2 are suffixes

► Conditions:
Porter Stemmer (Contd…)
Porter Stemmer (Contd…)
if the second or third rule of step 1b was used
Porter Stemmer (Contd…)
Porter Stemmer (Contd…)
Porter Stemmer (Contd…)
Porter Stemmer (Contd…)
Porter Stemmer (Contd…)

► Drawbacks:
► gas (noun) → ga
► gases (plural) → gase
► gasses (verb, present tense) → gass
► gassing (verb, present continuous) → gass
► gaseous (adjective) → gaseou
Regular Expressions

► A regular expression is a formula in a special language that is used for

specifying simple classes of strings. A string is a sequence of symbols;
for the purpose of most text-based search techniques, a string is any
sequence of alphanumeric characters (letters, numbers, spaces, tabs,
and punctuation).
► For these purposes a space is just a character like any other, and we
represent it with the symbol .
► Formally, a regular expression is an algebraic notation for
characterizing a set of strings.
► Thus they can be used to specify search strings as well as to define a
language in a formal way.
Examples of Regular Expressions
Regular Expressions Regular Set

(0 + 10*) {0, 1, 10, 100, 1000, 10000, … }

(010) {1, 01, 10, 010, 0010, …}

(0 + ε)(1 + ε) {ε, 0, 1, 01}

It would be set of strings of a’s and b’s of any length which also
(a+b)*
includes the null string i.e. {ε, a, b, aa , ab , bb , ba, aaa…….}

It would be set of strings of a’s and b’s ending with the string abb i.e.
(a+b)*abb
{abb, aabb, babb, aaabb, ababb, …………..}

It would be set consisting of even number of 1’s which also includes an

(11)*
empty string i.e. {ε, 11, 1111, 111111, ……….}

It would be set of strings consisting of even number of a’s followed by

(aa)*(bb)*b odd number of b’s i.e. {b, aab, aabbb, aabbbbb, aaaab, aaaabbb,
…………..}

It would be string of a’s and b’s of even length that can be obtained by
(aa + ab + ba + bb)* concatenating any combination of the strings aa, ab, ba and bb
including null i.e. {aa, ab, ba, bb, aaab, aaba, …………..}
Properties of regular sets
• If we do the union of two regular sets then the resulting set would also
be regular.
• If we do the intersection of two regular sets then the resulting set
would also be regular.
• If we do the complement of regular sets, then the resulting set would
also be regular.
• If we do the difference of two regular sets, then the resulting set would
also be regular.
• If we do the reversal of regular sets, then the resulting set would also
be regular.
• If we take the closure of regular sets, then the resulting set would also
be regular.
• If we do the concatenation of two regular sets, then the resulting set
would also be regular.
Regular Expressions (Contd…)

► The simplest kind of regular expression is a sequence of simple

characters.

► Regular expressions are case sensitive.

► This means that the pattern /woodchucks/ will not match the string
Woodchucks.
Regular Expressions (Contd…)

► The brackets can be used with the dash ( - ) to specify any one character in
a range.

► The square brackets can also be used to specify what a single character
cannot be, by use of the caret ^.
► If the caret (^) is the first symbol after the open square brace, the
resulting pattern is negated.
► For example, the pattern /[^a]/ matches any single character (including
special characters) except a.
Regular Expressions (Contd…)

► The use of square brackets solves our capitalization problem for

woodchucks.
► But we still haven’t answered our original question; how do we
specify both woodchuck and woodchucks?
► We can’t use the square brackets, because while they allow us to say
‘s or S’, they don’t allow us to say ‘s or nothing’.
► For this we use the question-mark, which means ‘the preceding
character or nothing’.
Regular Expressions (Contd…)
► The set of operators that allow us to say things like “some number of ‘a’s”
are based on the asterisk or *, commonly called the Kleene * (pronounced
“cleany star”).
► The Kleene star means ‘zero or more occurrences of the immediately
previous character or regular expression’.
► Recall that the regular expression for an individual digit was /[0-9]/.
► So the regular expression for an integer (a string of digits) is /[0-9] [0-9]*/.
(Why isn’t it just ) /[0-9]*/?
► Sometimes it’s annoying to have to write the regular expression for digits
twice, so there is a shorter way to specify ‘at least one’ of some character.
► This is the Kleene +, which means ‘one or more of the previous character’.
► Thus the expression is /[0-9]+/ the normal way to specify ‘a sequence of
digits’.
Regular Expressions (Contd…)
► Anchors are special characters that anchor regular expressions to
particular places in a string.
► The most common anchors are the caret ^ and the dollar-sign $.
► The caret ^ matches the start of a line. The pattern /^The/ matches
the word ‘The’ only at the start of a line.
► Thus there are three uses of the caret ^:
► to match the start of a line,
► as a negation inside of square brackets, and
► just to mean a caret.
► The dollar sign $ matches the end of a line.
► So the pattern is a useful pattern (␣$) for matching a space at the
end of a line.
Regular Expressions (Contd…)
► If we want to search for either the string cat or the string dog.
► Since we can’t use the square-brackets to search for ‘cat or dog’ (why
not?) we need a new operator, the disjunction operator, also called the
pipe symbol |.
► The pattern matches /cat | dog/ either the string cat or the string dog.
► Sometimes we need to use this disjunction operator in the midst of a larger
sequence.
► How can we specify both guppy and guppies?
► We cannot simply say /guppy | ies/ , because that would match only the
strings guppy and ies. This is because sequences like take precedence over
the disjunction operator .
► In order to make the disjunction operator apply only to a specific pattern,
we need to use the parenthesis operators (and).
Regular Expressions (Contd…)

► Operator precedence
Finite State Automata

► The regular expression is more than just a convenient metalanguage

for text searching.
► A regular expression is one way of describing a finite-state automaton
(FSA).
► Finite-state automata are the theoretical foundation of a good deal
of the computational work.
► Any regular expression can be implemented as a finite-state
automaton.
Finite State Automata (Contd…)
Finite State Automata (Contd…)
Finite State Transducers

► A transducer maps between one representation and another: a

finite-state transducer, or FST, is a type of finite automaton which
maps between two sets of symbols.
► The FST thus has a more general function than an FSA; where an FSA
defines a formal language by defining a set of strings, an FST defines
a relation between sets of strings.
► Another way of looking at an FST is as a machine that reads one
string and generates another.
Finite State Transducers (Contd…)
► Here’s a summary of this fourfold way of thinking about transducers:
• FST as recognizer: a transducer that takes a pair of strings as input
and outputs; accept if the string-pair is in the string-pair language, and
reject if it is not.
• FST as generator: a machine that outputs pairs of strings of the
language. Thus, the output is a yes or no, and a pair of output strings.
• FST as translator: a machine that reads a string and outputs another
string.
• FST as set relater: a machine that computes relations between sets.
Finite State Transducers (Contd…)

► Let’s now turn to the task of morphological parsing.

► Given the input cats, for instance, we’d like to output cat +N +PI,
telling us that cat is a plural noun.
► In the finite-state morphology paradigm that we use, we represent a
word as a correspondence between a lexical level, which represents a
concatenation of morphemes making up a word,
► and the surface level, which represents the concatenation of letters
making up the actual spelling of the word.
Finite State Transducers (Contd…)
► Transducers are similar to finite state machines. They have states ,
inputs and transition between states. They also have output.
Expand previous FST so that it performs the following
transformations:
Finite State Transducers


WFST
We have FST below, which accept the strings {cat,cap}
WFST
► A weighted finite state transducer takes probabilities into account
► WFST are used in many NLP applications, such as speech recognition.
► WFST includes weight for each transition. this weight for each transition
tells you how frequent this transition is.
N-Gram

► In the fields of computational linguistics and probability, an n-gram is

a contiguous sequence of n items from a given sample of text or
speech. The items can be phonemes, syllables, letters, words or base
pairs according to the application.
► The word N-gram approach to spelling error detection and correction
was proposed by Mays et al. (1991).
► The idea is to generate every possible misspelling of each word in a
sentence either just by typographical modifications (letter insertion,
deletion, substitution), or by including homophones as well, (and
presumably including the correct spelling), and then choosing the
spelling that gives the sentence the highest prior probability.
N-gram Sensitivity to the Training Corpus
► In speech recognition, input may be noisy and this can lead to wrong
speech-to-text conversion.
► N-gram models are used in machine translation to produce more natural
sentences in the target language.
► N-gram models are generally at word-level. It has also been used at
character level to do stemming, i.e. separate the root word from the
suffix.
N-gram Sensitivity to the Training Corpus
The N-gram model is statistical model, like other statistical model its dependent on
training corpus.
There are two implications
the probabilities often encode specific facts about a given training corpus
N-grams do better and better job of modeling the training corpus as we increase the
value of N
N-gram Sensitivity to the Training Corpus
• The longer the context on which we train the model, the more coherent
the sentences. In the unigram sentences, there is no coherent relation
between words or any sentence-final punctuation.

• The bigram sentences have some local word-to-word coherence

(especially if we consider that punctuation counts as a word).

• The trigram and quadrigram sentences are beginning to look a lot like
Shakespeare. Indeed, a careful investigation of the quadrigram sentences
shows that they look a little too much like Shakespeare.
N-gram
• We will use corpora to calculate probabilities
• We will calculate probabilities of

unigram 1 word units {salt, table, lock}

1 character units {t, p, s}

bigram 2 word units {salt and, table and, lock and}

2 character units {th, he, ho}

trigram 3 word units {salt and pepper, table and chairs, lock and
key}
3 character units {the, tho, thp}
Unknown Words: Open Versus Closed Vocabulary Tasks
Closed Vocabulary
Sometimes we have a language task in which we know all the words that can occur,
Closed vocabulary and hence we know the vocabulary size V in advance.
The closed vocabulary assumption is the assumption that we have such a lexicon and that
the test set can only contain words from this lexicon. The closed vocabulary task thus
assumes there are no unknown words.
The number of unseen words grows constantly, so we can’t possibly know in advance
exactly how many there are, and we’d like our model to do something reasonable with
them. We call these oov unseen events unknown words, or out of vocabulary (OOV)
words.
The percentage of OOV words that appear in the test set is called the OOV rate.
Unknown Words: Open Versus Closed Vocabulary Tasks
Open Vocabulary
An open vocabulary system is one in which we model these potential unknown words
in the test set by adding a pseudo-word called <UNK>.
We can train the probabilities of the unknown word model <UNK> as follows:
1. Choose a vocabulary (word list) that is fixed in advance.
2. Convert in the training set any word that is not in this set (any OOV word) to the
unknown word token <UNK> in a text normalization step.
3. Estimate the probabilities for <UNK> from its counts just like any other regular
word in the training set.

An alternative that doesn’t require choosing a vocabulary is to replace the first

occurrence of every word type in the training data by <UNK>.
Evaluating n-Grams: Perplexity
❏ The best way to evaluate the performance of a language model is to embed it in an
application and measure the total performance of the application.
❏ Such end-to-end evaluation is called extrinsic evaluation, and also sometimes called in
vivo evaluation
❏ Extrinsic evaluation is the only way to know if a particular improvement in a component is
really going to help the task at hand.
❏ Thus, for speech recognition, we can compare the performance of two language models by
running the speech recognizer twice, once with each language model, and seeing which
gives the more accurate transcription.
Evaluating n-Grams: Perplexity
❏ End-to-end evaluation is often very expensive; evaluating a large speech recognition test set,
for example, takes hours or even days.
❏ Thus, we would like a metric that can be used to quickly evaluate potential improvements in a
language model.
❏ An intrinsic evaluation metric is one that measures the quality of a model independent of any
application. Perplexity is the most common intrinsic evaluation metric for N-gram language
models.
❏ While an (intrinsic) improvement in perplexity does not guarantee an (extrinsic) improvement
in speech recognition performance (or any other end-to-end metric), it often correlates with
such improvements.
❏ Thus, it is commonly used as a quick check on an algorithm, and an improvement in perplexity
can then be confirmed by an end-to-end evaluation.
Evaluating n-Grams: Perplexity
The intuition of perplexity is that given two probabilistic models, the better model is the one
that has a tighter fit to the test data or that better predicts the details of the test data.
We can measure better prediction by looking at the probability the model assigns to the test
data; the better model will assign a higher probability to the test data.
the perplexity (sometimes called PP for short) of a language model on a test set is a function of
the probability that the language model assigns to that test set.
For a test set W = w1,W2 -..wn, the perplexity is the probability of the test set, normalized by
the number of words:

Note : Inverse in Eq. The higher the conditional

probability of the word sequence, the lower the
perplexity.
Thus, minimizing perplexity is equivalent to
maximizing the test set probability according to
the language model.
Smoothing
There is a major problem with the maximum likelihood estimation process we
have seen for training the parameters of an n-gram model.
This is the problem of sparse data data caused by the fact that our maximum
likelihood estimate was based on a particular set of training data.
For any n-gram that occurred a sufficient number of times, we might have a
good estimate of its probability.
But because any corpus is limited, some perfectly acceptable English word
sequences are bound to be missing from it.
This missing data means that the n-gram matrix for any given training corpus is
bound to have a very large number of cases of putative “zero probability
n-grams” that should really have some non-zero probability.
Furthermore, the MLE method also produces poor estimates when the counts
are non-zero but still small.
Smoothing
❏ We need a method which can help get better estimates for these zero or low frequency counts.
Zero counts turn out to cause another huge problem.
❏ The perplexity metric defined above requires that we compute the probability of each test
sentence. But if a test sentence has an n-gram that never appeared in the training set, the
maximum likelihood estimate of the probability for this N-gram, and hence for the whole test
sentence, will be zero!
❏ This means that in order to evaluate our language models, we need to modify the MLE method
to assign some non-zero probability to any n-gram, even one that was never observed in
training.
❏ For these reasons, we'll want to modify the maximum likelihood estimates for computing
n-gram probabilities, focusing on the n-gram events that we incorrectly assumed had zero
probability.
❏ We use the term smoothing for such modifications that address the poor estimates that are due
to variability in small data sets.
❏ The name comes from the fact that (looking ahead a bit) we will be shaving a little bit of
probability mass from the higher counts and piling it instead on the zero counts, making the
distribution a little less jagged.
Smoothing Techniques
Smoothing techniques
• Laplace Smoothing
• Good-Turing Discounting
• Backoff
Laplace Smoothing
● One simple way to do smoothing might be just to take our matrix of bigram
counts, before we normalize them into probabilities, and add 1 to all the
counts. This algorithm is called Laplace smoothing, or Laplace’s Law.
● For Unigram
○ Add 1 to every word (type) count to ge an adjusted count *c
○ Normalize by N(#tokens) + V (#types)
○ Original unigram probability P(Wi) = Ci/N
○ New unigram probability P(Wi) = C+1/N+V

Word Level Analysis in NLP
No ratings yet
Word Level Analysis in NLP
18 pages
Understanding Morphology and Morphemes
No ratings yet
Understanding Morphology and Morphemes
118 pages
Word Level Analysis and Morphology
No ratings yet
Word Level Analysis and Morphology
78 pages
Understanding Morphology in NLP
No ratings yet
Understanding Morphology in NLP
39 pages
Word Level Analysis in NLP
No ratings yet
Word Level Analysis in NLP
97 pages
Understanding Morphology in NLP
No ratings yet
Understanding Morphology in NLP
8 pages
Inflectional vs Derivational Morphology
No ratings yet
Inflectional vs Derivational Morphology
25 pages
Word Level Analysis and Regex Techniques
No ratings yet
Word Level Analysis and Regex Techniques
114 pages
Word Structure and Morphology Analysis
No ratings yet
Word Structure and Morphology Analysis
126 pages
Morphological Analysis in NLP Techniques
No ratings yet
Morphological Analysis in NLP Techniques
47 pages
Regular Expression Basics and Patterns
No ratings yet
Regular Expression Basics and Patterns
29 pages
Finite State Automata and Morphology Guide
No ratings yet
Finite State Automata and Morphology Guide
125 pages
Morphology Stemming
No ratings yet
Morphology Stemming
82 pages
Understanding Morphology Basics
No ratings yet
Understanding Morphology Basics
6 pages
1.5 Specification of Tokens
No ratings yet
1.5 Specification of Tokens
24 pages
Understanding Text Processing and Morphology
No ratings yet
Understanding Text Processing and Morphology
56 pages
Inflectional vs. Derivational Morphology
No ratings yet
Inflectional vs. Derivational Morphology
25 pages
Lecture 2a
No ratings yet
Lecture 2a
35 pages
Word Processing Techniques in Computing
No ratings yet
Word Processing Techniques in Computing
92 pages
Understanding Morphological Analysis
No ratings yet
Understanding Morphological Analysis
29 pages
Understanding Morphology in Writing
No ratings yet
Understanding Morphology in Writing
7 pages
Word Level Morphological Analysis in NLP
No ratings yet
Word Level Morphological Analysis in NLP
49 pages
Understanding Morphological Elements
100% (1)
Understanding Morphological Elements
6 pages
Introduction to Morphology Concepts
No ratings yet
Introduction to Morphology Concepts
33 pages
Understanding Morphology in Linguistics
No ratings yet
Understanding Morphology in Linguistics
33 pages
Morphological Analysis in NLP
No ratings yet
Morphological Analysis in NLP
103 pages
Morphology, Syntax, and Semantics Overview
No ratings yet
Morphology, Syntax, and Semantics Overview
33 pages
Understanding Morphology in Linguistics
No ratings yet
Understanding Morphology in Linguistics
27 pages
Key Concepts in Language Modeling
No ratings yet
Key Concepts in Language Modeling
48 pages
Types of Morphology Explained
No ratings yet
Types of Morphology Explained
22 pages
An Introduction To English Morphology
No ratings yet
An Introduction To English Morphology
22 pages
Understanding Morphemes and Morphology
No ratings yet
Understanding Morphemes and Morphology
5 pages
NLP Unit 2 GPT-1
No ratings yet
NLP Unit 2 GPT-1
62 pages
Understanding Morphology in Linguistics
No ratings yet
Understanding Morphology in Linguistics
19 pages
Understanding Linguistic Morphology
No ratings yet
Understanding Linguistic Morphology
27 pages
MORPHOLOGY
No ratings yet
MORPHOLOGY
51 pages
Understanding Morphemes in Language
No ratings yet
Understanding Morphemes in Language
22 pages
Types of Morphemes in Morphology
No ratings yet
Types of Morphemes in Morphology
37 pages
Understanding Morphology in Linguistics
No ratings yet
Understanding Morphology in Linguistics
3 pages
Understanding Linguistic Morphology
No ratings yet
Understanding Linguistic Morphology
61 pages
Understanding Morphology in NLP
No ratings yet
Understanding Morphology in NLP
52 pages
Understanding Morphology in Linguistics
No ratings yet
Understanding Morphology in Linguistics
5 pages
Understanding Morphemes and Their Types
No ratings yet
Understanding Morphemes and Their Types
71 pages
Understanding Morphemes and Their Types
No ratings yet
Understanding Morphemes and Their Types
22 pages
Understanding Morphemes and Word Formation
No ratings yet
Understanding Morphemes and Word Formation
7 pages
Understanding Morphemes and Their Types
No ratings yet
Understanding Morphemes and Their Types
13 pages
Understanding Computational Morphology
100% (1)
Understanding Computational Morphology
12 pages
Understanding English Morphology Basics
No ratings yet
Understanding English Morphology Basics
24 pages
Understanding Morphemes and Word Structure
No ratings yet
Understanding Morphemes and Word Structure
5 pages
Understanding Morphemes in Morphology
No ratings yet
Understanding Morphemes in Morphology
12 pages
Trisyllabic Laxing in English Morphology
No ratings yet
Trisyllabic Laxing in English Morphology
23 pages
Finite-State Morphology in NLP
100% (1)
Finite-State Morphology in NLP
55 pages
Understanding Zero Morphemes in English
No ratings yet
Understanding Zero Morphemes in English
11 pages
Morphemes in the Word "Monster"
No ratings yet
Morphemes in the Word "Monster"
13 pages
Stemming and Lemmatization in NLP
No ratings yet
Stemming and Lemmatization in NLP
20 pages
Free vs. Bound Morphemes Explained
No ratings yet
Free vs. Bound Morphemes Explained
16 pages
Word Level Analysis and Morphology
No ratings yet
Word Level Analysis and Morphology
54 pages
Understanding Morphology in Linguistics
No ratings yet
Understanding Morphology in Linguistics
12 pages
Theory of Morphology
No ratings yet
Theory of Morphology
25 pages
CH 2. HADOOP-Mapreduce.
No ratings yet
CH 2. HADOOP-Mapreduce.
102 pages
CH 1
No ratings yet
CH 1
40 pages
Chap 6
No ratings yet
Chap 6
25 pages
DL Practice Q IA1
No ratings yet
DL Practice Q IA1
1 page
Chapter # 1
No ratings yet
Chapter # 1
112 pages
Chapter # 2
No ratings yet
Chapter # 2
127 pages
IoT Practice Questions and Concepts
No ratings yet
IoT Practice Questions and Concepts
1 page
Overview of Information Retrieval Systems
No ratings yet
Overview of Information Retrieval Systems
7 pages
Chapter # 4
No ratings yet
Chapter # 4
56 pages
Discourse and Pragmatic Analysis Guide
No ratings yet
Discourse and Pragmatic Analysis Guide
58 pages
Chapter # 3
No ratings yet
Chapter # 3
18 pages
Computer Engineering POS Tagging Guide
No ratings yet
Computer Engineering POS Tagging Guide
71 pages
Understanding IP Mobility and Mobile IPv6
No ratings yet
Understanding IP Mobility and Mobile IPv6
56 pages
Understanding Loaders and Linkers
100% (1)
Understanding Loaders and Linkers
64 pages
Data Structures in Assemblers Explained
No ratings yet
Data Structures in Assemblers Explained
64 pages
Knowledge-Based Agents and Logic
No ratings yet
Knowledge-Based Agents and Logic
80 pages
Problem Solving in AI: Search Methods
No ratings yet
Problem Solving in AI: Search Methods
72 pages
Understanding Percept Sequences in AI
No ratings yet
Understanding Percept Sequences in AI
29 pages
DBSCAN Online Calculator Overview
No ratings yet
DBSCAN Online Calculator Overview
50 pages
Introduction to Linguistics and Phonetics
100% (1)
Introduction to Linguistics and Phonetics
127 pages
Tunisian Arabic Verbs Dictionary
50% (2)
Tunisian Arabic Verbs Dictionary
47 pages
Understanding Noun Suffixes
100% (1)
Understanding Noun Suffixes
94 pages
Text Mining for AI Lecture Summary
No ratings yet
Text Mining for AI Lecture Summary
17 pages
Morphological Awareness in English Teaching
No ratings yet
Morphological Awareness in English Teaching
7 pages
Understanding Morphemes and Their Types
No ratings yet
Understanding Morphemes and Their Types
9 pages
Exercises
No ratings yet
Exercises
65 pages
Quranic Grammar Handouts Overview
No ratings yet
Quranic Grammar Handouts Overview
51 pages
Inflectional vs. Derivational Suffixes
No ratings yet
Inflectional vs. Derivational Suffixes
10 pages
Begg172 BOOK by Perplexity. Ai
No ratings yet
Begg172 BOOK by Perplexity. Ai
11 pages
EU Legal Translation Manual: Ukrainian Guide
No ratings yet
EU Legal Translation Manual: Ukrainian Guide
580 pages
English Grammar Terms Glossary
No ratings yet
English Grammar Terms Glossary
8 pages
Introduction to Language and Linguistics
100% (1)
Introduction to Language and Linguistics
33 pages
Text Book of Balochi
No ratings yet
Text Book of Balochi
234 pages
Kuwaiti Arabic
100% (1)
Kuwaiti Arabic
305 pages
K-5 Common Core ELA Standards
No ratings yet
K-5 Common Core ELA Standards
6 pages
Middle English Grammar Overview
No ratings yet
Middle English Grammar Overview
6 pages
Tagalog Morphology's Impact on PhilE
No ratings yet
Tagalog Morphology's Impact on PhilE
8 pages
Georgia Kindergarten ELA Standards
No ratings yet
Georgia Kindergarten ELA Standards
6 pages
Understanding Word Classes: Nouns Explained
No ratings yet
Understanding Word Classes: Nouns Explained
34 pages
Meaning of Muhammad Abdullah
No ratings yet
Meaning of Muhammad Abdullah
18 pages
Etymology and Lexicology of English
No ratings yet
Etymology and Lexicology of English
33 pages
Understanding Language: Key Concepts
No ratings yet
Understanding Language: Key Concepts
45 pages
Harvard Linguistics 110 - Class 14 Morphology I
No ratings yet
Harvard Linguistics 110 - Class 14 Morphology I
6 pages
Oxford Learner's Pocket Dictionary PDF
No ratings yet
Oxford Learner's Pocket Dictionary PDF
2 pages
Summary of Language Study Concepts
No ratings yet
Summary of Language Study Concepts
72 pages
New Branch Proposal for Cariban Languages
No ratings yet
New Branch Proposal for Cariban Languages
26 pages
Overview of English Word Classes
No ratings yet
Overview of English Word Classes
2 pages
Slavic Resultatives and Aspect Analysis
No ratings yet
Slavic Resultatives and Aspect Analysis
45 pages
Class II Monthly Syllabus 2024-25
No ratings yet
Class II Monthly Syllabus 2024-25
10 pages

Computer Engineering Morphology Guide

Uploaded by

Computer Engineering Morphology Guide

Uploaded by

Vision and Mission (Department)

Word Level Analysis

► There are two broad classes of Morphemes: Stems and Affixes.

► Affixes are further divided into prefixes, suffixes, infixes, and

► A word can have more than one affix.

► Inflection is the combination of a word stem with a grammatical

► While the regular plural is spelled -s after most nouns, it is spelled

► The possessive suffix is realized by apostrophe + -s for regular

► These regular verbs (e.g. walk, or inspect), have four morphological

► Derivation is the combination of a word stem with a grammatical

► Adjectives can also be derived from nouns and verbs.

► Stemming algorithms work by cutting off the end or the beginning of

► Lemmatization, on the other hand, takes into consideration the

► STEMMING a simplified form of morphological analysis – simply find

► Based on rules like:

► The algorithm consists of five sets of rules, applied in order.

► The rules are of the form: (condition) S1 -> S2

► A regular expression is a formula in a special language that is used for

(0 + 10*) {0, 1, 10, 100, 1000, 10000, … }

(0*10*) {1, 01, 10, 010, 0010, …}

(0 + ε)(1 + ε) {ε, 0, 1, 01}

It would be set consisting of even number of 1’s which also includes an

It would be set of strings consisting of even number of a’s followed by

► The simplest kind of regular expression is a sequence of simple

► Regular expressions are case sensitive.

► The use of square brackets solves our capitalization problem for

► The regular expression is more than just a convenient metalanguage

► A transducer maps between one representation and another: a

► Let’s now turn to the task of morphological parsing.

► In the fields of computational linguistics and probability, an n-gram is

• The bigram sentences have some local word-to-word coherence

unigram 1 word units {salt, table, lock}

bigram 2 word units {salt and, table and, lock and}

An alternative that doesn’t require choosing a vocabulary is to replace the first

Note : Inverse in Eq. The higher the conditional

You might also like

(010) {1, 01, 10, 010, 0010, …}