0% found this document useful (0 votes)

16 views100 pages

Understanding English Morphology

Uploaded by

Kranti Gajmal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views100 pages

Understanding English Morphology

Uploaded by

Kranti Gajmal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Word Level Processing

• In linguistics, morphology is the study of the

internal structure of words, focusing on how
smaller units of meaning, known as morphemes,
combine to form words.
• Think of it as dissecting words to understand
their building blocks and how they connect.
Morphemes

• The smallest units of meaning within a word.

Example: In "unbreakable," "un-" and "breakable"
are morphemes.
– Free vs. Bound:
• Free morphemes: Can stand alone as words
(e.g., "book," "run").
• Bound morphemes: Must attach to another
morpheme to form a word (e.g., "un-," "-able").
Morphemes : Types
• Prefixes:
– Added to the beginning of a word (e.g., "un-,"
"re-").
• Suffixes:
– Added to the end of a word (e.g., "-able," "-ly").
• Infixes:
– Added within a word stem (e.g., "s" in "sing-s").
• Roots:
– The core meaning-carrying morpheme of a word
(e.g., "break" in "unbreakable").
Morphological Processes
• Inflection: Modifying a word to express
grammatical information like tense, number, or
case (e.g., "sing," "sings," "sung").
• Derivation: Creating new words from existing ones
by adding affixes (e.g., "happy" -> "unhappy").
• Compounding: Combining two or more words to
form a new word (e.g., "blackboard," "sunflower").
Morphology Types: Process

• Inflection:
– Modifying a word to express grammatical
information like tense, number, case, or
mood. Examples: "sing," "sings," "sung,"
"book," "books."
• Derivation:
– Creating new words from existing ones by
adding affixes (prefixes, suffixes, infixes).
Examples: "happy" -> "unhappy," "teach" ->
"teacher," "run" -> "running."
Morphology Types: Process

• Compounding:
– Combining two or more words to form a new
word. Examples: "blackboard," "sunflower,"
"bookstore."
• Conversion:
– Changing the part of speech of a word
without adding affixes. Examples: "run"
(verb) -> "run" (noun), "fast" (adjective) ->
"fast" (adverb).
Morphology Types: Affix

• Prefixation:
– Adding an affix at the beginning of a word.
Examples: "un-," "re-," "non-."
• Suffixation:
– Adding an affix at the end of a word.
Examples: "-able," "-ly," "-ness."
Morphology Types: Affix

• Infixation:
– Adding an affix within a word stem.
Examples: "s" in "sing-s," "umlaut" in German.
• Circumfixation:
– Adding affixes both at the beginning and end
of a word. Examples: "ge-" and "-t" in German
"gearbeitet" (worked).
Morphology Types: Word Form

• Agglutinative:
– Words built by adding single, transparent
morphemes ("glue") with clear meanings.
Examples: Turkish, Finnish.
• Fusional:
– Multiple grammatical features combined
within a single complex morpheme, making
analysis more intricate. Examples: Latin,
Sanskrit.
Morphology Types: Word Form

• Ablaut:
– Internal vowel changes to alter word
meaning or grammatical information.
Examples: Arabic, Old English.
• Reduplication:
– Repeating part of a word for emphasis or
grammatical function. Examples: Malay,
Tagalog.
Morphology Types: Inflection

• English:
– Primarily uses suffixes to mark grammatical
information like tense, number, and case.
Examples: "sing," "sings," "sung," "book,"
"books."
Morphology Types: Inflection

• Agglutinative:
– Many languages like Hindi, Tamil, and Kannada use
"gluing" suffixes to build up complex words with
specific meanings. Example: "kitaab" (book) + "on"
(plural) = "kitaabon" (books).
• Fusional:
– Some languages like Marathi and Malayalam
combine multiple grammatical features within a
single suffix, making analysis more complex.
Example: "chala" (he went) encompasses both past
tense and singular person.
Morphology Types: Derivation

• English:
– Primarily uses prefixes and suffixes to change
the meaning or part of speech of a word.
Examples: "unhappy," "playable,"
"conversion."
Morphology Types: Derivation

• Indian Languages:
– Reduplication: Many languages like Telugu and
Oriya repeat parts of words for emphasis or to
denote grammatical changes. Example: "chala-
chala" (going repeatedly).
– Internal changes: Some languages like Punjabi
alter vowel sounds or consonants within words
for derivation. Example: "padhna" (to read) vs.
"parh" (reading).
Morphology Types: Compounding

• English:
– Combines two or more words to form new
ones. Examples: "blackboard," "sunflower,"
"bookstore."
Morphology Types: Compounding

• Indian Languages:
– Tātkriya: Certain Indian languages like Sanskrit
and Hindi form compound verbs by linking nouns
or adjectives with verbs. Example: "jal-pīna" (to
drink water) from "jal" (water) and "pīna" (to
drink).
– Bahuvrīhi: Combining nouns creates new nouns
with descriptive meanings. Example: "kamal-
phool" (lotus) from "kamal" (lotus) and "phool"
(flower).
Finite State Transducer
• A Finite State Transducer (FST) is a computational
model used in various areas like Natural Language
Processing (NLP) and formal language theory.
• It's essentially a state machine that takes an input
sequence (usually text or strings) and produces an
output sequence based on its internal rules and
transitions.
Components
• States:
– The machine exists in specific states at any given
time, representing its processing point.
• Transitions:
– Connections between states labeled with
input/output pairs. These pairs define how the
machine moves between states and what it
generates on each transition.
• Start and End states:
– Special states marking the beginning and end of the
processing sequence.
• An FST starts in the specified start state and reads
the input sequence one symbol at a time.
• Based on the current state and the read symbol, it
checks the defined transitions and moves to the
next state while generating the corresponding
output symbol.
• This process continues until the machine reaches
the end state and the entire input sequence is
processed.
Types

• Deterministic FSTs (DFSTs):

– For each state and input symbol, there's only one
possible transition and output. This offers
efficient and predictable processing.
• Non-deterministic FSTs (NFSTs):
– Allow multiple transitions and outputs from a
single state for a given input symbol. This
provides flexibility for handling ambiguity and
complex rules.
Examples
Applications

• Morphological parsing:
– Analyzes the structure of words into their
meaningful components (morphemes).
• Text-to-speech synthesis:
– Converts text into spoken language, accounting
for pronunciation rules and intonations.
Applications

• Machine translation:
– Translates text from one language to another,
considering grammatical and semantic
differences.
• Spell checking and correction:
– Identifies and corrects misspelled words based
on known patterns and rules.
Parsing
Parsing

• We need at least the following to build a morphological parser:

– Lexicon: the list of stems and affixes, together with basic
information about them (Noun stem or Verb stem, etc.)
– Morphotactics: the model of morpheme ordering that
explains which classes of morphemes can follow other
classes of morphemes inside a word. E.g., the rule that
English plural morpheme follows the noun rather than
preceding it.
– Orthographic rules: these spelling rules are used to model
the changes that occur in a word, usually when two
morphemes combine (e.g., the y→ie spelling rule changes
city + -s to cities). Morphology and FSTs
Finite-State Lexicon
• A finite-state lexicon is a powerful tool in Natural
Language Processing (NLP) for representing and
analyzing the morphological structure of words.
• It leverages finite-state transducers (FSTs) to model
the processes by which morphemes (meaningful
units) combine to form words in a specific
language.
Finite-State Lexicon: Components

• Lexical entries: These represent individual morphemes

and their properties like meaning, part-of-speech, and
possible combinations with other morphemes.
• FST network: This network of interconnected states
and transitions encodes the morphological rules
defining how morphemes interact and combine to
form words.
• Input and output: The lexicon takes a word as input
and uses the FST network to analyze its morphemic
composition and generate an output representing its
morphological structure.
Combining FST Lexicon and Rules
Combining FST Lexicon and Rules

• The power of FSTs is that the exact same cascade

with the same state sequences is used
– when machine is generating the surface form
from the lexical tape, or
– When it is parsing the lexical tape from the
surface tape.
• Parsing can be slightly more complicated than
generation, because of the problem of ambiguity.
– For example, foxes could be fox +V +3SG as
well as fox +N +PL Morphology and FSTs
• Orthographic Rules: The Foundation of Written
Language
– Definition: Orthographic rules govern the
consistent and correct spelling, punctuation, and
formatting of written text within a language.
– Purpose: Ensure clarity, readability, and
adherence to conventions, facilitating effective
communication and understanding.
Orthographic Rules: Key Components

• Spelling:
– Guides the correct formation of words based on
accepted patterns and conventions.
– Examples: "receive" not "recieve," "necessary"
not "neccessary."
• Capitalization:
– Specifies when to use uppercase letters, such as
at the beginning of sentences and for proper
nouns.
Orthographic Rules: Importance

• Punctuation:
– Dictates the use of commas, periods, semicolons,
question marks, and other symbols to clarify
meaning and structure sentences.
– • Clarify Sentence Structure
– Let’s eat, grandma. vs Let’s eat grandma.
– • Indicate Sentence Type
– Statement → She is here.
– Question → Is she here?
– Exclamation → She is here!
Orthographic Rules: Importance
– • Mark Possession or Omission
– It’s raining. → It is
– John’s hat → hat belonging to John
– • Separate Clauses
– When I arrived, he was leaving.
– I wanted to go; however, I stayed.
• Hyphenation:
– Determines when to break words across lines for
visual clarity and readability.
– A high-speed chase" vs "The chase was high
speed.
Orthographic Rules: Importance
– re-enter (not reenter), anti-inflammatory, ex-president
Hyphenation in Morphology and NLP

In computational linguistics, hyphenation is:

 Handled during tokenization and morphological analysis

 Sometimes ambiguous (e.g., "re-cover" vs "recover")

FST-based analysers may include hyphen rules to preserve or normalize form.

Rule Example
Use for compound adjectives before nouns "long-term plan"
Avoid after -ly adverbs "highly skilled worker" (no hyphen)
Use in ages as adjectives "a five-year-old child"
Don’t hyphenate familiar compound nouns "school bus", "software engineer"
Avoid multiple hyphens when not needed Prefer “nonlinear” over “non-linear” if style allows

• Word Breaks:
– Guides how to divide text into individual words,
especially in languages without clear word
Orthographic Rules: Importance
boundaries (e.g., Chinese, Japanese).
– Word breaks are points where one word ends
and another begins, or where a word can be split
(typically at the end of a line) in writing or type-
setting.
Word Breaks in NLP and Morphology

In Natural Language Processing, detecting word breaks is part of tokenization.

Examples:

 Input: "foxes"
 Output Tokens: fox + plural suffix -es (morphological break)

For languages without spaces:

 Input: "我喜欢吃饭" (Chinese)

 Output Tokens: 我 | 喜欢 | 吃饭 ("I | like | eating")
Orthographic Rules: Importance

• Text Preprocessing:
• Orthographic rules are essential for cleaning and
normalizing text data before further analysis.
• This includes tasks like:
– Correcting misspellings
– Converting text to lowercase or uppercase
– Removing extra spaces or punctuation
– Resolving ambiguities in word boundaries
Orthographic Rules: Importance

• Text Segmentation:
– Acquiring accurate segmentation into words,
sentences, or paragraphs relies on orthographic
rules.
• Lexical Analysis:
– Identifying and understanding individual words
depends on correct spelling and word formation
rules.
• Grammar Checking:
– Detecting grammatical errors often involves
recognizing violations of orthographic conventions.
Orthographic Rules: Importance

• Machine Translation:
– Generating grammatically correct and well-
formatted output in the target language
requires adherence to orthographic rules.
• Text-to-Speech Systems:
– Producing natural-sounding speech relies on
appropriate punctuation and pronunciation of
written text, guided by orthographic principles.
Tokenization

• Tokenization is the fundamental process of

splitting textual data into smaller, more
manageable units known as tokens.
• These tokens can be words, characters, sentences,
or any other meaningful element, depending on the
specific task and desired level of granularity.
Tokenization
Tokenization : Why?

• Improves computational efficiency:

– Breaking down larger text into smaller units
makes it easier and faster for NLP algorithms to
analyze and process the data.
• Reduces ambiguity:
– Tokenization can help clarify context and
eliminate ambiguities present in continuous text.
Tokenization : Why?

• Facilitates feature extraction:

– Tokens serve as the building blocks for various
NLP features like n-grams, word embeddings,
and part-of-speech tags.
• Prepares data for downstream tasks:
– Tokenized data is the initial input for many NLP
tasks such as machine translation, sentiment
analysis, text summarization, and question
answering.
Tokenization : Types
• Word tokenization: The most common type, splitting
text into individual words.
• Sentence tokenization: Divides text into individual
sentences.
• Character tokenization: Breaks down text into
individual characters, useful for certain language
models and analyses.
• Subword tokenization: Splits words into smaller
meaningful units like prefixes, suffixes, or
morphemes, particularly helpful for languages with
complex morphology or out-of-vocabulary words.
Stemming

• Stemming refers to the process of transforming

words to their stems, which are the base forms that
carry the core meaning.
• It essentially aims to reduce words to their most
basic forms by removing prefixes, suffixes, or
inflections, while still preserving their inherent
meaning.
Stemming
Stemming
Stemming: Why?

• Improves performance of NLP tasks:

– By reducing word variations, stemming can
improve the accuracy and efficiency of
algorithms used for tasks like text classification,
information retrieval, and sentiment analysis.
• Reduces data sparsity:
– When dealing with massive amounts of text
data, stemming can help overcome the issue of
data sparsity by grouping words with the same
stem, leading to more robust statistical models.
Stemming: Types
• Rule-based stemming:
– Algorithms rely on handcrafted rules to identify
and remove affixes based on specific patterns.
Examples include Porter Stemmer and Lancaster
Stemmer.
• Statistical stemming:
– Algorithms utilize statistical models to determine
the most likely stem based on observed word
frequencies and morphological patterns. Examples
include Snowball Stemmer and Krovetz Stemmer.
Stemming: Applications

• Search engines:
– Stemming can help search engines match queries
to relevant documents even if the exact search
terms are not present.
• Document clustering:
– Stemming can group similar documents together
by identifying their shared semantic core.
Stemming: Applications

• Spam filtering:
– Stemming can identify patterns in spam
messages by stripping away variations of
common spam keywords.
• Machine translation:
– Stemming can improve the accuracy of machine
translation by reducing word variations and
focusing on semantic similarities.
• The Porter Stemmer was developed by Martin
Porter in 1980 and is one of the most popular
stemming algorithms.
• It operates on the English language and goes
through a series of rules to strip off common
suffixes from words.
• The rules are designed to be simple and heuristic-
based, making the algorithm efficient and easy to
implement.
Porter Stemmer: How it works?

• Identifying Suffixes:
– The algorithm first classifies each character in a
word as either a consonant or a vowel.
– Successive consonants and vowels are then
grouped together, forming sequences like "CV"
or "CCCVVV".
– This creates a pattern representing the word's
structure.
Porter Stemmer: How it works?

• Applying Rules:
– The stemmer then iterates through a set of pre-
defined rules, each targeting specific suffix
patterns.
– For example, a rule might say "remove '-ing' if
preceded by two vowels."
– If a matching rule is found, the corresponding
suffix is removed from the word.
Porter Stemmer: How it works?

• Handling Exceptions and Ordering:

– The algorithm includes rules for handling
irregular cases and ensuring consistent
application.
– For example, there might be specific rules for
suffixes like "-ize" or "-ize" depending on the
context.
– The rules generally run in a specific order,
ensuring the most relevant transformations are
applied first.
Porter Stemmer: How it works?

• Resulting Stem:
– After applying all relevant rules, the remaining
word form is considered the "stem."
– It's worth noting that the resulting stem might
not always be an actual word found in the
dictionary.
Porter Stemmer
Spelling Errors

• Dictionary-Based Methods:
– Lookup: Compare each word in the text against a
comprehensive dictionary. Flag words not found
as potential errors.
– Suggestions: Offer a list of correctly spelled
words that are similar to the flagged word,
based on:
• Edit distance (number of changes needed to
transform one word into another)
• Phonetic similarity (how words sound alike)
Spelling Errors: Rule Based
• Common Errors:
– Identify typical spelling mistakes (e.g., "teh" for "the",
"accomodate" for "accommodate") and correct them
based on predefined rules.
• Grammar Rules:
– Detect errors that violate grammatical rules (e.g.,
subject-verb agreement, plural forms).
• Contextual Clues:
– Utilize surrounding words and sentence structure to
infer correct spellings (e.g., "I went to the stare"
might be corrected to "I went to the store").
Spelling Errors: Statistical
• N-gram Models:
– Analyze word patterns and probabilities of letter
sequences to identify unusual combinations that
might indicate errors.
• Machine Learning:
– Train algorithms on large text corpora to learn
patterns and relationships between words,
enabling error detection and correction.
Spelling Errors: Hybrid
• Combining Methods: Often, multiple techniques
are combined to enhance accuracy.
– Dictionary-based lookup for quick identification
of straightforward errors.
– Rule-based methods for specific language rules
and patterns.
– Statistical techniques for handling contextual
nuances and complex errors.
Minimum Edit Distance

• The minimum edit distance between two strings

refers to the smallest number of edit operations
needed to transform one string into the other.
• These edit operations typically involve:
– Insertion: Adding a character to the string.
– Deletion: Removing a character from the string.
– Substitution: Replacing one character with
another in the string.
Minimum Edit Distance

• By calculating the minimum edit distance, we

essentially measure the similarity between two strings
based on the minimal number of changes required to
make them identical.
• This concept has numerous applications in various
fields, including:
– Spell checking
– Grouping different forms of the same word together
– Machine translation
– DNA sequencing
Minimum Edit Distance

• Calculating the minimum edit distance can be done

through various algorithms, with the most common
being dynamic programming.
• This method uses a table to store the minimum edit
distances for all possible sub-sequences of the two
strings, allowing for efficient computation and
reducing redundant calculations.
Minimum Edit Distance

• Here's an example to illustrate the concept:

String 1: "cat"
String 2: "cart"
• To transform "cat" into "cart", we need only one
edit operation: inserting the character 'r'.
• Therefore, the minimum edit distance between
these two strings is 1.
Minimum Edit Distance
Human Morphological Processing

• Human morphological processing delves into the

fascinating inner workings of how our brains
analyze and understand the structure of words.
• It explores how we break down words into their
smaller meaningful units called morphemes and
combine them to create new or complex words.
Stages

• Morpheme identification: Recognizing morphemes

and their boundaries within a word. This stage
considers:
– Orthographic cues: Letter patterns and spelling
conventions (e.g., "-able" as a suffix).
– Morphological knowledge: Stored mental
dictionary of known morphemes and their
meanings.
– Contextual clues: Surrounding words and
sentence structure.
Stages

• Morpheme access:
– Retrieving the meaning and function of each
identified morpheme from the mental lexicon.
Stages

• Morpheme integration: Combining the meanings of

individual morphemes to form the overall meaning
of the complex word. This involves considering:
– Morpheme order: The order in which morphemes
are combined matters (e.g., "unbreakable" vs.
"break-up").
– Morpheme interactions: Some morphemes can
modify the meaning of others (e.g., "un-"
negates meaning).
Neurolinguistic Aspects
• Studies suggest specific brain regions are involved
in morphological processing, particularly in the left
hemisphere.
• Different areas handle different processing stages,
with some regions dedicated to morpheme
identification and others to meaning access and
integration.
Factors Influencing Processing
• Language complexity:
– Languages with richer morphology (more
complex words) might require more
sophisticated processing mechanisms.
• Frequency:
– More frequent words tend to be processed
faster and more efficiently.
• Individual differences:
– Age, education, and language familiarity can
affect processing speed and accuracy.
n-gram

• An n-gram is a contiguous sequence of n items from

a given sample of text or speech.
• These items can be characters, words, or other
units, depending on the context.
• N-grams are used in various natural language
processing (NLP) tasks, including language
modeling, machine translation, and text generation.
n-gram

• Words: In this case, an n-gram would be a sequence

of n consecutive words within a text.
– For example, "the quick brown fox" has 3-grams
like "the quick", "quick brown", and "brown fox".
• Letters: Here, an n-gram would be a sequence of n
consecutive letters.
– For example, "hello" has 2-grams like "he", "el",
and "ll".
n-gram

• Phonemes:
– These are the basic sound units in spoken
language. So, an n-gram would be a sequence of
n consecutive phonemes. For example, the word
"cat" has 3-grams like "/kæt/", "/æt/", and "/t/".
• Other elements:
– Depending on the application, n-grams can even
involve things like punctuation marks, syllables,
or base pairs in DNA sequences.
n-gram

• The value of n determines the type of n-gram:

– Unigrams: n = 1 (individual symbols like words or
letters)
– Bigrams: n = 2 (sequences of two adjacent
symbols)
– Trigrams: n = 3 (sequences of three adjacent
symbols)
– Higher-order n-grams: n > 3 (sequences of four
or more symbols)
n-gram
n-gram

• Here's an example using words:

• Original Text: "The cat sat on the mat."
– Unigrams: "The", "cat", "sat", "on", "the", "mat."
– Bigrams: "The cat", "cat sat", "sat on", "on the",
"the mat."
– Trigrams: "The cat sat", "cat sat on", "sat on the",
"on the mat."
– 4-grams: "The cat sat on", "cat sat on the", "sat
on the mat."
n-gram

• N-grams are commonly used in language modeling

to capture patterns and dependencies in a
sequence of words.
• They are also used in machine learning models for
tasks like text classification and sentiment analysis.
• By considering the context of surrounding words,
n-grams help capture the structure and meaning of
language in a more nuanced way than individual
words alone.
N-gram for spelling corrections

• Building an N-gram model for spelling corrections

involves using statistical information from a given
corpus to identify and correct spelling errors in
text.
N-gram for spelling corrections
N-gram for spelling corrections

• Corpus Collection:
– Collect a large corpus of text data. This corpus
should represent the language and context for
which you want to perform spelling corrections.
• Preprocessing:
– Clean and preprocess the corpus. Remove any
irrelevant characters, punctuation, and special
symbols.
– Convert all text to lowercase to ensure case-
insensitive matching.
N-gram for spelling corrections

• N-gram Extraction:
– Choose an appropriate value for 'n' (e.g.,
unigrams, bigrams, trigrams) based on the
context and the expected length of spelling
errors.
– Extract n-grams from the preprocessed corpus.
For each n-gram, keep track of its frequency in
the corpus.
N-gram for spelling corrections

• Building the Model:

– Create a model that stores the n-grams and their
frequencies.
– This model could be a dictionary, where the keys
are n-grams, and the values are their
corresponding frequencies.
N-gram for spelling corrections

• Identifying Spelling Errors:

– When a new text is input, break it into n-grams
using the same method applied to the training
corpus.
– Compare the n-grams from the input text with
the n-grams stored in your model.
– Identify n-grams in the input text that deviate
significantly from the expected frequencies
based on your model.
N-gram for spelling corrections

• Correction Suggestions:
– Suggest corrections based on the identified
errors. You can consider various methods, such
as:
• Recommending the most frequent n-gram in
the corpus for a given set of n-grams.
• Using edit distance metrics to find closest
matches.
• Implementing language-specific rules for
common misspellings.
N-gram for spelling corrections

• Implementation:
– Implement the spelling correction algorithm
using your N-gram model and suggestions.
N-gram for language model

• N-grams play a crucial role in building language

models, enabling them to predict and generate text
that resembles natural language.
N-gram for language model

• Collecting Data and Building N-gram Model:

– Corpus Selection: Gather a large corpus of text that
reflects the language you want to model (e.g., news
articles, books, social media posts).
– Tokenization: Break the text into individual words or
tokens.
– N-gram Generation: Extract N-grams of different
lengths (e.g., bigrams, trigrams, etc.) and count their
occurrences in the corpus.
– Storage: Store the N-gram counts in a data structure
like a dictionary or a trie for efficient retrieval.
N-gram for language model

• Language Modeling with N-grams:

– Probability Calculation: Estimate the probability of a
word or sequence of words occurring based on the
N-gram counts:
• P(w) = Number of times w appears in the corpus /
Total number of words in the corpus (unigram
probability)
• P(w_i | w_(i-1)) = Number of times w_i appears
after w_(i-1) / Number of times w_(i-1) appears
(bigram probability)
N-gram for language model

• Language Modeling with N-grams:

– Prediction: Use N-gram probabilities to predict
the next word in a sequence, given previous
words.
– Generation: Generate new text by repeatedly
choosing the most probable word based on the
previous words, forming coherent sentences.
N-gram for language model

• Smoothing:
– Addressing Zero Probabilities: Handle N-grams
not seen in the training corpus by using
smoothing techniques like:
• Laplace smoothing: Add a small constant to all
counts to avoid zero probabilities.
• Backoff smoothing: Fallback to lower-order N-
gram probabilities if higher-order counts are
zero.
• Spelling correction: Suggest corrections based on likely
word sequences.
• Machine translation: Help translate text by identifying
likely word combinations in the target language.
• Speech recognition: Improve accuracy by considering
word context and sequences.
• Text generation: Generate creative text formats like
poems, code, scripts, musical pieces, email, letters, etc.
• Question answering: Help answer questions based on
understanding of language patterns.
@mituskillologies

Thank you

Unit 2nd
No ratings yet
Unit 2nd
123 pages
2 Morphology and Finite-State Transducers
No ratings yet
2 Morphology and Finite-State Transducers
59 pages
Word-Level Processing in NLP Morphology
No ratings yet
Word-Level Processing in NLP Morphology
36 pages
M1.2 Morphology
No ratings yet
M1.2 Morphology
38 pages
Morphology: Dr. A. Kaliappan Asp / Soc Srmist
No ratings yet
Morphology: Dr. A. Kaliappan Asp / Soc Srmist
30 pages
Morphological Analysis and Cliticization
No ratings yet
Morphological Analysis and Cliticization
41 pages
Unit 2
No ratings yet
Unit 2
21 pages
Understanding Morphology in Language
No ratings yet
Understanding Morphology in Language
10 pages
Morphological Analysis of Words
No ratings yet
Morphological Analysis of Words
35 pages
Lexemes and Word Structure in NLP
No ratings yet
Lexemes and Word Structure in NLP
37 pages
NLP - Slides - Module2 - Slides 1 - 18
No ratings yet
NLP - Slides - Module2 - Slides 1 - 18
18 pages
Morphology and Finite-State Transducers
No ratings yet
Morphology and Finite-State Transducers
98 pages
Morphology NLP Notes
No ratings yet
Morphology NLP Notes
12 pages
Morphology and Finite-State Transducers
No ratings yet
Morphology and Finite-State Transducers
32 pages
Unit1 NLP Notes
No ratings yet
Unit1 NLP Notes
15 pages
NLP Slides Module2
No ratings yet
NLP Slides Module2
50 pages
NLP - Slides - Module2 - Slides 1 - 34
No ratings yet
NLP - Slides - Module2 - Slides 1 - 34
34 pages
NLP Slides Module2
No ratings yet
NLP Slides Module2
50 pages
Finite-State Morphological Parsing
No ratings yet
Finite-State Morphological Parsing
47 pages
Word Level Morphological Analysis in NLP
No ratings yet
Word Level Morphological Analysis in NLP
49 pages
Morphological and Lexical Analysis in NLP
No ratings yet
Morphological and Lexical Analysis in NLP
9 pages
Morphology and Finite-State Transducers
No ratings yet
Morphology and Finite-State Transducers
30 pages
Finite State Automata and Morphology Guide
No ratings yet
Finite State Automata and Morphology Guide
125 pages
Word Level Analysis in NLP
No ratings yet
Word Level Analysis in NLP
97 pages
Morphology
No ratings yet
Morphology
45 pages
Morphological Parsing with FSTs
No ratings yet
Morphological Parsing with FSTs
28 pages
Understanding Morphology in Linguistics
No ratings yet
Understanding Morphology in Linguistics
15 pages
Understanding Morphology Basics
No ratings yet
Understanding Morphology Basics
2 pages
W 916 W HCVNQVX 6 NDWQMge KAe 1 Q 2 Ua 88 Le 1 Q 2 Ua 88 L
No ratings yet
W 916 W HCVNQVX 6 NDWQMge KAe 1 Q 2 Ua 88 Le 1 Q 2 Ua 88 L
13 pages
Finite State Morphological Parsing Guide
No ratings yet
Finite State Morphological Parsing Guide
27 pages
NLP Morphology and Analysis Techniques
No ratings yet
NLP Morphology and Analysis Techniques
30 pages
NLP2 Lecture 5
No ratings yet
NLP2 Lecture 5
56 pages
Morphology Parsing in Natural Languages
No ratings yet
Morphology Parsing in Natural Languages
31 pages
Understanding Morphology in NLP
No ratings yet
Understanding Morphology in NLP
52 pages
Understanding English Morphology Basics
No ratings yet
Understanding English Morphology Basics
9 pages
Understanding Computational Morphology
100% (1)
Understanding Computational Morphology
12 pages
Morphology and Finite-State Transducers
No ratings yet
Morphology and Finite-State Transducers
34 pages
Understanding English Morphology and NLP
No ratings yet
Understanding English Morphology and NLP
12 pages
Natural Language Processing: Morphology
No ratings yet
Natural Language Processing: Morphology
27 pages
Finite-State Morphology in NLP
100% (1)
Finite-State Morphology in NLP
55 pages
Word Level Analysis and Morphology
No ratings yet
Word Level Analysis and Morphology
54 pages
Understanding Finite State Transducers
No ratings yet
Understanding Finite State Transducers
52 pages
Understanding Morphology in Linguistics
No ratings yet
Understanding Morphology in Linguistics
43 pages
Understanding English Morphology Basics
No ratings yet
Understanding English Morphology Basics
5 pages
Understanding Morphology: Word Structure
No ratings yet
Understanding Morphology: Word Structure
21 pages
Morphology: Building Blocks of Language
No ratings yet
Morphology: Building Blocks of Language
2 pages
Morphology
No ratings yet
Morphology
19 pages
Morphology and Finite State Transducers
No ratings yet
Morphology and Finite State Transducers
26 pages
Overview of Morphology in Linguistics
No ratings yet
Overview of Morphology in Linguistics
6 pages
Morphology Notes
No ratings yet
Morphology Notes
5 pages
Understanding Morphology in Linguistics
No ratings yet
Understanding Morphology in Linguistics
3 pages
Morphology Study Guide
No ratings yet
Morphology Study Guide
13 pages
Understanding Morphology and Word Formation
No ratings yet
Understanding Morphology and Word Formation
5 pages
AWN Chapter3 Routing Protocols (Network Layer)
No ratings yet
AWN Chapter3 Routing Protocols (Network Layer)
43 pages
CSC-501 TCS Notes Module 1
No ratings yet
CSC-501 TCS Notes Module 1
25 pages
Power and Limitation of TM
No ratings yet
Power and Limitation of TM
2 pages
Distributed Systems Lab Manual
No ratings yet
Distributed Systems Lab Manual
38 pages
Types of Queues in Data Structures
No ratings yet
Types of Queues in Data Structures
23 pages
Mumbai University Data Structures Solutions
No ratings yet
Mumbai University Data Structures Solutions
20 pages
Introduction to Automata Theory
100% (1)
Introduction to Automata Theory
26 pages
Updated 5th and 6th Sem 2021 Scheme and Syllabus
No ratings yet
Updated 5th and 6th Sem 2021 Scheme and Syllabus
71 pages
B.Tech CSE Study & Evaluation Scheme
No ratings yet
B.Tech CSE Study & Evaluation Scheme
50 pages
Automata Theory: DFA and NFA Explained
No ratings yet
Automata Theory: DFA and NFA Explained
16 pages
Understanding Pushdown Automata
No ratings yet
Understanding Pushdown Automata
8 pages
MCA Third Semester Syllabus Overview
No ratings yet
MCA Third Semester Syllabus Overview
31 pages
God of Fury: A Dark MM College Romance (Legacy of Gods Book 5) Rina Kent Ebook Optimized PDF Reading
100% (1)
God of Fury: A Dark MM College Romance (Legacy of Gods Book 5) Rina Kent Ebook Optimized PDF Reading
39 pages
Verilog Smart Parking System Design
No ratings yet
Verilog Smart Parking System Design
7 pages
Finite Automata: Language Acceptance and Translation
No ratings yet
Finite Automata: Language Acceptance and Translation
4 pages
Sequence Detector Design Guide
No ratings yet
Sequence Detector Design Guide
25 pages
Finite Automata Overview for CSC312
No ratings yet
Finite Automata Overview for CSC312
11 pages
Finite Automata and Regular Languages
No ratings yet
Finite Automata and Regular Languages
158 pages
Introduction to Theoretical Computer Science
No ratings yet
Introduction to Theoretical Computer Science
90 pages
JNTUK R23 B.Tech CSE Syllabus Overview
No ratings yet
JNTUK R23 B.Tech CSE Syllabus Overview
29 pages
Mealy Machine for Pattern Recognition
No ratings yet
Mealy Machine for Pattern Recognition
54 pages
BCA 2nd Semester Syllabus Overview
No ratings yet
BCA 2nd Semester Syllabus Overview
34 pages
Understanding State Machines and Diagrams
No ratings yet
Understanding State Machines and Diagrams
44 pages
Theory of Computation Exam Questions
No ratings yet
Theory of Computation Exam Questions
11 pages
Compiler Design Overview - MCS 232
No ratings yet
Compiler Design Overview - MCS 232
5 pages
Introduction to Theory of Computation
No ratings yet
Introduction to Theory of Computation
29 pages
JFLAP Experiments in Automata Theory
No ratings yet
JFLAP Experiments in Automata Theory
30 pages
Bit-Stream Detection System Design
No ratings yet
Bit-Stream Detection System Design
20 pages
Data Flow Diagram Essentials
No ratings yet
Data Flow Diagram Essentials
2 pages
Digital Logic Design Lab Manual
No ratings yet
Digital Logic Design Lab Manual
44 pages
M.Sc. Computer Science Syllabus 2020-21
No ratings yet
M.Sc. Computer Science Syllabus 2020-21
23 pages
Lucene Internals: Data Structures Explained
No ratings yet
Lucene Internals: Data Structures Explained
32 pages
UOttawa CSI 3104: Formal Languages
100% (1)
UOttawa CSI 3104: Formal Languages
42 pages
Sequential Logic Circuits Overview
No ratings yet
Sequential Logic Circuits Overview
129 pages
Structure Charts in Software Development
No ratings yet
Structure Charts in Software Development
11 pages
Theory of Computation Course Overview
No ratings yet
Theory of Computation Course Overview
2 pages

Understanding English Morphology

Uploaded by

Understanding English Morphology

Uploaded by

Word Level Processing

• In linguistics, morphology is the study of the

• The smallest units of meaning within a word.

• Deterministic FSTs (DFSTs):

• We need at least the following to build a morphological parser:

• Lexical entries: These represent individual morphemes

• The power of FSTs is that the exact same cascade

In computational linguistics, hyphenation is:

 Handled during tokenization and morphological analysis

FST-based analysers may include hyphen rules to preserve or normalize form.

In Natural Language Processing, detecting word breaks is part of tokenization.

For languages without spaces:

 Input: "我喜欢吃饭" (Chinese)

• Tokenization is the fundamental process of

• Improves computational efficiency:

• Facilitates feature extraction:

• Stemming refers to the process of transforming

• Improves performance of NLP tasks:

• Handling Exceptions and Ordering:

• The minimum edit distance between two strings

• By calculating the minimum edit distance, we

• Calculating the minimum edit distance can be done

• Here's an example to illustrate the concept:

• Human morphological processing delves into the

• Morpheme identification: Recognizing morphemes

• Morpheme integration: Combining the meanings of

• An n-gram is a contiguous sequence of n items from

• Words: In this case, an n-gram would be a sequence

• The value of n determines the type of n-gram:

• Here's an example using words:

• N-grams are commonly used in language modeling

• Building an N-gram model for spelling corrections

• Building the Model:

• Identifying Spelling Errors:

• N-grams play a crucial role in building language

• Collecting Data and Building N-gram Model:

• Language Modeling with N-grams:

• Language Modeling with N-grams:

You might also like