0% found this document useful (0 votes)
16 views100 pages

Understanding English Morphology

Uploaded by

Kranti Gajmal
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views100 pages

Understanding English Morphology

Uploaded by

Kranti Gajmal
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Word Level Processing

• In linguistics, morphology is the study of the


internal structure of words, focusing on how
smaller units of meaning, known as morphemes,
combine to form words.
• Think of it as dissecting words to understand
their building blocks and how they connect.
Morphemes

• The smallest units of meaning within a word.


Example: In "unbreakable," "un-" and "breakable"
are morphemes.
– Free vs. Bound:
• Free morphemes: Can stand alone as words
(e.g., "book," "run").
• Bound morphemes: Must attach to another
morpheme to form a word (e.g., "un-," "-able").
Morphemes : Types
• Prefixes:
– Added to the beginning of a word (e.g., "un-,"
"re-").
• Suffixes:
– Added to the end of a word (e.g., "-able," "-ly").
• Infixes:
– Added within a word stem (e.g., "s" in "sing-s").
• Roots:
– The core meaning-carrying morpheme of a word
(e.g., "break" in "unbreakable").
Morphological Processes
• Inflection: Modifying a word to express
grammatical information like tense, number, or
case (e.g., "sing," "sings," "sung").
• Derivation: Creating new words from existing ones
by adding affixes (e.g., "happy" -> "unhappy").
• Compounding: Combining two or more words to
form a new word (e.g., "blackboard," "sunflower").
Morphology Types: Process

• Inflection:
– Modifying a word to express grammatical
information like tense, number, case, or
mood. Examples: "sing," "sings," "sung,"
"book," "books."
• Derivation:
– Creating new words from existing ones by
adding affixes (prefixes, suffixes, infixes).
Examples: "happy" -> "unhappy," "teach" ->
"teacher," "run" -> "running."
Morphology Types: Process

• Compounding:
– Combining two or more words to form a new
word. Examples: "blackboard," "sunflower,"
"bookstore."
• Conversion:
– Changing the part of speech of a word
without adding affixes. Examples: "run"
(verb) -> "run" (noun), "fast" (adjective) ->
"fast" (adverb).
Morphology Types: Affix

• Prefixation:
– Adding an affix at the beginning of a word.
Examples: "un-," "re-," "non-."
• Suffixation:
– Adding an affix at the end of a word.
Examples: "-able," "-ly," "-ness."
Morphology Types: Affix

• Infixation:
– Adding an affix within a word stem.
Examples: "s" in "sing-s," "umlaut" in German.
• Circumfixation:
– Adding affixes both at the beginning and end
of a word. Examples: "ge-" and "-t" in German
"gearbeitet" (worked).
Morphology Types: Word Form

• Agglutinative:
– Words built by adding single, transparent
morphemes ("glue") with clear meanings.
Examples: Turkish, Finnish.
• Fusional:
– Multiple grammatical features combined
within a single complex morpheme, making
analysis more intricate. Examples: Latin,
Sanskrit.
Morphology Types: Word Form

• Ablaut:
– Internal vowel changes to alter word
meaning or grammatical information.
Examples: Arabic, Old English.
• Reduplication:
– Repeating part of a word for emphasis or
grammatical function. Examples: Malay,
Tagalog.
Morphology Types: Inflection

• English:
– Primarily uses suffixes to mark grammatical
information like tense, number, and case.
Examples: "sing," "sings," "sung," "book,"
"books."
Morphology Types: Inflection

• Agglutinative:
– Many languages like Hindi, Tamil, and Kannada use
"gluing" suffixes to build up complex words with
specific meanings. Example: "kitaab" (book) + "on"
(plural) = "kitaabon" (books).
• Fusional:
– Some languages like Marathi and Malayalam
combine multiple grammatical features within a
single suffix, making analysis more complex.
Example: "chala" (he went) encompasses both past
tense and singular person.
Morphology Types: Derivation

• English:
– Primarily uses prefixes and suffixes to change
the meaning or part of speech of a word.
Examples: "unhappy," "playable,"
"conversion."
Morphology Types: Derivation

• Indian Languages:
– Reduplication: Many languages like Telugu and
Oriya repeat parts of words for emphasis or to
denote grammatical changes. Example: "chala-
chala" (going repeatedly).
– Internal changes: Some languages like Punjabi
alter vowel sounds or consonants within words
for derivation. Example: "padhna" (to read) vs.
"parh" (reading).
Morphology Types: Compounding

• English:
– Combines two or more words to form new
ones. Examples: "blackboard," "sunflower,"
"bookstore."
Morphology Types: Compounding

• Indian Languages:
– Tātkriya: Certain Indian languages like Sanskrit
and Hindi form compound verbs by linking nouns
or adjectives with verbs. Example: "jal-pīna" (to
drink water) from "jal" (water) and "pīna" (to
drink).
– Bahuvrīhi: Combining nouns creates new nouns
with descriptive meanings. Example: "kamal-
phool" (lotus) from "kamal" (lotus) and "phool"
(flower).
Finite State Transducer
• A Finite State Transducer (FST) is a computational
model used in various areas like Natural Language
Processing (NLP) and formal language theory.
• It's essentially a state machine that takes an input
sequence (usually text or strings) and produces an
output sequence based on its internal rules and
transitions.
Components
• States:
– The machine exists in specific states at any given
time, representing its processing point.
• Transitions:
– Connections between states labeled with
input/output pairs. These pairs define how the
machine moves between states and what it
generates on each transition.
• Start and End states:
– Special states marking the beginning and end of the
processing sequence.
• An FST starts in the specified start state and reads
the input sequence one symbol at a time.
• Based on the current state and the read symbol, it
checks the defined transitions and moves to the
next state while generating the corresponding
output symbol.
• This process continues until the machine reaches
the end state and the entire input sequence is
processed.
Types

• Deterministic FSTs (DFSTs):


– For each state and input symbol, there's only one
possible transition and output. This offers
efficient and predictable processing.
• Non-deterministic FSTs (NFSTs):
– Allow multiple transitions and outputs from a
single state for a given input symbol. This
provides flexibility for handling ambiguity and
complex rules.
Examples
Applications

• Morphological parsing:
– Analyzes the structure of words into their
meaningful components (morphemes).
• Text-to-speech synthesis:
– Converts text into spoken language, accounting
for pronunciation rules and intonations.
Applications

• Machine translation:
– Translates text from one language to another,
considering grammatical and semantic
differences.
• Spell checking and correction:
– Identifies and corrects misspelled words based
on known patterns and rules.
Parsing
Parsing

• We need at least the following to build a morphological parser:


– Lexicon: the list of stems and affixes, together with basic
information about them (Noun stem or Verb stem, etc.)
– Morphotactics: the model of morpheme ordering that
explains which classes of morphemes can follow other
classes of morphemes inside a word. E.g., the rule that
English plural morpheme follows the noun rather than
preceding it.
– Orthographic rules: these spelling rules are used to model
the changes that occur in a word, usually when two
morphemes combine (e.g., the y→ie spelling rule changes
city + -s to cities). Morphology and FSTs
Finite-State Lexicon
• A finite-state lexicon is a powerful tool in Natural
Language Processing (NLP) for representing and
analyzing the morphological structure of words.
• It leverages finite-state transducers (FSTs) to model
the processes by which morphemes (meaningful
units) combine to form words in a specific
language.
Finite-State Lexicon: Components

• Lexical entries: These represent individual morphemes


and their properties like meaning, part-of-speech, and
possible combinations with other morphemes.
• FST network: This network of interconnected states
and transitions encodes the morphological rules
defining how morphemes interact and combine to
form words.
• Input and output: The lexicon takes a word as input
and uses the FST network to analyze its morphemic
composition and generate an output representing its
morphological structure.
Combining FST Lexicon and Rules
Combining FST Lexicon and Rules

• The power of FSTs is that the exact same cascade


with the same state sequences is used
– when machine is generating the surface form
from the lexical tape, or
– When it is parsing the lexical tape from the
surface tape.
• Parsing can be slightly more complicated than
generation, because of the problem of ambiguity.
– For example, foxes could be fox +V +3SG as
well as fox +N +PL Morphology and FSTs
• Orthographic Rules: The Foundation of Written
Language
– Definition: Orthographic rules govern the
consistent and correct spelling, punctuation, and
formatting of written text within a language.
– Purpose: Ensure clarity, readability, and
adherence to conventions, facilitating effective
communication and understanding.
Orthographic Rules: Key Components

• Spelling:
– Guides the correct formation of words based on
accepted patterns and conventions.
– Examples: "receive" not "recieve," "necessary"
not "neccessary."
• Capitalization:
– Specifies when to use uppercase letters, such as
at the beginning of sentences and for proper
nouns.
Orthographic Rules: Importance

• Punctuation:
– Dictates the use of commas, periods, semicolons,
question marks, and other symbols to clarify
meaning and structure sentences.
– • Clarify Sentence Structure
– Let’s eat, grandma. vs Let’s eat grandma.
– • Indicate Sentence Type
– Statement → She is here.
– Question → Is she here?
– Exclamation → She is here!
Orthographic Rules: Importance
– • Mark Possession or Omission
– It’s raining. → It is
– John’s hat → hat belonging to John
– • Separate Clauses
– When I arrived, he was leaving.
– I wanted to go; however, I stayed.
• Hyphenation:
– Determines when to break words across lines for
visual clarity and readability.
– A high-speed chase" vs "The chase was high
speed.
Orthographic Rules: Importance
– re-enter (not reenter), anti-inflammatory, ex-president
Hyphenation in Morphology and NLP

In computational linguistics, hyphenation is:

 Handled during tokenization and morphological analysis


 Sometimes ambiguous (e.g., "re-cover" vs "recover")

FST-based analysers may include hyphen rules to preserve or normalize form.

Rule Example
Use for compound adjectives before nouns "long-term plan"
Avoid after -ly adverbs "highly skilled worker" (no hyphen)
Use in ages as adjectives "a five-year-old child"
Don’t hyphenate familiar compound nouns "school bus", "software engineer"
Avoid multiple hyphens when not needed Prefer “nonlinear” over “non-linear” if style allows

• Word Breaks:
– Guides how to divide text into individual words,
especially in languages without clear word
Orthographic Rules: Importance
boundaries (e.g., Chinese, Japanese).
– Word breaks are points where one word ends
and another begins, or where a word can be split
(typically at the end of a line) in writing or type-
setting.
Word Breaks in NLP and Morphology

In Natural Language Processing, detecting word breaks is part of tokenization.

Examples:

 Input: "foxes"
 Output Tokens: fox + plural suffix -es (morphological break)

For languages without spaces:

 Input: "我喜欢吃饭" (Chinese)


 Output Tokens: 我 | 喜欢 | 吃饭 ("I | like | eating")
Orthographic Rules: Importance

• Text Preprocessing:
• Orthographic rules are essential for cleaning and
normalizing text data before further analysis.
• This includes tasks like:
– Correcting misspellings
– Converting text to lowercase or uppercase
– Removing extra spaces or punctuation
– Resolving ambiguities in word boundaries
Orthographic Rules: Importance

• Text Segmentation:
– Acquiring accurate segmentation into words,
sentences, or paragraphs relies on orthographic
rules.
• Lexical Analysis:
– Identifying and understanding individual words
depends on correct spelling and word formation
rules.
• Grammar Checking:
– Detecting grammatical errors often involves
recognizing violations of orthographic conventions.
Orthographic Rules: Importance

• Machine Translation:
– Generating grammatically correct and well-
formatted output in the target language
requires adherence to orthographic rules.
• Text-to-Speech Systems:
– Producing natural-sounding speech relies on
appropriate punctuation and pronunciation of
written text, guided by orthographic principles.
Tokenization

• Tokenization is the fundamental process of


splitting textual data into smaller, more
manageable units known as tokens.
• These tokens can be words, characters, sentences,
or any other meaningful element, depending on the
specific task and desired level of granularity.
Tokenization
Tokenization : Why?

• Improves computational efficiency:


– Breaking down larger text into smaller units
makes it easier and faster for NLP algorithms to
analyze and process the data.
• Reduces ambiguity:
– Tokenization can help clarify context and
eliminate ambiguities present in continuous text.
Tokenization : Why?

• Facilitates feature extraction:


– Tokens serve as the building blocks for various
NLP features like n-grams, word embeddings,
and part-of-speech tags.
• Prepares data for downstream tasks:
– Tokenized data is the initial input for many NLP
tasks such as machine translation, sentiment
analysis, text summarization, and question
answering.
Tokenization : Types
• Word tokenization: The most common type, splitting
text into individual words.
• Sentence tokenization: Divides text into individual
sentences.
• Character tokenization: Breaks down text into
individual characters, useful for certain language
models and analyses.
• Subword tokenization: Splits words into smaller
meaningful units like prefixes, suffixes, or
morphemes, particularly helpful for languages with
complex morphology or out-of-vocabulary words.
Stemming

• Stemming refers to the process of transforming


words to their stems, which are the base forms that
carry the core meaning.
• It essentially aims to reduce words to their most
basic forms by removing prefixes, suffixes, or
inflections, while still preserving their inherent
meaning.
Stemming
Stemming
Stemming: Why?

• Improves performance of NLP tasks:


– By reducing word variations, stemming can
improve the accuracy and efficiency of
algorithms used for tasks like text classification,
information retrieval, and sentiment analysis.
• Reduces data sparsity:
– When dealing with massive amounts of text
data, stemming can help overcome the issue of
data sparsity by grouping words with the same
stem, leading to more robust statistical models.
Stemming: Types
• Rule-based stemming:
– Algorithms rely on handcrafted rules to identify
and remove affixes based on specific patterns.
Examples include Porter Stemmer and Lancaster
Stemmer.
• Statistical stemming:
– Algorithms utilize statistical models to determine
the most likely stem based on observed word
frequencies and morphological patterns. Examples
include Snowball Stemmer and Krovetz Stemmer.
Stemming: Applications

• Search engines:
– Stemming can help search engines match queries
to relevant documents even if the exact search
terms are not present.
• Document clustering:
– Stemming can group similar documents together
by identifying their shared semantic core.
Stemming: Applications

• Spam filtering:
– Stemming can identify patterns in spam
messages by stripping away variations of
common spam keywords.
• Machine translation:
– Stemming can improve the accuracy of machine
translation by reducing word variations and
focusing on semantic similarities.
• The Porter Stemmer was developed by Martin
Porter in 1980 and is one of the most popular
stemming algorithms.
• It operates on the English language and goes
through a series of rules to strip off common
suffixes from words.
• The rules are designed to be simple and heuristic-
based, making the algorithm efficient and easy to
implement.
Porter Stemmer: How it works?

• Identifying Suffixes:
– The algorithm first classifies each character in a
word as either a consonant or a vowel.
– Successive consonants and vowels are then
grouped together, forming sequences like "CV"
or "CCCVVV".
– This creates a pattern representing the word's
structure.
Porter Stemmer: How it works?

• Applying Rules:
– The stemmer then iterates through a set of pre-
defined rules, each targeting specific suffix
patterns.
– For example, a rule might say "remove '-ing' if
preceded by two vowels."
– If a matching rule is found, the corresponding
suffix is removed from the word.
Porter Stemmer: How it works?

• Handling Exceptions and Ordering:


– The algorithm includes rules for handling
irregular cases and ensuring consistent
application.
– For example, there might be specific rules for
suffixes like "-ize" or "-ize" depending on the
context.
– The rules generally run in a specific order,
ensuring the most relevant transformations are
applied first.
Porter Stemmer: How it works?

• Resulting Stem:
– After applying all relevant rules, the remaining
word form is considered the "stem."
– It's worth noting that the resulting stem might
not always be an actual word found in the
dictionary.
Porter Stemmer
Spelling Errors

• Dictionary-Based Methods:
– Lookup: Compare each word in the text against a
comprehensive dictionary. Flag words not found
as potential errors.
– Suggestions: Offer a list of correctly spelled
words that are similar to the flagged word,
based on:
• Edit distance (number of changes needed to
transform one word into another)
• Phonetic similarity (how words sound alike)
Spelling Errors: Rule Based
• Common Errors:
– Identify typical spelling mistakes (e.g., "teh" for "the",
"accomodate" for "accommodate") and correct them
based on predefined rules.
• Grammar Rules:
– Detect errors that violate grammatical rules (e.g.,
subject-verb agreement, plural forms).
• Contextual Clues:
– Utilize surrounding words and sentence structure to
infer correct spellings (e.g., "I went to the stare"
might be corrected to "I went to the store").
Spelling Errors: Statistical
• N-gram Models:
– Analyze word patterns and probabilities of letter
sequences to identify unusual combinations that
might indicate errors.
• Machine Learning:
– Train algorithms on large text corpora to learn
patterns and relationships between words,
enabling error detection and correction.
Spelling Errors: Hybrid
• Combining Methods: Often, multiple techniques
are combined to enhance accuracy.
– Dictionary-based lookup for quick identification
of straightforward errors.
– Rule-based methods for specific language rules
and patterns.
– Statistical techniques for handling contextual
nuances and complex errors.
Minimum Edit Distance

• The minimum edit distance between two strings


refers to the smallest number of edit operations
needed to transform one string into the other.
• These edit operations typically involve:
– Insertion: Adding a character to the string.
– Deletion: Removing a character from the string.
– Substitution: Replacing one character with
another in the string.
Minimum Edit Distance

• By calculating the minimum edit distance, we


essentially measure the similarity between two strings
based on the minimal number of changes required to
make them identical.
• This concept has numerous applications in various
fields, including:
– Spell checking
– Grouping different forms of the same word together
– Machine translation
– DNA sequencing
Minimum Edit Distance

• Calculating the minimum edit distance can be done


through various algorithms, with the most common
being dynamic programming.
• This method uses a table to store the minimum edit
distances for all possible sub-sequences of the two
strings, allowing for efficient computation and
reducing redundant calculations.
Minimum Edit Distance

• Here's an example to illustrate the concept:


String 1: "cat"
String 2: "cart"
• To transform "cat" into "cart", we need only one
edit operation: inserting the character 'r'.
• Therefore, the minimum edit distance between
these two strings is 1.
Minimum Edit Distance
Human Morphological Processing

• Human morphological processing delves into the


fascinating inner workings of how our brains
analyze and understand the structure of words.
• It explores how we break down words into their
smaller meaningful units called morphemes and
combine them to create new or complex words.
Stages

• Morpheme identification: Recognizing morphemes


and their boundaries within a word. This stage
considers:
– Orthographic cues: Letter patterns and spelling
conventions (e.g., "-able" as a suffix).
– Morphological knowledge: Stored mental
dictionary of known morphemes and their
meanings.
– Contextual clues: Surrounding words and
sentence structure.
Stages

• Morpheme access:
– Retrieving the meaning and function of each
identified morpheme from the mental lexicon.
Stages

• Morpheme integration: Combining the meanings of


individual morphemes to form the overall meaning
of the complex word. This involves considering:
– Morpheme order: The order in which morphemes
are combined matters (e.g., "unbreakable" vs.
"break-up").
– Morpheme interactions: Some morphemes can
modify the meaning of others (e.g., "un-"
negates meaning).
Neurolinguistic Aspects
• Studies suggest specific brain regions are involved
in morphological processing, particularly in the left
hemisphere.
• Different areas handle different processing stages,
with some regions dedicated to morpheme
identification and others to meaning access and
integration.
Factors Influencing Processing
• Language complexity:
– Languages with richer morphology (more
complex words) might require more
sophisticated processing mechanisms.
• Frequency:
– More frequent words tend to be processed
faster and more efficiently.
• Individual differences:
– Age, education, and language familiarity can
affect processing speed and accuracy.
n-gram

• An n-gram is a contiguous sequence of n items from


a given sample of text or speech.
• These items can be characters, words, or other
units, depending on the context.
• N-grams are used in various natural language
processing (NLP) tasks, including language
modeling, machine translation, and text generation.
n-gram

• Words: In this case, an n-gram would be a sequence


of n consecutive words within a text.
– For example, "the quick brown fox" has 3-grams
like "the quick", "quick brown", and "brown fox".
• Letters: Here, an n-gram would be a sequence of n
consecutive letters.
– For example, "hello" has 2-grams like "he", "el",
and "ll".
n-gram

• Phonemes:
– These are the basic sound units in spoken
language. So, an n-gram would be a sequence of
n consecutive phonemes. For example, the word
"cat" has 3-grams like "/kæt/", "/æt/", and "/t/".
• Other elements:
– Depending on the application, n-grams can even
involve things like punctuation marks, syllables,
or base pairs in DNA sequences.
n-gram

• The value of n determines the type of n-gram:


– Unigrams: n = 1 (individual symbols like words or
letters)
– Bigrams: n = 2 (sequences of two adjacent
symbols)
– Trigrams: n = 3 (sequences of three adjacent
symbols)
– Higher-order n-grams: n > 3 (sequences of four
or more symbols)
n-gram
n-gram

• Here's an example using words:


• Original Text: "The cat sat on the mat."
– Unigrams: "The", "cat", "sat", "on", "the", "mat."
– Bigrams: "The cat", "cat sat", "sat on", "on the",
"the mat."
– Trigrams: "The cat sat", "cat sat on", "sat on the",
"on the mat."
– 4-grams: "The cat sat on", "cat sat on the", "sat
on the mat."
n-gram

• N-grams are commonly used in language modeling


to capture patterns and dependencies in a
sequence of words.
• They are also used in machine learning models for
tasks like text classification and sentiment analysis.
• By considering the context of surrounding words,
n-grams help capture the structure and meaning of
language in a more nuanced way than individual
words alone.
N-gram for spelling corrections

• Building an N-gram model for spelling corrections


involves using statistical information from a given
corpus to identify and correct spelling errors in
text.
N-gram for spelling corrections
N-gram for spelling corrections

• Corpus Collection:
– Collect a large corpus of text data. This corpus
should represent the language and context for
which you want to perform spelling corrections.
• Preprocessing:
– Clean and preprocess the corpus. Remove any
irrelevant characters, punctuation, and special
symbols.
– Convert all text to lowercase to ensure case-
insensitive matching.
N-gram for spelling corrections

• N-gram Extraction:
– Choose an appropriate value for 'n' (e.g.,
unigrams, bigrams, trigrams) based on the
context and the expected length of spelling
errors.
– Extract n-grams from the preprocessed corpus.
For each n-gram, keep track of its frequency in
the corpus.
N-gram for spelling corrections

• Building the Model:


– Create a model that stores the n-grams and their
frequencies.
– This model could be a dictionary, where the keys
are n-grams, and the values are their
corresponding frequencies.
N-gram for spelling corrections

• Identifying Spelling Errors:


– When a new text is input, break it into n-grams
using the same method applied to the training
corpus.
– Compare the n-grams from the input text with
the n-grams stored in your model.
– Identify n-grams in the input text that deviate
significantly from the expected frequencies
based on your model.
N-gram for spelling corrections

• Correction Suggestions:
– Suggest corrections based on the identified
errors. You can consider various methods, such
as:
• Recommending the most frequent n-gram in
the corpus for a given set of n-grams.
• Using edit distance metrics to find closest
matches.
• Implementing language-specific rules for
common misspellings.
N-gram for spelling corrections

• Implementation:
– Implement the spelling correction algorithm
using your N-gram model and suggestions.
N-gram for language model

• N-grams play a crucial role in building language


models, enabling them to predict and generate text
that resembles natural language.
N-gram for language model

• Collecting Data and Building N-gram Model:


– Corpus Selection: Gather a large corpus of text that
reflects the language you want to model (e.g., news
articles, books, social media posts).
– Tokenization: Break the text into individual words or
tokens.
– N-gram Generation: Extract N-grams of different
lengths (e.g., bigrams, trigrams, etc.) and count their
occurrences in the corpus.
– Storage: Store the N-gram counts in a data structure
like a dictionary or a trie for efficient retrieval.
N-gram for language model

• Language Modeling with N-grams:


– Probability Calculation: Estimate the probability of a
word or sequence of words occurring based on the
N-gram counts:
• P(w) = Number of times w appears in the corpus /
Total number of words in the corpus (unigram
probability)
• P(w_i | w_(i-1)) = Number of times w_i appears
after w_(i-1) / Number of times w_(i-1) appears
(bigram probability)
N-gram for language model

• Language Modeling with N-grams:


– Prediction: Use N-gram probabilities to predict
the next word in a sequence, given previous
words.
– Generation: Generate new text by repeatedly
choosing the most probable word based on the
previous words, forming coherent sentences.
N-gram for language model

• Smoothing:
– Addressing Zero Probabilities: Handle N-grams
not seen in the training corpus by using
smoothing techniques like:
• Laplace smoothing: Add a small constant to all
counts to avoid zero probabilities.
• Backoff smoothing: Fallback to lower-order N-
gram probabilities if higher-order counts are
zero.
• Spelling correction: Suggest corrections based on likely
word sequences.
• Machine translation: Help translate text by identifying
likely word combinations in the target language.
• Speech recognition: Improve accuracy by considering
word context and sequences.
• Text generation: Generate creative text formats like
poems, code, scripts, musical pieces, email, letters, etc.
• Question answering: Help answer questions based on
understanding of language patterns.
@mituskillologies

Thank you

You might also like