Generative AI is a branch of AI that enables software applications to
generate new content; often natural language dialogs, but also images,
video, code, and other formats.
The ability to generate content is based on a language model, which
has been trained with huge volumes of data - often documents from
the Internet or other public sources of information.
Generative AI models encapsulate semantic relationships between
language elements (that's a fancy way of saying that the models
"know" how words relate to one another), and that's what enables
them to generate a meaningful sequence of text.
There are large language models (LLMs) and small language
models (SLMs) - the difference is based on the volume of data and the
number of variables in the model. LLMs are very powerful and
generalize well, but can be more costly to train and use. SLMs tend to
work well in scenarios that are more focused on specific topic areas,
and usually cost less.
Generative AI scenarios
Common uses of generative AI include:
Implementing chatbots and AI agents that assist human users.
Creating new documents or other content (often as a starting point for
further iterative development)
Automated translation of text between languages.
Summarizing or explaining complex documents.
Key points to understand about computer vision include:
Computer vision is accomplished by using large numbers of images to
train a model.
Image classification is a form of computer vision in which a model is
trained with images that are labeled with the main subject of the
image (in other words, what it's an image of) so that it can analyze
unlabeled images and predict the most appropriate label - identifying
the subject of the image.
Object detection is a form of computer vision in which the model is
trained to identify the location of specific objects in an image.
There are more advanced forms of computer vision - for
example, semantic segmentation is an advanced form of object
detection where, rather than indicate an object's location by drawing a
box around it, the model can identify the individual pixels in the image
that belong to a particular object.
You can combine computer vision and language models to create
a multi-modal model that combines computer vision and generative AI
capabilities.
Computer vision scenarios
Common uses of computer vision include:
Auto-captioning or tag-generation for photographs.
Visual search.
Monitoring stock levels or identifying items for checkout in retail
scenarios.
Security video monitoring.
Authentication through facial recognition.
Robotics and self-driving vehicles.
Key points to understand about speech include:
Speech recognition is the ability of AI to "hear" and interpret speech.
Usually this capability takes the form of speech-to-text (where the
audio signal for the speech is transcribed into text).
Speech synthesis is the ability of AI to vocalize words as spoken
language. Usually this capability takes the form of text-to-speech in
which information in text format is converted into an audible signal.
AI speech technology is evolving rapidly to handle challenges like
ignoring background noise, detecting interruptions, and generating
increasingly expressive and human-like voices.
AI speech scenarios
Common uses of AI speech technologies include:
Personal AI assistants in phones, computers, or household devices with
which you interact by talking.
Automated transcription of calls or meetings.
Automating audio descriptions of video or text.
Automated speech translation between languages.
Key points to understand about natural language processing (NLP) include:
NLP capabilities are based on models that are trained to do particular
types of text analysis.
While many natural language processing scenarios are handled by
generative AI models today, there are many common text analytics
use cases where simpler NLP language models can be more cost-
effective.
Common NLP tasks include:
o Entity extraction - identifying mentions of entities like people,
places, organizations in a document
o Text classification - assigning document to a specific category.
o Sentiment analysis - determining whether a body of text is
positive, negative, or neutral and inferring opinions.
o Language detection - identifying the language in which text is
written.
Note
In this module, we've used the term natural language processing (NLP) to
describe AI capabilities that derive meaning from "ordinary" human
language. You might also see this area of AI referred to as natural language
understanding (NLU).
Natural language processing scenarios
Common uses of NLP technologies include:
Analyzing document or transcripts of calls and meetings to determine
key subjects and identify specific mentions of people, places,
organizations, products, or other entities.
Analyzing social media posts, product reviews, or articles to evaluate
sentiment and opinion.
Implementing chatbots that can answer frequently asked questions or
orchestrate predictable conversational dialogs that don't require the
complexity of generative AI.
Key points to understand about using AI to extract data and insights include:
The basis for most document analysis solutions is a computer vision
technology called optical character recognition (OCR).
While an OCR model can identify the location of text in an image, more
advanced models can also interpret individual values in the document -
and so extract specific fields.
While most data extraction models have historically focused on
extracting fields from text-based forms, more advanced models that
can extract information from audio recording, images, and videos are
becoming more readily available.
Data and insight extraction scenarios
Common uses of AI to extract data and insights include:
Automated processing of forms and other documents in a business
process - for example, processing an expense claim.
Large-scale digitization of data from paper forms. For example,
scanning and archiving census records.
Indexing documents for search.
Identifying key points and follow-up actions from meeting transcripts or
recordings.
Key points to understand about responsible AI include:
Fairness: AI models are trained using data, which is generally sourced
and selected by humans. There's substantial risk that the data
selection criteria, or the data itself reflects unconscious bias that may
cause a model to produce discriminatory outputs. AI developers need
to take care to minimize bias in training data and test AI systems for
fairness.
Reliability and safety: AI is based on probabilistic models, it is not
infallible. AI-powered applications need to take this into account and
mitigate risks accordingly.
Privacy and security: Models are trained using data, which may
include personal information. AI developers have a responsibility to
ensure that the training data is kept secure, and that the trained
models themselves can't be used to reveal private personal or
organizational details.
Inclusiveness: The potential of AI to improve lives and drive success
should be open to everyone. AI developers should strive to ensure that
their solutions don't exclude some users.
Transparency: AI can sometimes seem like "magic", but it's important
to make users aware of how the system works and any potential
limitations it may have.
Accountability: Ultimately, the people and organizations that develop
and distribute AI solutions are accountable for their actions. It's
important for organizations developing AI models and applications to
define and apply a framework of governance to help ensure that they
apply responsible AI principles to their work.
Responsible AI examples
Some example of scenarios where responsible AI practices should be applied
include:
An AI-powered college admissions system should be tested to ensure it
evaluates all applications fairly, taking into account relevant academic
criteria but avoiding unfounded discrimination based on irrelevant
demographic factors.
An AI-powered robotic solution that uses computer vision to detect
objects should avoid unintentional harm or damage. One way to
accomplish this goal is to use probability values to determine
"confidence" in object identification before interacting with physical
objects, and avoid any action if the confidence level is below a specific
threshold.
A facial identification system used in an airport or other secure area
should delete personal images that are used for temporary access as
soon as they're no longer required. Additionally, safeguards should
prevent the images being made accessible to operators or users who
have no need to view them.
A web-based chatbot that offers speech-based interaction should also
generate text captions to avoid making the system unusable for users
with a hearing impairment.
A bank that uses an AI-based loan-approval application should disclose
the use of AI, and describe features of the data on which it was trained
(without revealing confidential information).
Machine learning is in many ways the intersection of two disciplines -
data science and software engineering. The goal of machine learning is
to use data to create a predictive model that can be incorporated into
a software application or service. To achieve this goal requires
collaboration between data scientists who explore and prepare the
data before using it to train a machine learning model, and software
developers who integrate the models into applications where they're
used to predict new data values (a process known as inferencing).
In this module, you'll explore some of the core concepts on which
machine learning is based, learn how to identify different kinds of
machine learning models, and examine the ways in which machine
learning models are trained and evaluated. Finally, you'll learn how to
use Microsoft Azure Machine Learning to train and deploy a machine
learning model, without needing to write any code.
Note
Machine learning is based on mathematical and statistical techniques,
some of which are described at a high level in this module. Don't worry
if you're not a mathematical expert though! The goal of the module is
to help you gain an intuition of how machine learning works - we'll
keep the mathematics to the minimum required to understand the
core concepts.
What is machine learning?
Completed100 XP
5 minutes
Machine learning has its origins in statistics and mathematical modeling of
data. The fundamental idea of machine learning is to use data from past
observations to predict unknown outcomes or values. For example:
The proprietor of an ice cream store might use an app that combines
historical sales and weather records to predict how many ice creams
they're likely to sell on a given day, based on the weather forecast.
A doctor might use clinical data from past patients to run automated
tests that predict whether a new patient is at risk from diabetes based
on factors like weight, blood glucose level, and other measurements.
A researcher in the Antarctic might use past observations to automate
the identification of different penguin species (such as Adelie, Gentoo,
or Chinstrap) based on measurements of a bird's flippers, bill, and
other physical attributes.
Machine learning as a function
Because machine learning is based on mathematics and statistics, it's
common to think about machine learning models in mathematical terms.
Fundamentally, a machine learning model is a software application that
encapsulates a function to calculate an output value based on one or more
input values. The process of defining that function is known as training. After
the function has been defined, you can use it to predict new values in a
process called inferencing.
Let's explore the steps involved in training and inferencing.
1. The training data consists of past observations. In most cases, the
observations include the observed attributes or features of the thing
being observed, and the known value of the thing you want to train a
model to predict (known as the label).
In mathematical terms, you'll often see the features referred to using the
shorthand variable name x, and the label referred to as y. Usually, an
observation consists of multiple feature values, so x is actually a vector (an
array with multiple values), like this: [x1,x2,x3,...].
To make this clearer, let's consider the examples described previously:
In the ice cream sales scenario, our goal is to train a model that
can predict the number of ice cream sales based on the weather.
The weather measurements for the day (temperature, rainfall,
windspeed, and so on) would be the features (x), and the
number of ice creams sold on each day would be the label (y).
In the medical scenario, the goal is to predict whether or not a
patient is at risk of diabetes based on their clinical
measurements. The patient's measurements (weight, blood
glucose level, and so on) are the features (x), and the likelihood
of diabetes (for example, 1 for at risk, 0 for not at risk) is
the label (y).
In the Antarctic research scenario, we want to predict the species
of a penguin based on its physical attributes. The key
measurements of the penguin (length of its flippers, width of its
bill, and so on) are the features (x), and the species (for
example, 0 for Adelie, 1 for Gentoo, or 2 for Chinstrap) is
the label (y).
2. An algorithm is applied to the data to try to determine a relationship
between the features and the label, and generalize that relationship as
a calculation that can be performed on x to calculate y. The specific
algorithm used depends on the kind of predictive problem you're trying
to solve (more about this later), but the basic principle is to try
to fit the data to a function in which the values of the features can be
used to calculate the label.
3. The result of the algorithm is a model that encapsulates the calculation
derived by the algorithm as a function - let's call it f. In mathematical
notation:
y = f(x)
4. Now that the training phase is complete, the trained model can be
used for inferencing. The model is essentially a software program that
encapsulates the function produced by the training process. You can
input a set of feature values, and receive as an output a prediction of
the corresponding label. Because the output from the model is a
prediction that was calculated by the function, and not an observed
value, you'll often see the output from the function shown as ŷ (which
is rather delightfully verbalized as "y-hat").
Supervised machine learning
Supervised machine learning is a general term for machine learning
algorithms in which the training data includes both feature values and
known label values. Supervised machine learning is used to train models by
determining a relationship between the features and labels in past
observations, so that unknown labels can be predicted for features in future
cases.
Regression
Regression is a form of supervised machine learning in which the label
predicted by the model is a numeric value. For example:
The number of ice creams sold on a given day, based on the
temperature, rainfall, and windspeed.
The selling price of a property based on its size in square feet, the
number of bedrooms it contains, and socio-economic metrics for its
location.
The fuel efficiency (in miles-per-gallon) of a car based on its engine
size, weight, width, height, and length.
Classification
Classification is a form of supervised machine learning in which the label
represents a categorization, or class. There are two common classification
scenarios.
Binary classification
In binary classification, the label determines whether the observed
item is (or isn't) an instance of a specific class. Or put another way, binary
classification models predict one of two mutually exclusive outcomes. For
example:
Whether a patient is at risk for diabetes based on clinical metrics like
weight, age, blood glucose level, and so on.
Whether a bank customer will default on a loan based on income,
credit history, age, and other factors.
Whether a mailing list customer will respond positively to a marketing
offer based on demographic attributes and past purchases.
In all of these examples, the model predicts a
binary true/false or positive/negative prediction for a single possible class.
Multiclass classification
Multiclass classification extends binary classification to predict a label that
represents one of multiple possible classes. For example,
The species of a penguin (Adelie, Gentoo, or Chinstrap) based on its
physical measurements.
The genre of a movie (comedy, horror, romance, adventure, or science
fiction) based on its cast, director, and budget.
In most scenarios that involve a known set of multiple classes, multiclass
classification is used to predict mutually exclusive labels. For example, a
penguin can't be both a Gentoo and an Adelie. However, there are also some
algorithms that you can use to train multilabel classification models, in which
there may be more than one valid label for a single observation. For
example, a movie could potentially be categorized as both science
fiction and comedy.
Unsupervised machine learning
Unsupervised machine learning involves training models using data that
consists only of feature values without any known labels. Unsupervised
machine learning algorithms determine relationships between the features of
the observations in the training data.
Clustering
The most common form of unsupervised machine learning is clustering. A
clustering algorithm identifies similarities between observations based on
their features, and groups them into discrete clusters. For example:
Group similar flowers based on their size, number of leaves, and
number of petals.
Identify groups of similar customers based on demographic attributes
and purchasing behavior.
In some ways, clustering is similar to multiclass classification; in that it
categorizes observations into discrete groups. The difference is that when
using classification, you already know the classes to which the observations
in the training data belong; so the algorithm works by determining the
relationship between the features and the known classification label. In
clustering, there's no previously known cluster label and the algorithm
groups the data observations based purely on similarity of features.
In some cases, clustering is used to determine the set of classes that exist
before training a classification model. For example, you might use clustering
to segment your customers into groups, and then analyze those groups to
identify and categorize different classes of customer (high value - low
volume, frequent small purchaser, and so on). You could then use your
categorizations to label the observations in your clustering results and use
the labeled data to train a classification model that predicts to which
customer category a new customer might belong.
Regression evaluation metrics
Based on the differences between the predicted and actual values, you can
calculate some common metrics that are used to evaluate a regression
model.
Mean Absolute Error (MAE)
The variance in this example indicates by how many ice creams each
prediction was wrong. It doesn't matter if the prediction
was over or under the actual value (so for example, -3 and +3 both indicate
a variance of 3). This metric is known as the absolute error for each
prediction, and can be summarized for the whole validation set as the mean
absolute error (MAE).
In the ice cream example, the mean (average) of the absolute errors (2, 3, 3,
1, 2, and 3) is 2.33.
Mean Squared Error (MSE)
The mean absolute error metric takes all discrepancies between predicted
and actual labels into account equally. However, it may be more desirable to
have a model that is consistently wrong by a small amount than one that
makes fewer, but larger errors. One way to produce a metric that "amplifies"
larger errors by squaring the individual errors and calculating the mean of
the squared values. This metric is known as the mean squared
error (MSE).
In our ice cream example, the mean of the squared absolute values (which
are 4, 9, 9, 1, 4, and 9) is 6.
Root Mean Squared Error (RMSE)
The mean squared error helps take the magnitude of errors into account, but
because it squares the error values, the resulting metric no longer represents
the quantity measured by the label. In other words, we can say that the MSE
of our model is 6, but that doesn't measure its accuracy in terms of the
number of ice creams that were mispredicted; 6 is just a numeric score that
indicates the level of error in the validation predictions.
If we want to measure the error in terms of the number of ice creams, we
need to calculate the square root of the MSE; which produces a metric called,
unsurprisingly, Root Mean Squared Error. In this case √6, which
is 2.45 (ice creams).
Coefficient of determination (R2)
All of the metrics so far compare the discrepancy between the predicted and
actual values in order to evaluate the model. However, in reality, there's
some natural random variance in the daily sales of ice cream that the model
takes into account. In a linear regression model, the training algorithm fits a
straight line that minimizes the mean variance between the function and the
known label values. The coefficient of determination (more commonly
referred to as R2 or R-Squared) is a metric that measures the proportion of
variance in the validation results that can be explained by the model, as
opposed to some anomalous aspect of the validation data (for example, a
day with a highly unusual number of ice creams sales because of a local
festival).
The calculation for R2 is more complex than for the previous metrics. It
compares the sum of squared differences between predicted and actual
labels with the sum of squared differences between the actual label values
and the mean of actual label values, like this:
R2 = 1- ∑(y-ŷ)2 ÷ ∑(y-ȳ)2
Don't worry too much if that looks complicated; most machine learning tools
can calculate the metric for you. The important point is that the result is a
value between 0 and 1 that describes the proportion of variance explained
by the model. In simple terms, the closer to 1 this value is, the better the
model is fitting the validation data. In the case of the ice cream regression
model, the R2 calculated from the validation data is 0.95.
Iterative training
The metrics described above are commonly used to evaluate a regression
model. In most real-world scenarios, a data scientist will use an iterative
process to repeatedly train and evaluate a model, varying:
Feature selection and preparation (choosing which features to include
in the model, and calculations applied to them to help ensure a better
fit).
Algorithm selection (We explored linear regression in the previous
example, but there are many other regression algorithms)
Algorithm parameters (numeric settings to control algorithm behavior,
more accurately called hyperparameters to differentiate them from
the x and y parameters).
After multiple iterations, the model that results in the best evaluation metric
that's acceptable for the specific scenario is selected.
Evaluating a clustering model
Since there's no known label with which to compare the predicted cluster
assignments, evaluation of a clustering model is based on how well the
resulting clusters are separated from one another.
There are multiple metrics that you can use to evaluate cluster separation,
including:
Average distance to cluster center: How close, on average, each
point in the cluster is to the centroid of the cluster.
Average distance to other center: How close, on average, each
point in the cluster is to the centroid of all other clusters.
Maximum distance to cluster center: The furthest distance
between a point in the cluster and its centroid.
Silhouette: A value between -1 and 1 that summarizes the ratio of
distance between points in the same cluster and points in different
clusters (The closer to 1, the better the cluster separation).
Thoughtfully designed machine learning solutions form the foundation of
today's AI applications. From predictive analytics to personalized
recommendations and beyond, machine learning solutions support the latest
technological advances in society by using existing data to produce new
insights.
Data scientists make decisions to tackle machine learning problems in
different ways. The decisions they make affect the cost, speed, quality, and
longevity of the solution.
In this module, you learn how to design an end-to-end machine learning
solution with Microsoft Azure that can be used in an enterprise setting. Using
the following six steps as a framework, we explore how to plan, train, deploy,
and monitor machine learning solutions.
1. Define the problem: Decide on what the model should predict and
when it's successful.
2. Get the data: Find data sources and get access.
3. Prepare the data: Explore the data. Clean and transform the data
based on the model's requirements.
4. Train the model: Choose an algorithm and hyperparameter values
based on trial and error.
5. Integrate the model: Deploy the model to an endpoint to generate
predictions.
6. Monitor the model: Track the model's performance.
Note
The diagram is a simplified representation of the machine learning process.
Typically, the process is iterative and continuous. For example, when
monitoring the model you may decide to go back and retrain the model.
Next, let's look at how we can get started on a machine learning solution by
defining the problem.
1. Classification: Predict a categorical value.
2. Regression: Predict a numerical value.
3. Time-series forecasting: Predict future numerical values based on
time-series data.
4. Computer vision: Classify images or detect objects in images.
5. Natural language processing (NLP): Extract insights from text.
6. Setup: Create all necessary Azure resources for the solution.
7. Model development (inner loop): Explore and process the data to
train and evaluate the model.
8. Continuous integration: Package and register the model.
9. Model deployment (outer loop): Deploy the model.
10. Continuous deployment: Test the model and promote to
production environment.
11. Monitoring: Monitor model and endpoint performance.