0% found this document useful (0 votes)
16 views2 pages

Evaluating Machine Learning Metrics

The document discusses the importance of evaluating machine learning models, highlighting different evaluation metrics for classification and regression models. Key classification metrics include accuracy, precision, recall, F1 score, and ROC curve, while regression metrics include R^2, mean absolute error, and mean squared error. It also provides guidance on which metrics to prioritize based on the specific characteristics of the data and the model's performance.

Uploaded by

Renato
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views2 pages

Evaluating Machine Learning Metrics

The document discusses the importance of evaluating machine learning models, highlighting different evaluation metrics for classification and regression models. Key classification metrics include accuracy, precision, recall, F1 score, and ROC curve, while regression metrics include R^2, mean absolute error, and mean squared error. It also provides guidance on which metrics to prioritize based on the specific characteristics of the data and the model's performance.

Uploaded by

Renato
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Machine Learning Model Evaluation

Evaluating the results of a machine learning model is as important as building


one.

But just like how different problems have different machine learning models,
different machine learning models have different evaluation metrics.

Below are some of the most important evaluation metrics you'll want to look
into for classification and regression models.

Classification Model Evaluation Metrics/Techniques


 Accuracy - The accuracy of the model in decimal form. Perfect accuracy
is equal to 1.0.
 Precision - Indicates the proportion of positive identifications (model
predicted class 1) which were actually correct. A model which produces
no false positives has a precision of 1.0.
 Recall - Indicates the proportion of actual positives which were correctly
classified. A model which produces no false negatives has a recall of 1.0.
 F1 score - A combination of precision and recall. A perfect model
achieves an F1 score of 1.0.
 Confusion matrix - Compares the predicted values with the true values
in a tabular way, if 100% correct, all values in the matrix will be top left to
bottom right (diagonal line).
 Cross-validation - Splits your dataset into multiple parts and train and
tests your model on each part then evaluates performance as an
average.
 Classification report - Sklearn has a built-in function
called classification_report() which returns some of the main classification
metrics such as precision, recall and f1-score.
 ROC Curve - Also known as receiver operating characteristic is a plot of
true positive rate versus false-positive rate.
 Area Under Curve (AUC) Score - The area underneath the ROC curve.
A perfect model achieves an AUC score of 1.0.

Which classification metric should you use?


 Accuracy is a good measure to start with if all classes are balanced (e.g.
same amount of samples which are labelled with 0 or 1).
 Precision and recall become more important when classes are
imbalanced.
 If false-positive predictions are worse than false-negatives, aim for higher
precision.

 If false-negative predictions are worse than false-positives, aim for higher


recall.
 F1-score is a combination of precision and recall.
 A confusion matrix is always a good way to visualize how a classification
model is going.

Regression Model Evaluation Metrics/Techniques


 R^2 (pronounced r-squared) or the coefficient of determination -
Compares your model's predictions to the mean of the targets. Values
can range from negative infinity (a very poor model) to 1. For example, if
all your model does is predict the mean of the targets, its R^2 value
would be 0. And if your model perfectly predicts a range of numbers it's
R^2 value would be 1.
 Mean absolute error (MAE) - The average of the absolute differences
between predictions and actual values. It gives you an idea of how wrong
your predictions were.
 Mean squared error (MSE) - The average squared differences between
predictions and actual values. Squaring the errors removes negative
errors. It also amplifies outliers (samples which have larger errors).

Which regression metric should you use?
 R2 is similar to accuracy. It gives you a quick indication of how well your
model might be doing. Generally, the closer your R2 value is to 1.0, the
better the model. But it doesn't really tell exactly how wrong your model
is in terms of how far off each prediction is.
 MAE gives a better indication of how far off each of your model's
predictions are on average.
 As for MAE or MSE, because of the way MSE is calculated, squaring the
differences between predicted values and actual values, it amplifies
larger differences. Let's say we're predicting the value of houses (which
we are).
 Pay more attention to MAE: When being $10,000 off is twice as bad
as being $5,000 off.
 Pay more attention to MSE: When being $10,000 off is more than
twice as bad as being $5,000 off.

For more resources on evaluating a machine learning model, be sure to check
out the following resources:

 Scikit-Learn documentation for metrics and scoring (quantifying the


quality of predictions)
 Beyond Accuracy: Precision and Recall by Will Koehrsen
 Stack Overflow answer describing MSE (mean squared error) and
RSME (root mean squared error)

You might also like