UNIT-2
DESIGN AND ANALYSIS OF
MACHINE LEARNING
ALGORITHMS
Contents
• Guidelines for machine learning experiments,
• Cross Validation (CV) and resampling –
• K-fold CV, bootstrapping, measuring classifier performance, assessing a
single classification algorithm and comparing two classification
algorithms
• t test, McNemar’s test, K-fold CV paired t test Performance
metrics-MSE, accuracy, confusion matrix, precision, recall, F1- Score
• Linear Regression with multiple variables-
• Logistic Regression-
• spam filtering with logistic regression
t-test
• The t-test is a statistical hypothesis test used to determine whether
there is a significant difference between the means of two groups.
• In the context of machine learning, the t-test can be used to compare
the performance of two different models, two algorithms, or the same
model under different conditions.
• Types of t-Tests
1. Independent t-test (Two-sample t-test): Used when comparing the means
of two independent groups.
2. Paired t-test: Used when comparing the means of the same group under two
different conditions.
• Step-by-Step Explanation of the Independent t-Test
• Let's go through the steps of performing an independent t-test with an
example.
• Scenario
• Assume we have two models, Model A and Model B, and we want to
compare their performances based on accuracy scores obtained from
10-fold cross-validation
• Step 1: Formulate Hypotheses
• Null Hypothesis (H0): There is no significant difference between the
performance of Model A and Model B.
• Alternative Hypothesis (H1): There is a significant difference
between the performance of Model A and Model B.
• Step 2: Collect Data
• Suppose we have the following accuracy scores for each fold:
• Model A: [0.85, 0.87, 0.88, 0.86, 0.89, 0.87, 0.88, 0.86, 0.89, 0.88]
• Model B: [0.83, 0.82, 0.84, 0.85, 0.84, 0.83, 0.85, 0.82, 0.83, 0.84]
• Step 3: Calculate Means and Standard Deviations
• Calculate the mean and standard deviation for both sets of scores.
• Step 4: Perform the Independent t-TestUse the ttest_ind function from
the [Link] library to perform the t-test.
• Step 5: Interpret the Results
• t-statistic: The t-statistic measures the size of the difference relative to
the variation in the sample data.
• p-value: The p-value indicates the probability of observing the results
given that the null hypothesis is true.
• Let's assume the following results were obtained:
• t_statistic, p_value = 5.57, 0.0001
• Interpretation:
• If the p-value is less than the chosen significance level (typically
0.05), we reject the null hypothesis. In this case, since the p-value is
0.0001, which is much less than 0.05, we reject the null hypothesis.
• This means there is a significant difference between the performance
of Model A and Model B.
McNemar’s Test: Explanation and
Example
• McNemar’s test is a statistical test used on paired nominal data to
determine whether there are differences in the proportions of two
related groups.
• It's commonly used in machine learning to compare the performance
of two classifiers on the same dataset, especially when the data is in
the form of a contingency table.
When to Use McNemar’s Test?
• Binary classification problems: When you have two classifiers and
want to compare their predictions.
• Paired samples: The same instances are classified by both classifiers,
so their predictions can be directly compared.