Key Machine Learning Questions & Answers
Key Machine Learning Questions & Answers
Accuracy may not be suitable for imbalanced datasets because it doesn't consider the class distribution. A model might achieve high accuracy by simply predicting the majority class, which can be misleading when the minority class is the true interest of analysis. In such cases, metrics like precision, recall, and the F1-score provide better insights into the model's performance on each class, emphasizing the success on the minority class .
Decision trees can easily overfit the training data by capturing noise and small fluctuations in the dataset. To mitigate this, techniques such as pruning (pre-pruning and post-pruning) are used. Pre-pruning stops the tree's growth early before it becomes too complex, while post-pruning removes branches that have little importance after the tree is fully grown. These techniques help to simplify the model and enhance its generalization to new data .
The kernel trick in SVM is a method that allows the algorithm to operate in a higher-dimensional space without explicitly calculating the coordinates in that space, by using a kernel function. This capability is particularly useful for dealing with nonlinear data. Common kernel types include the linear kernel, polynomial kernel, and radial basis function (RBF) kernel. Each kernel transforms the input space differently, enabling the SVM to create non-linear decision boundaries .
Entropy is a measure of the disorder or impurity in a dataset, with higher values indicating greater disorder. In decision tree construction, information gain is used to evaluate the effectiveness of an attribute in classifying the data. It is calculated by measuring the reduction in entropy after the dataset is split based on the attribute. An optimal split is one that achieves the highest information gain, indicating a more homogeneous division of data post-split, leading to clearer distinctions between nodes .
Mean Square Error (MSE) is not suitable for logistic regression because the predictions are probabilistic and bounded between 0 and 1, which can result in non-convex minimization problems. Instead, the logistic regression model uses the log-likelihood function as its cost function, whereby the Maximum Likelihood Estimation (MLE) technique is employed to find the optimal parameters. This results in a convex optimization problem that can be efficiently solved .
The hyperplane in a Support Vector Machine (SVM) serves as the decision boundary that separates different classes in the feature space. The goal of an SVM is to find the optimal hyperplane that maximizes the margin between the data points of different classes, ensuring that the classification is as robust as possible. This maximization of the margin reduces the model's susceptibility to overfitting and increases its ability to generalize to unseen data .
Support Vector Machines (SVM) generally handle outliers better than logistic regression, primarily because SVM focuses on maximizing the margin around the decision boundary and not all data points, particularly those outside the margin, influence its positioning. Logistic regression, however, assumes a linear relationship with all data points and is sensitive to outliers that can skew the decision boundary away from its optimal position .
Multiclass classification using logistic regression can be approached with 'one vs. all' where a separate binary classifier is trained for each class to determine whether data belongs to that class or not. Alternatively, 'one vs. one' involves training classifiers for every pair of classes. While 'one vs. all' is computationally simpler, 'one vs. one' often provides more accurate results as the classifiers focus on separating only two classes at a time, leading to simpler decision boundaries .
Supervised learning involves training a model on labeled data, where the outcomes are known, allowing the model to learn the mapping from inputs to outputs. Unsupervised learning, in contrast, deals with unlabeled data, where the model tries to identify patterns and structures, such as clustering or association. Distinguishing between these types is crucial because it influences the choice of algorithms and the approach to problem-solving. Supervised learning is suitable for tasks where outcomes are known and precise prediction is required, while unsupervised learning is used for exploratory data analysis and discovering hidden patterns .
Linear regression is used for predicting continuous outcome variables and assumes a linear relationship between features and target. Logistic regression, on the other hand, is used for predicting binary outcomes and leverages the logistic function to estimate probabilities, resulting in outputs bounded between 0 and 1. While linear regression provides direct numerical predictions, logistic regression provides class probabilities, making it essential for classification tasks .




