CS221 Machine Learning Overview
CS221 Machine Learning Overview
Binary classification involves predicting a discrete output, typically represented as positive (+1) or negative (-1), whereas regression involves predicting a continuous output, often called the response or target . Typical applications of binary classification include fraud detection (predicting if a transaction is fraudulent), moderating content (predicting if an online comment is toxic), and scientific experiments like identifying Higgs boson decay events . Applications of regression include predicting asset wealth indices from satellite images, housing prices based on property information, and arrival times of services .
The introduction of nonlinear features in neural networks significantly enhances model development and performance by enabling the capture of complex patterns and interactions within data that linear models cannot represent . The automatic learning of these features during training allows neural networks to flexibly adapt to various input characteristics, often resulting in superior predictive accuracy, especially for tasks involving significant nonlinearity like image recognition or language processing .
Stochastic gradient descent (SGD) improves upon standard gradient descent by significantly reducing the computation required per iteration, which results in faster convergence, especially for large datasets . Instead of using the entire dataset to calculate the gradient, SGD uses a randomly selected subset (a single data point or mini-batch), which reduces the time complexity per update and often leads to better generalization on unseen data .
The purpose of reflex-based models in machine learning is to perform fast, fixed feedforward operations to produce predictions, making them suitable for real-time decision-making tasks . These models function by taking an input (x), like an image or sentence, and applying a predefined function (f) to produce an output (y), like a binary class label or a continuous value .
Feature templates support the development of nonlinear predictors by organizing and transforming raw data into a structured input space that linear models can manipulate, effectively allowing the models to capture complex patterns . This approach leverages the simplicity and efficiency of linear models while extending their capability to handle nonlinearities, which can be beneficial for both interpretability and computational speed compared to deeper architectures like neural networks .
Unsupervised learning techniques like K-means are preferred over supervised learning approaches when labeled data is scarce or unavailable, making it impractical to train models with explicit input-output pairs . Such techniques are useful for discovering inherent structures, grouping similar data points, and dimensionality reduction, which can help in exploratory data analysis and pre-processing steps in larger pipelines .
Backpropagation plays a pivotal role in training neural networks by facilitating the efficient computation of the gradient of the loss function with respect to each weight in the network . It simplifies gradient computation by propagating the error terms backward through the network layers, using the chain rule for derivatives, thus avoiding manual gradient calculations and allowing for scalable training of deep models .
Structured prediction tasks are fundamentally different because they involve producing complex outputs, such as sentences or segmented images, which encompass a vast space of possible outputs compared to binary classification and regression, which have fixed or continuous outputs . Challenges associated with structured prediction include handling the potentially enormous space of outputs and managing error cascades, where initial errors can lead to further errors in predictions .
Errors in machine learning can be unevenly distributed across different demographic or categorical groups, potentially leading to biases in decision-making processes . Group DRO (Distributionally Robust Optimization) is a technique designed to address this issue by ensuring that errors do not disproportionately impact any particular group within the population, thus promoting fairness and equity in model predictions .
Ensuring that a machine learning model generalizes well to new test inputs is crucial for its applicability to real-world scenarios, where it must make accurate predictions on unseen data . Factors influencing a model's ability to generalize include the complexity of the model (to avoid overfitting), the quality and diversity of training data, and the choice of algorithms and hyperparameters, all of which can affect the balance between bias and variance in the model .