0% found this document useful (0 votes)

3 views36 pages

ML Python Spring2018 Part1

The document provides an overview of machine learning, defining it as a field that enables computers to learn from data without explicit programming. It outlines the workflow of machine learning, including data collection, preparation, model selection, evaluation, and optimization, while distinguishing between supervised, unsupervised, and semi-supervised learning. Additionally, it discusses key concepts such as feature scaling, model evaluation metrics, and various algorithms used in machine learning.

Uploaded by

anh nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views36 pages

ML Python Spring2018 Part1

Uploaded by

anh nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Machine Learning in Python

Rohith Mohan
GradQuant
Spring 2018
What is Machine Learning?

[Link]
Traditional Programming
Data
• Getting computers to program
themselves
Output • Coding is the bottleneck, let data
Computer
dictate programming

Program

Machine Learning
Data

Computer Output

Program
[Link]
Formal Definitions
• Arthur Samuel (1959)
• “Machine Learning: Field of study that gives computers the ability to learn
without being explicitly programmed.”
• Created a program for computer to play itself in checkers (10000s games) and
learn at IBM

• Tom Mitchell (1998)

• “Well-posed Learning Problem: A computer program is said to learn from
experience E with respect to some task T and some performance measure P, if
its performance on T, as measured by P, improves with experience E.”

Andrew Ng Machine Learning Coursera

Machine Learning
• Developed out of initial work in Artificial Intelligence (AI)
• Increased availability of large datasets and advances in computing
architecture boosted usage in recent times

[Link]
Usage
Natural Language Processing
+ Computer Vision

Mining and clustering

gene expression data to
identify individuals

Reproducing human
behavior (True AI)

[Link]
breakthroughs-of-2017-that-might-just-change-the-world-1222695/

[Link]
Recommendation algorithms [Link] [Link]
Common steps in ML workflow
• Collect data (various sources, UCI data repository, news orgs, Kaggle)
• Prepare data (exploratory analysis, feature selection, regularization)
• Selecting and training model (train and test datasets, what model?)
• Evaluating model (accuracy, precision, ROC curves, F1 score)
• Optimizing performance (change model, # of features, scaling)
scikit-learn

[Link]
Preprocessing
• Clean data and deal with missing values, etc.
• Feature scaling - rescaling features to be more sensible
• Standardization - getting various features into similar range (e.g. -1 to
1)
• Square footage of a house (100s of ft) vs # of rooms (1-5)
• Normalization – scaling to some standard (e.g. subtract mean &
divide by SD)
• Many others (regularization,imputation, generating polynomial
features, etc.)

[Link]
Importance of feature scaling

[Link]
Comparison of scaling
StandardScaler
Comparison of scaling
RobustScaler
Train Test (Cross Validate?)
• Why do we need to split up our datasets?
• Overfitting
• Split dataset
• Train – for training your model on
• Test – evaluate performance of model
• Usually 40% for testing is enough
• Validation set?
• Cross-validation
• Split up training set into subsets and evaluate performance (can be more
computationally expensive but conserves data)
• Hyper-parameter tuning
Bias-variance tradeoff

Underfitting Overfitting
High Bias High Variance

[Link] [Link]
plot-underfitting-overfitting-py
Bias-variance tradeoff

[Link]
How to select a model?
[Link]
Supervised vs Unsupervised Learning
• Supervised
• Regression, classification
• Input variables, output variable, learn mapping of input to output

• Unsupervised
• Clustering, association, etc.
• No correct answers and no teacher

• Semi-supervised
• Partially labeled dataset of images
• Mixing both techniques is what occurs in real-world
Regression
• Linear regression (OLS)

• Prediction
• Multiple variables/features?
• Feature selection

[Link] [Link]
Feature Selection

[Link]
Feature Selection

[Link]
Regression
• Linear regression (OLS)

• Prediction
• Multiple variables/features?
• Feature selection
• Length, width of a house (area?)
• Regularization

[Link] [Link]
Regularization

[Link]
Regularization

[Link]
[Link]
plot-train-error-vs-test-error-py
Classification – Logistic Regression

[Link]
Classification – Logistic Regression

[Link] [Link]
Classification – SVM

[Link]
Evaluating Performance
• Accuracy – how many predictions are correct out of the entire
dataset?
• Can be a flawed metric

• Precision and Recall

[Link]
Evaluating Performance
• Accuracy – how many predictions are correct out of the entire
dataset?
• Can be a flawed metric

• Precision and Recall

• ROC curves
• F1 score
Evaluating Performance

[Link]
Classification - K-Nearest Neighbors
• Robust to noisy training data
• More effective with larger datasets

• Need to determine parameter K (number of nearest neighbors)

• What type of distance metric?
• High computation cost
Clustering
• Unsupervised learning
• Can help you understand structure of your data

• Various types of clustering: K-means, Hierarchical, Ward

K-means
• Randomly choose k centroids
• Form clusters around it
• Take mean of cluster to identify new centroid
• Repeat until convergence

[Link]

Machine Learning Concepts and Python Guide
No ratings yet
Machine Learning Concepts and Python Guide
589 pages
Data Preprocessing & Supervised Learning
No ratings yet
Data Preprocessing & Supervised Learning
19 pages
Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
24 pages
Machine Learning Applications and Techniques
No ratings yet
Machine Learning Applications and Techniques
53 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
43 pages
Week 6 and 7 Slides
No ratings yet
Week 6 and 7 Slides
105 pages
Machine Learning Workshop Overview
No ratings yet
Machine Learning Workshop Overview
78 pages
Machine Learning with Python Guide
No ratings yet
Machine Learning with Python Guide
58 pages
Machine Learning Basics and Scikit-learn
No ratings yet
Machine Learning Basics and Scikit-learn
38 pages
Machine Learning Basics and Preprocessing
No ratings yet
Machine Learning Basics and Preprocessing
52 pages
Machine Learning Basics and Techniques
No ratings yet
Machine Learning Basics and Techniques
29 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
24 pages
Industrial Training on Machine Learning
No ratings yet
Industrial Training on Machine Learning
20 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
32 pages
Machine Learning Basics Overview
No ratings yet
Machine Learning Basics Overview
21 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
55 pages
Scikit-Learn Machine Learning Guide
No ratings yet
Scikit-Learn Machine Learning Guide
54 pages
Essential Steps for DS/ML Projects
No ratings yet
Essential Steps for DS/ML Projects
30 pages
Introduction to Reinforcement Learning
No ratings yet
Introduction to Reinforcement Learning
50 pages
Overview of Machine Learning Concepts
No ratings yet
Overview of Machine Learning Concepts
6 pages
ML Study Guide
No ratings yet
ML Study Guide
21 pages
Machine Learning: Feature Scaling & Regression
No ratings yet
Machine Learning: Feature Scaling & Regression
33 pages
Machine Learning Model Fundamentals
No ratings yet
Machine Learning Model Fundamentals
13 pages
DL Unit-1
No ratings yet
DL Unit-1
20 pages
Machine Learning Overview and Applications
No ratings yet
Machine Learning Overview and Applications
42 pages
Machine Learning Concepts and Applications
No ratings yet
Machine Learning Concepts and Applications
61 pages
Machine Learning Basics and Model Building
No ratings yet
Machine Learning Basics and Model Building
8 pages
U1 Int395
No ratings yet
U1 Int395
38 pages
Supervised Machine Learning Overview
No ratings yet
Supervised Machine Learning Overview
38 pages
Data Analytics Course Overview
No ratings yet
Data Analytics Course Overview
253 pages
Data Science Overview and Machine Learning
No ratings yet
Data Science Overview and Machine Learning
51 pages
ML Encyclopedia
No ratings yet
ML Encyclopedia
15 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
95 pages
Machine Learning Concepts and Techniques
No ratings yet
Machine Learning Concepts and Techniques
17 pages
Machine Learning Overview and Notes
No ratings yet
Machine Learning Overview and Notes
19 pages
Data Normalization in Pattern Recognition
No ratings yet
Data Normalization in Pattern Recognition
51 pages
Overview of Machine Learning Concepts
No ratings yet
Overview of Machine Learning Concepts
9 pages
Foundational Machine Learning Concepts
No ratings yet
Foundational Machine Learning Concepts
22 pages
Supervised Learning Techniques in AI
No ratings yet
Supervised Learning Techniques in AI
121 pages
Dimensionality Reduction in Machine Learning
No ratings yet
Dimensionality Reduction in Machine Learning
75 pages
Data Transformation & Dimensionality Reduction
No ratings yet
Data Transformation & Dimensionality Reduction
22 pages
Lec1 en
No ratings yet
Lec1 en
58 pages
Data Analyst Interview Questions Guide
No ratings yet
Data Analyst Interview Questions Guide
16 pages
Which ML Algo Should I Use SAS
No ratings yet
Which ML Algo Should I Use SAS
20 pages
Understanding Machine Learning Phases
No ratings yet
Understanding Machine Learning Phases
96 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
15 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
168 pages
Supervised Machine Learning Algorithms
No ratings yet
Supervised Machine Learning Algorithms
11 pages
Machine Learning Basics Overview
No ratings yet
Machine Learning Basics Overview
38 pages
Chapter - 01 - Introduction To ML
No ratings yet
Chapter - 01 - Introduction To ML
60 pages
Understanding Machine Learning Concepts
No ratings yet
Understanding Machine Learning Concepts
87 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
51 pages
ML MLib Yang
No ratings yet
ML MLib Yang
47 pages
Machine Learning Basics and Steps
No ratings yet
Machine Learning Basics and Steps
13 pages
Machine Learning Concepts and Techniques
No ratings yet
Machine Learning Concepts and Techniques
68 pages
Understanding 5.1k Means in ML
No ratings yet
Understanding 5.1k Means in ML
10 pages
Machine Learning Basics and Applications
No ratings yet
Machine Learning Basics and Applications
21 pages
Lecture 10
No ratings yet
Lecture 10
25 pages
Lecture 12
No ratings yet
Lecture 12
32 pages
Reg With Stata
No ratings yet
Reg With Stata
52 pages
Ba Yes Eli Citation Example
No ratings yet
Ba Yes Eli Citation Example
4 pages
Slides Git
No ratings yet
Slides Git
37 pages
TISI Emcee Examples
No ratings yet
TISI Emcee Examples
12 pages
FP TDD Data
No ratings yet
FP TDD Data
101 pages
POLI 502 FA18: Homework 2: Due at The Start of Class On Tuesday, October 2, 2018
No ratings yet
POLI 502 FA18: Homework 2: Due at The Start of Class On Tuesday, October 2, 2018
2 pages
POLI 502 FA17: Homework 5: Due at The Start of Class On November 28, 2017
No ratings yet
POLI 502 FA17: Homework 5: Due at The Start of Class On November 28, 2017
2 pages
POLI 502 FA18: Homework 3: Due Date October 19
No ratings yet
POLI 502 FA18: Homework 3: Due Date October 19
2 pages
Probability Distributions: Appendix A
No ratings yet
Probability Distributions: Appendix A
15 pages
Human Capital and Labor Productivity in Europe
No ratings yet
Human Capital and Labor Productivity in Europe
10 pages
Chapter 4 An Illustrative Example of Case 2 Best-Worst Scaling
No ratings yet
Chapter 4 An Illustrative Example of Case 2 Best-Worst Scaling
37 pages
Causal Inference Seminar Overview
No ratings yet
Causal Inference Seminar Overview
26 pages
Causal Mediation Analysis Explained
No ratings yet
Causal Mediation Analysis Explained
19 pages
R and Illustrator for Professional Graphics
No ratings yet
R and Illustrator for Professional Graphics
9 pages
Education's Impact on Worker Productivity
No ratings yet
Education's Impact on Worker Productivity
44 pages
Causal Mechanisms and Sensitivity Analysis
No ratings yet
Causal Mechanisms and Sensitivity Analysis
19 pages
Causal Inference with Directed Graphs
No ratings yet
Causal Inference with Directed Graphs
75 pages
Wireless Internet: Cheng Li Shiwen Mao
No ratings yet
Wireless Internet: Cheng Li Shiwen Mao
486 pages
Clustering Techniques in Machine Learning
No ratings yet
Clustering Techniques in Machine Learning
82 pages
Hierarchical Clustering Methods Explained
No ratings yet
Hierarchical Clustering Methods Explained
28 pages
Understanding Machine Learning Approaches
No ratings yet
Understanding Machine Learning Approaches
26 pages
Intangible Asset Dynamics in Firms
No ratings yet
Intangible Asset Dynamics in Firms
40 pages
Machine Learning with Python Guide
No ratings yet
Machine Learning with Python Guide
2 pages
CS229 Problem Set #3: Theory & Unsupervised Learning
No ratings yet
CS229 Problem Set #3: Theory & Unsupervised Learning
5 pages
Yang 2021 Transfer
No ratings yet
Yang 2021 Transfer
17 pages
Local Community Detection in Networks
No ratings yet
Local Community Detection in Networks
14 pages
DDoS Attack Detection with ML & Big Data
No ratings yet
DDoS Attack Detection with ML & Big Data
6 pages
Machine Learning Course Syllabus
No ratings yet
Machine Learning Course Syllabus
5 pages
Capstone Project Guide for Data Analysts
No ratings yet
Capstone Project Guide for Data Analysts
18 pages
Unsupervised Learning: Clustering Methods
No ratings yet
Unsupervised Learning: Clustering Methods
56 pages
Practice Makes Perfect: Spanish Pronouns and Prepositions, Dorothy Richmond Online Version
100% (10)
Practice Makes Perfect: Spanish Pronouns and Prepositions, Dorothy Richmond Online Version
155 pages
Understanding MKAU Meaning
No ratings yet
Understanding MKAU Meaning
38 pages
Evaluating System Integration Efficiency
No ratings yet
Evaluating System Integration Efficiency
14 pages
Similarity-Based Learning Methods
No ratings yet
Similarity-Based Learning Methods
51 pages
RAG Language Model Overview
No ratings yet
RAG Language Model Overview
16 pages
DBSCAN Clustering in Python
No ratings yet
DBSCAN Clustering in Python
4 pages
Missile Systems Maintenance Survey Report
No ratings yet
Missile Systems Maintenance Survey Report
80 pages
Customer Segmentation via Clustering Analysis
100% (3)
Customer Segmentation via Clustering Analysis
39 pages
Data Mining Concepts and Techniques Overview
No ratings yet
Data Mining Concepts and Techniques Overview
52 pages
Probabilistic Hierarchical Clustering
No ratings yet
Probabilistic Hierarchical Clustering
18 pages
Data Science Benefits and Applications
No ratings yet
Data Science Benefits and Applications
15 pages
K-Means and PSO for Crime Prediction
No ratings yet
K-Means and PSO for Crime Prediction
11 pages
Data Science Final Exam Guide
No ratings yet
Data Science Final Exam Guide
63 pages
Energy Consumption Patterns in Smart Grids
No ratings yet
Energy Consumption Patterns in Smart Grids
18 pages
Machine Learning for Crack Mode Classification
No ratings yet
Machine Learning for Crack Mode Classification
16 pages
Marketing Science Manager Resume
No ratings yet
Marketing Science Manager Resume
4 pages
Unit-5 ML
No ratings yet
Unit-5 ML
14 pages