0% found this document useful (0 votes)
15 views3 pages

Alteryx Inspire 2018: Linear Regression Insights

The document discusses techniques for exploring, preparing, and modeling time series and other types of data using Alteryx tools. Key points include using field summaries, scatter plots, and other exploratory techniques to understand data; imputing missing values; performing regression analysis and assessing significance of predictors; creating classification models like decision trees; and evaluating different model performance metrics.

Uploaded by

Ishan Sane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views3 pages

Alteryx Inspire 2018: Linear Regression Insights

The document discusses techniques for exploring, preparing, and modeling time series and other types of data using Alteryx tools. Key points include using field summaries, scatter plots, and other exploratory techniques to understand data; imputing missing values; performing regression analysis and assessing significance of predictors; creating classification models like decision trees; and evaluating different model performance metrics.

Uploaded by

Ishan Sane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Alteryx Inspire Conference

 Field summary used to investigate data type & statistical dist.


 Scatter plots & plot of means can be used for exploratory data analysis
 Impute tool (handles missing or zero values) with mean as an option
 Are 0 values included in the mean calculation?
 P-value analysis on target variable (lower the value more significant the result)
 Association measure (analysis only relevant for linear/logistic regression)
 Create samples tool: creates a training/testing set
 Linear regression (interactive tool provides breakdown of results). Especially look for lowest
p value indicating most relevance (statistical significance)
 Intercept value (value if every other variable is zero)
 OLS analysis (spread of errors will reveal model bias)
 Stepwise regression (re-selects predictor variables depending on their significance)
 Oversample tool (selects samples biased to a certain value)
 Log normalisation (dealing with skewed data)
Log([value]+1), regression deals easier with linearised data
 Confusion matrix will give values of false positives/negatives
 Using false positives, we can oversample that to 50% split to train the model

 Decision Tree (green: path to failure, orange represents success, Tree Classification browse
tool, if it is a yes (go to the left otherwise right)
Accuracy at each node can be shown
 Union tool can also combine model objects together

Understanding Time Series

 Always start with a field summary (describe())


 Find any missing periods
 MUST have consecutive periods between beginning and ending periods

 TS Filler fills missing gaps


 Green bar represents population of numeric vs. null values
 TS Plots allows you to analyse time series data in terms of decomposition, auto-correlation,
partial auto-correlation

 Log frequency/sample to look at relative basis over time


 Clustering is an un-supervised learning technique
 Udacity (predictive analytics course). Can do

Cache & run workflow (caching up till a certain point in a workflow)


Insights tool – has a built in viz platform

Putler’s Predictive Analytics Pyramid

 Determine information needed to address problem/issue


 Find & engineer appropriate and meaningful predictors
 Relationship between predictors & target
 Determine type of models needed

Meaningful metrics for prediction

Decision makers can tend to jump to a solution too soon rather than determining what information
is really needed to inform the problem/solution.

Comparing metrics from different types of models

Is it providing signal or creating noise in the model

Which predictor matters the most when making a prediction

Different modelling methods use different measures of effect size

How does predicted value change as level of numeric predictor increases or as the category changes
for a categorical predictor

For classification models – predicted probability for each possible target classes

Regression models (predicted numeric value of target)

Metrics - Regression

1. MAPE (%)
2. RMSE
3. Correlation between actual & predicted values

Metrics - Binary or Multi-Class Models

- Area under receiver operator curve (AUC) only for binary, can have multi-class extension to
it
- Confusion matrix
- Log-loss (penalise based on count)

Partial dependency plot (fitted values across range of a focal predictor)

Multi-collinearity only starts affecting the model when number of records are a lot

Reverse-causality

Efficiency

 Performance
 Memory
 Hard drive space
 Load on servers during production

Develop Efficiency

Caching

 Right-click & cache to avoid re-running workflow

Reduce by sampling

Ctrl+f (in all caps, can search for values within tools)

Can load games (in ‘about’ section)

HIPPO (Highest Paid Person’s Opinion)

You might also like