vecstack

Python package for stacking (machine learning technique)

Easy to use. Perform stacking in a single line
Use any sklearn-like models
Perform classification and regression tasks
Apply any user-defined transformations for target and prediction
Python 2, Python 3
Win, Linux, Mac
MIT license
Depends on numpy, scipy, scikit-learn

Below you can find

Installation guide
Brief example
Complete examples for regression and classification
Explanation of stacking concept with pictures

You can also look at detailed parameter description or just type >>>help(stacking)

Installation

1-st way

From the command line run pip install vecstack

2-nd way

Download package file from PyPI
Unpack archive
From the command line go to unpacked directory
Run python setup.py install

3-rd way

Download package file from PyPI
From the command line go to the directory where you've downloaded package file
Run python -m pip install vecstack-0.1.zip

Brief example

from vecstack import stacking

# Get your data

# Initialize 1-st level models

# Get your stacking features in a single line
S_train, S_test = stacking(models, X_train, y_train, X_test, regression = True, verbose = 2)

# Use 2-nd level model with stacking features

Complete examples

Regression

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.ensemble import RandomForestRegressor
from xgboost import XGBRegressor
from vecstack import stacking

# Load demo data
boston = load_boston()
X, y = boston.data, boston.target

# Make train/test split
# As usual in machine learning task we have X_train, y_train, and X_test
X_train, X_test, y_train, y_test = train_test_split(X, y, 
    test_size = 0.2, random_state = 0)

# Caution! All models and parameter values are just 
# demonstrational and shouldn't be considered as recommended.
# Initialize 1-st level models.
models = [
    ExtraTreesRegressor(random_state = 0, n_jobs = -1, 
        n_estimators = 100, max_depth = 3),
        
    RandomForestRegressor(random_state = 0, n_jobs = -1, 
        n_estimators = 100, max_depth = 3),
        
    XGBRegressor(seed = 0, n_jobs = -1, learning_rate = 0.1, 
        n_estimators = 100, max_depth = 3)]
    
# Compute stacking features
S_train, S_test = stacking(models, X_train, y_train, X_test, 
    regression = True, metric = mean_absolute_error, n_folds = 4, 
    shuffle = True, random_state = 0, verbose = 2)

# Initialize 2-nd level model
model = XGBRegressor(seed = 0, n_jobs = -1, learning_rate = 0.1, 
    n_estimators = 100, max_depth = 3)
    
# Fit 2-nd level model
model = model.fit(S_train, y_train)

# Predict
y_pred = model.predict(S_test)

# Final prediction score
print('Final prediction score: [%.8f]' % mean_absolute_error(y_test, y_pred))

task:   [regression]
metric: [mean_absolute_error]

model 0: [ExtraTreesRegressor]
    fold 0: [3.20733439]
    fold 1: [2.87943130]
    fold 2: [2.53026486]
    fold 3: [2.83618694]
    ----
    MEAN:   [2.86330437]

model 1: [RandomForestRegressor]
    fold 0: [3.11110485]
    fold 1: [2.78404210]
    fold 2: [2.55707729]
    fold 3: [2.32209992]
    ----
    MEAN:   [2.69358104]

model 2: [XGBRegressor]
    fold 0: [2.40318939]
    fold 1: [2.37286982]
    fold 2: [1.89121530]
    fold 3: [1.95382831]
    ----
    MEAN:   [2.15527571]
    
Final prediction score: [2.78409065]

Classification

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from vecstack import stacking

# Load demo data
iris = load_iris()
X, y = iris.data, iris.target

# Make train/test split
# As usual in machine learning task we have X_train, y_train, and X_test
X_train, X_test, y_train, y_test = train_test_split(X, y, 
    test_size = 0.2, random_state = 0)

# Caution! All models and parameter values are just 
# demonstrational and shouldn't be considered as recommended.
# Initialize 1-st level models.
models = [
    ExtraTreesClassifier(random_state = 0, n_jobs = -1, 
        n_estimators = 100, max_depth = 3),
        
    RandomForestClassifier(random_state = 0, n_jobs = -1, 
        n_estimators = 100, max_depth = 3),
        
    XGBClassifier(seed = 0, n_jobs = -1, learning_rate = 0.1, 
        n_estimators = 100, max_depth = 3)]
    
# Compute stacking features
S_train, S_test = stacking(models, X_train, y_train, X_test, 
    regression = False, metric = accuracy_score, n_folds = 4, 
    stratified = True, shuffle = True, random_state = 0, verbose = 2)

# Initialize 2-nd level model
model = XGBClassifier(seed = 0, n_jobs = -1, learning_rate = 0.1, 
    n_estimators = 100, max_depth = 3)
    
# Fit 2-nd level model
model = model.fit(S_train, y_train)

# Predict
y_pred = model.predict(S_test)

# Final prediction score
print('Final prediction score: [%.8f]' % accuracy_score(y_test, y_pred))

task:   [classification]
metric: [accuracy_score]

model 0: [ExtraTreesClassifier]
    fold 0: [0.93548387]
    fold 1: [0.96666667]
    fold 2: [1.00000000]
    fold 3: [0.89655172]
    ----
    MEAN:   [0.95000000]

model 1: [RandomForestClassifier]
    fold 0: [0.87096774]
    fold 1: [0.96666667]
    fold 2: [1.00000000]
    fold 3: [0.93103448]
    ----
    MEAN:   [0.94166667]

model 2: [XGBClassifier]
    fold 0: [0.83870968]
    fold 1: [0.93333333]
    fold 2: [1.00000000]
    fold 3: [0.93103448]
    ----
    MEAN:   [0.92500000]
    
Final prediction score: [0.96666667]

Stacking concept

We want to predict train and test sets with some 1-st level model(s), and then use this predictions as features for 2-nd level model.
Any model can be used as 1-st level model or 2-nd level model.
To avoid overfitting (for train set) we use cross-validation technique and in each fold we predict out-of-fold part of train set.
The common practice is to use from 3 to 10 folds.
In each fold we predict full test set, so after completion of all folds we need to find mean (mode) of all test set predictions made in each fold.
As an example we look at stacking implemented with single 1-st level model and 3-fold cross-validation.
Tree pictures below describe three folds of cross-validation. After completion of all three folds we get single train feature and single test feature to use with 2-nd level model.
We can repeat this cycle using other 1-st level models to get more features for 2-nd level model.
At the bottom you can see GIF animation.

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
pic		pic
vecstack		vecstack
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vecstack

Installation

Brief example

Complete examples

Regression

Classification

Stacking concept

Animation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vecstack

Installation

Brief example

Complete examples

Regression

Classification

Stacking concept

Animation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages