v3.1.2 — Now with Transformers & AutoML

SuperML Java Framework

A modular machine learning library for Java with 21 specialized components, 20+ algorithms, and 400,000+ predictions per second. Inspired by scikit-learn, built for enterprise Java.

SuperML Java Framework

Why SuperML Java?

Everything Java Developers Need for ML

21 modules, 172+ tests, all compiling successfully — production-ready from day one

20+ Algorithms

Linear models, tree ensembles (XGBoost, Random Forest, Gradient Boosting), neural networks, BERT/GPT transformers, and clustering — all in one framework.

AutoML & Hyperparameter Tuning

Automated model selection and hyperparameter optimization so you spend less time tuning and more time building.

400K+ Predictions / Second

XGBoost batch inference at 400,000+ predictions/sec with 6.88µs single-prediction latency. Pipeline throughput of 35,714 predictions/sec.

Pipeline System

Chain preprocessing, feature engineering, and model training into reproducible, serializable pipelines — just like scikit-learn.

PMML Export

Export trained models to PMML for cross-platform deployment and interoperability with other ML systems.

Dual Visualization

GUI and ASCII terminal visualization modes for model evaluation, training curves, and data exploration without leaving Java.

Model Persistence & Drift Detection

Save and load trained models with built-in serialization. Monitor production models for data drift automatically.

Enterprise Ready

Thread-safe, 21/21 modules compile successfully, 172+ passing tests. Java 8+ compatible including Java 11, 17, and 21.

Get Started in 5 Minutes

Add SuperML Java to your project and run your first model

Step 1: Add Maven Dependency

Add SuperML Java to your pom.xml:

<dependency>
  <groupId>org.superml</groupId>
  <artifactId>superml-core</artifactId>
  <version>3.1.2</version>
</dependency>

Gradle:
implementation 'org.superml:superml-core:3.1.2'

Step 2: Load Your Dataset

Use built-in datasets or load your own:

var dataset = Datasets.loadIris();
var split = DataLoaders.trainTestSplit(
  dataset.X, dataset.y, 0.2, 42);

Step 3: Train a Model

Fit any algorithm with a consistent API:

var forest = new RandomForest(100, 10);
forest.fit(split.XTrain, split.yTrain);
double[] preds = forest.predict(split.XTest);

Step 4: Evaluate & Export

Measure accuracy and export for deployment:

double accuracy = Metrics.accuracy(
  split.yTest, preds);
System.out.printf("Accuracy: %.2f%%\n",
  accuracy * 100);

Clean API

Familiar Patterns, Java Native

If you know scikit-learn, you already know SuperML Java. Consistent fit/predict API across all 20+ algorithms.

Random Forest — Full Example

import org.superml.datasets.Datasets;
import org.superml.datasets.DataLoaders;
import org.superml.ensemble.RandomForest;
import org.superml.metrics.Metrics;
import java.util.Arrays;

var dataset = Datasets.loadIris();
var split = DataLoaders.trainTestSplit(
    dataset.X,
    Arrays.stream(dataset.y).asDoubleStream().toArray(),
    0.2, 42);

var forest = new RandomForest(100, 10);
forest.fit(split.XTrain, split.yTrain);
double[] predictions = forest.predict(split.XTest);
double accuracy = Metrics.accuracy(split.yTest, predictions);
System.out.printf("Accuracy: %.2f%%\n", accuracy * 100);

Logistic Regression

var classifier = new LogisticRegression()
    .setMaxIter(1000);
classifier.fit(split.XTrain, split.yTrain);
double[] preds = classifier.predict(split.XTest);
System.out.printf("Accuracy: %.2f%%\n",
    Metrics.accuracy(split.yTest, preds) * 100);

20+ Algorithms

Comprehensive Algorithm Library

From classic linear models to modern transformer architectures — all with a consistent Java API

Linear Models

Logistic Regression, Linear Regression, Ridge, Lasso, SGD Classifier & Regressor — 6 algorithms with regularization support.

Tree Ensembles

Decision Trees, Random Forest, Gradient Boosting, XGBoost — 89%+ accuracy with 164ms training and 400K+ predictions/sec.

Neural Networks

MLP, CNN, RNN architectures with 95%+ accuracy on benchmark tasks. Backpropagation with multiple activation functions.

Transformer Models

BERT-style encoder, GPT-style decoder, and seq2seq models for NLP tasks — modern deep learning in pure Java.

Clustering

K-Means with k-means++ initialization for smarter centroid selection and faster convergence.

AutoML

Automated hyperparameter optimization and model selection. Let the framework find the best configuration for your data.

Preprocessing & Pipelines

Scalers, encoders, imputers, and feature selectors — chainable into reproducible Pipeline objects.

Cross-Validation

K-fold, stratified, and time-series cross-validation with built-in scoring metrics.

PMML Export

Export any trained model to PMML format for deployment in scoring engines and other ML platforms.

Benchmarks

Built for Production Performance

Measured on standard hardware — real numbers from the test suite

400,000+ predictions/sec

XGBoost batch inference throughput. 6.88 microseconds per single prediction.

89%+ accuracy (XGBoost)

On standard classification benchmarks with 2.5 second training time.

95%+ accuracy (Neural Networks)

MLP/CNN on benchmark tasks. Random Forest trains in 164ms.

172+ tests passing

All 21/21 modules compile and pass their test suites. ~4 minute full build.

Learn & Explore

Comprehensive Documentation

Everything you need to go from zero to production

Quick Start Guide

5-minute setup walkthrough covering installation, your first model, and evaluation.

API Reference

Complete JavaDoc for all 21 modules with examples for every class and method.

Algorithm Guides

Deep-dive guides for each algorithm covering parameters, use cases, and tuning tips.

Advanced Topics

Inference optimization, model persistence, drift detection, Kaggle integration, and transformer fine-tuning.

FAQs

Frequently Asked Questions

Common questions about SuperML Java v3.1.2

What Java versions are supported?

SuperML Java supports Java 8+ and is tested on Java 8, 11, 17, and 21. It also works with Kotlin and Scala on the JVM.

What is the Maven artifact ID?

The core artifact is org.superml:superml-core:3.1.2, available on Maven Central. Individual modules can be added separately for a lighter dependency footprint.

How does it compare to Weka or DL4J?

SuperML Java offers a scikit-learn–style API that is more consistent and easier to use than Weka, with broader algorithm coverage including modern transformers. It is lighter than DL4J while still supporting production-grade neural networks.

Can I use it in production?

Yes. All 21 modules are production-ready with thread-safe operations, model drift detection, PMML export, and 172+ passing tests. The framework has been benchmarked at 400,000+ predictions per second.

Does it support AutoML?

Yes. v3.1.2 includes AutoML with automated hyperparameter optimization so you can find the best model configuration without manual tuning.

How do I contribute?

SuperML Java is open source on GitHub. Submit issues, feature requests, or pull requests — contributions are welcome from the community.

Start Building ML Applications in Java

21 modules, 20+ algorithms, 400K+ predictions/sec — everything you need, in Java.