v3.1.2 — Now with Transformers & AutoML
SuperML Java Framework
A modular machine learning library for Java with 21 specialized components, 20+ algorithms, and 400,000+ predictions per second. Inspired by scikit-learn, built for enterprise Java.

Why SuperML Java?
Everything Java Developers Need for ML
21 modules, 172+ tests, all compiling successfully — production-ready from day one
20+ Algorithms
Linear models, tree ensembles (XGBoost, Random Forest, Gradient Boosting), neural networks, BERT/GPT transformers, and clustering — all in one framework.
AutoML & Hyperparameter Tuning
Automated model selection and hyperparameter optimization so you spend less time tuning and more time building.
400K+ Predictions / Second
XGBoost batch inference at 400,000+ predictions/sec with 6.88µs single-prediction latency. Pipeline throughput of 35,714 predictions/sec.
Pipeline System
Chain preprocessing, feature engineering, and model training into reproducible, serializable pipelines — just like scikit-learn.
PMML Export
Export trained models to PMML for cross-platform deployment and interoperability with other ML systems.
Dual Visualization
GUI and ASCII terminal visualization modes for model evaluation, training curves, and data exploration without leaving Java.
Model Persistence & Drift Detection
Save and load trained models with built-in serialization. Monitor production models for data drift automatically.
Enterprise Ready
Thread-safe, 21/21 modules compile successfully, 172+ passing tests. Java 8+ compatible including Java 11, 17, and 21.
Get Started in 5 Minutes
Add SuperML Java to your project and run your first model
Step 1: Add Maven Dependency
Add SuperML Java to your pom.xml:<dependency>
<groupId>org.superml</groupId>
<artifactId>superml-core</artifactId>
<version>3.1.2</version>
</dependency>
Gradle:implementation 'org.superml:superml-core:3.1.2'
Step 2: Load Your Dataset
Use built-in datasets or load your own:var dataset = Datasets.loadIris();
var split = DataLoaders.trainTestSplit(
dataset.X, dataset.y, 0.2, 42);
Step 3: Train a Model
Fit any algorithm with a consistent API:var forest = new RandomForest(100, 10);
forest.fit(split.XTrain, split.yTrain);
double[] preds = forest.predict(split.XTest);
Step 4: Evaluate & Export
Measure accuracy and export for deployment:double accuracy = Metrics.accuracy(
split.yTest, preds);
System.out.printf("Accuracy: %.2f%%\n",
accuracy * 100);
Clean API
Familiar Patterns, Java Native
If you know scikit-learn, you already know SuperML Java. Consistent fit/predict API across all 20+ algorithms.
Random Forest — Full Example
import org.superml.datasets.Datasets; import org.superml.datasets.DataLoaders; import org.superml.ensemble.RandomForest; import org.superml.metrics.Metrics; import java.util.Arrays; var dataset = Datasets.loadIris(); var split = DataLoaders.trainTestSplit( dataset.X, Arrays.stream(dataset.y).asDoubleStream().toArray(), 0.2, 42); var forest = new RandomForest(100, 10); forest.fit(split.XTrain, split.yTrain); double[] predictions = forest.predict(split.XTest); double accuracy = Metrics.accuracy(split.yTest, predictions); System.out.printf("Accuracy: %.2f%%\n", accuracy * 100);
Logistic Regression
var classifier = new LogisticRegression() .setMaxIter(1000); classifier.fit(split.XTrain, split.yTrain); double[] preds = classifier.predict(split.XTest); System.out.printf("Accuracy: %.2f%%\n", Metrics.accuracy(split.yTest, preds) * 100);
20+ Algorithms
Comprehensive Algorithm Library
From classic linear models to modern transformer architectures — all with a consistent Java API
Linear Models
Logistic Regression, Linear Regression, Ridge, Lasso, SGD Classifier & Regressor — 6 algorithms with regularization support.
Tree Ensembles
Decision Trees, Random Forest, Gradient Boosting, XGBoost — 89%+ accuracy with 164ms training and 400K+ predictions/sec.
Neural Networks
MLP, CNN, RNN architectures with 95%+ accuracy on benchmark tasks. Backpropagation with multiple activation functions.
Transformer Models
BERT-style encoder, GPT-style decoder, and seq2seq models for NLP tasks — modern deep learning in pure Java.
Clustering
K-Means with k-means++ initialization for smarter centroid selection and faster convergence.
AutoML
Automated hyperparameter optimization and model selection. Let the framework find the best configuration for your data.
Preprocessing & Pipelines
Scalers, encoders, imputers, and feature selectors — chainable into reproducible Pipeline objects.
Cross-Validation
K-fold, stratified, and time-series cross-validation with built-in scoring metrics.
PMML Export
Export any trained model to PMML format for deployment in scoring engines and other ML platforms.
Benchmarks
Built for Production Performance
Measured on standard hardware — real numbers from the test suite
400,000+ predictions/sec
XGBoost batch inference throughput. 6.88 microseconds per single prediction.
89%+ accuracy (XGBoost)
On standard classification benchmarks with 2.5 second training time.
95%+ accuracy (Neural Networks)
MLP/CNN on benchmark tasks. Random Forest trains in 164ms.
172+ tests passing
All 21/21 modules compile and pass their test suites. ~4 minute full build.
Learn & Explore
Comprehensive Documentation
Everything you need to go from zero to production
Quick Start Guide
5-minute setup walkthrough covering installation, your first model, and evaluation.
API Reference
Complete JavaDoc for all 21 modules with examples for every class and method.
Algorithm Guides
Deep-dive guides for each algorithm covering parameters, use cases, and tuning tips.
Advanced Topics
Inference optimization, model persistence, drift detection, Kaggle integration, and transformer fine-tuning.
FAQs
Frequently Asked Questions
Common questions about SuperML Java v3.1.2
What Java versions are supported?
SuperML Java supports Java 8+ and is tested on Java 8, 11, 17, and 21. It also works with Kotlin and Scala on the JVM.
What is the Maven artifact ID?
The core artifact is org.superml:superml-core:3.1.2, available on Maven Central. Individual modules can be added separately for a lighter dependency footprint.
How does it compare to Weka or DL4J?
SuperML Java offers a scikit-learn–style API that is more consistent and easier to use than Weka, with broader algorithm coverage including modern transformers. It is lighter than DL4J while still supporting production-grade neural networks.
Can I use it in production?
Yes. All 21 modules are production-ready with thread-safe operations, model drift detection, PMML export, and 172+ passing tests. The framework has been benchmarked at 400,000+ predictions per second.
Does it support AutoML?
Yes. v3.1.2 includes AutoML with automated hyperparameter optimization so you can find the best model configuration without manual tuning.
How do I contribute?
SuperML Java is open source on GitHub. Submit issues, feature requests, or pull requests — contributions are welcome from the community.
Start Building ML Applications in Java
21 modules, 20+ algorithms, 400K+ predictions/sec — everything you need, in Java.