ML Kit provides object detection and tracking in images and videos, identifying up to five objects per image and assigning unique IDs for tracking in video streams.
Setting up ML Kit requires Android API level 21 or higher and including the object-detection dependency from Google's Maven repository.
Developers can configure detection mode, multiple object detection, and classification using ObjectDetectorOptions and process images from various sources like files, byte buffers, or bitmaps.
Best practices include optimizing for real-time performance by limiting detection frequency and disabling unnecessary features like classification if not needed.
User experience can be enhanced by guiding users on capturing suitable images for object detection and implementing handling for unknown objects when using classification.
You can use ML Kit to detect and track objects in successive video frames.
When you pass an image to ML Kit, it detects up to five objects in the image along with the position of each object in the image. When detecting objects in video streams, each object has a unique ID that you can use to track the object from frame to frame. You can also optionally enable coarse object classification, which labels objects with broad category descriptions.
build.gradle file, make sure to include
Google's Maven repository in both your buildscript and
allprojects sections.app/build.gradle:
dependencies { // ... implementation 'com.google.mlkit:object-detection:17.0.2' }
To detect and track objects, first create an instance of ObjectDetector and
optionally specify any detector settings that you want to change from the
default.
Configure the object detector for your use case with an
ObjectDetectorOptions object. You can change the following
settings:
| Object Detector Settings | |
|---|---|
| Detection mode |
STREAM_MODE (default) | SINGLE_IMAGE_MODE
In In |
| Detect and track multiple objects |
false (default) | true
Whether to detect and track up to five objects or only the most prominent object (default). |
| Classify objects |
false (default) | true
Whether or not to classify detected objects into coarse categories. When enabled, the object detector classifies objects into the following categories: fashion goods, food, home goods, places, and plants. |
The object detection and tracking API is optimized for these two core use cases:
To configure the API for these use cases:
// Live detection and tracking val options = ObjectDetectorOptions.Builder() .setDetectorMode(ObjectDetectorOptions.STREAM_MODE) .enableClassification() // Optional .build() // Multiple object detection in static images val options = ObjectDetectorOptions.Builder() .setDetectorMode(ObjectDetectorOptions.SINGLE_IMAGE_MODE) .enableMultipleObjects() .enableClassification() // Optional .build()
// Live detection and tracking ObjectDetectorOptions options = new ObjectDetectorOptions.Builder() .setDetectorMode(ObjectDetectorOptions.STREAM_MODE) .enableClassification() // Optional .build(); // Multiple object detection in static images ObjectDetectorOptions options = new ObjectDetectorOptions.Builder() .setDetectorMode(ObjectDetectorOptions.SINGLE_IMAGE_MODE) .enableMultipleObjects() .enableClassification() // Optional .build();
Get an instance of ObjectDetector:
val objectDetector = ObjectDetection.getClient(options)
ObjectDetector objectDetector = ObjectDetection.getClient(options);
ObjectDetector
instance's process() method.
The object detector runs directly from a Bitmap, NV21 ByteBuffer or a
YUV_420_888 media.Image. Constructing an InputImage from those sources
are recommended if you have direct access to one of them. If you construct
an InputImage from other sources, we will handle the conversion
internally for you and it might be less efficient.
For each frame of video or image in a sequence, do the following:
You can create an InputImage
object from different sources, each is explained below.
media.Image
To create an InputImage
object from a media.Image object, such as when you capture an image from a
device's camera, pass the media.Image object and the image's
rotation to InputImage.fromMediaImage().
If you use the
CameraX library, the OnImageCapturedListener and
ImageAnalysis.Analyzer classes calculate the rotation value
for you.
private class YourImageAnalyzer : ImageAnalysis.Analyzer { override fun analyze(imageProxy: ImageProxy) { val mediaImage = imageProxy.image if (mediaImage != null) { val image = InputImage.fromMediaImage(mediaImage, imageProxy.imageInfo.rotationDegrees) // Pass image to an ML Kit Vision API // ... } } }
private class YourAnalyzer implements ImageAnalysis.Analyzer { @Override public void analyze(ImageProxy imageProxy) { Image mediaImage = imageProxy.getImage(); if (mediaImage != null) { InputImage image = InputImage.fromMediaImage(mediaImage, imageProxy.getImageInfo().getRotationDegrees()); // Pass image to an ML Kit Vision API // ... } } }
If you don't use a camera library that gives you the image's rotation degree, you can calculate it from the device's rotation degree and the orientation of camera sensor in the device:
private val ORIENTATIONS = SparseIntArray() init { ORIENTATIONS.append(Surface.ROTATION_0, 0) ORIENTATIONS.append(Surface.ROTATION_90, 90) ORIENTATIONS.append(Surface.ROTATION_180, 180) ORIENTATIONS.append(Surface.ROTATION_270, 270) } /** * Get the angle by which an image must be rotated given the device's current * orientation. */ @RequiresApi(api = Build.VERSION_CODES.LOLLIPOP) @Throws(CameraAccessException::class) private fun getRotationCompensation(cameraId: String, activity: Activity, isFrontFacing: Boolean): Int { // Get the device's current rotation relative to its "native" orientation. // Then, from the ORIENTATIONS table, look up the angle the image must be // rotated to compensate for the device's rotation. val deviceRotation = activity.windowManager.defaultDisplay.rotation var rotationCompensation = ORIENTATIONS.get(deviceRotation) // Get the device's sensor orientation. val cameraManager = activity.getSystemService(CAMERA_SERVICE) as CameraManager val sensorOrientation = cameraManager .getCameraCharacteristics(cameraId) .get(CameraCharacteristics.SENSOR_ORIENTATION)!! if (isFrontFacing) { rotationCompensation = (sensorOrientation + rotationCompensation) % 360 } else { // back-facing rotationCompensation = (sensorOrientation - rotationCompensation + 360) % 360 } return rotationCompensation }
private static final SparseIntArray ORIENTATIONS = new SparseIntArray(); static { ORIENTATIONS.append(Surface.ROTATION_0, 0); ORIENTATIONS.append(Surface.ROTATION_90, 90); ORIENTATIONS.append(Surface.ROTATION_180, 180); ORIENTATIONS.append(Surface.ROTATION_270, 270); } /** * Get the angle by which an image must be rotated given the device's current * orientation. */ @RequiresApi(api = Build.VERSION_CODES.LOLLIPOP) private int getRotationCompensation(String cameraId, Activity activity, boolean isFrontFacing) throws CameraAccessException { // Get the device's current rotation relative to its "native" orientation. // Then, from the ORIENTATIONS table, look up the angle the image must be // rotated to compensate for the device's rotation. int deviceRotation = activity.getWindowManager().getDefaultDisplay().getRotation(); int rotationCompensation = ORIENTATIONS.get(deviceRotation); // Get the device's sensor orientation. CameraManager cameraManager = (CameraManager) activity.getSystemService(CAMERA_SERVICE); int sensorOrientation = cameraManager .getCameraCharacteristics(cameraId) .get(CameraCharacteristics.SENSOR_ORIENTATION); if (isFrontFacing) { rotationCompensation = (sensorOrientation + rotationCompensation) % 360; } else { // back-facing rotationCompensation = (sensorOrientation - rotationCompensation + 360) % 360; } return rotationCompensation; }
Then, pass the media.Image object and the
rotation degree value to InputImage.fromMediaImage():
val image = InputImage.fromMediaImage(mediaImage, rotation)
InputImage image = InputImage.fromMediaImage(mediaImage, rotation);
To create an InputImage
object from a file URI, pass the app context and file URI to
InputImage.fromFilePath(). This is useful when you
use an ACTION_GET_CONTENT intent to prompt the user to select
an image from their gallery app.
val image: InputImage try { image = InputImage.fromFilePath(context, uri) } catch (e: IOException) { e.printStackTrace() }
InputImage image; try { image = InputImage.fromFilePath(context, uri); } catch (IOException e) { e.printStackTrace(); }
ByteBuffer or ByteArrayTo create an InputImage
object from a ByteBuffer or a ByteArray, first calculate the image
rotation degree as previously described for media.Image input.
Then, create the InputImage object with the buffer or array, together with image's
height, width, color encoding format, and rotation degree:
val image = InputImage.fromByteBuffer( byteBuffer, /* image width */ 480, /* image height */ 360, rotationDegrees, InputImage.IMAGE_FORMAT_NV21 // or IMAGE_FORMAT_YV12 ) // Or: val image = InputImage.fromByteArray( byteArray, /* image width */ 480, /* image height */ 360, rotationDegrees, InputImage.IMAGE_FORMAT_NV21 // or IMAGE_FORMAT_YV12 )
InputImage image = InputImage.fromByteBuffer(byteBuffer, /* image width */ 480, /* image height */ 360, rotationDegrees, InputImage.IMAGE_FORMAT_NV21 // or IMAGE_FORMAT_YV12 ); // Or: InputImage image = InputImage.fromByteArray( byteArray, /* image width */480, /* image height */360, rotation, InputImage.IMAGE_FORMAT_NV21 // or IMAGE_FORMAT_YV12 );
BitmapTo create an InputImage
object from a Bitmap object, make the following declaration:
val image = InputImage.fromBitmap(bitmap, 0)
InputImage image = InputImage.fromBitmap(bitmap, rotationDegree);
The image is represented by a Bitmap object together with rotation degrees.
process() method:
objectDetector.process(image) .addOnSuccessListener { detectedObjects -> // Task completed successfully // ... } .addOnFailureListener { e -> // Task failed with an exception // ... }
objectDetector.process(image) .addOnSuccessListener( new OnSuccessListener<List<DetectedObject>>() { @Override public void onSuccess(List<DetectedObject> detectedObjects) { // Task completed successfully // ... } }) .addOnFailureListener( new OnFailureListener() { @Override public void onFailure(@NonNull Exception e) { // Task failed with an exception // ... } });
If the call to process() succeeds, a list of DetectedObjects is passed to
the success listener.
Each DetectedObject contains the following properties:
| Bounding box | A Rect that indicates the position of the object in the
image. |
||||||
| Tracking ID | An integer that identifies the object across images. Null in SINGLE_IMAGE_MODE. | ||||||
| Labels |
|
for (detectedObject in detectedObjects) { val boundingBox = detectedObject.boundingBox val trackingId = detectedObject.trackingId for (label in detectedObject.labels) { val text = label.text if (PredefinedCategory.FOOD == text) { ... } val index = label.index if (PredefinedCategory.FOOD_INDEX == index) { ... } val confidence = label.confidence } }
// The list of detected objects contains one item if multiple // object detection wasn't enabled. for (DetectedObject detectedObject : detectedObjects) { Rect boundingBox = detectedObject.getBoundingBox(); Integer trackingId = detectedObject.getTrackingId(); for (Label label : detectedObject.getLabels()) { String text = label.getText(); if (PredefinedCategory.FOOD.equals(text)) { ... } int index = label.getIndex(); if (PredefinedCategory.FOOD_INDEX == index) { ... } float confidence = label.getConfidence(); } }
For the best user experience, follow these guidelines in your app:
Also, check out the ML Kit Material Design showcase app and the Material Design Patterns for machine learning-powered features collection.
If you want to use object detection in a real-time application, follow these guidelines to achieve the best framerates:
When you use streaming mode in a real-time application, don't use multiple object detection, as most devices won't be able to produce adequate framerates.
Disable classification if you don't need it.
Camera or
camera2 API,
throttle calls to the detector. If a new video
frame becomes available while the detector is running, drop the frame. See the
VisionProcessorBase class in the quickstart sample app for an example.
CameraX API,
be sure that backpressure strategy is set to its default value
ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST.
This guarantees only one image will be delivered for analysis at a time. If more images are
produced when the analyzer is busy, they will be dropped automatically and not queued for
delivery. Once the image being analyzed is closed by calling
ImageProxy.close(), the next latest image will be delivered.
CameraSourcePreview and
GraphicOverlay classes in the quickstart sample app for an example.
ImageFormat.YUV_420_888 format. If you use the older Camera API, capture images in
ImageFormat.NV21 format.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-06-11 UTC.