Get started with managed collection
Stay organized with collections
Save and categorize content based on your preferences.
This document describes how to set up Google Cloud Managed Service for Prometheus
with managed collection. The setup is a minimal example of working ingestion,
using a Prometheus deployment that monitors an example application
and stores collected metrics in Monarch.
This document shows you how to do the following:
Set up your environment and command-line tools.
Set up managed collection for your cluster.
Configure a resource for target scraping and metric ingestion.
We recommend that you use managed collection; it reduces the complexity of
deploying, scaling, sharding, configuring, and maintaining the collectors.
Managed collection is supported for GKE and
all other Kubernetes environments.
Managed collection runs Prometheus-based collectors as a Daemonset and ensures
scalability by only scraping targets on colocated nodes. You configure the
collectors with lightweight custom resources to scrape exporters using pull
collection, then the collectors push the scraped data to
the central datastore Monarch. Google Cloud never directly accesses
your cluster to pull or scrape metric data; your collectors push data to
Google Cloud. For more information about managed and self-deployed data
collection, see Data collection with
Managed Service for Prometheus and Ingestion and querying with
managed and self-deployed collection.
Before you begin
This section describes the configuration needed for the tasks described
in this document.
Set up projects and tools
To use Google Cloud Managed Service for Prometheus, you need the following resources:
A Google Cloud project with the Cloud Monitoring API enabled.
If you don't have a Google Cloud project, then do the following:
In the search results, click through to "Cloud Monitoring API".
If "API enabled" is not displayed, then click the Enable button.
A Kubernetes cluster. If you do not have a Kubernetes cluster,
then follow the instructions in the Quickstart for
GKE.
You also need the following command-line tools:
gcloud
kubectl
The gcloud and kubectl tools are part of the
Google Cloud CLI. For information about installing
them, see Managing Google Cloud CLI components. To see the
gcloud CLI components you have installed, run the following command:
gcloud components list
Configure your environment
To avoid repeatedly entering your project ID or cluster name,
perform the following configuration:
Configure the command-line tools as follows:
Configure the gcloud CLI to refer to the ID of your
Google Cloud project:
gcloud config set project PROJECT_ID
If running on GKE, use gcloud CLI to set
your cluster:
Create the NAMESPACE_NAME Kubernetes namespace for resources you create
as part of the example application. We recommend using the namespace name
gmp-test when using this documentation to configure an example Prometheus
setup.
Create the namespace by running the following:
kubectl create ns NAMESPACE_NAME
Set up managed collection
You can use managed collection on both
GKE and non-GKE Kubernetes clusters.
After managed collection is enabled, the in-cluster components will be running
but no metrics are generated yet. PodMonitoring or ClusterPodMonitoring
resources are needed by these components to correctly scrape the metrics
endpoints. You must either deploy these resources with valid metrics endpoints
or enable one of the managed metrics packages, for example,
Kube state metrics,
built into GKE. For troubleshooting information, see
Ingestion-side problems.
Enabling managed collection installs the following components in your cluster:
The gmp-operator Deployment, which deploys the Kubernetes operator for
Managed Service for Prometheus.
If you are running in a GKE environment that does not enable
managed collection by default, then see
Enable managed collection manually.
Managed collection on GKE is automatically upgraded when new
in-cluster component versions are released.
Managed collection on GKE uses permissions granted to the default
Compute Engine service account. If you have a policy that modifies the
standard permissions on the default node service account, you might need to add
the Monitoring Metric Writer role
to continue.
Enable managed collection manually
If you are running in a GKE environment that does not enable
managed collection by default,
then you can enable managed collection by using the following:
The Managed Prometheus Bulk Cluster Enablement dashboard in Cloud Monitoring.
The Kubernetes Engine page in the Google Cloud console.
The Google Cloud CLI. To use the gcloud CLI, you must be
running GKE version 1.21.4-gke.300 or newer.
Terraform for Google Kubernetes Engine. To use Terraform to enable Managed Service for Prometheus,
you must be running GKE version 1.21.4-gke.300 or newer.
If you use the search bar to find this page, then select the result whose subheading is
Monitoring.
Use the filter bar to search for the Managed Prometheus Bulk Cluster Enablement entry, then select it.
To enable managed collection on one or more GKE clusters by
using the Managed Prometheus Bulk Cluster Enablement dashboard, do the following:
Select the checkbox for each GKE cluster on which you want
to enable managed collection.
Select Enable Selected.
Note: This dashboard shows only GKE clusters in the current
project, even if the project has multiple projects in its metric scope. For more
information, see
Overview of viewing metrics for multiple projects.
Kubernetes Engine UI
You can do the following by using the Google Cloud console:
Enable managed collection on an existing GKE cluster.
Create a new GKE cluster with managed collection enabled.
To update an existing cluster, do the following:
In the Google Cloud console, go to the Kubernetes clusters page:
If you use the search bar to find this page, then select the result whose subheading is
Kubernetes Engine.
Click on the name of the cluster.
In the Features list, locate the Managed Service for Prometheus
option. If it is listed as disabled, click
editEdit,
and then select Enable Managed Service for Prometheus.
Click Save changes.
To create a cluster with managed collection enabled, do the following:
In the Google Cloud console, go to the Kubernetes clusters page:
If you use the search bar to find this page, then select the result whose subheading is
Kubernetes Engine.
Click Create.
Click Configure for the Standard option.
In the navigation panel, click Features.
In the Operations section, select Enable Managed Service for
Prometheus.
Click Save.
gcloud CLI
You can do the following by using the gcloud CLI:
Enable managed collection on an existing GKE cluster.
Create a new GKE cluster with managed collection enabled.
These commands might take up to 5 minutes to complete.
First, set your project:
gcloud config set project PROJECT_ID
To update an existing cluster, run one of the following
update commands based on whether your cluster is zonal or regional:
gcloud container clusters update CLUSTER_NAME --enable-managed-prometheus --zone ZONE
gcloud container clusters update CLUSTER_NAME --enable-managed-prometheus --region REGION
To create a cluster with managed collection enabled, run the following command:
gcloud container clusters create CLUSTER_NAME --zone ZONE --enable-managed-prometheus
GKE Autopilot
Managed collection is on by default in
GKE Autopilot clusters
running GKE version 1.25 or greater. You can't turn off managed collection.
If your cluster fails to enable managed collection automatically when upgrading
to 1.25, you can manually enable it by running the update command in the
gcloud CLI section.
The example application emits
the example_requests_total counter metric and the example_random_numbers
histogram metric (among others) on its metrics port. The manifest for
the application defines three replicas.
To deploy the example application, run the following command:
To ingest the metric data emitted by the example application, Managed Service for Prometheus
uses target scraping. Target scraping and metrics ingestion are configured using
Kubernetes custom resources. The managed service uses
PodMonitoring custom resources (CRs).
A PodMonitoring CR scrapes targets only in the namespace the CR is deployed in.
To scrape targets in multiple namespaces, deploy the same PodMonitoring CR in
each namespace. You can verify the PodMonitoring resource is installed in the
intended namespace by running kubectl get podmonitoring -A.
The following manifest defines a PodMonitoring resource,
prom-example, in the NAMESPACE_NAME namespace. The resource
uses a Kubernetes label selector to find all
pods in the namespace that have the label app.kubernetes.io/name with the value prom-example.
The matching pods are scraped on a port named metrics, every 30 seconds, on
the /metrics HTTP path.
Your managed collector is now scraping the matching pods. You can view the
status of your scrape target by
enabling the target status feature.
To configure horizontal collection that applies to a range of pods across
all namespaces, use the ClusterPodMonitoring
resource. The ClusterPodMonitoring resource provides the same interface as the
PodMonitoring resource but does not limit discovered pods to a given namespace.
If you are running on GKE, then you can do the following:
To query the metrics ingested by the example application using PromQL in
Cloud Monitoring, see Query using Cloud Monitoring.
If you are running outside of GKE, then you need to
create a service account and authorize it to write your metric data,
as described in the following section.
Provide credentials explicitly
When running on GKE, the collecting Prometheus server
automatically retrieves credentials from the environment based on the
node's service account.
In non-GKE Kubernetes clusters, credentials must be explicitly
provided through the OperatorConfig resource in the
gmp-public namespace.
Set the context to your target project:
gcloud config set project PROJECT_ID
Create a service account:
gcloud iam service-accounts create gmp-test-sa
Grant the required permissions to the service account:
Save the file and close the editor. After the change is applied, the
pods are re-created and start authenticating to the metric
backend with the given service account.
Additional topics for managed collection
This section describes how to do the following:
Enable the target status feature for easier debugging.
Configure target scraping using Terraform.
Filter the data you export to the managed service.
Scrape Kubelet and cAdvisor metrics.
Convert your existing prom-operator resources for use with the managed
service.
Run managed collection outside of GKE.
Enabling the target status feature
Managed Service for Prometheus provides a way to check whether your targets
are being properly discovered and scraped by the collectors. This target status
report is meant to be a tool for debugging acute problems. We strongly
recommend only enabling this feature to investigate immediate issues. Leaving
target status reporting on in large clusters might cause the operator to run
out of memory and crash loop.
You can check the status of your targets in your PodMonitoring or
ClusterPodMonitoring resources by setting the features.targetStatus.enabled
value within the OperatorConfig resource to true, as shown in the following:
After a few seconds, the Status.Endpoint Statuses field appears on every
valid PodMonitoring or ClusterPodMonitoring resource, when configured.
If you have a PodMonitoring resource with the name prom-example
in the NAMESPACE_NAME namespace, then you can check the status by running
the following command:
Status.Conditions.Status is true when Managed Service for Prometheus
acknowledges and processes the PodMonitoring or ClusterPodMonitoring.
Status.Endpoint Statuses.Active Targets shows the number of scrape targets
that Managed Service for Prometheus counts on all collectors for this
PodMonitoring resource. In the example application, the prom-example
deployment has three replicas with a single metric target, so the value is 3.
If there are unhealthy targets, the Status.Endpoint Statuses.Unhealthy
Targets field appears.
Status.Endpoint Statuses.Collectors Fraction shows a value of 1
(meaning 100%) if all of the managed collectors are reachable by
Managed Service for Prometheus.
Status.Endpoint Statuses.Last Update Time shows the last updated time. When
the last update time is significantly longer than your desired scrape
interval time, the difference might indicate issues with your target or
cluster.
Status.Endpoint Statuses.Sample Groups field shows sample targets grouped
by common target labels injected by the collector. This value is useful for
debugging situations where your targets are not discovered. If all targets
are healthy and being collected, then the expected value for the Health
field is up and the value for the Last Scrape Duration Seconds field is
the usual duration for a typical target.
mTLS is commonly configured within zero trust environments, such as Istio
service mesh or Cloud Service Mesh.
To enable scraping endpoints secured using mTLS, set the
Spec.Endpoints[].Scheme field in your PodMonitoring resource to https. While
not recommended, you can set the Spec.Endpoints[].tls.insecureSkipVerify field
in your PodMonitoring resource to true to skip verifying the certificate
authority. Alternatively, you can configure Managed Service for Prometheus to
load certificates and keys from secret resources.
For example, the following Secret resource contains keys for the client
(cert), private key (key), and certificate authority (ca) certificates:
For reference documentation about all the Managed Service for Prometheus
mTLS options, see the
API reference documentation.
BasicAuth
To enable scraping endpoints secured using BasicAuth, set the
Spec.Endpoints[].BasicAuth field in your PodMonitoring resource with your
username and password. For other HTTP Authorization Header types, see
HTTP Authorization Header.
For example, the following Secret resource contains a key to store the password:
For reference documentation about all the Managed Service for Prometheus
BasicAuth options, see the
API reference documentation.
HTTP Authorization Header
To enable scraping endpoints secured using HTTP Authorization Headers, set the
Spec.Endpoints[].Authorization field in your PodMonitoring resource with the
type and credentials. For BasicAuth endpoints, use the
BasicAuth configuration instead.
For example, the following Secret resource contains a key to store the
credentials:
To configure a PodMonitoring resource that uses the prior Secret resource with a
client ID of foo and token URL of example.com/token, modify your resource to
add an oauth2 section:
If you collect a lot of data, you might want to prevent some time series from
being sent to Managed Service for Prometheus to keep down costs. You can do
this by using Prometheus relabeling rules
with a keep action for an allowlist or a drop action for a denylist. For
managed collection, this rule goes in the metricRelabeling section of your
PodMonitoring or ClusterPodMonitoring
resource.
For example, the following metric relabeling rule will filter out any metric
that begins with foo_bar_, foo_baz_, or foo_qux_:
metricRelabeling:
- action: drop
regex: foo_(bar|baz|qux)_.+
sourceLabels: [__name__]
The Cloud Monitoring Metrics Management page provides information
that can help you control the amount you spend on billable metrics
without affecting observability. The Metrics Management page reports the
following information:
Ingestion volumes for both byte- and sample-based billing, across metric
domains and for individual metrics.
Data about labels and cardinality of metrics.
Number of reads for each metric.
Use of metrics in alerting policies and custom dashboards.
The Kubelet exposes metrics about itself as well as cAdvisor metrics about
containers running on its node. You can configure managed collection to
scrape Kubelet and cAdvisor metrics by editing the OperatorConfig resource.
For instructions, see the exporter documentation for
Kubelet and cAdvisor.
Convert existing prometheus-operator resources
You can usually convert your existing prometheus-operator resources to
Managed Service for Prometheus managed collection PodMonitoring and
ClusterPodMonitoring resources.
For example, the ServiceMonitor resource
defines monitoring for a set of
services. The PodMonitoring resource serves a subset
of the fields served
by the ServiceMonitor resource. You can convert a ServiceMonitor CR to a
PodMonitoring CR by mapping the fields as described in the following table:
monitoring.coreos.com/v1
ServiceMonitor
Compatibility
monitoring.googleapis.com/v1
PodMonitoring
.ServiceMonitorSpec.Selector
Identical
.PodMonitoringSpec.Selector
.ServiceMonitorSpec.Endpoints[]
.TargetPort maps to .Port .Path: compatible .Interval: compatible .Timeout: compatible
.PodMonitoringSpec.Endpoints[]
.ServiceMonitorSpec.TargetLabels
PodMonitor must specify: .FromPod[].From pod label .FromPod[].To target label
.PodMonitoringSpec.TargetLabels
The following is a sample ServiceMonitor CR; the content in bold type is
replaced in the conversion, and the content in italic type maps directly:
The following is the analogous PodMonitoring CR, assuming that your service
and its pods are labeled with app=example-app. If this assumption
does not apply, then you need to use the label selectors of the underlying
Service resource.
The content in bold type has been replaced in the conversion:
apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
name: example-app
spec:
selector:
matchLabels:
app: example-app
endpoints:
- port: web
path: /stats
interval: 30s
targetLabels:
fromPod:
- from: foo # pod label from example-app Service pods.
to: foo
You can always continue to use your existing prometheus-operator resources and
deployment configs by using self-deployed collectors instead of
managed collectors. You can query metrics sent from both collector types, so you
might want to use self-deployed collectors for your existing Prometheus
deployments while using managed collectors for new Prometheus deployments.
Reserved labels
Managed Service for Prometheus automatically adds the following labels
to all metrics collected. These labels are used to uniquely identify a resource
in Monarch:
project_id: The identifier of the Google Cloud project associated with your
metric.
location: The physical location (Google Cloud region) where the
data is stored. This value is typically the region of your
GKE cluster. If data is
collected from an AWS or on-premises deployment, then the value might be
the closest Google Cloud region.
cluster: The name of the Kubernetes cluster associated with your metric.
namespace: The name of the Kubernetes namespace associated with your metric.
job: The job label of the Prometheus target, if known;
might be empty for rule-evaluation results.
instance: The instance label of the Prometheus target, if known;
might be empty for rule-evaluation results.
While not recommended when running on Google Kubernetes Engine, you can override the
project_id, location, and cluster labels by adding
them as args to the Deployment resource within
operator.yaml. If you use any reserved labels as metric labels,
Managed Service for Prometheus automatically relabels them by adding the
prefix exported_. This behavior matches how upstream Prometheus handles
conflicts with reserved labels.
Enable vertical pod autoscaling (VPA) for managed collection
If you are encountering Out of Memory (OOM) errors for the collector pods in
your cluster or if the default resource requests and limits for the collectors
otherwise don't meet your needs, then you can use vertical pod autoscaling to
dynamically allocate resources.
When you set the field scaling.vpa.enabled: true on the OperatorConfig
resource, the operator deploys a VerticalPodAutoscaler manifest in the cluster
that allows the
resource requests and limits
of the collector pods to be set automatically, based on usage.
To enable VPA for collector pods in Managed Service for Prometheus, run the
following command:
If the command completes successfully, then the operator sets up vertical pod
autoscaling for the collector pods. Out Of Memory errors result in an immediate
increase to the resource limits. If there are no OOM errors, then the first
adjustment to the resource requests and limits of the collector pods typically
occurs within 24 hours.
You might receive this error when attempting to enable VPA:
vertical pod autoscaling is not available - install vpa support and restart the
operator
To resolve this error, you need to first enable vertical pod autoscaling at the
cluster level:
Go to the Kubernetes Engine - Clusters page in the
Google Cloud console.
In the Google Cloud console, go to the Kubernetes clusters page:
If you use the search bar to find this page, then select the result whose subheading is
Kubernetes Engine.
Select the cluster you want to modify.
In the Automation section, edit the value of the Vertical Pod
Autoscaling option.
Select the Enable Vertical Pod Autoscaling checkbox, and then click
Save changes. This change restarts your cluster. The operator
restarts as a part of this process.
Retry the following command: kubectl -n gmp-public patch
operatorconfig/config -p '{"scaling":{"vpa":{"enabled":true}}}'
--type=merge to enable VPA for Managed Service for Prometheus.
To confirm that the OperatorConfig resource is edited successfully, open it
using the command kubectl -n gmp-public edit operatorconfig config. If
successful, your OperatorConfig includes the following section in bold:
If you have already enabled vertical pod autoscaling at the cluster level and
are still seeing the vertical pod autoscaling is not available - install vpa
support and restart the operator error, then the gmp-operator pod might need
to re-evaluate the cluster configuration. Do one of the following:
If you are running a Standard
cluster, run the following command to recreate the pod:
After the gmp-operator pod has restarted, follow the steps above to patch the
OperatorConfig once again.
If you are running an Autopilot cluster, then you can't restart
the gmp-operator pod manually. When you enable VPA for the managed
collectors in an Autopilot cluster, VPA automatically evicts
and recreates the collector pods to apply the new resource requests; no
cluster restart is required. If you see the vertical pod autoscaling is
not available error after enabling VPA, or encounter other issues with
VPA activation for Managed Service for Prometheus, then
contact support.
Vertical pod autoscaling works best when ingesting steady numbers of samples,
divided equally across nodes. If the metrics load is irregular or spiky, or if
metrics load varies greatly between nodes, VPA might not be an efficient
solution.
Configure statsd_exporter and other exporters that report metrics centrally
If you use the statsd_exporter for Prometheus, Envoy for Istio, the SNMP
exporter, the Prometheus Pushgateway, kube-state-metrics, or you otherwise
have a similar exporter that intermediates and reports metrics on behalf of
other resources running in your environment, then you need to make some small
changes for your exporter to work with Managed Service for Prometheus.
Select Kubernetes Engine in the Google Cloud console, then select
Clusters.
Locate the cluster for which you want to disable managed collection and
click its name.
On the Details tab, scroll down to Features and change the state to
Disabled by using the edit button.
To disable managed collection deployed by using Terraform, specify
enabled = false in the managed_prometheus section of the
google_container_cluster resource.
To disable managed collection deployed by using kubectl, run the following
command:
Disabling managed collection causes your cluster to stop sending new data to
Managed Service for Prometheus. Taking this action does not delete any
existing metrics data already stored in the system.
Disabling managed collection also deletes the gmp-public namespace and
any resources within it, including any exporters
installed in that namespace.
Run managed collection outside of GKE
In GKE environments, you can run managed collection without
further configuration. In other Kubernetes environments,
you need to explicitly provide credentials, a project-id value to contain your
metrics, a location value (Google Cloud region) where your metrics will be
stored, and a cluster value to save the name of the cluster in which the
collector is running.
As gcloud does not work outside of Google Cloud environments, you need to
deploy using kubectl instead. Unlike with gcloud,
deploying managed collection using kubectl does not automatically upgrade your
cluster when a new version is available. Remember to watch the releases
page for new versions and manually upgrade
by re-running the kubectl commands with the new version.
You can provide a service account key by modifying the
OperatorConfig resource within operator.yaml as described in Provide
credentials explicitly. You can provide
project-id, location, and cluster values by adding them as args to the
Deployment resource within operator.yaml.
We recommend choosing project-id based on your planned tenancy model for
reads. Pick a project to store metrics in based on how you plan to organize
reads later with metrics scopes. If you don't care, you
can put everything into one project.
For location, we recommend choosing the nearest Google Cloud region to your
deployment. The further the chosen Google Cloud region is from your deployment,
the more write latency you'll have and the more you'll be affected by potential
networking issues. You might want to consult this list of regions across
multiple clouds. If
you don't care, you can put everything into one Google Cloud region. You can't
use global as your location.
For cluster, we recommend choosing the name of the cluster in which the
operator is deployed.
When properly configured, your OperatorConfig should look like this:
This example assumes you have set the REGION variable to a value like
us-central1, for example.
Running Managed Service for Prometheus outside of Google Cloud incurs
data transfer fees. There are fees to transfer data into Google Cloud, and
you might incur fees to transfer data out of another cloud. You can minimize
these costs by enabling gzip compression over the wire through the
OperatorConfig. Add the text shown in bold to the resource:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2026-06-09 UTC."],[],[]]