Well-Architected Framework: Cost optimization pillar

Last reviewed 2025-02-14 UTC
This page provides a one-page view of all of the pages in the cost optimization pillar of the Well-Architected Framework. You can print this page or save it in PDF format by using your browser's print function. This page doesn't have a table of contents. You can't use the links on this page to navigate within the page.

The cost optimization pillar in the Google Cloud Well-Architected Framework describes principles and recommendations to optimize the cost of your workloads in Google Cloud.

The intended audience includes the following:

The cost models for on-premises and cloud workloads differ significantly. On-premises IT costs include capital expenditure (CapEx) and operational expenditure (OpEx). On-premises hardware and software assets are acquired and the acquisition costs are depreciated over the operating life of the assets. In the cloud, the costs for most cloud resources are treated as OpEx, where costs are incurred when the cloud resources are consumed. This fundamental difference underscores the importance of the following core principles of cost optimization.

For cost optimization principles and recommendations that are specific to AI and ML workloads, see AI and ML perspective: Cost optimization in the Well-Architected Framework.

Core principles

The recommendations in the cost optimization pillar of the Well-Architected Framework are mapped to the following core principles:

These principles are closely aligned with the core tenets of cloud FinOps. FinOps is relevant to any organization, regardless of its size or maturity in the cloud. By adopting these principles and following the related recommendations, you can control and optimize costs throughout your journey in the cloud.

Contributors

Author: Nicolas Pintaux | Customer Engineer, Application Modernization Specialist

Other contributors:

Align cloud spending with business value

This principle in the cost optimization pillar of the Google Cloud Well-Architected Framework provides recommendations to align your use of Google Cloud resources with your organization's business goals.

Principle overview

To effectively manage cloud costs, you need to maximize the business value that the cloud resources provide and minimize the total cost of ownership (TCO). When you evaluate the resource options for your cloud workloads, consider not only the cost of provisioning and using the resources, but also the cost of managing them. For example, virtual machines (VMs) on Compute Engine might be a cost-effective option for hosting applications. However, when you consider the overhead to maintain, patch, and scale the VMs, the TCO can increase. On the other hand, serverless services like Cloud Run can offer greater business value. The lower operational overhead lets your team focus on core activities and helps to increase agility.

To ensure that your cloud resources deliver optimal value, evaluate the following factors:

By aligning cloud spending with business value, you get the following benefits:

Recommendations

To align cloud spending with business objectives, consider the following recommendations.

Prioritize managed services and serverless products

Whenever possible, choose managed services and serverless products to reduce operational overhead and maintenance costs. This choice lets your teams concentrate on their core business activities. They can accelerate the delivery of new features and functionalities, and help drive innovation and value.

The following are examples of how you can implement this recommendation:

Balance cost efficiency with business agility

Controlling costs and optimizing resource utilization are important goals. However, you must balance these goals with the need for flexible infrastructure that lets you innovate rapidly, respond quickly to changes, and deliver value faster. The following are examples of how you can achieve this balance:

Enable self-service optimization

Encourage a culture of experimentation and exploration by providing your teams with self-service cost optimization tools, observability tools, and resource management platforms. Enable them to provision, manage, and optimize their cloud resources autonomously. This approach helps to foster a sense of ownership, accelerate innovation, and ensure that teams can respond quickly to changing needs while being mindful of cost efficiency.

Adopt and implement FinOps

Adopt FinOps to establish a collaborative environment where everyone is empowered to make informed decisions that balance cost and value. FinOps fosters financial accountability and promotes effective cost optimization in the cloud.

Promote a value-driven and TCO-aware mindset

Encourage your team members to adopt a holistic attitude toward cloud spending, with an emphasis on TCO and not just upfront costs. Use techniques like value stream mapping to visualize and analyze the flow of value through your software delivery process and to identify areas for improvement. Implement unit costing for your applications and services to gain a granular understanding of cost drivers and discover opportunities for cost optimization. For more information, see Maximize business value with cloud FinOps.

Foster a culture of cost awareness

This principle in the cost optimization pillar of the Google Cloud Well-Architected Framework provides recommendations to promote cost awareness across your organization and ensure that team members have the cost information that they need to make informed decisions.

Conventionally, the responsibility for cost management might be centralized to a few select stakeholders and primarily focused on initial project architecture decisions. However, team members across all cloud user roles (analyst, architect, developer, or administrator) can help to reduce the cost of your resources in Google Cloud. By sharing cost data appropriately, you can empower team members to make cost-effective decisions throughout their development and deployment processes.

Principle overview

Stakeholders across various roles – product owners, developers, deployment engineers, administrators, and financial analysts – need visibility into relevant cost data and its relationship to business value. When provisioning and managing cloud resources, they need the following data:

Every individual might not need access to raw cost data. However, promoting cost awareness across all roles is crucial because individual decisions can affect costs.

By promoting cost visibility and ensuring clear ownership of cost management practices, you ensure that everyone is aware of the financial implications of their choices and everyone actively contributes to the organization's cost optimization goals. Whether through a centralized FinOps team or a distributed model, establishing accountability is crucial for effective cost optimization efforts.

Recommendations

To promote cost awareness and ensure that your team members have the cost information that they need to make informed decisions, consider the following recommendations.

Provide organization-wide cost visibility

To achieve organization-wide cost visibility, the teams that are responsible for cost management can take the following actions:

Understand how cloud resources are billed

Pricing for Google Cloud resources might vary across regions. Some resources are billed monthly at a fixed price, and others might be billed based on usage. To understand how Google Cloud resources are billed, use the Google Cloud pricing calculator and product-specific pricing information (for example, Google Kubernetes Engine (GKE) pricing).

Understand resource-based cost optimization options

For each type of cloud resource that you plan to use, explore strategies to optimize utilization and efficiency. The strategies include rightsizing, autoscaling, and adopting serverless technologies where appropriate. The following are examples of cost optimization options for a few Google Cloud products:

Understand discount-based cost optimization options

Familiarize yourself with the discount programs that Google Cloud offers, such as the following examples:

Evaluate and choose the discounts options that align with your workload characteristics and usage patterns.

Incorporate cost estimates into architecture blueprints

Encourage teams to develop architecture blueprints that include cost estimates for different deployment options and configurations. This practice empowers teams to compare costs proactively and make informed decisions that align with both technical and financial objectives.

Use a consistent and standard set of labels for all your resources

You can use labels to track costs and to identify and classify resources. Specifically, you can use labels to allocate costs to different projects, departments, or cost centers. Defining a formal labeling policy that aligns with the needs of the main stakeholders in your organization helps to make costs visible more widely. You can also use labels to filter resource cost and usage data based on target audience.

Use automation tools like Terraform to enforce labeling on every resource that is created. To enhance cost visibility and attribution further, you can use the tools provided by the open-source cost attribution solution.

Share cost reports with team members

By sharing cost reports with your team members, you empower them to take ownership of their cloud spending. This practice enables cost-effective decision making, continuous cost optimization, and systematic improvements to your cost allocation model.

Cost reports can be of several types, including the following:

Optimize resource usage

This principle in the cost optimization pillar of the Google Cloud Well-Architected Framework provides recommendations to help you plan and provision resources to match the requirements and consumption patterns of your cloud workloads.

Principle overview

To optimize the cost of your cloud resources, you need to thoroughly understand your workloads resource requirements and load patterns. This understanding is the basis for a well defined cost model that lets you forecast the total cost of ownership (TCO) and identify cost drivers throughout your cloud adoption journey. By proactively analyzing and forecasting cloud spending, you can make informed choices about resource provisioning, utilization, and cost optimization. This approach lets you control cloud spending, avoid overprovisioning, and ensure that cloud resources are aligned with the dynamic needs of your workloads and environments.

Recommendations

To effectively optimize cloud resource usage, consider the following recommendations.

Choose environment-specific resources

Each deployment environment has different requirements for availability, reliability and scalability. For example, developers might prefer an environment that lets them rapidly deploy and run applications for short durations, but might not need high availability. On the other hand, a production environment typically needs high availability. To maximize the utilization of your resources, define environment-specific requirements based on your business needs. The following table lists examples of environment-specific requirements.

Environment Requirements
Production
  • High availability
  • Predictable performance
  • Operational stability
  • Security with robust resources
Development and testing
  • Cost efficiency
  • Flexible infrastructure with burstable capacity
  • Ephemeral infrastructure when data persistence is not necessary
Other environments (like staging and QA)
  • Tailored resource allocation based on environment-specific requirements

Choose workload-specific resources

Each of your cloud workloads might have different requirements for availability, scalability, security, and performance. To optimize costs, you need to align resource choices with the specific requirements of each workload. For example, a stateless application might not require the same level of availability or reliability as a stateful backend. The following table lists more examples of workload-specific requirements.

Workload type Workload requirements Resource options
Mission-critical Continuous availability, robust security, and high performance Premium resources and managed services like Spanner for high availability and global consistency of data.
Non-critical Cost-efficient and autoscaling infrastructure Resources with basic features and ephemeral resources like Spot VMs.
Event-driven Dynamic scaling based on the current demand for capacity and performance Serverless services like Cloud Run and Cloud Run functions.
Experimental workloads Low cost and flexible environment for rapid development, iteration, testing, and innovation Resources with basic features, ephemeral resources like Spot VMs, and sandbox environments with defined spending limits.

A benefit of the cloud is the opportunity to take advantage of the most appropriate computing power for a given workload. Some workloads are developed to take advantage of processor instruction sets, and others might not be designed in this way. Benchmark and profile your workloads accordingly. Categorize your workloads and make workload-specific resource choices (for example, choose appropriate machine families for Compute Engine VMs). This practice helps to optimize costs, enable innovation, and maintain the level of availability and performance that your workloads need.

The following are examples of how you can implement this recommendation:

Select regions based on cost requirements

For your cloud workloads, carefully evaluate the available Google Cloud regions and choose regions that align with your cost objectives. The region with lowest cost might not offer optimal latency or it might not meet your sustainability requirements. Make informed decisions about where to deploy your workloads to achieve the desired balance. You can use the Google Cloud Region Picker to understand the trade-offs between cost, sustainability, latency, and other factors.

Use built-in cost optimization options

Google Cloud products provide built-in features to help you optimize resource usage and control costs. The following table lists examples of cost optimization features that you can use in some Google Cloud products:

Product Cost optimization feature
Compute Engine
  • Automatically add or remove VMs based on the current load by using autoscaling.
  • Avoid overprovisioning by creating and using custom machine types
  • that match your workload's requirements.
  • For non-critical or fault-tolerant workloads, reduce costs by using Spot VMs.
  • In development environments, reduce costs by limiting the run time of VMs or by suspending or stopping VMs when you don't need them.
GKE
  • Automatically adjust the size of GKE clusters based on the current load by using cluster autoscaler.
  • Automatically create and manage node pools based on workload requirements and ensure optimal resource utilization by using node auto-provisioning.
Cloud Storage
  • Automatically transition data to lower-cost storage classes based on the age of data or based on access patterns by using Object Lifecycle Management.
  • Dynamically move data to the most cost-effective storage class based on usage patterns by using Autoclass.
BigQuery
  • Reduce query processing costs for steady-state workloads by using capacity-based pricing.
  • Optimize query performance and costs by using partitioning and clustering techniques.
Google Cloud VMware Engine

Optimize resource sharing

To maximize the utilization of cloud resources, you can deploy multiple applications or services on the same infrastructure, while still meeting the security and other requirements of the applications. For example, in development and testing environments, you can use the same cloud infrastructure to test all the components of an application. For the production environment, you can deploy each component on a separate set of resources to limit the extent of impact in case of incidents.

The following are examples of how you can implement this recommendation:

Develop and maintain reference architectures

Create and maintain a repository of reference architectures that are tailored to meet the requirements of different deployment environments and workload types. To streamline the design and implementation process for individual projects, the blueprints can be centrally managed by a team like a Cloud Center of Excellence (CCoE). Project teams can choose suitable blueprints based on clearly defined criteria, to ensure architectural consistency and adoption of best practices. For requirements that are unique to a project, the project team and the central architecture team should collaborate to design new reference architectures. You can share the reference architectures across the organization to foster knowledge sharing and expand the repository of available solutions. This approach ensures consistency, accelerates development, simplifies decision-making, and promotes efficient resource utilization.

Review the reference architectures provided by Google for various use cases and technologies. These reference architectures incorporate best practices for resource selection, sizing, configuration, and deployment. By using these reference architectures, you can accelerate your development process and achieve cost savings from the start.

Enforce cost discipline by using organization policies

Consider using organization policies to limit the available Google Cloud locations and products that team members can use. These policies help to ensure that teams adhere to cost-effective solutions and provision resources in locations that are aligned with your cost optimization goals.

Estimate realistic budgets and set financial boundaries

Develop detailed budgets for each project, workload, and deployment environment. Make sure that the budgets cover all aspects of cloud operations, including infrastructure costs, software licenses, personnel, and anticipated growth. To prevent overspending and ensure alignment with your financial goals, establish clear spending limits or thresholds for projects, services, or specific resources. Monitor cloud spending regularly against these limits. You can use proactive quota alerts to identify potential cost overruns early and take timely corrective action.

In addition to setting budgets, you can use quotas and limits to help enforce cost discipline and prevent unexpected spikes in spending. You can exercise granular control over resource consumption by setting quotas at various levels, including projects, services, and even specific resource types.

The following are examples of how you can implement this recommendation:

By using quotas and limits in conjunction with budgeting and monitoring, you can create a proactive and multi-layered approach to cost control. This approach helps to ensure that your cloud spending remains within defined boundaries and aligns with your business objectives. Remember, these cost controls are not permanent or rigid. To ensure that the cost controls remain aligned with current industry standards and reflect your evolving business needs, you must review the controls regularly and adjust them to include new technologies and best practices.

Optimize continuously

This principle in the cost optimization pillar of the Google Cloud Well-Architected Framework provides recommendations to help you optimize the cost of your cloud deployments based on constantly changing and evolving business goals.

As your business grows and evolves, your cloud workloads need to adapt to changes in resource requirements and usage patterns. To derive maximum value from your cloud spending, you must maintain cost-efficiency while continuing to support business objectives. This requires a proactive and adaptive approach that focuses on continuous improvement and optimization.

Principle overview

To optimize cost continuously, you must proactively monitor and analyze your cloud environment and make suitable adjustments to meet current requirements. Focus your monitoring efforts on key performance indicators (KPIs) that directly affect your end users' experience, align with your business goals, and provide insights for continuous improvement. This approach lets you identify and address inefficiencies, adapt to changing needs, and continuously align cloud spending with strategic business goals. To balance comprehensive observability with cost effectiveness, understand the costs and benefits of monitoring resource usage and use appropriate process-improvement and optimization strategies.

Recommendations

To effectively monitor your Google Cloud environment and optimize cost continuously, consider the following recommendations.

Focus on business-relevant metrics

Effective monitoring starts with identifying the metrics that are most important for your business and customers. These metrics include the following:

Use observability for resource optimization

The following are recommendations to use observability to identify resource bottlenecks and underutilized resources in your cloud deployments:

Balance troubleshooting needs with cost

Detailed observability data can help with diagnosing and troubleshooting issues. However, storing excessive amounts of observability data or exporting unnecessary data to external monitoring tools can lead to unnecessary costs. For efficient troubleshooting, consider the following recommendations:

Tailor data collection to roles and set role-specific retention policies

Consider the specific data needs of different roles. For example, developers might primarily need access to traces and application-level logs, whereas IT administrators might focus on system logs and infrastructure metrics. By tailoring data collection, you can reduce unnecessary storage costs and avoid overwhelming users with irrelevant information.

Additionally, you can define retention policies based on the needs of each role and any regulatory requirements. For example, developers might need access to detailed logs for a shorter period, while financial analysts might require longer-term data.

Consider regulatory and compliance requirements

In certain industries, regulatory requirements mandate data retention. To avoid legal and financial risks, you need to ensure that your monitoring and data retention practices help you adhere to relevant regulations. At the same time, you need to maintain cost efficiency. Consider the following recommendations:

Implement smart alerting

Alerting helps to detect and resolve issues in a timely manner. However, a balance is necessary between an approach that keeps you informed, and one that overwhelms you with notifications. By designing intelligent alerting systems, you can prioritize critical issues that have higher business impact. Consider the following recommendations: