Well-Architected Framework: Financial services (FS) perspective

Last reviewed 2025-07-28 UTC
This page provides a one-page view of all of the pages in the FSI perspective of the Well-Architected Framework. You can print this page or save it in PDF format by using your browser's print function. This page doesn't have a table of contents. You can't use the links on this page to navigate within the page.

This document in the Google Cloud Well-Architected Framework describes principles and recommendations to help you to design, build, and manage financial services (FS) applications in Google Cloud that meet your operational, security, reliability, cost, and performance goals.

The target audience for this document includes decision makers, architects, administrators, developers, and operators who design, build, deploy, and maintain FS workloads in Google Cloud. Examples of FS organizations that could benefit from this guidance include banks, payment infrastructure players, insurance providers, and capital market operators.

FS organizations have specific considerations, particularly for architecture and resilience. These considerations are primarily driven by regulatory, risk, and performance requirements. This document provides high-level guidance that's based on design considerations that we've observed across a wide range of FS customers globally. Whether your workloads are fully in the cloud or transitioning to hybrid or multi-cloud deployments, the guidance in this document helps you design workloads on Google Cloud to meet your regulatory requirements and diverse risk perspectives. The guidance might not address the unique challenges of every organization. It provides a foundation that addresses many of the primary regulatory requirements of FS organizations.

A primary challenge in designing cloud workloads involves aligning cloud deployments with on-premises environments, especially when you aim for consistent approaches to security, reliability, and resilience. Cloud services create opportunities to fundamentally rethink your architecture in order to reduce management overhead, optimize cost, enhance security, and improve reliability and resilience.

The following pages describe principles and recommendations that are specific to FS workloads for each pillar of the Well-Architected Framework:

Contributors

Authors:

Other contributors:

Financial services perspective: Operational excellence

This document in the Google Cloud Well-Architected Framework: Financial services (FS) perspective provides an overview of the principles and recommendations to build, deploy, and operate robust FS workloads in Google Cloud. These recommendations help you set up foundational elements like observability, automation, and scalability. The recommendations in this document align with the operational excellence pillar of the Well-Architected Framework.

Operational excellence is critical for FS workloads in Google Cloud due to the highly regulated and sensitive nature of such workloads. Operational excellence ensures that cloud solutions can adapt to evolving needs and meet your requirements for value, performance, security, and reliability. Failures in these areas could result in significant financial losses, regulatory penalties, and reputational damage.

Operational excellence provides the following benefits for FS workloads:

The operational excellence recommendations in this document are mapped to the following core principles:

Define SLAs and corresponding SLOs and SLIs

Across many FS organizations, the availability of applications is typically classified based on recovery time objective (RTO) and recovery point objective (RPO) metrics. For business-critical applications that serve external customers, a service level agreement (SLA) might also be defined.

SLAs need a framework of metrics that represents the behavior of the system from the user-satisfaction perspective. Site reliability engineering (SRE) practices offer a way to achieve the level of system reliability that you want. Creating a framework of metrics involves defining and monitoring key numerical indicators to understand system health from the user's perspective. For example, metrics like latency and error rates quantify how well a service is performing. These metrics are called service level indicators (SLIs). Developing effective SLIs is crucial, because they provide the raw data that's necessary to objectively assess reliability.

To define meaningful SLAs, SLIs, and SLOs, consider the following recommendations:

Examples of service levels

The following table provides examples of SLIs, SLOs, and SLAs for a payment platform:

Business metric SLI SLO SLA
Payment transaction success

A quantitative measure of the percentage of all initiated payment transactions that are successfully processed and confirmed.

Example: (number of successful transactions ÷ total number of valid transactions) × 100, measured over a rolling 5-minute window.

An internal target to maintain a high percentage of successful payment transactions over a specific period.

Example: Maintain a 99.98% payment transaction success rate over a rolling 30-day window, excluding invalid requests and planned maintenance.

A contractual guarantee for the success rate and speed of payment transaction processing.

Example: The service provider guarantees that 99.0% of payment transactions initiated by the client will be successfully processed and confirmed within one second.

Payment processing latency

The average time taken for a payment transaction to be processed from initiation by the client to final confirmation.

Example: Average response time in milliseconds for transaction confirmation, measured over a rolling 5-minute window.

An internal target for the speed at which payment transactions are processed.

Example: Ensure that 99.5% of payment transactions are processed within 400 milliseconds over a rolling 30-day window.

A contractual commitment to resolve critical payment processing issues within a specified timeframe.

Example: For critical payment processing issues (defined as an outage that affects more than 1% of transactions), the service provider commits to a resolution time of within two hours from the time when the issue is reported or detected.

Platform availability

The percentage of time when the core payment processing API and user interface are operational and accessible to clients.

Example: (total operational time − downtime) ÷ total operational time × 100, measured per minute.

An internal target for the uptime of the core payment platform.

Example: Achieve 99.995% platform availability per calendar month, excluding scheduled maintenance windows.

A formal, legally binding commitment to clients regarding the minimum uptime of the payment platform, including consequences for failure to meet.

Example: The platform will maintain a minimum of 99.9% availability per calendar month, excluding scheduled maintenance windows. If the availability falls below the minimum level, the client will receive a service credit of 5% of the monthly service fee for each 0.1% drop.

Use SLI data to monitor whether systems are within the defined SLOs and to ensure that the SLAs are met. By using a set of well-defined SLIs, engineers and developers can monitor FS applications at the following levels:

OpenTelemetry provides an open source standard and a set of technologies to capture all types of telemetry including metrics, traces, and logs. Google Cloud Managed Service for Prometheus provides a fully-managed, highly scalable backend for metrics and operation of Prometheus at scale.

For more information about SLI, SLO, and error budgets, see the SRE handbook.

To develop effective alerting and monitoring dashboards and mechanisms, use Google Cloud Observability tools together with Google Cloud Monitoring. For information about security-specific monitoring and detection capabilities, see the security pillar.

Define and test incident management processes

Well-defined and regularly tested incident management processes contribute directly to the value, performance, security, and reliability of the FS workloads in Google Cloud. These processes help financial institutions meet their stringent regulatory requirements, protect sensitive data, maintain business continuity, and uphold customer trust.

Regular testing of incident management processes provides the following benefits:

To define and test your incident management processes, consider the following recommendations.

Establish clear incident response procedures

A well-established set of incident response procedures involves the following elements:

Implement performance and load testing regularly

Regular performance and load testing helps to ensure that cloud-based applications and infrastructure can handle peak loads and maintain optimal performance. Load testing simulates realistic traffic patterns. Stress testing exercises the system to its limits to identify potential bottlenecks and performance limitations. You can use products like Cloud Load Balancing and load testing services to simulate real-world traffic. Based on the test results, you can adjust your cloud infrastructure and applications for optimal performance and scalability. For example, you can adjust resource allocation or tune application configurations.

Automate testing within CI/CD pipelines

Incorporating automated testing into your CI/CD pipelines helps to ensure the quality and reliability of cloud applications by validating changes before deployment. This approach significantly reduces the risk of errors and regressions and it helps you to build a more stable and robust software system. You can incorporate different types of testing in your CI/CD pipelines, including unit testing, integration testing, and end-to-end testing. Use products like Cloud Build and Cloud Deploy to create and manage your CI/CD pipelines.

Continuously improve and innovate

For financial services workloads in the cloud, migrating to the cloud is merely the initial step. Ongoing enhancement and innovation are essential for the following reasons:

To ensure continuous improvement and innovation, consider the following recommendations.

Conduct regular retrospectives

Retrospectives are vital for continuously improving incident response procedures, and for optimizing testing strategies based on the outcomes of regular performance and load testing. To ensure that retrospectives are effective, do the following:

Foster a culture of learning

A culture of learning facilitates safe exploration of new technologies in Google Cloud, such as AI and ML capabilities to enhance services like fraud detection and personalized financial advice. To promote a culture of learning, do the following:

Stay up-to-date with cloud technologies

Continuous learning is essential for understanding and implementing new security measures, leveraging advanced data analytics for better insights, and adopting innovative solutions that are relevant to financial services.

Financial services perspective: Security, privacy, and compliance

This document in the Google Cloud Well-Architected Framework: Financial services (FS) perspective provides an overview of the principles and recommendations to address the security, privacy, and compliance requirements of financial services (FS) workloads in Google Cloud. The recommendations help you build resilient and compliant infrastructure, safeguard sensitive data, maintain customer trust, navigate the complex landscape of regulatory requirements, and effectively manage cyber threats. The recommendations in this document align with the security pillar of the Well-Architected Framework.

Security in cloud computing is a critical concern for FS organizations, which are highly attractive to cybercriminals due to the vast amounts of sensitive data that they manage, including customer details and financial records. The consequences of a security breach are exceptionally severe, including significant financial losses, long-term reputational damage, and significant regulatory fines. Therefore, FS workloads need stringent security controls.

To help ensure comprehensive security and compliance, you need to understand the shared responsibilities between you (FS organizations) and Google Cloud. Google Cloud is responsible for securing the underlying infrastructure, including physical security and network security. You are responsible for securing data and applications, configuring access control, and configuring and managing security services. To support you in your security efforts, the Google Cloud partner ecosystem offers security integration and managed services.

The security recommendations in this document are mapped to the following core principles:

Implement security by design

Financial regulations like the Payment Card Industry Data Security Standard (PCI DSS), the Gramm-Leach-Bliley Act (GLBA) in the United States, and various national financial data protection laws mandate that security is integrated into systems from the outset. The security-by-design principle emphasizes the integration of security throughout the development lifecycle to help ensure that vulnerabilities are minimized from the outset.

To apply the security-by-design principle for your FS workloads in Google Cloud, consider the following recommendations:

Implement zero trust

Modern financial regulations increasingly emphasize the need for stringent access controls and continuous verification. These requirements reflect the principle of zero trust, which aims to protect workloads against both internal and external threats and bad actors. The zero-trust principle advocates for continuous verification of every user and device, which eliminates implicit trust and mitigates lateral movement.

To implement zero trust, consider the following recommendations:

Implement shift-left security

Financial regulators encourage proactive security measures. Identifying and addressing vulnerabilities early in the development lifecycle helps to reduce the risk of security incidents and the potential for non-compliance penalties. The principle of shift-left security promotes early security testing and integration, which helps to reduce the cost and complexity of remediation.

To implement shift-left security, consider the following recommendations:

Implement preemptive cyber defense

Financial institutions are prime targets for sophisticated cyberattacks. Regulations often require robust threat intelligence and proactive defense mechanisms. Preemptive cyber defense focuses on proactive threat detection and response by using advanced analytics and automation.

Consider the following recommendations:

Use AI securely and responsibly, and use AI for security

AI and ML are increasingly used for financial services use cases such as fraud detection and algorithmic trading. Regulations require that these technologies be used ethically, transparently, and securely. AI can also help to enhance your security capabilities. Consider the following recommendations for using AI:

Meet regulatory, compliance, and privacy needs

Financial services are subject to a vast array of regulations, including data residency requirements, specific audit trails, and data protection standards. To ensure that sensitive data is properly identified, protected, and managed, FS organizations need robust data governance policies and data classification schemes. Consider the following recommendations to help you meet regulatory requirements:

Prioritize security initiatives

Given the breadth of security requirements, financial institutions must prioritize initiatives that are based on risk assessment and regulatory mandates. We recommend the following phased approach:

  1. Establish a strong security foundation: Focus on the core areas of security, including identity and access management, network security, and data protection. This focus helps to build a robust security posture and helps to ensure comprehensive defense against evolving threats.
  2. Address critical regulations: Prioritize compliance with key regulations like PCI DSS, GDPR, and relevant national laws. Doing so helps to ensure data protection, mitigates legal risks, and builds trust with customers.
  3. Implement advanced security: Gradually adopt advanced security practices like zero trust, AI-driven security solutions, and proactive threat hunting.

Financial services perspective: Reliability

This document in the Google Cloud Well-Architected Framework: Financial services (FS) perspective provides an overview of the principles and recommendations to design, deploy, and operate reliable FS workloads in Google Cloud. The document explores how to integrate advanced reliability practices and observability into your architectural blueprints. The recommendations in this document align with the reliability pillar of the Well-Architected Framework.

For financial institutions, reliable and resilient infrastructure is both a business need and a regulatory imperative. To ensure that FS workloads in Google Cloud are reliable, you must understand and mitigate potential failure points, deploy resources redundantly, and plan for recovery. Operational resilience is an outcome of reliability. It's the ability to absorb, adapt to, and recover from disruptions. Operational resilience helps FS organizations meet strict regulatory requirements. It also helps avoid intolerable harm to customers.

The key building blocks of reliability in Google Cloud are regions, zones, and the various location scopes of cloud resources: zonal, regional, multi-regional, global. You can improve availability by using managed services, distributing resources, implementing high-availability patterns, and automating processes.

Regulatory requirements

FS organizations operate under strict reliability mandates by regulatory agencies such as the Federal Reserve System in the US, the European Banking Authority in the EU, and the Prudential Regulation Authority in the UK. Globally, regulators emphasize operational resilience, which is vital for financial stability and consumer protection. Operational resilience is the ability to withstand disruptions, recover effectively, and maintain critical services. This requires a harmonized approach for managing technological risks and dependencies on third parties.

The regulatory requirements across most jurisdictions have the following common themes:

The reliability recommendations in this document are mapped to the following core principles:

Prioritize multi-zone and multi-region deployments

For critical financial services applications, we recommend that you use a multi-region topology that's distributed across at least two regions and across three zones within each region. This approach is important for resilience against zone and region outages. Regulations often prescribe this approach, because if a failure occurs in one zone or region, most jurisdictions consider a severe disruption to a second zone to be a plausible consequence. The rationale is that when one location fails, the other location might receive an exceptionally high amount of additional traffic.

Consider the following recommendations to build resilience against zone and region outages:

Eliminate single points of failure

Distribute resources across different locations and use redundant resources to prevent any single point of failure (SPOF) from affecting the entire application stack.

Consider the following recommendations to avoid SPOFs:

For more information, see Design reliable infrastructure for your workloads in Google Cloud.

Understand and manage aggregate availability

Be aware that the overall or aggregate availability of a system is affected by the availability of each tier or component of the system. The number of tiers in an application stack has an inverse relationship with the aggregate availability of the stack. Consider the following recommendations for managing aggregate availability:

Implement a robust DR strategy

Create well-defined plans for different disaster scenarios, including zonal and regional outages. A well-defined disaster recovery (DR) strategy lets you recover from a disruption and resume normal operations with minimal impact.

DR and high availability (HA) are different concepts. With cloud deployments, in general, DR applies to multi-region deployments and HA applies to regional deployments. These deployment archetypes support different replication mechanisms.

For financial institutions, your choice of failover region might be limited by regulations about data sovereignty and data residency. If you need an active-active topology across two regions, we recommend that you choose managed multi-regional services, like Spanner and Cloud Storage, especially when data replication is critical.

Consider the following recommendations:

For more information, see Architecting disaster recovery for cloud infrastructure outages.

Leverage managed services

Whenever possible, use managed services to take advantage of the built-in features for backups, HA, and scalability. Consider the following recommendations for using managed services:

Automate the infrastructure provisioning and recovery processes

Automation helps to minimize human errors and helps to reduce the time and resources that are necessary to respond to incidents. The use of automation can help to ensure faster recovery from failures and more consistent results. Consider the following recommendations to automate how you provision and recover resources:

Financial services perspective: Cost optimization

This document in the Google Cloud Well-Architected Framework: Financial services (FS) perspective provides an overview of principles and recommendations to optimize the cost of your FS workloads in Google Cloud. The recommendations in this document align with the cost optimization pillar of the Well-Architected Framework.

Robust cost optimization for financial services workloads requires the following fundamental elements:

To optimize cost, you need a comprehensive understanding of the cost drivers and resource needs across your organization. In some large organizations, especially those that are early in the cloud journey, a single team is often responsible for optimizing spend across a large number of domains. This approach assumes that a central team is best placed to identify high-value opportunities to improve efficiency.

The centralized approach might yield some success during the initial stages of cloud adoption or for non-critical workloads. However, a single team can't drive cost optimization across an entire organization. When the resource usage or the level of regulatory scrutiny increases, the centralized approach isn't sustainable. Centralized teams face scalability challenges particularly when dealing with a large number of financial products and services. The project teams that own the products and services might resist changes that are made by an external team.

For effective cost optimization, spend-related data must be highly visible, and engineers and other cloud users who are close to the workloads must be motivated to take action to optimize cost. From an organizational standpoint, the challenge for cost optimization is to identify what areas should be optimized, identify the engineers who are responsible for those areas, and then convince them to take the required optimization action. This document provides recommendations to address this challenge.

The cost optimization recommendations in this document are mapped to the following core principles:

Identify waste by using Google Cloud tools

Google Cloud provides several products, tools, and features to help you identify waste. Consider the following recommendations.

Use automation and AI to systematically identify what to optimize

Active Assist provides intelligent recommendations across services such as Cloud Run for microservices, BigQuery for data analytics, Compute Engine for core applications, and Cloud SQL for relational databases. Active Assist recommendations are provided at no cost and without any configuration by you. The recommendations help you to identify idle resources and underutilized commitments.

Centralize FinOps monitoring and control through a unified interface

Cloud Billing reports and the FinOps hub let you implement comprehensive cost monitoring. This comprehensive view is vital for financial auditors and internal finance teams to track cloud spend, assess the financial posture, evaluate FinOps maturity across various business units or cost centers, and provide a consistent financial narrative.

Identify value by analyzing and enriching spend data

Active Assist is effective at identifying obvious waste. However, pinpointing value can be more challenging, particularly when workloads are on unsuitable products or when the workloads lack clear alignment with business value. For FS workloads, business value extends beyond cost reduction. The value includes risk mitigation, regulatory adherence, and gaining competitive advantages.

To understand cloud spend and value holistically, you need a complete understanding at multiple levels: where the spend is coming from, what business function the spend is driving, and the technical feasibility of refactoring or optimizing the workload in question.

The following diagram shows how you can apply the data-information-knowledge-wisdom (DIKW) pyramid and Google Cloud tools to get a holistic understanding of cloud costs and value.

The data-information-knowledge-wisdom (DIKW) pyramid shows how to use cloud spending data to inform decisions.

The preceding diagram shows how you can use the DIKW approach to refine raw cloud spending data into actionable insights and decisions that drive business value.

Consider the following recommendations for analyzing cloud spend data.

Analyze spend data that's provided by Google Cloud

Start with detailed Cloud Billing data that's exported to BigQuery and data that's available in Monitoring logs. To derive actionable insights and make decisions, you need to structure this data and enrich it with business context.

Visualize data through available tooling

Augment the built-in Google Cloud dashboards with custom reporting by using tools like Data Studio on top of BigQuery exports. Finance teams can build custom dashboards that contextualize cloud spend against financial metrics, regulatory reporting requirements, and business unit profitability. They can then provide a clear financial narrative for analysis and decision making by executive stakeholders.

Allocate spend to drive accountability

After you understand what's driving the cloud spend, you need to identify who is spending money and why. This level of understanding requires a robust cost-allocation practice, which involves attaching business-relevant metadata to cloud resources. For example, if a particular resource is used by the Banking-AppDev team, you can attach a tag like team=banking_appdev to the resource to track the cost that the team incurs on that resource. Ideally, you should allocate 100% of your cloud costs to the source of the spending. In practice, you might start with a lower target because building a metadata structure to support 100% cost allocation is a complex effort.

Consider the following recommendations to develop a metadata strategy to support cost allocation:

After you define an allocation strategy by using tags, you need to decide the level of granularity at which the strategy should be implemented. The required granularity depends on your business needs. For example, some organizations might need to track cost at the product level, some might need cost data for each cost center, and others might need cost data per environment (development, staging, and production).

Consider the following approaches to achieve the appropriate level of cost-allocation granularity for your organization:

Often, you might need to use the project hierarchy combined with tagging and labeling for effective cost allocation. Regardless of the cost-allocation approach that you choose, follow the recommendations that were described earlier for developing a robust metadata strategy: validitation, automation, and simplicity.

Drive accountability and motivate engineers to take action

The cloud FinOps team is responsible for driving an organization to be conscious of costs and value. The individual product teams and engineering teams must take the required actions for cost optimization. These teams are also accountable for the cost behavior of the financial services workloads and for ensuring that their workloads provide the required business value.

Consider the following recommendations to drive accountability and motivate teams to optimize cost.

Establish a centralized FinOps team for governance

Cloud FinOps practices don't grow organically. A dedicated FinOps team must define and establish FinOps practices by doing the following:

Get executive sponsorship and mandates

Senior leadership, including the CTO, CFO, and CIO, must actively champion an organization-wide shift to a FinOps culture. Their support is crucial for prioritizing cost accountability, allocating resources for the FinOps program, ensuring cross-functional participation, and driving compliance with FinOps requirements.

Incentivize teams to optimize cost

Engineers and engineering teams might not be self-motivated to focus on cost optimization. It's important to align team and individual goals with cost efficiency by implementing incentives such as the following:

Implement showback and chargeback techniques

Ensure that teams have clear visibility into the cloud resources and costs that they own. Assign financial responsibility to the appropriate individuals within the teams. Use formal mechanisms to enforce rigorous tagging and implement transparent rules for allocating shared costs.

Focus on value and TCO rather than cost

When you evaluate cloud solutions, consider the long-term total cost of ownership (TCO). For example, self-hosting a database for an application might seem to be cheaper than using a managed database service like Cloud SQL. However, to assess the long-term value and TCO, you must consider the hidden costs that are associated with self-hosted databases. Such costs include the dedicated engineering effort for patching, scaling, security hardening, and disaster recovery, which are critical requirements for FS workloads. Managed services provide significantly higher long-term value, which offsets the infrastructure costs. Managed services provide robust compliance capabilities, have built-in reliability features, and can help to reduce your operational overhead.

Consider the following recommendations to focus on value and TCO.

Use product-specific techniques and tools for resource optimization

Leverage cost-optimization tools and features that are provided by Google Cloud products, such as the following:

Take advantage of discounts

Ensure that the billing rate for your cloud resources is as low as possible by using discounts that Google offers. The individual product and engineering teams typically manage resource optimization. The central FinOps team is responsible for optimizing billing rates because they have visibility into resource requirements across the entire organization. Therefore, they can aggregate the requirements and maximize the commitment-based discounts.

You can take advantage of the following types of discounts for Google Cloud resources:

You can achieve significant savings by using CUDs on top of enterprise discounts.

In addition to CUDs, use the following approaches to reduce billing rates:

Financial services perspective: Performance optimization

This document in the Google Cloud Well-Architected Framework: Financial services (FS) perspective provides an overview of principles and recommendations to optimize the performance of your FS workloads in Google Cloud. The recommendations in this document align with the performance optimization pillar of the Well-Architected Framework.

Performance optimization has a long history in financial services. It has helped FS organizations surpass technical challenges and it's nearly always been an enabler or accelerator for the creation of new business models. For example, ATMs (introduced in 1967) automated the cash dispensation process and they helped banks to decrease the cost of their core business. Techniques like bypassing the OS kernel and pinning application threads to compute cores helped to achieve deterministic and low latency for trading applications. The reduction in latency facilitated higher and firmer liquidity with tighter spreads in the financial markets.

The cloud creates new opportunities for performance optimization. It also challenges some of the historically accepted optimization patterns. Specifically, the following trade-offs are more transparent and controllable in the cloud:

For example, adapting hardware and IT resources to specific skill requirements is a trivial task in the cloud. To support GPU programming, you can create GPU-based VMs. You can scale capacity in the cloud to accommodate demand spikes without over-provisioning resources. This capability helps to ensure that your workloads can handle peak loads, such as on nonfarm payroll days and when trading volumes are significantly greater than historical levels. Instead of spending on writing highly optimized code at the level of individual servers (like highly fine-tuned code in the C language) or writing code for conventional high performance computing (HPC) environments, you can scale out optimally by using a well-architected Kubernetes-based distributed system.

The performance optimization recommendations in this document are mapped to the following core principles:

Align technology performance metrics with key business indicators

You can map performance optimization to business-value outcomes in several ways. For example, in a buy-side research desk, a business objective could be to optimize the output per research hour or to prioritize experiments from teams that have a proven track record, such as higher Sharpe ratios. On the sell side, you can use analytics to track client interest and accordingly prioritize the throughput to AI models that support the most interesting research.

Connecting performance goals to business key performance indicators (KPIs) is also important for funding performance improvements. Business innovation and transformation initiatives (sometimes called change-the-bank efforts) have different budgets and they have potentially different degrees of access to resources when compared to business-as-usual (BAU) or run-the-bank operations. For example, Google Cloud helped the risk management and technology teams of a G-SIFI to collaborate with the front-office quantitative analysts on a solution to perform risk analytics calculations (such as XVA) in minutes instead of hours or days. This solution helped the organization to meet relevant compliance requirements. It also enabled the traders to have higher quality conversations with their clients, potentially offering tighter spreads, firmer liquidity, and more cost-effective hedging.

When you align your performance metrics with business indicators, consider the following recommendations:

Prioritize security without sacrificing performance for unproven risks

Security and regulatory compliance in FS organizations must be unequivocally of a high standard. Maintaining a high standard is essential to avoid losing clients and to prevent irreparable damage to an organization's brand. Often, the highest value is derived through technology innovations such as generative AI and unique, managed services like Spanner. Don't automatically discard such technology options due to a blanket misconception about prohibitive operational risk or inadequate regulatory compliance posture.

Google Cloud has worked closely with G-SIFIs to make sure that an AI-based approach for Anti-Money Laundering (AML) can be used across the jurisdictions where the institutions serve customers. For example, HSBC significantly enhanced the performance of its financial crime (Fincrime) unit with the following results:

Consider the following recommendations:

Rethink your architecture to adapt to new opportunities and requirements

Augmenting your current architectures with cloud-based capabilities can provide significant value. To achieve more transformative outcomes, you need to periodically rethink your architecture by using a cloud-first approach.

Consider the following recommendations to periodically rethink the architecture of your workloads to further optimize performance.

Use cloud-based alternatives to on-premises HPC systems and schedulers

To take advantage of higher elasticity, improved security posture, and extensive monitoring and governance capabilities, you can run HPC workloads in the cloud or burst on-premises workloads to the cloud. However, for certain numerical modeling use cases like simulation of investment strategies or XVA modeling, combining Kubernetes with Kueue might offer a more powerful solution.

Switch to graph-based programming for simulations

Monte Carlo simulations might be much more performant in a graph-based execution system such as Dataflow. For example, HSBC uses Dataflow to run risk calculations 16 times faster compared to their previous approach.

Run cloud-based exchanges and trading platforms

Conversations with Google Cloud customers reveal that the 80/20 Pareto principle applies to the performance requirements of markets and trading applications.

Future-proof your technology to meet present and future business needs

Historically, many FS organizations built proprietary technologies to gain a competitive edge. For example, in the early 2000s, successful investment banks and trading firms had their own implementations of foundational technologies such as pub-sub systems and message brokers. With the evolution of open source technologies and the cloud, such technologies have become commodities and don't offer incremental business value.

Consider the following recommendations to future-proof your technology.

Adopt a data-as-a-service (DaaS) approach for faster time to market and cost transparency

FS organizations often evolve through a combination of organic growth and mergers and acquisitions (M&A). As a result, the organizations need to integrate disparate technologies. They also need to manage duplicate resources, such as data vendors, data licenses, and integration points. Google Cloud provides opportunities to create differentiated value in post-merger integrations.

For example, you can use services like BigQuery sharing to build an analysis-ready data-as-a-service (DaaS) platform. The platform can provide both market data and inputs from alternative sources. This approach eliminates the need to build redundant data pipelines and it lets you focus on more valuable initiatives. Further, the merged or acquired companies can quickly and efficiently rationalize their post-merger data licensing and infrastructure needs. Instead of spending effort on adapting and merging legacy data estates and operations, the combined business can focus on new business opportunities.

Build an abstraction layer to isolate existing systems and address emerging business models

Increasingly, the competitive advantage for banks isn't the core banking system but their customer experience layer. However, legacy banking systems often use monolithic applications that were developed in languages like Cobol and are integrated across the entire banking value chain. This integration made it difficult to separate the layers of the value chain, so it was nearly impossible to upgrade and modernize such systems.

One solution to address this challenge is to use an isolation layer such as an API management system or a staging layer like Spanner that duplicates the book of record and facilitates the modernization of services with advanced analytics and AI. For example, Deutsche Bank used Spanner to isolate their legacy core banking estate and start their innovation journey.