Streaming Pub/Sub messages over WebSockets

This tutorial illustrates a way for a frontend app—in this case, a web page—to handle high volumes of incoming data when you use Google Cloud. The tutorial describes some of the challenges of high-volume streams. An example app is provided with this tutorial that illustrates how to use WebSockets to visualize a dense stream of messages published to a Pub/Sub topic, processing them in a timely manner that maintains a performant frontend.

This tutorial is for developers who are familiar with browser-to-server communication over HTTP and with writing frontend apps using HTML, CSS, and JavaScript. The tutorial assumes that you have some experience with Google Cloud and are familiar with Linux command-line tools.

Objectives

Create and configure a virtual machine (VM) instance with the necessary components to stream the payloads of a Pub/Sub subscription to browser clients.
Configure a process on the VM to subscribe to a Pub/Sub topic and output the individual messages to a log.
Install a web server to serve static content and to stream shell command output to WebSocket clients.
Visualize the WebSocket stream aggregations and individual message samples in a browser using HTML, CSS, and JavaScript.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator.

New Google Cloud users might be eligible for a free trial.

Before you begin

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Open Cloud Shell to execute the commands listed in this tutorial.
GO TO Cloud Shell

You run all the terminal commands in this tutorial from Cloud Shell.
Enable the Compute Engine API and Pub/Sub API:
```
gcloud services enable compute pubsub
```

When you finish this tutorial, you can avoid continued billing by deleting the resources you created. See Cleaning up for more detail.

Introduction

As more apps embrace event-driven models, it's important that frontend apps are able to make simple, low-friction connections to the messaging services that form the cornerstone of these architectures.

There are several options for streaming data to web browser clients; the most common of these is WebSockets. This tutorial takes you through installing a process that subscribes to a stream of messages being published to a Pub/Sub topic, and route those messages to through the web server enroute to clients connected over WebSockets.

For this tutorial, you work with the publicly available Pub/Sub topic used in the NYC Taxi Tycoon Google Dataflow CodeLab. This topic provides you with a real-time stream of simulated taxi telemetry based on historical ride data taken in New York City from the Taxi & Limousine Commission's trip record datasets.

Architecture

The following diagram shows the architecture of the tutorial that you build in this tutorial.

The diagram shows a message publisher that's outside the project that contains the Compute Engine resource; the publisher sends messages to a Pub/Sub topic. The Compute Engine instance makes the messages available over WebSockets to a browser that runs a dashboard based on HTML5 and JavaScript.

By combining pulltop and websocketd, you can have messages that are received from the Pub/Sub topic streamed to a browser using WebSockets.

Adjusting Pub/Sub topic throughput

The NYC Taxi Tycoon public Pub/Sub topic generates 2000 to 2500 simulated taxi ride updates per second—up to 8 Mb or more per second. The built-in flow control in Pub/Sub slows down a subscriber's message rate automatically if Pub/Sub detects a growing queue of unacknowledged messages. Therefore, you might see high message-rate variability across different workstations, network connections, and front-end processing code.

Effective browser message processing

Given the high volume of messages coming over the WebSocket stream, you need to be thoughtful in writing the frontend code that processes this stream. For example, you might dynamically create HTML elements for each message. But at the expected message rate, updating the page for each message could lock up the browser window. Frequent memory allocations that result from dynamically creating HTML elements also extend garbage collection durations, degrading the user experience. In short, you don't want to call document.createElement() for each of the approximately 2000 messages arriving each second.

The approach taken by this tutorial for managing this dense stream of messages is as follows:

The following figure shows the dashboard that's created as part of this tutorial.

The figure depicts a last-message latency of 24 milliseconds at a rate of nearly 2100 messages per second. If the critical code paths for processing each individual message don't complete in time, the number of observed messages per second decrease as the last message latency increases. The ride sampling is done using the JavaScript setInterval API set to cycle once every three seconds, which prevents the frontend from creating an enormous number of DOM elements over its lifetime. (The overwhelming majority of those are practically unobservable at rates higher than 10 per second anyway.)

The dashboard starts processing events in the middle of the stream, so rides already in progress are recognized as new by the dashboard unless they've been seen before. The code uses an associative array to store each observed ride, indexed by the ride_id value, and removes the reference to a particular ride when the passenger has been dropped off. Rides in an "enroute" or "pickup" state add a reference to that array unless (for the case of "enroute") the ride has been previously observed.

Install and configure the WebSocket server

To begin, you create a Compute Engine instance that you'll use as the WebSocket server. After you create the instance, you install tools on it that you need later.

Install Node.js and the tutorial code

Test that pulltop can read messages

Establish message flow to websocketd

Now that you've established that pulltop can read the Pub/Sub topic, you can start the websocketd process to begin sending messages to the browser.

Capture topic messages to a local file

For this tutorial, you capture the message stream that you get from pulltop and write it to a local file. Capturing message traffic to a local file adds a storage requirement, but it also decouples the operation of the websocketd process from the streaming Pub/Sub topic messages. Capturing the information locally allows scenarios where you might want to temporarily halt Pub/Sub streaming (perhaps to adjust flow control parameters) but not force a reset of currently connected WebSocket clients. When the message stream is reestablished, websocketd automatically resumes message streaming to clients.

Visualizing messages

Individual ride messages published to the Pub/Sub topic have a structure like this:

Based on these values, you calculate several metrics for the dashboard's header. The calculations are executed once per inbound ride event. The values include the following:

In addition to the metrics and individual ride samples, when a passenger is picked up or dropped off, the dashboard shows an alert notification above the grid of ride samples.

Performance

The following screenshot shows the Chrome Developer Tools performance monitor while the browser tab is processing around 2100 messages per second.

With message dispatch happening at a latency of approximately 30ms, the CPU utilization averages at around 80%. Memory utilization is shown at a minimum of 29 MB, with 57 MB in total being allocated, and growing and shrinking freely.

Clean up

Remove firewall rules

If you used an existing project for this tutorial, you can remove the firewall rules you created. It's good practice to minimize open ports.

Delete the firewall rule you created to allow TCP on port 8000:
```
gcloud compute firewall-rules delete websocket
```
If you also created a firewall rule to allow SSH connectivity, delete the firewall rule to allow TCP on port 22:
```
gcloud compute firewall-rules delete wss-ssh
```

Delete the project

If you don't want to use this project again, you can delete the project.

In the Google Cloud console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.

Streaming Pub/Sub messages over WebSockets Stay organized with collections Save and categorize content based on your preferences.

Objectives

Costs

Before you begin

Introduction

Architecture

Adjusting Pub/Sub topic throughput

Effective browser message processing

Install and configure the WebSocket server

Install Node.js and the tutorial code

Test that pulltop can read messages

Establish message flow to websocketd

Capture topic messages to a local file

Visualizing messages

Performance

Clean up

Remove firewall rules

Delete the project

What's next