This tutorial illustrates a way for a frontend app—in this case, a web page—to handle high volumes of incoming data when you use Google Cloud. The tutorial describes some of the challenges of high-volume streams. An example app is provided with this tutorial that illustrates how to use WebSockets to visualize a dense stream of messages published to a Pub/Sub topic, processing them in a timely manner that maintains a performant frontend.
This tutorial is for developers who are familiar with browser-to-server communication over HTTP and with writing frontend apps using HTML, CSS, and JavaScript. The tutorial assumes that you have some experience with Google Cloud and are familiar with Linux command-line tools.
In this document, you use the following billable components of Google Cloud:
To generate a cost estimate based on your projected usage,
use the pricing calculator.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
roles/resourcemanager.projectCreator), which contains the
resourcemanager.projects.create permission. Learn how to grant
roles.
Verify that billing is enabled for your Google Cloud project.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
roles/resourcemanager.projectCreator), which contains the
resourcemanager.projects.create permission. Learn how to grant
roles.
Verify that billing is enabled for your Google Cloud project.
You run all the terminal commands in this tutorial from Cloud Shell.
gcloud services enable compute pubsub
When you finish this tutorial, you can avoid continued billing by deleting the resources you created. See Cleaning up for more detail.
As more apps embrace event-driven models, it's important that frontend apps are able to make simple, low-friction connections to the messaging services that form the cornerstone of these architectures.
There are several options for streaming data to web browser clients; the most common of these is WebSockets. This tutorial takes you through installing a process that subscribes to a stream of messages being published to a Pub/Sub topic, and route those messages to through the web server enroute to clients connected over WebSockets.
For this tutorial, you work with the publicly available Pub/Sub topic used in the NYC Taxi Tycoon Google Dataflow CodeLab. This topic provides you with a real-time stream of simulated taxi telemetry based on historical ride data taken in New York City from the Taxi & Limousine Commission's trip record datasets.
The following diagram shows the architecture of the tutorial that you build in this tutorial.
The diagram shows a message publisher that's outside the project that contains the Compute Engine resource; the publisher sends messages to a Pub/Sub topic. The Compute Engine instance makes the messages available over WebSockets to a browser that runs a dashboard based on HTML5 and JavaScript.
This tutorial uses a combination of tools to bridge Pub/Sub and Websockets:
pulltop is a Node.js program that you install as part of this tutorial.
The tool subscribes to a Pub/Sub topic and streams received
messages to standard output.websocketd
is a small command-line tool that wraps an existing command-line interface
program and allows it to be accessed using a WebSocket.By combining pulltop and websocketd, you can have messages that are received
from the Pub/Sub topic streamed to a browser using WebSockets.
The NYC Taxi Tycoon public Pub/Sub topic generates 2000 to 2500 simulated taxi ride updates per second—up to 8 Mb or more per second. The built-in flow control in Pub/Sub slows down a subscriber's message rate automatically if Pub/Sub detects a growing queue of unacknowledged messages. Therefore, you might see high message-rate variability across different workstations, network connections, and front-end processing code.
Given the high volume of messages coming over the WebSocket stream, you
need to be thoughtful in writing the frontend code that processes this stream.
For example, you might dynamically create HTML elements for each message. But at
the expected message rate, updating the page for each message could lock up the
browser window. Frequent memory allocations that result from dynamically
creating HTML elements also extend garbage collection durations, degrading the
user experience. In short, you don't want to call document.createElement() for
each of the approximately 2000 messages arriving each second.
The approach taken by this tutorial for managing this dense stream of messages is as follows:
The following figure shows the dashboard that's created as part of this tutorial.

The figure depicts a last-message latency of 24 milliseconds at a rate
of nearly 2100 messages per second. If the critical code paths for processing
each individual message don't complete in time, the number of observed messages
per second decrease as the last message latency increases.
The ride sampling is done using the JavaScript setInterval API set to cycle
once every three seconds, which prevents the frontend from creating an enormous
number of DOM elements over its lifetime. (The overwhelming majority of those
are practically unobservable at rates higher than 10 per second anyway.)
The dashboard starts processing events in the middle of the stream, so rides
already in progress are recognized as new by the dashboard unless they've been
seen before. The code uses an associative array to store each observed ride,
indexed by the ride_id value, and removes the reference to a particular ride
when the passenger has been dropped off. Rides in an "enroute" or "pickup" state
add a reference to that array unless (for the case of "enroute") the ride has
been previously observed.
To begin, you create a Compute Engine instance that you'll use as the WebSocket server. After you create the instance, you install tools on it that you need later.
In Cloud Shell, set the default Compute Engine zone.
The following example shows us-central1-a, but you can use any zone that
you want.
gcloud config set compute/zone us-central1-a
Create a Compute Engine instance named websocket-server in
the default zone:
gcloud compute instances create websocket-server --tags wss
Add a firewall rule that allows TCP traffic on port 8000 to any
instance tagged as wss:
gcloud compute firewall-rules create websocket \
--direction=IN \
--allow=tcp:8000 \
--target-tags=wss
If you're using an existing project, make sure that TCP port 22 is
open to allow SSH connectivity to the instance.
By default, the default-allow-ssh firewall rule is enabled in the default
network. However, if you or your administrator removed the default
rule in an existing project, TCP port 22 might not be open. (If you
created a new project for this tutorial, the rule is enabled by default,
and you don't need to do anything.)
Add a firewall rule that allows TCP traffic on port 22 to any
instance tagged as wss:
gcloud compute firewall-rules create wss-ssh \
--direction=IN \
--allow=tcp:22 \
--target-tags=wss
Connect to the instance using SSH:
gcloud compute ssh websocket-server
At the terminal command of the instance, switch accounts to root so
that you can install software:
sudo -s
Install the git and unzip tools:
apt-get install -y unzip git
Install the websocketd binary on the instance:
cd /var/tmp/
wget \
https://github.com/joewalnes/websocketd/releases/download/v0.3.0/websocketd-0.3.0-linux_386.zip
unzip websocketd-0.3.0-linux_386.zip
mv websocketd /usr/bin
In a terminal on the instance, install Node.js:
curl -sL https://deb.nodesource.com/setup_10.x | bash -
apt-get install -y nodejs
Download the tutorial source repository:
exit
cd ~
git clone https://github.com/GoogleCloudPlatform/solutions-pubsub-websockets.git
Change permissions on pulltop to allow execution:
cd solutions-pubsub-websockets
chmod 755 pulltop/pulltop.js
Install pulltop dependencies:
cd pulltop
npm install
sudo npm link
On the instance, run pulltop against the public topic:
pulltop projects/pubsub-public-data/topics/taxirides-realtime
If pulltop is working, you see a stream of results like the following:
{"ride_id":"9729a68d-fcde-484b-bc32-bf29f5188628","point_idx":328,"latitude"
:40.757360000000006,"longitude":-73.98228,"timestamp":"2019-03-22T20:03:51.6
593-04:00","meter_reading":11.069151,"meter_increment":0.033747412,"ride_stat
us":"enroute","passenger_count":1}Press Ctrl+C to stop the stream.
Now that you've established that pulltop can read the Pub/Sub
topic, you can start the websocketd process to begin sending messages to the
browser.
For this tutorial, you capture the message stream that you get from pulltop
and write it to a local file. Capturing message traffic to a local file adds a
storage requirement, but it also decouples the operation of the websocketd
process from the streaming Pub/Sub topic messages. Capturing
the information locally allows scenarios where you might want to temporarily
halt Pub/Sub streaming (perhaps to adjust flow control parameters)
but not force a reset of
currently connected WebSocket clients. When the message stream is reestablished,
websocketd automatically resumes message streaming to clients.
On the instance, run pulltop against the public topic, and redirect
message output to the local taxi.json file. The nohup command instructs
the OS to keep the pulltop process running if you log out or close the
terminal.
nohup pulltop \
projects/pubsub-public-data/topics/taxirides-realtime > \
/var/tmp/taxi.json &
Verify that JSON messages are being written to the file:
tail /var/tmp/taxi.json
If the messages are being written to the taxi.json file, the output is
similar to the following:
{"ride_id":"9729a68d-fcde-484b-bc32-bf29f5188628","point_idx":328,"latitude"
:40.757360000000006,"longitude":-73.98228,"timestamp":"2019-03-22T20:03:51.6
593-04:00","meter_reading":11.069151,"meter_increment":0.033747412,"ride_sta
tus":"enroute","passenger_count":1}Change to the web folder of your app:
cd ../web
Start websocketd to begin streaming the contents of the local file
using WebSockets:
nohup websocketd --port=8000 --staticdir=. tail -f /var/tmp/taxi.json &
This runs the websocketd command in the background. The websocketd tool
consumes the output of the tail command and streams each element as a
WebSocket message.
Check the contents of nohup.out to verify that the server started
correctly:
tail nohup.out
If everything is working properly, the output is similar to the following:
Mon, 25 Mar 2019 14:03:53 -0400 | INFO | server | | Serving using application : /usr/bin/tail -f /var/tmp/taxi.json Mon, 25 Mar 2019 14:03:53 -0400 | INFO | server | | Serving static content from : .
Individual ride messages published to the Pub/Sub topic have a structure like this:
{
"ride_id": "562127d7-acc4-4af9-8fdd-4eedd92b6e69",
"point_idx": 248,
"latitude": 40.74644000000001,
"longitude": -73.97144,
"timestamp": "2019-03-24T00:46:08.49094-04:00",
"meter_reading": 8.40615,
"meter_increment": 0.033895764,
"ride_status": "enroute",
"passenger_count": 1
}
Based on these values, you calculate several metrics for the dashboard's header. The calculations are executed once per inbound ride event. The values include the following:
ride_status value of
dropoff is observed.In addition to the metrics and individual ride samples, when a passenger is picked up or dropped off, the dashboard shows an alert notification above the grid of ride samples.
Get the external IP address of the current instance:
curl -H "Metadata-Flavor: Google" http://metadata/computeMetadata/v1/instance/network-interfaces/0/access-configs/0/external-ip; echo
Copy the IP address.
On your local machine, open a new web browser and enter the URL:
http://$ip-address:8000.You see a page that shows the dashboard for this tutorial:

Click the taxi icon at the top to open a connection to the stream and begin processing messages.
Individual rides are visualized with a sample of nine active rides being rendered every three seconds:

You can click the taxi icon at any time to start or stop the WebSocket stream. If the WebSocket connection is severed, the icon turns red, and updates to metrics and individual rides are halted. To reconnect, click the taxi icon again.
The following screenshot shows the Chrome Developer Tools performance monitor while the browser tab is processing around 2100 messages per second.

With message dispatch happening at a latency of approximately 30ms, the CPU utilization averages at around 80%. Memory utilization is shown at a minimum of 29 MB, with 57 MB in total being allocated, and growing and shrinking freely.
If you used an existing project for this tutorial, you can remove the firewall rules you created. It's good practice to minimize open ports.
Delete the firewall rule you created to allow TCP on port 8000:
gcloud compute firewall-rules delete websocket
If you also created a firewall rule to allow SSH connectivity, delete the
firewall rule to allow TCP on port 22:
gcloud compute firewall-rules delete wss-ssh
If you don't want to use this project again, you can delete the project.
cabdash.js to geo-locate picked up and dropped off rides.Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-06-09 UTC.