Skip to main content

Deploying a DBOS App on Google Cloud Run

This guide covers deploying a DBOS application to Google Cloud Run with a Cloud SQL for PostgreSQL database. It includes best practices for security, availability, and scalability. This guide assumes DBOS Conductor is hosted separately.

Choosing a Cloud Run Execution Mode

Cloud Run offers three execution modes, each mapping differently to DBOS workloads:

  • Service handles HTTP requests and auto-scales based on traffic and CPU usage. Best for synchronous workflows.
  • Worker Pool runs always-on instances with no HTTP listener. Best for queue-heavy applications that need all DBOS background services online at all times.
  • Job runs a container to completion and exits. Useful for periodic batch work with no always-on requirement.

Service

A Cloud Run service listens for HTTP requests and scales automatically based on traffic and CPU usage.

DBOS runs background services—the scheduler, queue runner, recovery service, and Conductor connection—that operate independently of HTTP requests. These require CPU at all times, so you should use instance-based billing and set --min-instances=1 to keep one instance always on. This is similar to the requirements for using sidecars on Cloud Run.

Database connection exhaustion

In Service mode, use a connection pooler like PgBouncer in front of your Cloud SQL instance. Cloud Run can scale to hundreds of instances under load, which may exhaust your database's maximum connections. PgBouncer must run in session mode—DBOS uses LISTEN/NOTIFY, which is incompatible with transaction mode.

Worker Pool

A Cloud Run worker pool runs always-on containers without an HTTP listener. Because instances never scale to zero, all DBOS background services stay online.

Worker pools suit DBOS applications that rely heavily on queues. Every instance actively dequeues and processes workflows, and the pool can be resized via the Cloud Run REST API.

Worker pools don't auto-scale, but you can implement an external scaler from within the pool. Use a DBOS scheduled workflow that periodically checks queue length with ListWorkflows and calls the Cloud Run Admin API to resize the pool based on load. This works even from within the pool because DBOS guarantees only one process runs a scheduled function at a time, even across multiple instances. This prevents a thundering herd of conflicting resize requests.

See Scaling a worker pool below for a full walkthrough.

Job

A Cloud Run job runs a container to completion and exits without listening for HTTP requests.

Because DBOS has a built-in scheduler, you typically don't need Cloud Run Jobs. However, Jobs suit applications that consist entirely of periodic work with no always-on requirement—the job starts, runs workflows to completion, and shuts down, so you only pay for the time it runs.

Deploying to Cloud Run

Deploying a DBOS application to Cloud Run is no different from deploying any other containerized application. You need a Dockerfile, a database, and the standard Cloud Run deployment commands.

The one DBOS-specific detail is the database connection string: it must be provided in key=value format (e.g., user=postgres password=secret database=myappdb host=/cloudsql/...). On Cloud Run, use the --add-cloudsql-instances flag to mount the Cloud SQL Auth Proxy Unix socket, then pass the socket path as the host parameter. This gives your app a private, encrypted path to the database with no public IP.

Schema migration

By default, DBOS creates its system tables on startup. If your Cloud Run service account doesn't have DDL privileges, run dbos migrate with a privileged user before deploying.

Walkthrough: deploying a DBOS app

This walkthrough deploys a sample DBOS Go application (source code) to Cloud Run with a Cloud SQL PostgreSQL database. It covers project setup, infrastructure, and deployment in both Service and Worker Pool modes.


Google Cloud project setup

You need a Google Cloud project with billing enabled and the required APIs turned on.

Install the Google Cloud SDK, then:

gcloud auth login
gcloud projects create [YOUR_PROJECT_ID]
gcloud config set project [YOUR_PROJECT_ID]

gcloud beta billing projects link [YOUR_PROJECT_ID] \
--billing-account=[YOUR_BILLING_ACCOUNT_ID]

gcloud services enable \
run.googleapis.com \
sqladmin.googleapis.com \
compute.googleapis.com \
servicenetworking.googleapis.com \
secretmanager.googleapis.com \
artifactregistry.googleapis.com \
cloudbuild.googleapis.com

VPC networking for Cloud SQL

Create a VPC with a subnet for Cloud Run, allocate an IP range for VPC peering, and establish the peering connection:

# Create VPC
gcloud compute networks create main-vpc --subnet-mode=custom

# Create subnet for Cloud Run
gcloud compute networks subnets create run-subnet \
--network=main-vpc \
--region=us-central1 \
--range=10.0.0.0/24

# Allocate IP range for Cloud SQL peering
gcloud compute addresses create google-managed-services-default \
--global \
--purpose=VPC_PEERING \
--prefix-length=16 \
--description="Peering for Google Cloud SQL" \
--network=main-vpc

# Establish VPC peering
gcloud services vpc-peerings connect \
--service=servicenetworking.googleapis.com \
--ranges=google-managed-services-default \
--network=main-vpc

Cloud SQL PostgreSQL instance

Create a private-IP-only PostgreSQL instance and an application database:

# Create the Cloud SQL instance (private IP only)
gcloud sql instances create my-postgres-instance \
--database-version=POSTGRES_17 \
--tier=db-perf-optimized-N-2 \
--region=us-central1 \
--root-password="[YOUR_STRONG_PASSWORD]" \
--network=projects/[YOUR_PROJECT_ID]/global/networks/main-vpc \
--no-assign-ip

# Create the application database
gcloud sql databases create myappdb --instance=my-postgres-instance

Store the database password in Secret Manager:

echo -n "[YOUR_STRONG_PASSWORD]" | gcloud secrets create db-password \
--data-file=- \
--replication-policy="automatic"

Store the DBOS Conductor API key:

echo -n "[YOUR_CONDUCTOR_API_KEY]" | gcloud secrets create conductor-api-key \
--data-file=- \
--replication-policy="automatic"
note

For production, consider creating a dedicated database user instead of using the postgres superuser. Grant it only the permissions your application needs.


IAM service account and permissions

Create a service account for Cloud Run and grant it access to the secrets and Cloud SQL:

# Create service account
gcloud iam service-accounts create run-identity \
--display-name="Cloud Run Service Account"

# Grant access to the database password secret
gcloud secrets add-iam-policy-binding db-password \
--member="serviceAccount:run-identity@[YOUR_PROJECT_ID].iam.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor"

# Grant access to the Conductor API key secret (if using Conductor)
gcloud secrets add-iam-policy-binding conductor-api-key \
--member="serviceAccount:run-identity@[YOUR_PROJECT_ID].iam.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor"

# Grant Cloud SQL client role
gcloud projects add-iam-policy-binding [YOUR_PROJECT_ID] \
--member="serviceAccount:run-identity@[YOUR_PROJECT_ID].iam.gserviceaccount.com" \
--role="roles/cloudsql.client"

When deploying with --source, Cloud Build runs under the project's default Compute Engine service account, not run-identity. This account needs the cloudbuild.builds.builder role to build and push container images:

# [YOUR_PROJECT_NUMBER] is the numeric project number (not the project ID)
# Find it in the Google Cloud Console under project settings
gcloud projects add-iam-policy-binding [YOUR_PROJECT_ID] \
--member="serviceAccount:[YOUR_PROJECT_NUMBER]-compute@developer.gserviceaccount.com" \
--role="roles/cloudbuild.builds.builder"

Sample Dockerfile

Multi-stage build with a distroless runtime image:

Dockerfile
# --- Build Stage ---
FROM golang:1.24 as builder

WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download && go mod tidy
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o main .

# --- Run Stage ---
FROM gcr.io/distroless/static-debian12
COPY --from=builder /app/main /
EXPOSE 8080
CMD ["/main"]

You can test the build locally against a local PostgreSQL instance before deploying:

# Build the image
docker build -t dbos-go-starter-image .

# Run with a local Postgres
docker run --rm -p 8080:8080 \
-e DB_USER=postgres \
-e DB_PASSWORD="your_local_db_password" \
-e DB_NAME=myappdb \
-e INSTANCE_UNIX_SOCKET=host.docker.internal \
-e DBOS_CONDUCTOR_KEY="your_conductor_api_key" \
dbos-go-starter-image

Then hit http://localhost:8080/workflow/1 to start a DBOS workflow.


Deploy

Deploy from source—Cloud Build automatically builds your container and pushes it to Artifact Registry.

gcloud run deploy my-app \
--source . \
--region us-central1 \
--no-cpu-throttling \
--min-instances=1 \
--service-account run-identity@[YOUR_PROJECT_ID].iam.gserviceaccount.com \
--network main-vpc \
--subnet run-subnet \
--vpc-egress private-ranges-only \
--add-cloudsql-instances [YOUR_PROJECT_ID]:us-central1:my-postgres-instance \
--set-secrets DB_PASSWORD=db-password:latest,DBOS_CONDUCTOR_KEY=conductor-api-key:latest \
--set-env-vars DB_USER=postgres,DB_NAME=myappdb,INSTANCE_UNIX_SOCKET=/cloudsql/[YOUR_PROJECT_ID]:us-central1:my-postgres-instance \
--allow-unauthenticated

Key flags:

  • --no-cpu-throttling Enables instance-based billing so DBOS background services keep running between requests.
  • --min-instances=1 Keeps one instance always on so background services never stop.
  • --set-secrets Injects secrets from Secret Manager as environment variables.
  • --add-cloudsql-instances Mounts the Cloud SQL Auth Proxy socket, letting the app connect via INSTANCE_UNIX_SOCKET.
  • --source . Builds your Dockerfile remotely via Cloud Build.
  • --allow-unauthenticated Makes the service publicly accessible.

Build logs

During deployment, gcloud streams Cloud Build output to your terminal. You can also view logs in the Cloud Build console or with:

gcloud builds list --limit=5 --region=us-central1
gcloud builds log [BUILD_ID] --region=us-central1

Service URL (service mode only)

On successful deployment, gcloud prints the service URL:

Service URL: https://my-app-XXXXXXXXXX.us-central1.run.app

Retrieve it later with:

gcloud run services describe my-app --region us-central1 --format='value(status.url)'

Application logs

For a service:

gcloud logging read \
'resource.type=cloud_run_revision AND resource.labels.service_name=my-app' \
--limit 100 --format='text'

For a worker pool:

gcloud logging read \
'resource.type=cloud_run_worker_pool AND resource.labels.worker_pool_name=my-app' \
--limit 100 --format='text'

Or view logs in the Cloud Run console under the Logs tab.

Test the deployment (service mode only)

Start a DBOS workflow:

curl -X GET https://my-app-XXXXXXXXXX.us-central1.run.app/workflow/1

This runs the three-step ExampleWorkflow with task ID 1. Each step takes 5 seconds. Poll progress with:

curl -X GET https://my-app-XXXXXXXXXX.us-central1.run.app/last_step/1

Returns 1, 2, or 3 depending on how many steps have completed.

Scaling a Worker Pool

Worker pools don't auto-scale, but you can build an external scaler inside the pool using a DBOS scheduled workflow. DBOS guarantees only one instance runs a scheduled function at a time—even across a multi-instance pool—preventing a thundering herd of conflicting resize requests.

The complete implementation is in the cloud-run demo app.

IAM permissions

The worker pool's service account needs permission to manage Cloud Run resources and to act as itself when creating new revisions.

IAM commands
# Grant Cloud Run admin role
gcloud projects add-iam-policy-binding [YOUR_PROJECT_ID] \
--member="serviceAccount:run-identity@[YOUR_PROJECT_ID].iam.gserviceaccount.com" \
--role="roles/run.admin"

# Grant actAs permission on the service account itself
gcloud iam service-accounts add-iam-policy-binding \
run-identity@[YOUR_PROJECT_ID].iam.gserviceaccount.com \
--member="serviceAccount:run-identity@[YOUR_PROJECT_ID].iam.gserviceaccount.com" \
--role="roles/iam.serviceAccountUser"

The scheduled workflow periodically checks the queue depth and resizes the pool to match by calling the Cloud Run Admin API. It authenticates with a short-lived access token from the GCE metadata server, reads the current instance count with a GET, and updates it with a PATCH.

Here's an example in Go (the same approach works in any DBOS-supported language):

Scaling workflow snippet
main.go
func ScalingWorkflow(ctx dbos.DBOSContext, scheduledTime time.Time) (string, error) {
// 1. Read queue length by listing all enqueued/pending workflows
workflows, err := dbos.ListWorkflows(ctx, dbos.WithQueuesOnly(), dbos.WithQueueName(taskQueue.Name))
if err != nil {
return "", fmt.Errorf("failed to list workflows: %w", err)
}
qlen := len(workflows)

// 2. Get current instance count from the Cloud Run Admin API
currentInstances, err := dbos.RunAsStep(ctx, func(stepCtx context.Context) (int, error) {
return getWorkerPoolInstances(stepCtx)
})
if err != nil {
return "", fmt.Errorf("failed to get current instances: %w", err)
}

// 3. Compute desired instances: ceil(queue_depth / worker_concurrency)
desiredInstances := int(math.Ceil(float64(qlen) / float64(WORKER_CONCURRENCY)))
if desiredInstances < 1 {
desiredInstances = 1
}

// 4. Resize the pool if needed
if desiredInstances != currentInstances {
_, err := dbos.RunAsStep(ctx, func(stepCtx context.Context) (string, error) {
return setWorkerPoolInstances(stepCtx, desiredInstances)
})
if err != nil {
return "", fmt.Errorf("failed to set instances: %w", err)
}
}

return fmt.Sprintf("qlen=%d, instances=%d", qlen, desiredInstances), nil
}

Upgrading Workflow Code

Deploying new code to Cloud Run creates a new revision. By default, Cloud Run routes all traffic to the latest revision immediately. Understanding how revisions interact with upgrading DBOS code is key to safely deploying changes without disrupting in-progress workflows.

DBOS supports two strategies for deploying breaking changes: versioning and patching. Each maps differently to Cloud Run's revision model depending on whether you run a Service or a Worker Pool.

Cloud Run revisions

Every gcloud run deploy or gcloud beta run worker-pools deploy creates a new revision (e.g., my-app-00001-abc). Cloud Run injects the revision name into every container as the K_REVISION environment variable.

For services, you can split traffic between revisions, enabling blue-green or canary deployments. By default, 100% of traffic goes to the latest revision.

For worker pools, a new deployment replaces all running instances. Old instances are shut down regardless of what they were processing.

Service mode

Versioning

Set ApplicationVersion to K_REVISION so each Cloud Run revision gets a distinct DBOS version. Workflows started on a revision are tagged with that revision's version. A DBOS process only recovers workflows matching its own version, so old workflows won't be replayed with new code. To drain old workflows, keep the previous revision active (with a share of traffic or --min-instances=1) until all its workflows complete. You can check with ListWorkflows.

Patching

With patching, fix the application version to a constant and enable patching in the DBOS configuration. Since all revisions share the same version, new containers automatically recover in-progress workflows from previous deployments. Cloud Run routes traffic to the latest revision by default, so new requests go to the new code while the patching logic in your workflow handles the transition for recovered workflows.

Worker pool mode

When you deploy a new worker pool revision, Cloud Run replaces all running instances. Old instances shut down, and new instances start with the new code.

Versioning

If ApplicationVersion is set to K_REVISION, the new instances have a different version than workflows started by the old instances. Those in-progress workflows won't be automatically recovered because the version doesn't match.

To migrate them, fork the old workflows to the new version using ForkWorkflow with the new ApplicationVersion. The new workers will then execute the forked workflows. You can automate this as part of a post-deployment step or a startup routine that lists old-version workflows and forks them.

Patching

With a fixed application version and patching enabled, the new worker pool instances automatically recover workflows from the previous deployment. Conductor detects that the old instances went down and that new instances with the same version are available, triggering recovery without any manual intervention.

Advanced scenarios

More complex deployment strategies are possible. You can combine versioning and patching—for example, using versioning for major changes and patching for hotfixes within a version. In Service mode, you can use Cloud Run revision tags to route a subset of traffic to a tagged revision, letting you test new workflow code in production before shifting all traffic.