Workflows
Workflows provide durable execution so you can write programs that are resilient to any failure.
Workflows are comprised of steps, which are ordinary Python functions annotated with @DBOS.step()
.
If a workflow is interrupted for any reason (e.g., an executor restarts or crashes), when your program restarts the workflow automatically resumes execution from the last completed step.
Here's an example workflow that sends a confirmation email, sleeps for a while, then sends a reminder email. Using a workflow guarantees that even if the sleep duration is weeks or months, even if your program crashes or restarts many times, the reminder email is always sent on schedule (and the confirmation email is never re-sent).
@DBOS.workflow()
def reminder_workflow(email: str, time_to_sleep: int):
send_confirmation_email(email)
DBOS.sleep(time_to_sleep)
send_reminder_email(email)
Here are some example apps demonstrating what workflows can do:
- Fault-Tolerant Checkout: No matter how many times you crash this online storefront, it always correctly processes your orders.
- Scheduled Reminders: Send a reminder email to yourself on any day in the future—even if it's months away.
- Document Ingestion Pipeline: Use workflows and queues to reliably process thousands of documents concurrently.
Reliability Guarantees
Workflows provide the following reliability guarantees. These guarantees assume that the application and database may crash and go offline at any point in time, but are always restarted and return online.
- Workflows always run to completion. If a DBOS process is interrupted while executing a workflow and restarts, it resumes the workflow from the last completed step.
- Steps are tried at least once but are never re-executed after they complete. If a failure occurs inside a step, the step may be retried, but once a step has completed, it will never be re-executed.
- Transactions commit exactly once. Once a workflow commits a transaction, it will never retry that transaction.
If an exception is thrown from a workflow, the workflow terminates—DBOS records the exception, sets the workflow status to ERROR
, and does not recover the workflow.
This is because uncaught exceptions are assumed to be nonrecoverable.
If your workflow performs operations that may transiently fail (for example, sending HTTP requests to unreliable services), those should be performed in steps with configured retries.
DBOS provides tooling to help you identify failed workflows and examine the specific uncaught exceptions.