Durable Execution
How would you change the way you code if your app couldn't fail? What if you could opt into crash-proof execution?
Durable Execution keeps your apps running, even under the worst scenarios. It records the progress and state of your workflows, so disruptions won't lose or corrupt your work. Whether your app is facing a service outage or unexpected shutdown, Durable Execution makes sure it picks up where it left off and you don't repeat work that was already done. This reliability lets your app handle disruptions and deliver results as if the issue didn't happen in the first place.
What's Durable Execution?
Durable Execution lets systems keep running and making forward progress even when things go wrong. It uses state persistance and automatic task retries to create a fault-tolerant environment that ensures reliable execution. Most commonly used for long-running and distributed systems, Durable Execution separates application state and progress from an application's hardware or cloud-based execution. If one of your computers suddenly dies, Durable Execution can transfer its running application workflow to another computer or processing center and pick up where it left off with no or minimal data loss.
Durable Execution platforms are resilient and support high levels of data integrity. They're built to run jobs that are as short as moments or as long as years. They'll keep running even if the underlying infrastructure changes over time. Adopting Durable Execution makes your code simpler and your deployments more observable.
Business logic focus
Durable Execution shrinks your code, letting you move external dependcy mitigation handling out of your apps. With Durable Execution, you can focus on your workflows and business logic, not on handling errors. The following code is real and it works:
Adopting the Durable Execution paradigm produces streamlined code:
-
Cleaner code. Move abnormal condition handling out of your logic.
-
Run forever. Don’t worry about crashes or system outages, even over years or decades.
-
Runs under every condition. Durable Execution separates oversight like progress tracking from your running code instances. When things go wrong, you can wait for them to resolve, move processing to other systems or to other regions and centers.
-
Deploy and run at the same time. Durable Execution makes sure that each time your code runs, it follows the original logic and pathway. Ship updates and patches without changing outcomes for your existing long-running processes.
You gain these advantages by adopting Durable Execution into your applications.
Temporal and Durable Execution
When using Temporal, Durable Execution separates your work's state and progress (called your "Event History") from its code. This abstracted oversight (called "orchestration") takes place on a central server. It uses a persistent state and progress data store, so if your computing breaks, your workflows won't.
Temporal's approach offers specific advantages:
-
Separation of management and execution. The Temporal Service isn't tied to specific task workers or computing platforms.
-
Scale as needed. Durable Execution scales with your business. Each execution is a unique progress abstraction. Add more computing resources to match your needs. This lets you managing additional work without affecting the consistency or reliability of your execution process.
-
Reduce latency. Durable Execution is fast and reliable. It processes tasks quickly and efficiently, ensuring short and predictable response times.
These features combine to provide responsive and reliable services.
Self healing and catastrophes
Imagine developing a system to handle reimbursements for your employees. Now, consider ways your process might get blocked -- and resolved. For example:
-
Your finance manager goes on vacation and can't approve a reimbursement. What do you do? You can set a time-out policy ("it's been more than 3 business days") and use alternate routing (redirect the approval to another coworker) or messaging ("Hey, I'll be out of the office until date") so every reimbursement gets addressed in time or delayed with full clarity.
-
Your direct deposit with the reimbursed funds failed. For example, there might be an outage at the recipient's bank. After setting a retry policy that won't overload the API provider’s capacity, your process can keep trying until the deposit works. After giving the provider time to recover, you can run your code again and succeed.
-
A printer for paper checks is jammed or out of paper. Not every employee opts into direct deposit. You may need someone to manually walk over and take care of the printer issue before the check can be cut and sent. Once resolved, they can sign off to confirm the check printing task was completed.
These examples cover both hybrid human-technology situations (approval and the printer) as well as fully automated ones (the bank).
With Durable Execution, any problem that recovers over time isn’t really a problem. You have a built-in way to retry your task later. Durable Execution keeps your tasks alive and moving, whether they're fully automated or integrated with human actions. It doesn't matter if your problems originate with computing, API calls, machinery, or personnel. Durable Execution is built to keep processes moving forwards, regardless.
To be clear, not all tasks heal over time. For example, one of your service providers might go out of business. Retrying your API calls won't get you anywhere if that happens. That's why Durable Execution is designed to handle catastrophes as well as intermittent issues.
When you run into outlier cases where something is truly broken, you need a solution like Temporal. With Temporal, you can patch your code to use a new provider and safely deploy your fixes. You can "replay" your flow's execution history to pick up real-world changes. This allows it to complete your process without losing or repeating work.
Temporal capably handles both the self-healing and catastrophic scenarios. To opt in, you need to be aware of the restrictions that allow Temporal to work its magic.
Temporal requirements
Temporal's use of Durable Execution depends on a few critical factors to ensure you won’t lose or repeat work. Temporal uses a technique known as History Replay, which depends on the following:
-
A durable store: Event History must be saved durably using your server's persistent store. A workflow run, or its abstract execution, must persist forever or until you explicitly no longer need it.
-
Idempotency: Idempotency means you design tasks to succeed once and only once. An idempotent approach prevents process duplication, like withdrawing money twice or accidentally shipping extra orders. Run-once actions maintain data integrity and prevent costly errors. Idempotency keeps operations from producing additional effects, protecting your processes from accidental or repeated actions, ensuring reliable execution.
-
Determinism: Durable Execution stores and tracks every workflow as an abstract entity. If you need to restart the process under extreme circumstances, that process must align with the original run. You can't change a random number or a real measurement (like temperature, time, or location) from the first run. If you do, you can't just pick up from where you left off because the work no longer matches the earlier history.
Durable Execution requires your workflow code to be deterministic. Every time it runs or is replayed, the outcomes must be the same. This is the only way centralized control can provide all of Durable Execution's features.
Does this mean you can’t use random numbers or run your work on different days or in different environments? Of course not. It means your code must reliably pick up from where it left off without changing the past in any logical way. This is called determinism. It ensures that given the same starting conditions, your workflows behave identically during each execution. Your results are reliable and assured.
With Temporal's pre-requisites in place, you're ready to adopt Durable Execution into your applications.
Temporal and Durable Execution
Durable Execution offers a powerful solution for building reliable and scalable applications. It ensures that your workflows continue seamlessly, even when facing failures or disruptions. Durable Execution is:
-
Stateful and persistent: Durable Execution tracks progress and maintains state even when your service restarts or experiences failures. It stores checkpoints in external databases and logs, ensuring your system handles outages or crashes without losing progress.
-
Fault tolerant: Durable Execution handles failures automatically, keeping tasks running even when parts of your system go down. When a failure occurs, it recovers tasks without interrupting your entire application.
-
Designed to separate concerns: Durable Execution splits oversight (task orchestration) from infrastructure management. Focus your app's logic on on business processes and application-level logic, like managing fraud alerts or insufficient funds in a banking app, and not on status recovery. Durable Execution handles state and errors related to platform issues, such as network outages or infrastructure failures so you don't have to.
-
Won't repeat work: Durable Execution ensures tasks are not repeated unnecessarily. When a task fails, it retries it using policies designed to ensure success without duplicating work. This keeps the process consistent, eliminating redundant work even when errors arise. You won't be sending out seven pizzas when the customer ordered just one.
-
Naturally recoverable: Even in worst-case scenarios, Durable Execution recovers execution without losing progress. Moving to new hardware or service center deployments won't interrupt your workflows.
-
Inherently observable: Durable Execution makes the state, health, and progress of your app fully visible. It tracks tasks in real time, so you see progress, failures, and retries as they happen.
These features work together to make sure your process will keep moving forward and complete successfully. Temporal's implementation of Durable Execution, whether you're self hosting or using our world class Temporal Cloud service, provide the solution.
Durable Execution helps you build reliable and scalable applications. It keeps your workflows running smoothly, even through system failures or disruptions. By separating your application logic from task orchestration, Durable Execution ensures that your processes are consistent, reliable, and error-free.
With automatic recovery, Durable Execution guarantees that tasks complete without losing or repeating work. It simplifies your code, lets you scale easily, and ensures that your app can handle any challenges along the way. Durable Execution makes sure your critical processes keep moving forward, no matter what.
Getting started with Temporal helps ensure your work is reliable, efficient, and scalable.