PydanticAI + Temporal: Building Durable AI Agents That Survive Crashes
Most AI agents crash and lose state. PydanticAI's Temporal integration changes that. Here's how to build production agents that survive failures and recover gracefully.
December 16, 2025 7 min read
Your AI agent just processed 40 minutes of a complex workflow. Then your API rate limit hit. The agent crashed. All state lost. Start over from scratch.
This happens every day to teams running AI agents in production. The difference between a demo and a deployment isn't intelligence. It's durability.
PydanticAI v1, released in September 2025, solved this with native Temporal integration. Your agents can now crash, restart, and pick up exactly where they left off.
Why Most AI Agents Can't Handle Production
AI agents fail differently than traditional applications. They make dozens of LLM calls, each taking 2-30 seconds. They orchestrate external APIs. They run for minutes or hours, not milliseconds.
Traditional error handling breaks down at this scale.
Standard chatbot errors:
User message times out: Retry once
API fails: Return error message
Context window exceeded: Truncate history
AI agent errors that kill workflows:
20 minutes into a workflow, OpenAI rate limit hits
Agent crashes mid-execution, losing all accumulated state
External API times out during step 8 of 15
Server deploys during long-running agent task
The difference is state. Chatbots are stateless. Each message is independent. Agents accumulate context, make decisions, and build on previous steps.
When an agent crashes, you don't just lose the current message. You lose the entire workflow.
Stop planning and start building. We turn your idea into a production-ready product in 6-8 weeks.
Durable execution means your agent's progress is automatically persisted. If anything fails, the agent resumes from the last successful step.
Without durability:
With durability:
PydanticAI's Temporal integration provides this out of the box. Every agent action gets logged to Temporal's durable execution engine. Crashes become resume points, not restart points.
This matters when you're running agents that:
Process multi-step workflows taking 5+ minutes
Make expensive LLM calls you don't want to repeat
Interact with external systems that can't be safely retried
Need to maintain conversation context across failures
The PydanticAI Approach to Agent Durability
PydanticAI was built for production from day one. Type safety with Pydantic. Performance with pydantic-core and PyO3. And durability with Temporal.
Most Python AI frameworks bolt on persistence as an afterthought. PydanticAI made it foundational.
Core durability features:
Native Temporal workflow integration
Automatic state persistence between agent steps
Declarative retry policies per tool
Type-safe serialization of agent context
Rollback support for failed multi-step operations
The framework handles the complexity. You write agent logic as if failures don't exist. Under the hood, every decision point gets persisted.
Simple PydanticAI agent with durability:
If this agent crashes after finding sources, it doesn't re-search. It resumes at the summary step with the sources already found.
Temporal Integration: How It Works Under the Hood
Temporal provides the durability layer. PydanticAI provides the agent abstraction. Together, they create crash-resistant workflows.
What Temporal does:
Persists workflow state after every step
Replays workflow history on restart
Handles retries with exponential backoff
Manages timeouts and compensation logic
Provides visibility into long-running workflows
What PydanticAI does:
Wraps agent logic in Temporal workflows
Serializes agent state with Pydantic models
Maps agent tools to Temporal activities
Handles LLM-specific retry logic
Maintains type safety across crashes
When you mark an agent as @temporal_agent, PydanticAI generates a Temporal workflow. Each agent action becomes an activity. Temporal persists every activity result.
The durability guarantee:
Agent starts task
Temporal logs intention
Agent completes step 1
Temporal persists result
Agent crashes
Temporal detects failure
Workflow replays from last persisted state
Agent resumes at step 2 with step 1 results intact
This isn't eventual consistency. It's deterministic replay. Your agent sees the exact same state it would have seen if it never crashed.
Crash Recovery Patterns That Actually Work
Durability enables recovery patterns impossible with stateless agents.
Pattern 1: Expensive Call Memoization
LLM calls cost money and time. Don't repeat them after crashes.
Pattern 2: Human-in-the-Loop Workflows
Agents that wait for human approval need to survive server restarts.
Pattern 3: Graceful Degradation
When external APIs fail, save progress and retry with backoff.
Production Reliability: 60% to 90% Success Rates
The data is clear. Well-designed AI systems went from 40% error rates to 10% error rates. That's the difference between a failed pilot and a production deployment.
Durability is foundational to those improvements.
Failure modes durability solves:
Transient API failures: Retry without losing context
Rate limiting: Backoff and resume, don't restart
Timeouts on long operations: Resume after timeout
Server deployments: Workflows survive restarts
Cascading LLM errors: Rollback to last good state
Failure modes durability doesn't solve:
Bad prompts: Still need prompt engineering
Insufficient context: Still need RAG or fine-tuning
Logic errors in agent code: Still need testing
Model hallucinations: Still need validation
Durability gets you from 60% to 90% reliability. The last 10% requires better agent design, not just better infrastructure.
But that first 30% improvement? That's what makes agents deployable. Going from "works most of the time" to "recovers from failures gracefully" is the difference between prototype and product.
When You Actually Need Durable Agents
Not every AI feature needs Temporal. Durability adds complexity. Only add it when failure is expensive.
You need durable agents when:
Workflows run longer than 2 minutes
Individual steps cost more than $0.10
Failures force users to restart from scratch
External systems can't safely retry operations
Agents wait for external events (webhooks, human approval)
You don't need durable agents when:
Simple chatbot interactions under 30 seconds
Stateless Q&A with no workflow context
Rapid prototyping where you're still validating UX in your MVP development
Budget or timeline can't support infrastructure complexity
For teams building production AI agents that orchestrate multi-step workflows, durability isn't optional. The question is when to add it, not whether.
Most teams discover they need it after their first production incident. An agent fails mid-workflow. Users lose 20 minutes of progress. Support tickets flood in. This infrastructure complexity is often underestimated in initial AI development budgets.
PydanticAI lets you build durability in from the start. Temporal handles the hard parts. Your agents survive failures your users never see.
Implementation Checklist for Durable Agents
Getting started with PydanticAI and Temporal requires infrastructure setup, but the framework handles most complexity.
Infrastructure requirements:
Temporal server (cloud or self-hosted)
Temporal Python SDK installed
PydanticAI v1+ (September 2025 release)
Persistent storage for Temporal state (PostgreSQL recommended)
Development workflow:
Design agent workflow as discrete steps
Identify which steps need persistence
Wrap agent in @temporal_agent decorator
Define retry policies per tool
Test crash recovery in staging
Monitor workflow execution in Temporal UI
Cost considerations:
Temporal Cloud: $200-500/month for small deployments
Self-hosted Temporal: Free, but requires DevOps
Increased DB storage for workflow history
Reduced LLM costs from avoided retries
The ROI calculation is straightforward. If you're spending $500/month on unnecessary LLM retries due to crashes, durability pays for itself immediately. Use our MVP calculator to estimate the full infrastructure costs for durable AI agents.
Monitoring and Debugging Durable Workflows
Temporal's UI shows exactly where agents crash and why. No more mystery failures.
What you can see:
Complete workflow history with timestamps
Exact step where failure occurred
Input/output of every agent action
Retry attempts and backoff timing
Pending workflows waiting for external events
Debugging workflow:
Agent fails in production
Open Temporal UI
Find workflow by ID
See full execution history
Identify failing step
Check input/output at crash point
Fix and replay workflow from failure
This visibility transforms debugging. Traditional logs show "agent crashed." Temporal shows "agent crashed at step 8, processing document 42, after 3 retry attempts, with this exact input."
PydanticAI's type safety makes this even better. Serialized state is fully typed. No runtime surprises about what data exists at each step.
The Future of Durable AI Agents
PydanticAI's Temporal integration is just the beginning. The pattern will spread to other frameworks.
What's coming:
Deeper integration with observability tools
Automatic cost tracking per workflow step
Cross-framework durability standards
Simplified local development with Temporal
The teams winning with AI agents in production all solve durability. Some build custom solutions. Some use Temporal directly. PydanticAI makes it a framework primitive.
If you're building AI agents that matter, build them to survive failures. Your users won't notice the crashes that never interrupt their workflows.
Ready to Build Production-Grade AI Agents?
Durability is table stakes for production AI. But it's only one piece of the puzzle.
NextBuild helps startups ship AI features that work in production, not just demos. We build with frameworks like PydanticAI, infrastructure like Temporal, and patterns proven across dozens of deployments.
We'll help you build agents that survive the real world.
Chatbots are stateless. Agents accumulate state, make decisions, and run for minutes. Here are the 7 backend requirements that make or break production agents.
@temporal_agentasync def analysis_agent(documents: List[str]): # These embeddings are expensive # Temporal ensures we only compute once embeddings = await agent.run_tool("generate_embeddings", documents) # If crash happens here, embeddings are already saved clusters = await agent.run_tool("cluster_documents", embeddings) return clusters
python
@temporal_agentasync def approval_workflow(request: Request): analysis = await agent.run("Analyze request") # Wait for human approval (could take hours) approval = await wait_for_approval(analysis) # Server can restart during wait # Workflow resumes after approval if approval.approved: await agent.run("Execute approved action")
python
@temporal_agent( retry_policy=RetryPolicy( maximum_attempts=3, initial_interval=timedelta(seconds=5), backoff_coefficient=2.0 ))async def api_integration_agent(data: InputData): # Each API call has its own retry policy enriched = await agent.run_tool("enrich_data", data) validated = await agent.run_tool("validate_data", enriched) # If validation fails 3 times, workflow fails gracefully # But enrichment step doesn't re-run return validated