Skip to main content © 2026 NextBuild. All rights reserved.
The AI Agent Backend Checklist: 7 Infrastructure Requirements Your Chatbot Didn't Need
The AI Agent Backend Checklist: 7 Infrastructure Requirements Your Chatbot Didn't Need Chatbots are stateless. Agents accumulate state, make decisions, and run for minutes. Here are the 7 backend requirements that make or break production agents.
January 5, 2026 8 min readYou built a chatbot. It worked great. Now you're building an AI agent. You're using the same infrastructure.
This will fail.
Chatbots are request/response. User message in, AI message out. Stateless. Fast. Simple.
Agents are different. They hold state across multiple LLM calls. They execute multi-step workflows. They run for minutes or hours. They make autonomous decisions.
Your chatbot backend can't support this. Here are the 7 infrastructure requirements agents need that chatbots don't.
Requirement 1: Persistent State Management Chatbots can store conversation history in memory or simple session storage. Agents need durable state that survives crashes.
Why chatbots don't need this:
Each message is independent Context fits in a single prompt Failures just mean retry the message State rarely exceeds a few KB Why agents need this:
Multi-step workflows accumulate state Intermediate results must survive server restarts Failures require resuming from last checkpoint State can grow to MBs (documents, embeddings, analysis results) What to implement:
Database-backed state storage with versioning:
Storage options:
Ready to ship your MVP? Stop planning and start building. We turn your idea into a production-ready product in 6-8 weeks.
Get Your Free Prototype
PostgreSQL: Best for complex queries and relational data
Redis: Fast access, good for hot state
Convex: Real-time sync, great for collaborative agents
MongoDB: Flexible schema, good for variable state shape In-memory storage (lost on restart) Local files (doesn't scale, hard to query) Session cookies (size limits, security issues) Agents crash. Infrastructure fails. State must persist.
Requirement 2: Durable Execution and Crash Recovery Chatbots retry failed requests. Agents resume failed workflows.
Why chatbots don't need this:
Single request/response Retry from scratch is cheap (1-2 seconds) No accumulated work to lose Multi-step workflows take minutes Retrying from scratch wastes money (repeated LLM calls) Partial progress is valuable Users lose trust if crashes mean starting over Workflow orchestration with checkpointing:
Option 1: Temporal for durable workflows
If this crashes after findSources, Temporal resumes at analyzeSources with sources already available.
Option 2: Manual checkpointing
Agent fails at step 3 of 5 Retry from scratch Steps 1-2 re-executed (wasted LLM calls) If step 1 cost $0.50, you pay it twice Agent fails at step 3 Resume from step 3 Steps 1-2 results loaded from state No wasted LLM calls At scale, this saves thousands in unnecessary API costs.
Requirement 3: Tool Authorization and Sandboxing Chatbots don't call external APIs. Agents do. They need permission management.
Why chatbots don't need this:
Just LLM calls, no external actions Can't modify data or trigger side effects Risk is limited to bad responses Call external APIs (Stripe, SendGrid, databases) Perform actions with side effects (send email, charge card, delete data) Risk includes financial loss, data corruption, security breaches Tool-level permission system:
Read-only tools: No approval needed Write operations: Require user confirmation Financial actions: Always require approval + audit log Destructive operations: Block entirely or require multi-factor auth Don't give agents access to production databases directly. Use sandboxed APIs:
Chatbots can't break things. Agents can. Build guardrails.
Requirement 4: Comprehensive Observability Chatbots need basic logging. Agents need full workflow tracing.
Why chatbots don't need this:
Single LLM call to debug Input and output visible in logs Errors are straightforward Multi-step workflows with branching paths Need to see why agent chose specific tools Errors can occur at any step Debugging requires understanding full execution history Structured tracing for agent executions:
Each LLM call (model, tokens, latency, cost) Tool invocations (which tools, parameters, results) Decision points (why agent chose path A over B) State transitions (idle → running → complete) Errors and retry attempts OpenTelemetry: Standard tracing protocol LangSmith: Purpose-built for LLM applications Datadog/New Relic: General APM with custom metrics Temporal UI: Built-in workflow visibility Agent fails in production Look up session ID in traces See full execution history Identify which step failed and why See exact inputs that triggered failure Replay workflow with fixes Without observability, debugging agents is guesswork.
Requirement 5: Rate Limiting and Cost Controls Chatbots have predictable costs. Agents can spiral.
Why chatbots don't need this:
One LLM call per message Cost is roughly constant Usage tied directly to user messages Variable number of LLM calls per request Agents can loop indefinitely if poorly designed Tool calls add unpredictable costs Single user request might trigger dozens of API calls Multi-level rate limiting:
Track costs in real-time:
A poorly designed agent might:
Loop infinitely trying to solve an impossible task Make 50+ LLM calls for a simple request Call expensive tools repeatedly Cost $10 per user request instead of $0.10 Rate limits prevent runaway costs.
Requirement 6: Context Window Management Chatbots have simple context. Agents accumulate massive context.
Why chatbots don't need this:
Conversation history fits in context window Context is mostly user messages and bot responses Rarely exceeds 10K tokens Multi-step workflows accumulate state Tool call results add to context Documents and retrieved data pile up Can easily hit 100K+ tokens Summarize old messages Remove less relevant tool call results Keep only recent context + critical info Use vector search to retrieve relevant context dynamically Context window sizes directly impact cost:
10K tokens input: $0.01 per call 100K tokens input: $0.10 per call If your agent makes 10 calls, that's $0.10 vs $1.00 per session.
Managing context isn't just about staying under limits. It's about cost control.
Requirement 7: Multi-Step Error Recovery Chatbots retry failed requests. Agents need sophisticated recovery strategies.
Why chatbots don't need this:
Single step to retry User can just resend message No accumulated state to recover Failures can happen at any workflow step Some steps are expensive to retry Partial progress is valuable Recovery strategy depends on failure type Failure classification and recovery:
Pattern 1: Graceful degradation
Pattern 2: Partial results
Pattern 3: Human escalation
Chatbots fail fast. Agents fail smart.
Infrastructure Checklist Summary Before deploying agents to production, ensure you have:
1. Persistent State Management
[ ] Database for durable state storage [ ] State versioning [ ] Fast read/write access [ ] Cleanup policy for old sessions [ ] Workflow orchestration (Temporal or similar) [ ] Checkpoint saving after expensive steps [ ] Resume logic for crashed workflows [ ] Idempotent operations [ ] Permission system for tool access [ ] Approval flow for high-risk actions [ ] Audit logs for all tool calls [ ] Sandboxed APIs, not direct database access [ ] Distributed tracing (OpenTelemetry) [ ] LLM call tracking (model, tokens, cost) [ ] Tool call logging [ ] Error tracking with full context [ ] Max LLM calls per session [ ] Max tool calls per session [ ] Execution time limits [ ] Cost limits per session [ ] Real-time cost tracking [ ] Token counting [ ] Context pruning strategies [ ] Summarization for old context [ ] Relevant context retrieval [ ] Failure classification [ ] Retry logic with backoff [ ] Fallback strategies [ ] Graceful degradation [ ] Human escalation paths
Cost Implications Building this infrastructure isn't free.
Basic chatbot backend: $5K-10K Full agent backend: $20K-50K Monthly infrastructure costs:
Chatbot: $100-300/month (basic hosting + database) Agent: $500-2K/month (orchestration, state storage, observability) Get a detailed breakdown using our MVP calculator for your specific infrastructure needs.
Agents need more sophisticated infrastructure State storage grows with sessions Observability tools cost more at scale Orchestration platforms have licensing costs But the ROI is clear. Without proper infrastructure:
Agents crash and lose state (poor UX) Runaway costs from uncontrolled execution Debugging takes hours instead of minutes Security incidents from unauthorized tool calls The infrastructure pays for itself by preventing these failures.
When You Can Skip Components Not every agent needs all seven requirements.
Simple agents (< 3 steps, < 30 seconds):
Can skip: Durable execution (restart is cheap) Keep: State management, observability, rate limiting This is often the right starting point for MVP development , adding complexity only as needed.
Read-only agents (no side effects):
Can skip: Tool authorization (no risk) Keep: Everything else Internal tools (trusted users):
Can simplify: Authorization (trusted context) Keep: Observability, cost controls Can defer: Durable execution, advanced error recovery Keep: Basic state, observability, rate limits Start simple. Add infrastructure as agents grow in complexity and importance. Understanding what you can defer helps you budget accurately .
Ready to Build Production-Grade AI Agents? The difference between a demo and a deployed agent is infrastructure. Chatbot backends don't cut it.
NextBuild helps startups build AI agents with the right infrastructure from day one. We know which components you need now and which can wait. We build for production, not just prototypes.
We'll help you build agents that work in production, not just in demos.
90-95% of AI initiatives fail. Most teams build AI agents when they need simpler solutions. Here's how to know if you're about to waste money on the wrong approach.
typescript
interface AgentState {
sessionId : string ;
status : "idle" | "running" | "complete" | "failed" ;
currentStep : number ;
context : {
userInput : any ;
intermediateResults : any [];
toolCalls : ToolCall [];
decisions : Decision [];
};
createdAt : Date ;
updatedAt : Date ;
} typescript
import { proxyActivities } from "@temporalio/workflow" ;
export async function researchWorkflow ( topic : string ) : Promise < Report > {
const activities = proxyActivities ({ startToCloseTimeout: "5m" });
// Each step is persisted automatically
const sources = await activities. findSources (topic);
const analysis = await activities. analyzeSources (sources);
const report = await activities. generateReport (analysis);
return report;
} typescript
async function runAgentWithCheckpoints ( sessionId : string ) {
const state = await loadState (sessionId);
if (state.currentStep < 1 ) {
state.results.step1 = await executeStep1 (state.input);
state.currentStep = 1 ;
await saveState (sessionId, state);
}
if (state.currentStep < 2 ) {
state.results.step2 = await executeStep2 (state.results.step1);
state.currentStep = 2 ;
await saveState (sessionId, state);
}
// Crash anywhere, resume from last checkpoint
} typescript
interface ToolDefinition {
name : string ;
description : string ;
requiresAuth : boolean ;
permissions : Permission [];
riskLevel : "low" | "medium" | "high" ;
execute : ( args : any ) => Promise < any >;
}
const tools : ToolDefinition [] = [
{
name: "search_docs" ,
description: "Search internal documentation" ,
requiresAuth: true ,
permissions: [ "docs:read" ],
riskLevel: "low" ,
execute : async ( args ) => searchDocs (args.query),
},
{
name: "send_email" ,
description: "Send email to user" ,
requiresAuth: true ,
permissions: [ "email:send" ],
riskLevel: "high" , // Requires approval
execute : async ( args ) => {
// High risk tools require confirmation
const approved = await requestUserApproval ({
action: "send_email" ,
params: args,
});
if ( ! approved) throw new Error ( "Action not approved" );
return sendEmail (args);
},
},
]; typescript
// Bad: Direct database access
const tool = {
name: "update_user" ,
execute : async ( args ) => {
await db.users. update (args.userId, args.data); // Dangerous
},
};
// Good: Sandboxed API
const tool = {
name: "update_user_preferences" ,
execute : async ( args ) => {
// Only specific fields allowed
const allowedFields = [ "theme" , "notifications" ];
const sanitized = pick (args.data, allowedFields);
await userPreferencesAPI. update (args.userId, sanitized);
},
}; typescript
import { trace, context, SpanStatusCode } from "@opentelemetry/api" ;
async function executeAgentStep (
sessionId : string ,
stepName : string ,
fn : () => Promise < any >,
) {
const tracer = trace. getTracer ( "agent" );
return tracer. startActiveSpan (stepName, async ( span ) => {
span. setAttribute ( "session_id" , sessionId);
span. setAttribute ( "step_name" , stepName);
try {
const result = await fn ();
span. setStatus ({ code: SpanStatusCode. OK });
span. setAttribute ( "result_size" , JSON . stringify (result). length );
return result;
} catch (error) {
span. setStatus ({ code: SpanStatusCode. ERROR });
span. setAttribute ( "error" , error.message);
throw error;
} finally {
span. end ();
}
});
} typescript
interface AgentRateLimits {
maxLLMCallsPerSession : number ;
maxToolCallsPerSession : number ;
maxExecutionTimeSeconds : number ;
maxCostPerSession : number ; // in dollars
}
const limits : AgentRateLimits = {
maxLLMCallsPerSession: 20 ,
maxToolCallsPerSession: 10 ,
maxExecutionTimeSeconds: 60 ,
maxCostPerSession: 1.0 ,
};
async function executeWithLimits ( sessionId : string , fn : () => Promise < any >) {
const usage = await getSessionUsage (sessionId);
if (usage.llmCalls >= limits.maxLLMCallsPerSession) {
throw new Error ( "LLM call limit exceeded" );
}
if (usage.totalCost >= limits.maxCostPerSession) {
throw new Error ( "Cost limit exceeded" );
}
if (usage.executionTime >= limits.maxExecutionTimeSeconds) {
throw new Error ( "Execution time limit exceeded" );
}
return fn ();
} typescript
interface CostTracker {
sessionId : string ;
llmCalls : Array <{
model : string ;
inputTokens : number ;
outputTokens : number ;
cost : number ;
}>;
toolCalls : Array <{
tool : string ;
cost : number ;
}>;
totalCost : number ;
}
async function trackLLMCall (
sessionId : string ,
model : string ,
inputTokens : number ,
outputTokens : number ,
) {
const cost = calculateCost (model, inputTokens, outputTokens);
await db.costTracking. create ({
sessionId,
model,
inputTokens,
outputTokens,
cost,
timestamp: new Date (),
});
const total = await db.costTracking. sum ({ sessionId });
if (total > limits.maxCostPerSession) {
throw new Error ( `Cost limit exceeded: $${ total }` );
}
} typescript
interface ContextManager {
maxTokens : number ;
preserve : string []; // Always keep these
prune : ( context : Message []) => Message [];
}
function pruneContext ( messages : Message [], maxTokens : number ) : Message [] {
let totalTokens = countTokens (messages);
if (totalTokens <= maxTokens) return messages;
// Keep system message and last N user messages
const preserved = [
messages[ 0 ], // System message
... messages. slice ( - 3 ), // Last 3 messages
];
// Summarize middle messages
const middle = messages. slice ( 1 , - 3 );
const summary = await summarizeMessages (middle);
return [
messages[ 0 ],
{ role: "system" , content: `Previous context: ${ summary }` },
... messages. slice ( - 3 ),
];
} typescript
enum FailureType {
TRANSIENT = "transient" , // Retry will likely succeed
RATE_LIMIT = "rate_limit" , // Retry after backoff
INVALID_INPUT = "invalid_input" , // Don't retry, ask user
TOOL_FAILURE = "tool_failure" , // Try alternative tool
UNRECOVERABLE = "unrecoverable" , // Fail gracefully
}
interface RecoveryStrategy {
maxRetries : number ;
backoffMs : number ;
fallbackAction ?: () => Promise < any >;
}
async function executeWithRecovery (
fn : () => Promise < any >,
recovery : RecoveryStrategy ,
) : Promise < any > {
let attempt = 0 ;
while (attempt < recovery.maxRetries) {
try {
return await fn ();
} catch (error) {
const failureType = classifyError (error);
switch (failureType) {
case FailureType. TRANSIENT :
attempt ++ ;
await sleep (recovery.backoffMs * Math. pow ( 2 , attempt));
break ;
case FailureType. RATE_LIMIT :
await sleep ( 60000 ); // Wait 1 minute
attempt ++ ;
break ;
case FailureType. TOOL_FAILURE :
if (recovery.fallbackAction) {
return await recovery. fallbackAction ();
}
throw error;
case FailureType. INVALID_INPUT :
case FailureType. UNRECOVERABLE :
throw error;
}
}
}
throw new Error ( "Max retries exceeded" );
} typescript
try {
return await callPrimaryTool (args);
} catch (error) {
// Fall back to simpler approach
return await callFallbackTool (args);
} typescript
try {
return await completeWorkflow (sessionId);
} catch (error) {
// Return what we have so far
const partial = await getPartialResults (sessionId);
return {
success: false ,
partialResults: partial,
error: error.message,
};
} typescript
try {
return await autonomousExecution (task);
} catch (error) {
if (error.type === "COMPLEX_DECISION" ) {
// Can't automate this, ask human
return await requestHumanInput (task, error.context);
}
throw error;
}