The rise of AI coding assistants like Cursor, GitHub Copilot, and Claude has transformed software development. But what if your AI agents could go beyond code suggestions? What if they could autonomously write, test, debug, and deploy complete applications in secure, isolated environments?
In this tutorial, we'll build a production-ready autonomous coding agent that uses Cognitora's secure code execution sandboxes to safely run and test code while maintaining enterprise-grade isolation and security.
Why Cognitora for Coding Agents?
Before diving into the implementation, let's understand why Cognitora is ideal for building coding agents:
Security First
- Complete Isolation: Every execution runs in a dedicated Firecracker VM
- No Cross-Contamination: Agent experiments can't affect other users or the host system
- Resource Limits: Built-in CPU, memory, and timeout controls
Production Ready
- Sub-Second Cold Starts: Session pooling enables <100ms response times
- Multi-Language Support: Python, Node.js, and Bash out of the box
- Network Control: Granular control over internet access per execution
Developer Experience
- Simple API: RESTful API with official Python and JavaScript SDKs
- File Persistence: Upload files, persist state across executions
- Real-Time Output: Stream stdout/stderr as code executes
What We'll Build
Our autonomous coding agent will have three core capabilities:
- 🏗️ Project Scaffolding - Generate complete working applications from natural language prompts
- 🐛 Bug Fixing & Refactoring - Analyze existing code, identify issues, and apply fixes
- ✅ Testing & Validation - Run tests, check linting, and verify functionality
Architecture Overview
Implementation: Building the Agent
We'll build our agent using AgentKit (for orchestration) and Cognitora (for code execution). You can adapt this to LangChain, CrewAI, or any other framework.
Step 1: Setting Up the Environment
First, install the required dependencies:
npm install @inngest/agent-kit @cognitora/sdk zod
# or
pip install agentkit cognitora anthropic
Set your API keys:
export ANTHROPIC_API_KEY="your-anthropic-key"
export COGNITORA_API_KEY="your-cognitora-key"
Step 2: Initialize Cognitora Client
import { Cognitora } from '@cognitora/sdk';
const cognitora = new Cognitora({
apiKey: process.env.COGNITORA_API_KEY,
});
// Create a persistent session for our agent's workspace
const session = await cognitora.sessions.create({
language: 'bash',
name: 'coding-agent-workspace',
});
console.log(`Session created: ${session.id}`);
The session acts as our agent's persistent workspace, maintaining file state across multiple executions.
Step 3: Design the Agent Tools
Following best practices from Anthropic, we'll create focused, purpose-driven tools rather than generic ones.
Tool 1: Create or Update Files
import { createTool } from '@inngest/agent-kit';
import { z } from 'zod';
const createOrUpdateFilesTool = createTool({
name: 'createOrUpdateFiles',
description: 'Create or update multiple files in the workspace. Use this for writing code files, configuration, tests, etc.',
parameters: z.object({
files: z.array(
z.object({
path: z.string().describe('Relative file path (e.g., "src/app.ts")'),
content: z.string().describe('Complete file content'),
})
),
}),
handler: async ({ files }) => {
try {
// Write files to Cognitora session
await Promise.all(
files.map(async (file) => {
await cognitora.sessions.files.write(session.id, {
path: file.path,
content: file.content,
});
})
);
return {
success: true,
message: `Successfully created/updated ${files.length} file(s): ${files.map(f => f.path).join(', ')}`,
};
} catch (error) {
return {
success: false,
error: `Failed to write files: ${error.message}`,
};
}
},
});
Tool 2: Read Files
const readFilesTool = createTool({
name: 'readFiles',
description: 'Read contents of one or more files from the workspace',
parameters: z.object({
paths: z.array(z.string()).describe('Array of file paths to read'),
}),
handler: async ({ paths }) => {
try {
const contents = await Promise.all(
paths.map(async (path) => {
const content = await cognitora.sessions.files.read(session.id, path);
return { path, content };
})
);
return {
success: true,
files: contents,
};
} catch (error) {
return {
success: false,
error: `Failed to read files: ${error.message}`,
};
}
},
});
Tool 3: Execute Code
const executeCodeTool = createTool({
name: 'executeCode',
description: 'Execute code in the workspace (Python, Node.js, or Bash). Use for running the application, checking if code works, or quick tests.',
parameters: z.object({
language: z.enum(['python', 'javascript', 'bash']).describe('Programming language'),
code: z.string().describe('Code to execute'),
networking: z.boolean().optional().describe('Enable network access (default: false)'),
}),
handler: async ({ language, code, networking = false }) => {
try {
const result = await cognitora.codeInterpreter.execute({
sessionId: session.id,
code,
language,
networking,
});
return {
success: !result.error,
stdout: result.data?.outputs?.find(o => o.type === 'stdout')?.data || '',
stderr: result.data?.outputs?.find(o => o.type === 'stderr')?.data || '',
error: result.error,
executionTime: result.data?.executionTime,
};
} catch (error) {
return {
success: false,
error: `Execution failed: ${error.message}`,
};
}
},
});
Tool 4: Run Terminal Commands
const runTerminalTool = createTool({
name: 'runTerminal',
description: 'Run terminal commands in the workspace. Use for npm install, running tests, linting, git operations, etc. Remember to use non-interactive flags (-y, --yes).',
parameters: z.object({
command: z.string().describe('Shell command to execute'),
}),
handler: async ({ command }) => {
try {
const result = await cognitora.codeInterpreter.execute({
sessionId: session.id,
code: command,
language: 'bash',
networking: true, // Often needed for package installs
});
return {
success: !result.error,
output: result.data?.outputs?.find(o => o.type === 'stdout')?.data || '',
error: result.data?.outputs?.find(o => o.type === 'stderr')?.data || '',
};
} catch (error) {
return {
success: false,
error: `Command failed: ${error.message}`,
};
}
},
});
Step 4: Create the Agent
Now we'll assemble our agent with a carefully crafted system prompt:
import { createAgent, createNetwork, anthropic } from '@inngest/agent-kit';
const codingAgent = createAgent({
name: 'Cognitora Coding Agent',
description: 'An autonomous coding agent powered by Cognitora sandboxes',
system: `You are an expert coding agent that helps users build, debug, and improve software projects.
You have access to a secure, isolated Cognitora sandbox where you can:
- Create and modify files
- Execute code (Python, JavaScript/Node.js, Bash)
- Run terminal commands (npm, pip, git, etc.)
IMPORTANT GUIDELINES:
1. Always think step-by-step before acting
2. When running terminal commands, use non-interactive flags (-y, --yes, --no-input)
3. Test your code after writing it to ensure it works
4. If you encounter errors, analyze them carefully and fix the issues
5. Create comprehensive solutions, not just partial implementations
When you complete the task, respond with:
<task_complete>
Brief summary of what was accomplished
</task_complete>`,
model: anthropic({
model: 'claude-3-5-sonnet-latest',
max_tokens: 8192,
}),
tools: [
createOrUpdateFilesTool,
readFilesTool,
executeCodeTool,
runTerminalTool,
],
});
Step 5: Add Autonomous Loop
We'll use AgentKit's Network to enable autonomous iteration:
const network = createNetwork({
name: 'coding-agent-network',
agents: [codingAgent],
maxIter: 15, // Maximum iterations before stopping
// Router determines when the agent should stop
defaultRouter: ({ network }) => {
const isComplete = network?.state.kv.has('task_complete');
if (isComplete) {
console.log('✅ Task completed!');
return undefined; // Stop iteration
}
return codingAgent; // Continue with agent
},
});
// Add lifecycle hook to detect completion
codingAgent.lifecycle = {
onResponse: async ({ result, network }) => {
const lastMessage = result.content[result.content.length - 1];
if (lastMessage?.type === 'text' && lastMessage.text.includes('<task_complete>')) {
// Extract summary
const summary = lastMessage.text.match(/<task_complete>([\s\S]*?)<\/task_complete>/)?.[1]?.trim();
network?.state.kv.set('task_complete', summary);
}
return result;
},
};
Step 6: Run the Agent
async function main() {
const userPrompt = process.argv.slice(2).join(' ') ||
'Create a React Todo app with TypeScript, add unit tests, and run them';
console.log(`🤖 Starting agent with prompt: "${userPrompt}"\n`);
try {
const result = await network.run(userPrompt);
const summary = result.state.kv.get('task_complete');
console.log('\n' + '='.repeat(60));
console.log('TASK SUMMARY');
console.log('='.repeat(60));
console.log(summary);
console.log('='.repeat(60));
// Download the generated project
console.log('\n📦 Downloading project files...');
const files = await cognitora.sessions.files.list(session.id);
for (const file of files) {
const content = await cognitora.sessions.files.read(session.id, file.path);
// Save to local filesystem
await fs.writeFile(file.path, content);
}
console.log(`✅ Downloaded ${files.length} files to current directory`);
} catch (error) {
console.error('❌ Agent failed:', error);
} finally {
// Cleanup: delete session
await cognitora.sessions.delete(session.id);
console.log('\n🧹 Session cleaned up');
}
}
main();
Demo: Building a Real Application
Let's see our agent in action with a realistic prompt:
node agent.js "Create a FastAPI REST API for a todo list with SQLite database. \
Include CRUD endpoints, data validation with Pydantic, and pytest unit tests. \
After creating, run the tests to verify everything works."
Agent Execution Flow
🤖 Starting agent with prompt...
--- Iteration 1 ---
🔧 Tool: createOrUpdateFiles
Creating: main.py, models.py, database.py, test_api.py, requirements.txt
--- Iteration 2 ---
🔧 Tool: runTerminal
Command: pip install -r requirements.txt
✅ Installed fastapi, uvicorn, sqlalchemy, pytest
--- Iteration 3 ---
🔧 Tool: executeCode (Python)
Running: python main.py (health check)
✅ Server initialization successful
--- Iteration 4 ---
🔧 Tool: runTerminal
Command: pytest test_api.py -v --cov
✅ 12 tests passed, 95% coverage
--- Iteration 5 ---
✅ Task Complete!
=============================================================
TASK SUMMARY
=============================================================
Created a production-ready FastAPI Todo application with:
✅ RESTful API with CRUD endpoints
- POST /todos - Create new todo
- GET /todos - List all todos
- GET /todos/{id} - Get specific todo
- PUT /todos/{id} - Update todo
- DELETE /todos/{id} - Delete todo
✅ SQLite database with SQLAlchemy ORM
✅ Pydantic models for request/response validation
✅ Comprehensive pytest test suite (12 tests, 95% coverage)
✅ All tests passing
Files created:
- main.py (185 lines)
- models.py (42 lines)
- database.py (28 lines)
- test_api.py (156 lines)
- requirements.txt (5 packages)
The API is ready for deployment!
=============================================================
Advanced Use Cases
1. Bug Fixing Agent
Feed existing code to the agent and ask it to fix issues:
// Upload buggy codebase
await cognitora.sessions.files.upload(session.id, {
files: [
{ path: 'app.py', content: buggyCode },
{ path: 'test_app.py', content: testCode },
],
});
// Run the agent
await network.run(`
The tests are failing. Debug the code, identify the issues,
fix them, and verify all tests pass.
`);
2. Code Refactoring
await network.run(`
Refactor this codebase to:
1. Follow PEP 8 style guidelines
2. Add type hints throughout
3. Extract duplicate code into reusable functions
4. Add docstrings to all public functions
5. Ensure all existing tests still pass
`);
3. Documentation Generator
await network.run(`
Generate comprehensive documentation:
1. Add inline comments to complex logic
2. Create a README.md with setup instructions and API documentation
3. Generate API documentation using Swagger/OpenAPI
4. Add usage examples for each endpoint
`);
Best Practices
1. Tool Design
✅ DO:
- Create focused, single-purpose tools
- Provide clear, descriptive tool names and descriptions
- Return both success and error information to the agent
- Use Zod schemas for parameter validation
❌ DON'T:
- Create overly generic tools (e.g., a single "do anything" tool)
- Swallow errors - always return them to the agent
- Mix multiple unrelated operations in one tool
2. Prompt Engineering
✅ DO:
- Provide clear, step-by-step instructions
- Include examples of expected output format
- Set explicit completion criteria
- Remind the agent about non-interactive terminals
❌ DON'T:
- Use vague instructions like "make it better"
- Forget to mention important constraints
- Assume the agent knows your preferences
3. Error Handling
// Good: Let the agent learn from errors
handler: async ({ code }) => {
try {
const result = await cognitora.execute({ code });
return {
success: true,
output: result.stdout,
};
} catch (error) {
return {
success: false,
error: error.message, // Return to agent for correction
};
}
}
4. Resource Management
// Clean up sessions when done
try {
await network.run(prompt);
} finally {
await cognitora.sessions.delete(session.id);
console.log('Session cleaned up');
}
Comparison: Cognitora vs. Alternatives
Feature | Cognitora | E2B | Local Execution |
---|---|---|---|
Isolation | Firecracker VMs | Containers | None |
Cold Start | <100ms (pooled) | ~500ms | 0ms |
Multi-tenancy | Complete isolation | Shared resources | N/A |
Security | Enterprise-grade | Good | Poor |
Network Control | Per-execution | Per-sandbox | System-wide |
File Persistence | Session-based | Sandbox-based | Filesystem |
Languages | Python, Node.js, Bash | Multiple | Any |
Pricing | Pay-per-execution | Subscription | Free |
Production Considerations
1. Rate Limiting
import { RateLimiter } from 'limiter';
const limiter = new RateLimiter({
tokensPerInterval: 10,
interval: 'minute',
});
// Before each agent run
await limiter.removeTokens(1);
2. Timeout Configuration
const result = await cognitora.codeInterpreter.execute({
code,
language,
timeout: 30000, // 30 second maximum
});
3. Cost Tracking
let totalCredits = 0;
const executeCodeTool = createTool({
// ... tool definition
handler: async ({ code, language }) => {
const result = await cognitora.execute({ code, language });
// Track usage
totalCredits += result.data?.creditsUsed || 0;
console.log(`Credits used: ${result.data?.creditsUsed}, Total: ${totalCredits}`);
return result;
},
});
4. Logging and Monitoring
import winston from 'winston';
const logger = winston.createLogger({
level: 'info',
format: winston.format.json(),
transports: [
new winston.transports.File({ filename: 'agent.log' }),
],
});
// Log all tool calls
codingAgent.lifecycle = {
onToolCall: async ({ tool, parameters }) => {
logger.info('Tool called', {
tool: tool.name,
parameters,
timestamp: new Date().toISOString(),
});
},
};
Next Steps
Now that you've built a basic coding agent, here are some ways to extend it:
1. Add MCP Server Support
Integrate with Anthropic's Model Context Protocol to give your agent access to external tools:
const codingAgent = createAgent({
// ... existing config
mcpServers: [
{
name: 'github',
transport: {
type: 'sse',
url: 'http://localhost:3000/mcp/github',
},
},
],
});
2. Multi-Agent Collaboration
Create specialized agents that work together:
const backendAgent = createAgent({ /* backend specialist */ });
const frontendAgent = createAgent({ /* frontend specialist */ });
const testingAgent = createAgent({ /* testing specialist */ });
const network = createNetwork({
agents: [backendAgent, frontendAgent, testingAgent],
router: intelligentRouter, // Route tasks to appropriate agent
});
3. Human-in-the-Loop
Add approval steps for critical operations:
const deployTool = createTool({
name: 'deploy',
handler: async ({ target }) => {
// Request human approval
const approved = await requestApproval(`Deploy to ${target}?`);
if (!approved) {
return { success: false, error: 'Deployment cancelled by user' };
}
// Proceed with deployment
return await deployApplication(target);
},
});
Conclusion
We've built a production-ready autonomous coding agent that can generate, test, and debug complete applications using Cognitora's secure execution sandboxes. The combination of:
- Cognitora's isolation and security - Enterprise-grade sandboxing
- AgentKit's orchestration - Autonomous decision-making
- Proper tool design - Focused, effective agent capabilities
...creates a powerful foundation for AI-assisted software development.
Resources
Ready to build your own coding agents? Start with Cognitora's free tier.