Building Production-Ready Coding Agents with Cognitora

The rise of AI coding assistants like Cursor, GitHub Copilot, and Claude has transformed software development. But what if your AI agents could go beyond code suggestions? What if they could autonomously write, test, debug, and deploy complete applications in secure, isolated environments?

In this tutorial, we'll build a production-ready autonomous coding agent that uses Cognitora's secure code execution sandboxes to safely run and test code while maintaining enterprise-grade isolation and security.

Why Cognitora for Coding Agents?

Before diving into the implementation, let's understand why Cognitora is ideal for building coding agents:

Security First

Complete Isolation: Every execution runs in a dedicated Firecracker VM
No Cross-Contamination: Agent experiments can't affect other users or the host system
Resource Limits: Built-in CPU, memory, and timeout controls

Production Ready

Sub-Second Cold Starts: Session pooling enables <100ms response times
Multi-Language Support: Python, Node.js, and Bash out of the box
Network Control: Granular control over internet access per execution

Developer Experience

Simple API: RESTful API with official Python and JavaScript SDKs
File Persistence: Upload files, persist state across executions
Real-Time Output: Stream stdout/stderr as code executes

What We'll Build

Our autonomous coding agent will have three core capabilities:

🏗️ Project Scaffolding - Generate complete working applications from natural language prompts
🐛 Bug Fixing & Refactoring - Analyze existing code, identify issues, and apply fixes
✅ Testing & Validation - Run tests, check linting, and verify functionality

Architecture Overview

Loading diagram...

Implementation: Building the Agent

We'll build our agent using AgentKit (for orchestration) and Cognitora (for code execution). You can adapt this to LangChain, CrewAI, or any other framework.

Step 1: Setting Up the Environment

First, install the required dependencies:

bash
Copy
npm install @inngest/agent-kit @cognitora/sdk zod
# or
pip install agentkit cognitora anthropic

Set your API keys:

bash
Copy
export ANTHROPIC_API_KEY="your-anthropic-key"
export COGNITORA_API_KEY="your-cognitora-key"

Step 2: Initialize Cognitora Client

typescript
Copy
import { Cognitora } from '@cognitora/sdk';

const cognitora = new Cognitora({
  apiKey: process.env.COGNITORA_API_KEY,
});

// Create a persistent session for our agent's workspace
const session = await cognitora.sessions.create({
  language: 'bash',
  name: 'coding-agent-workspace',
});

console.log(`Session created: ${session.id}`);

The session acts as our agent's persistent workspace, maintaining file state across multiple executions.

Step 3: Design the Agent Tools

Following best practices from Anthropic, we'll create focused, purpose-driven tools rather than generic ones.

Tool 1: Create or Update Files

typescript
Copy
import { createTool } from '@inngest/agent-kit';
import { z } from 'zod';

const createOrUpdateFilesTool = createTool({
  name: 'createOrUpdateFiles',
  description: 'Create or update multiple files in the workspace. Use this for writing code files, configuration, tests, etc.',
  parameters: z.object({
    files: z.array(
      z.object({
        path: z.string().describe('Relative file path (e.g., "src/app.ts")'),
        content: z.string().describe('Complete file content'),
      })
    ),
  }),
  handler: async ({ files }) => {
    try {
      // Write files to Cognitora session
      await Promise.all(
        files.map(async (file) => {
          await cognitora.sessions.files.write(session.id, {
            path: file.path,
            content: file.content,
          });
        })
      );

      return {
        success: true,
        message: `Successfully created/updated ${files.length} file(s): ${files.map(f => f.path).join(', ')}`,
      };
    } catch (error) {
      return {
        success: false,
        error: `Failed to write files: ${error.message}`,
      };
    }
  },
});

Tool 2: Read Files

typescript
Copy
const readFilesTool = createTool({
  name: 'readFiles',
  description: 'Read contents of one or more files from the workspace',
  parameters: z.object({
    paths: z.array(z.string()).describe('Array of file paths to read'),
  }),
  handler: async ({ paths }) => {
    try {
      const contents = await Promise.all(
        paths.map(async (path) => {
          const content = await cognitora.sessions.files.read(session.id, path);
          return { path, content };
        })
      );

      return {
        success: true,
        files: contents,
      };
    } catch (error) {
      return {
        success: false,
        error: `Failed to read files: ${error.message}`,
      };
    }
  },
});

Tool 3: Execute Code

typescript
Copy
const executeCodeTool = createTool({
  name: 'executeCode',
  description: 'Execute code in the workspace (Python, Node.js, or Bash). Use for running the application, checking if code works, or quick tests.',
  parameters: z.object({
    language: z.enum(['python', 'javascript', 'bash']).describe('Programming language'),
    code: z.string().describe('Code to execute'),
    networking: z.boolean().optional().describe('Enable network access (default: false)'),
  }),
  handler: async ({ language, code, networking = false }) => {
    try {
      const result = await cognitora.codeInterpreter.execute({
        sessionId: session.id,
        code,
        language,
        networking,
      });

      return {
        success: !result.error,
        stdout: result.data?.outputs?.find(o => o.type === 'stdout')?.data || '',
        stderr: result.data?.outputs?.find(o => o.type === 'stderr')?.data || '',
        error: result.error,
        executionTime: result.data?.executionTime,
      };
    } catch (error) {
      return {
        success: false,
        error: `Execution failed: ${error.message}`,
      };
    }
  },
});

Tool 4: Run Terminal Commands

typescript
Copy
const runTerminalTool = createTool({
  name: 'runTerminal',
  description: 'Run terminal commands in the workspace. Use for npm install, running tests, linting, git operations, etc. Remember to use non-interactive flags (-y, --yes).',
  parameters: z.object({
    command: z.string().describe('Shell command to execute'),
  }),
  handler: async ({ command }) => {
    try {
      const result = await cognitora.codeInterpreter.execute({
        sessionId: session.id,
        code: command,
        language: 'bash',
        networking: true, // Often needed for package installs
      });

      return {
        success: !result.error,
        output: result.data?.outputs?.find(o => o.type === 'stdout')?.data || '',
        error: result.data?.outputs?.find(o => o.type === 'stderr')?.data || '',
      };
    } catch (error) {
      return {
        success: false,
        error: `Command failed: ${error.message}`,
      };
    }
  },
});

Step 4: Create the Agent

Now we'll assemble our agent with a carefully crafted system prompt:

typescript
Copy
import { createAgent, createNetwork, anthropic } from '@inngest/agent-kit';

const codingAgent = createAgent({
  name: 'Cognitora Coding Agent',
  description: 'An autonomous coding agent powered by Cognitora sandboxes',
  
  system: `You are an expert coding agent that helps users build, debug, and improve software projects.

You have access to a secure, isolated Cognitora sandbox where you can:
- Create and modify files
- Execute code (Python, JavaScript/Node.js, Bash)
- Run terminal commands (npm, pip, git, etc.)

IMPORTANT GUIDELINES:
1. Always think step-by-step before acting
2. When running terminal commands, use non-interactive flags (-y, --yes, --no-input)
3. Test your code after writing it to ensure it works
4. If you encounter errors, analyze them carefully and fix the issues
5. Create comprehensive solutions, not just partial implementations

When you complete the task, respond with:
<task_complete>
Brief summary of what was accomplished
</task_complete>`,

  model: anthropic({
    model: 'claude-3-5-sonnet-latest',
    max_tokens: 8192,
  }),

  tools: [
    createOrUpdateFilesTool,
    readFilesTool,
    executeCodeTool,
    runTerminalTool,
  ],
});

Step 5: Add Autonomous Loop

We'll use AgentKit's Network to enable autonomous iteration:

typescript
Copy
const network = createNetwork({
  name: 'coding-agent-network',
  agents: [codingAgent],
  maxIter: 15, // Maximum iterations before stopping

  // Router determines when the agent should stop
  defaultRouter: ({ network }) => {
    const isComplete = network?.state.kv.has('task_complete');
    
    if (isComplete) {
      console.log('✅ Task completed!');
      return undefined; // Stop iteration
    }

    return codingAgent; // Continue with agent
  },
});

// Add lifecycle hook to detect completion
codingAgent.lifecycle = {
  onResponse: async ({ result, network }) => {
    const lastMessage = result.content[result.content.length - 1];
    
    if (lastMessage?.type === 'text' && lastMessage.text.includes('<task_complete>')) {
      // Extract summary
      const summary = lastMessage.text.match(/<task_complete>([\s\S]*?)<\/task_complete>/)?.[1]?.trim();
      network?.state.kv.set('task_complete', summary);
    }

    return result;
  },
};

Step 6: Run the Agent

typescript
Copy
async function main() {
  const userPrompt = process.argv.slice(2).join(' ') || 
    'Create a React Todo app with TypeScript, add unit tests, and run them';

  console.log(`🤖 Starting agent with prompt: "${userPrompt}"\n`);

  try {
    const result = await network.run(userPrompt);
    const summary = result.state.kv.get('task_complete');

    console.log('\n' + '='.repeat(60));
    console.log('TASK SUMMARY');
    console.log('='.repeat(60));
    console.log(summary);
    console.log('='.repeat(60));

    // Download the generated project
    console.log('\n📦 Downloading project files...');
    const files = await cognitora.sessions.files.list(session.id);
    
    for (const file of files) {
      const content = await cognitora.sessions.files.read(session.id, file.path);
      // Save to local filesystem
      await fs.writeFile(file.path, content);
    }

    console.log(`✅ Downloaded ${files.length} files to current directory`);
  } catch (error) {
    console.error('❌ Agent failed:', error);
  } finally {
    // Cleanup: delete session
    await cognitora.sessions.delete(session.id);
    console.log('\n🧹 Session cleaned up');
  }
}

main();

Demo: Building a Real Application

Let's see our agent in action with a realistic prompt:

bash
Copy
node agent.js "Create a FastAPI REST API for a todo list with SQLite database. \
Include CRUD endpoints, data validation with Pydantic, and pytest unit tests. \
After creating, run the tests to verify everything works."

Agent Execution Flow

text
Copy
🤖 Starting agent with prompt...

--- Iteration 1 ---
🔧 Tool: createOrUpdateFiles
   Creating: main.py, models.py, database.py, test_api.py, requirements.txt

--- Iteration 2 ---
🔧 Tool: runTerminal
   Command: pip install -r requirements.txt
   ✅ Installed fastapi, uvicorn, sqlalchemy, pytest

--- Iteration 3 ---
🔧 Tool: executeCode (Python)
   Running: python main.py (health check)
   ✅ Server initialization successful

--- Iteration 4 ---
🔧 Tool: runTerminal
   Command: pytest test_api.py -v --cov
   ✅ 12 tests passed, 95% coverage

--- Iteration 5 ---
✅ Task Complete!

=============================================================
TASK SUMMARY
=============================================================
Created a production-ready FastAPI Todo application with:

✅ RESTful API with CRUD endpoints
  - POST /todos - Create new todo
  - GET /todos - List all todos
  - GET /todos/{id} - Get specific todo
  - PUT /todos/{id} - Update todo
  - DELETE /todos/{id} - Delete todo

✅ SQLite database with SQLAlchemy ORM
✅ Pydantic models for request/response validation
✅ Comprehensive pytest test suite (12 tests, 95% coverage)
✅ All tests passing

Files created:
- main.py (185 lines)
- models.py (42 lines)
- database.py (28 lines)
- test_api.py (156 lines)
- requirements.txt (5 packages)

The API is ready for deployment!
=============================================================

Advanced Use Cases

1. Bug Fixing Agent

Feed existing code to the agent and ask it to fix issues:

typescript
Copy
// Upload buggy codebase
await cognitora.sessions.files.upload(session.id, {
  files: [
    { path: 'app.py', content: buggyCode },
    { path: 'test_app.py', content: testCode },
  ],
});

// Run the agent
await network.run(`
  The tests are failing. Debug the code, identify the issues, 
  fix them, and verify all tests pass.
`);

2. Code Refactoring

typescript
Copy
await network.run(`
  Refactor this codebase to:
  1. Follow PEP 8 style guidelines
  2. Add type hints throughout
  3. Extract duplicate code into reusable functions
  4. Add docstrings to all public functions
  5. Ensure all existing tests still pass
`);

3. Documentation Generator

typescript
Copy
await network.run(`
  Generate comprehensive documentation:
  1. Add inline comments to complex logic
  2. Create a README.md with setup instructions and API documentation
  3. Generate API documentation using Swagger/OpenAPI
  4. Add usage examples for each endpoint
`);

Best Practices

1. Tool Design

✅ DO:

Create focused, single-purpose tools
Provide clear, descriptive tool names and descriptions
Return both success and error information to the agent
Use Zod schemas for parameter validation

❌ DON'T:

Create overly generic tools (e.g., a single "do anything" tool)
Swallow errors - always return them to the agent
Mix multiple unrelated operations in one tool

2. Prompt Engineering

✅ DO:

Provide clear, step-by-step instructions
Include examples of expected output format
Set explicit completion criteria
Remind the agent about non-interactive terminals

❌ DON'T:

Use vague instructions like "make it better"
Forget to mention important constraints
Assume the agent knows your preferences

3. Error Handling

typescript
Copy
// Good: Let the agent learn from errors
handler: async ({ code }) => {
  try {
    const result = await cognitora.execute({ code });
    return {
      success: true,
      output: result.stdout,
    };
  } catch (error) {
    return {
      success: false,
      error: error.message, // Return to agent for correction
    };
  }
}

4. Resource Management

typescript
Copy
// Clean up sessions when done
try {
  await network.run(prompt);
} finally {
  await cognitora.sessions.delete(session.id);
  console.log('Session cleaned up');
}

Comparison: Cognitora vs. Alternatives

Feature	Cognitora	E2B	Local Execution
Isolation	Firecracker VMs	Containers	None
Cold Start	<100ms (pooled)	~500ms	0ms
Multi-tenancy	Complete isolation	Shared resources	N/A
Security	Enterprise-grade	Good	Poor
Network Control	Per-execution	Per-sandbox	System-wide
File Persistence	Session-based	Sandbox-based	Filesystem
Languages	Python, Node.js, Bash	Multiple	Any
Pricing	Pay-per-execution	Subscription	Free

Production Considerations

1. Rate Limiting

typescript
Copy
import { RateLimiter } from 'limiter';

const limiter = new RateLimiter({
  tokensPerInterval: 10,
  interval: 'minute',
});

// Before each agent run
await limiter.removeTokens(1);

2. Timeout Configuration

typescript
Copy
const result = await cognitora.codeInterpreter.execute({
  code,
  language,
  timeout: 30000, // 30 second maximum
});

3. Cost Tracking

typescript
Copy
let totalCredits = 0;

const executeCodeTool = createTool({
  // ... tool definition
  handler: async ({ code, language }) => {
    const result = await cognitora.execute({ code, language });
    
    // Track usage
    totalCredits += result.data?.creditsUsed || 0;
    console.log(`Credits used: ${result.data?.creditsUsed}, Total: ${totalCredits}`);
    
    return result;
  },
});

4. Logging and Monitoring

typescript
Copy
import winston from 'winston';

const logger = winston.createLogger({
  level: 'info',
  format: winston.format.json(),
  transports: [
    new winston.transports.File({ filename: 'agent.log' }),
  ],
});

// Log all tool calls
codingAgent.lifecycle = {
  onToolCall: async ({ tool, parameters }) => {
    logger.info('Tool called', {
      tool: tool.name,
      parameters,
      timestamp: new Date().toISOString(),
    });
  },
};

Next Steps

Now that you've built a basic coding agent, here are some ways to extend it:

1. Add MCP Server Support

Integrate with Anthropic's Model Context Protocol to give your agent access to external tools:

typescript
Copy
const codingAgent = createAgent({
  // ... existing config
  mcpServers: [
    {
      name: 'github',
      transport: {
        type: 'sse',
        url: 'http://localhost:3000/mcp/github',
      },
    },
  ],
});

2. Multi-Agent Collaboration

Create specialized agents that work together:

typescript
Copy
const backendAgent = createAgent({ /* backend specialist */ });
const frontendAgent = createAgent({ /* frontend specialist */ });
const testingAgent = createAgent({ /* testing specialist */ });

const network = createNetwork({
  agents: [backendAgent, frontendAgent, testingAgent],
  router: intelligentRouter, // Route tasks to appropriate agent
});

3. Human-in-the-Loop

Add approval steps for critical operations:

typescript
Copy
const deployTool = createTool({
  name: 'deploy',
  handler: async ({ target }) => {
    // Request human approval
    const approved = await requestApproval(`Deploy to ${target}?`);
    
    if (!approved) {
      return { success: false, error: 'Deployment cancelled by user' };
    }
    
    // Proceed with deployment
    return await deployApplication(target);
  },
});

Conclusion

We've built a production-ready autonomous coding agent that can generate, test, and debug complete applications using Cognitora's secure execution sandboxes. The combination of:

Cognitora's isolation and security - Enterprise-grade sandboxing
AgentKit's orchestration - Autonomous decision-making
Proper tool design - Focused, effective agent capabilities

...creates a powerful foundation for AI-assisted software development.

Resources

Ready to build your own coding agents? Start with Cognitora's free tier.