Inside Cognitora: Architecture of an Enterprise Code Execution Platform

Building a production-grade code execution platform that's both secure and fast is no small feat. At Cognitora, we've architected a system that delivers sub-second cold starts, complete workload isolation, and horizontal scalability—all while maintaining enterprise-level security.

This article takes you inside our architecture, explaining the technical decisions, trade-offs, and infrastructure patterns that power Cognitora.

Architecture Overview
Infrastructure Layer
Orchestration with Nomad
API Services
Runtime Images & Execution
Networking & Security
Data Layer
Client SDKs
Scaling Strategy
Observability & Monitoring
Future Architecture

Architecture Overview

Cognitora is built on a microservices architecture running on Google Cloud Platform (GCP), with HashiCorp Nomad as the workload orchestrator. The platform handles two primary workload types:

Code Interpreter - Stateful, session-based code execution (Bash, Python, Node.js/JavaScript)
Containers - Custom Docker images with full resource control

High-Level Architecture

Loading diagram...

Infrastructure Layer

Google Cloud Platform Foundation

Our infrastructure is defined entirely in Terraform, ensuring reproducibility and version control. Key GCP components:

Virtual Private Cloud (VPC)

hcl
Copy
# Custom VPC with private subnets
resource "google_compute_network" "vpc" {
  name                    = "cognitora-network"
  auto_create_subnetworks = false
}

# Worker subnet (private) - 10.2.0.0/16
resource "google_compute_subnetwork" "worker_subnet" {
  name                     = "cognitora-worker-subnet"
  ip_cidr_range            = "10.2.0.0/16"
  region                   = var.region
  network                  = google_compute_network.vpc.id
  private_ip_google_access = true
}

Design Decision: We use private IP ranges with Cloud NAT for outbound internet access. This ensures:

Worker nodes never expose public IPs
All ingress traffic flows through load balancers
Complete network isolation between tenants

Cloud NAT Gateway

For workloads that need outbound internet (API calls, package downloads), we use Cloud NAT:

hcl
Copy
resource "google_compute_router_nat" "vpc_nat" {
  name                               = "cognitora-nat"
  router                             = google_compute_router.vpc_router.name
  region                             = var.region
  nat_ip_allocate_option             = "AUTO_ONLY"
  source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"
}

Why This Matters: Users can opt-in to networking for their executions. When enabled, traffic flows through NAT with controlled egress—no direct internet exposure for execution sandboxes.

Firewall Architecture

Security is enforced at the network level with defense-in-depth:

Default Deny - All traffic blocked by default
Explicit Allow - Only specific ports/protocols opened
Tag-Based Rules - Firewalls target specific instance groups
Internal-Only - Most services communicate via private IPs

hcl
Copy
# Example: Internal Nomad communication
resource "google_compute_firewall" "nomad_internal_communication" {
  name        = "cognitora-nomad-internal"
  network     = google_compute_network.vpc.name
  source_tags = ["nomad-cluster"]
  target_tags = ["nomad-cluster"]
  
  allow {
    protocol = "tcp"
    ports    = ["0-65535"]  # Full internal trust within cluster
  }
}

Orchestration with Nomad

Why Nomad Over Kubernetes?

We chose HashiCorp Nomad over Kubernetes for several reasons:

Advantages:

✅ Simplicity - Single 30MB binary vs K8s complexity
✅ Lower Overhead - Runs on smaller instances efficiently
✅ Fast Scheduling - Sub-second job placement
✅ Multi-Workload - Containers, VMs, binaries in one system
✅ Cost - Significantly lower operational overhead

Trade-offs:

❌ Smaller ecosystem compared to K8s
❌ Less third-party integrations

For our use case (short-lived, isolated workloads), Nomad's simplicity and speed win.

Nomad Cluster Architecture

Loading diagram...

Server Nodes (3):

Run Raft consensus for state management
Schedule jobs across client nodes
Handle API requests from Public API service
Automatically fail over if leader dies

Client Nodes (Auto-scaled):

Execute workloads in isolated containers
Report resource availability to servers
Auto-scale based on pending job queue
Drain and terminate when idle (cost optimization)

Job Specification Example

Here's how a code execution job looks in Nomad:

hcl
Copy
job "code-execution" {
  datacenters = ["dc1"]
  type        = "batch"
  
  group "interpreter" {
    count = 1
    
    # Restart policy for transient failures
    restart {
      attempts = 2
      delay    = "15s"
      mode     = "fail"
    }
    
    task "execute" {
      driver = "docker"
      
      config {
        image = "cognitora-runtime:python3.11"
        
        # User code injected here
        args = [
          "python3", "-c",
          "${user_code}"
        ]
        
        # Resource limits
        cpu_hard_limit = true
        memory_hard_limit = 512
        
        # Networking control
        network_mode = "${networking_enabled ? "bridge" : "none"}"
      }
      
      resources {
        cpu    = 1000  # 1 CPU core
        memory = 512   # 512 MB
      }
      
      # Timeout enforcement
      kill_timeout = "30s"
    }
  }
}

Key Features:

Resource Isolation - CPU/memory hard limits enforced
Network Control - Enable/disable per-job
Time Limits - Automatic termination after timeout
Restart Policy - Handle transient failures

API Services

Public API (Go)

The Public API is our main user-facing service, written in Go for performance and concurrency.

Architecture:

go
Copy
// Simplified service structure
type PublicAPI struct {
    nomadClient  *nomad.Client
    redisCache   *redis.Client
    supabaseDB   *supabase.Client
    sessionPool  *SessionPool
}

// Request flow
func (api *PublicAPI) ExecuteCode(req ExecuteRequest) (*ExecutionResult, error) {
    // 1. Authentication & Authorization
    user, err := api.authenticateAPIKey(req.APIKey)
    if err != nil {
        return nil, ErrUnauthorized
    }
    
    // 2. Cost Estimation
    cost := api.calculateCost(req.Resources)
    if user.Credits < cost {
        return nil, ErrInsufficientCredits
    }
    
    // 3. Session Management
    session := api.sessionPool.GetOrCreate(req.SessionID, req.Language)
    
    // 4. Job Submission to Nomad
    job := api.buildNomadJob(req, session)
    allocation, err := api.nomadClient.SubmitJob(job)
    if err != nil {
        return nil, err
    }
    
    // 5. Result Polling & Streaming
    result := api.waitForResult(allocation.ID)
    
    // 6. Deduct Credits
    api.deductCredits(user.ID, cost)
    
    return result, nil
}

Key Responsibilities:

Authentication - Validate API keys against Supabase
Rate Limiting - Redis-based rate limiting per account
Cost Calculation - Compute credits based on resources
Job Orchestration - Submit jobs to Nomad, track status
Session Pooling - Reuse warm sessions for performance
Billing Integration - Track usage, deduct credits

Performance Optimizations:

Connection Pooling - Reuse Nomad/Redis connections
Request Coalescing - Batch similar requests
Caching - Cache user data, API key validation results
Async Processing - Background jobs for non-critical paths

Why Go?

✅ Zero Management - Automatic scaling, health checks
✅ Pay-Per-Request - Efficient cost model
✅ Global Edge - Low latency worldwide
✅ Auto-Scaling - Instant scaling to handle traffic spikes

Web Application (Next.js)

Our user dashboard is a Next.js 15 application, deployed on a managed platform:

Features:

Authentication - Supabase Auth integration
Dashboard - Real-time execution monitoring
API Key Management - Generate, rotate, revoke keys
Billing - Usage tracking, Stripe integration
Analytics - Execution history, cost breakdown

Tech Stack:

Next.js 15 - React 19, Server Components
Supabase - PostgreSQL with Row-Level Security
Tailwind CSS - Modern, responsive UI
Stripe - Payment processing
Google Analytics - User behavior tracking

Runtime Images & Execution

Custom Runtime Images

We maintain several custom Docker images for different use cases:

1. Code Interpreter Runtime

dockerfile
Copy
FROM python:3.11-slim

# Install common packages
RUN pip install --no-cache-dir \
    pandas numpy scipy scikit-learn \
    requests beautifulsoup4 matplotlib \
    sqlalchemy psycopg2-binary

# Security: Non-root user
RUN useradd -m -u 1001 coderunner
USER coderunner

# Disable networking by default (enabled per-request)
ENV NETWORK_ENABLED=false

CMD ["python3"]

Optimizations:

Layer Caching - Common layers shared across runs
Minimal Base - Alpine/slim variants for fast pulls
Pre-warmed - Common packages pre-installed
Multi-Language - Python, Node.js, Go, Rust, R variants

2. Code Server Runtime (Interactive IDE)

For interactive coding experiences, we provide a VS Code in the browser:

dockerfile
Copy
FROM codercom/code-server:latest

# Cognitora extensions
COPY extensions/ /extensions/
RUN code-server --install-extension /extensions/cognitora-runner.vsix

# Workspace setup
COPY workspace/ /home/coder/workspace/

EXPOSE 8080
CMD ["code-server", "--bind-addr", "0.0.0.0:8080"]

Use Cases:

Persistent development environments
Team collaboration spaces
Educational platforms (coding bootcamps)

3. Agent Runtime (AI Agents)

For AI agent workloads (like our OpenAI Agents SDK integration):

dockerfile
Copy
FROM python:3.11-slim

# Agent dependencies
RUN pip install --no-cache-dir \
    openai anthropic \
    cognitora \
    agents-sdk

# Agent tools
COPY agent_tools/ /app/tools/

CMD ["python3", "-m", "agents"]

Execution Flow

Loading diagram...

Timing Breakdown:

Image Pull: ~200ms (cached) / ~2s (cold)
Container Start: ~150ms
Code Execution: Variable (user code)
Result Collection: ~50ms
Total Overhead: ~400-500ms

Networking & Security

Multi-Layer Security Model

Loading diagram...

Networking Control

Users have granular control over networking:

Code Interpreter:

python
Copy
# Networking ENABLED (default for interpreter)
result = client.code_interpreter.execute(
    code="import requests; print(requests.get('https://api.github.com').status_code)",
    language="python",
    networking=True  # Can make external API calls
)

Containers:

python
Copy
# Networking DISABLED (default for containers)
execution = client.containers.create_container(
    image="python:3.11-slim",
    command=["python", "-c", "print('Hello')"],
    networking=False  # Completely isolated
)

Security Rationale:

Code Interpreter: Default networking ON (common use case: data fetching)
Containers: Default networking OFF (principle of least privilege)

Reverse Proxy for Container Access

The Reverse Proxy is a separate service that provides external access to internal container services (like web apps running in containers) via friendly subdomain URLs.

How It Works:

text
Copy
Container Service (10.2.0.24:25001)
          ↓
Generate Token: "green-dew-15389ucymd"
          ↓
Public URL: https://green-dew-15389ucymd.cgn.my
          ↓
User accesses container via friendly URL

Key Features:

Token-Based Routing - Encodes IP:Port into subdomain tokens
- Example: 10.2.0.24:25001 → green-dew-15389ucymd.cgn.my
Zero-Latency Lookups - No database queries required
Port Security - Only allows Nomad port range (20000-32000)
Private Network Only - Routes only to internal VPC addresses
Heroku-Style URLs - Human-readable adjective-noun-token format

Use Case Example:

python
Copy
# User deploys a web server container
container = client.containers.create_container(
    image="nginx:latest",
    port_mapping={8080: "http"},  # Expose port 8080
)

# Platform generates friendly URL
# Container internal: http://10.2.0.24:25001
# Public URL: https://green-dew-15389ucymd.cgn.my

Architecture Integration:

text
Copy
Internet → Load Balancer → Reverse Proxy → VPC Connector → Container (10.2.0.24:25001)

This service is completely separate from the Public API and is specifically designed for exposing containerized web services securely.

Secret Management

Sensitive credentials never touch our code:

hcl
Copy
# Example: Supabase service key
resource "google_secret_manager_secret" "supabase_key" {
  secret_id = "supabase-service-role-key"
  
  replication {
    automatic = true
  }
}

# Accessed via environment variables
env {
  name = "SUPABASE_KEY"
  value_from {
    secret_key_ref {
      name = "supabase-service-role-key"
      key  = "latest"
    }
  }
}

Best Practices:

✅ Secrets in Google Secret Manager
✅ Automatic rotation where possible
✅ Audit logs for secret access
✅ Never logged or exposed in responses

Data Layer

Supabase (PostgreSQL)

We use Supabase as our primary database for:

User Accounts - Authentication, profiles, settings
API Keys - Key management with Row-Level Security (RLS)
Execution History - Past executions, logs, results
Billing Data - Credits, subscriptions, transactions
Usage Analytics - Aggregated metrics per account

Why Supabase?

✅ PostgreSQL - Full SQL power
✅ Row-Level Security - Database-enforced multi-tenancy
✅ Real-time Subscriptions - Live dashboard updates
✅ Built-in Auth - OAuth, magic links, etc.
✅ Managed - Automatic backups, high availability

Schema Example:

sql
Copy
-- API Keys table with RLS
CREATE TABLE api_keys (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    account_id UUID REFERENCES accounts(id),
    key_hash TEXT NOT NULL,
    name TEXT,
    permissions TEXT[],
    created_at TIMESTAMPTZ DEFAULT now(),
    last_used_at TIMESTAMPTZ,
    expires_at TIMESTAMPTZ
);

-- RLS Policy: Users can only see their own keys
ALTER TABLE api_keys ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can view own keys"
    ON api_keys FOR SELECT
    USING (account_id = auth.uid());

Redis Cache (Memorystore)

Redis handles high-velocity, ephemeral data:

Use Cases:

Session Pooling - Prewarmed session state
Rate Limiting - Per-user request counters
API Key Cache - Avoid DB hits on every request
Job Queue - Background task processing
Real-time Metrics - Execution counts, uptime

Configuration:

hcl
Copy
resource "google_redis_instance" "cognitora_redis" {
  name           = "cognitora-redis"
  tier           = "BASIC"
  memory_size_gb = 1
  region         = var.region
  redis_version  = "REDIS_7_0"
  
  redis_configs = {
    maxmemory-policy = "allkeys-lru"  # Evict least-recently-used
    timeout          = "300"           # Close idle connections
  }
}

Performance:

Sub-millisecond latency - Single-digit ms reads/writes
High throughput - 10,000+ ops/sec on BASIC tier
Automatic persistence - RDB snapshots + AOF logs

Cloud Storage

Google Cloud Storage for:

Runtime Images - Docker image layers
Execution Logs - Long-term log retention
User Files - Uploaded files for code execution
Backups - Database and configuration backups

Lifecycle Policies:

hcl
Copy
resource "google_storage_bucket" "execution_logs" {
  name     = "cognitora-execution-logs"
  location = "US"
  
  lifecycle_rule {
    condition {
      age = 90  # Days
    }
    action {
      type = "Delete"  # Auto-delete old logs
    }
  }
}

Client SDKs

We provide first-class SDKs for Python and JavaScript/TypeScript, with identical feature parity.

SDK Architecture

Loading diagram...

Python SDK

python
Copy
# Installation
pip install cognitora

# Usage
from cognitora import Cognitora

client = Cognitora(api_key="cgk_...")

# Execute code
result = client.code_interpreter.execute(
    code="""
import pandas as pd
data = pd.DataFrame({'a': [1, 2, 3]})
print(data.describe())
    """,
    language="python",
    networking=True
)

print(result.data.outputs[0].data)

JavaScript/TypeScript SDK

typescript
Copy
// Installation
npm install @cognitora/sdk

// Usage
import { Cognitora } from '@cognitora/sdk';

const client = new Cognitora({ apiKey: 'cgk_...' });

// Execute code
const result = await client.codeInterpreter.execute({
    code: `
const data = [1, 2, 3, 4, 5];
console.log(data.reduce((a, b) => a + b, 0));
    `,
    language: 'javascript',
    networking: true
});

console.log(result.data.outputs[0].data);

SDK Features

Both SDKs provide:

✅ Type Safety - TypeScript definitions / Python type hints
✅ Error Handling - Custom exception classes
✅ Retry Logic - Automatic retries with backoff
✅ File Uploads - Multipart form data handling
✅ Async Support - Promise/async-await patterns
✅ Session Management - Stateful execution contexts
✅ Streaming - Real-time output streaming (coming soon)

Scaling Strategy

Horizontal Scaling

text
Copy
Load Increases → Auto-Scaler Adds Nomad Clients → More Capacity
Load Decreases → Auto-Scaler Drains & Removes Nodes → Cost Savings

Auto-Scaling Configuration:

hcl
Copy
# Managed Instance Group for Nomad clients
resource "google_compute_region_autoscaler" "nomad_clients" {
  name   = "nomad-client-autoscaler"
  target = google_compute_region_instance_group_manager.nomad_clients.id
  
  autoscaling_policy {
    min_replicas = 3
    max_replicas = 50
    
    cpu_utilization {
      target = 0.7  # Scale up at 70% CPU
    }
    
    scale_in_control {
      max_scaled_in_replicas {
        fixed = 5  # Remove max 5 nodes at once
      }
      time_window_sec = 300  # Wait 5min between scale-downs
    }
  }
}

Scaling Metrics:

CPU Utilization - Average across all clients
Pending Jobs - Queue depth in Nomad
Memory Pressure - Available memory per node
Active Allocations - Running containers per node

Vertical Scaling

For resource-intensive workloads, we support custom instance types:

python
Copy
# Example: High-memory workload
execution = client.containers.create_container(
    image="cognitora/ml-runtime:latest",
    command=["python", "train.py"],
    cpu_cores=8.0,
    memory_mb=32768,
    max_cost_credits=1000
)

Session Pooling

To achieve sub-second cold starts, we maintain a pool of prewarmed sessions:

Loading diagram...

Benefits:

⚡ <100ms response time - No container startup
🔥 Preloaded packages - pandas, requests, etc.
🔄 Auto-replenishment - Pool refills in background
💰 Cost optimization - Reuse instead of recreate

Observability & Monitoring

Metrics & Logging

Google Cloud Operations (formerly Stackdriver) provides:

Metrics:
- Request latency (p50, p95, p99)
- Error rates by endpoint
- Resource utilization (CPU, memory, disk)
- Cost per execution
- Active users
Logging:
- Application logs (structured JSON)
- Audit logs (who did what, when)
- Execution logs (user code output)
- Error logs with stack traces
Tracing:
- End-to-end request tracing
- Nomad job lifecycle
- Database query performance

Dashboard Example:

Loading diagram...

Alerting

Proactive monitoring catches issues before users notice:

yaml
Copy
# Example alert policy
alert:
  name: "High Error Rate"
  condition: error_rate > 1% for 5 minutes
  notification:
    - email: ops@cognitora.dev
    - slack: #alerts
    - pagerduty: on-call
  
  actions:
    - auto_scale_up: true
    - trigger_incident: true

Alert Categories:

🚨 Critical - Service down, data loss risk
⚠️ Warning - High latency, resource saturation
ℹ️ Info - Deployments, configuration changes

Performance & Efficiency

Resource Optimization

Our infrastructure is designed for maximum efficiency:

Optimization Techniques:

Preemptible VMs - Significant cost reduction on worker nodes
Committed Use Discounts - Long-term capacity planning
Idle Node Termination - Auto-remove unused workers after 10 minutes
Image Layer Caching - Reuse common base layers across executions
Session Pooling - Amortize cold start costs with prewarmed sessions
Egress Optimization - Cache external API responses
Auto-Scaling - Dynamic capacity adjustment based on real-time demand
Resource Packing - Efficient bin-packing algorithm for container placement

Performance Metrics:

Cold Start: <500ms (with caching)
Warm Start: <100ms (from session pool)
Throughput: 10,000+ requests/minute
Availability: 99.9%+ uptime

Future Architecture

Roadmap

Q2 2025:

🔲 WebSocket support for real-time streaming
🔲 Multi-region deployment (US, EU, Asia)
🔲 GPU support for ML workloads

Q3 2025:

🔲 Kubernetes option (alongside Nomad)
🔲 Spot instance support (90% cost reduction)
🔲 Custom runtime images (user-provided Dockerfiles)

Q4 2025:

🔲 Edge execution (Cloudflare Workers integration)
🔲 FaaS-style deployment (serverless containers)
🔲 Workflow orchestration (DAG-based pipelines)

Challenges Ahead

Technical Challenges:

Global Low Latency - Edge execution in <50ms worldwide
State Management - Distributed sessions across regions
Cost at Scale - Maintaining low costs as volume grows
Security - Advanced isolation (VMs, microVMs)

Business Challenges:

Compliance - SOC2, ISO 27001, HIPAA
Enterprise Features - SSO, audit logs, VPC peering
Reliability - 99.99% uptime SLA

Conclusion

Building Cognitora has been a journey in balancing security, performance, and efficiency. Our architecture choices reflect real-world trade-offs:

Nomad over Kubernetes - Simplicity and speed over ecosystem size
Serverless edge services - Managed simplicity with automatic scaling
Custom runtime images - Performance optimization for common use cases
GCP foundation - Leveraging managed services for operational efficiency

The result is a platform that delivers:

⚡ Sub-second cold starts
🔒 Enterprise-grade security
📊 99.9%+ uptime
🚀 Horizontal scalability

Want to Learn More?

🚀 Try Cognitora: cognitora.dev
📚 API Documentation: cognitora.dev/docs
🐍 Python SDK: pip install cognitora
📦 JavaScript SDK: npm install @cognitora/sdk
💬 GitHub: github.com/Cognitora

Questions? Feedback? We'd love to hear from you: hello@cognitora.dev

Built with ❤️ by the Cognitora team

Last updated: January 2025

Inside Cognitora: Architecture of an Enterprise Code Execution Platform

A deep dive into Cognitora's production architecture—from Google Cloud infrastructure to Nomad orchestration, exploring how we achieve sub-second cold starts and enterprise-grade security at scale.

Table of Contents

Architecture Overview

High-Level Architecture

Infrastructure Layer

Google Cloud Platform Foundation

Virtual Private Cloud (VPC)

Cloud NAT Gateway

Firewall Architecture

Orchestration with Nomad

Why Nomad Over Kubernetes?

Nomad Cluster Architecture

Job Specification Example

API Services

Public API (Go)

Web Application (Next.js)

Runtime Images & Execution

Custom Runtime Images

1. Code Interpreter Runtime

2. Code Server Runtime (Interactive IDE)

3. Agent Runtime (AI Agents)

Execution Flow

Networking & Security

Multi-Layer Security Model

Networking Control

Reverse Proxy for Container Access

Secret Management

Data Layer

Supabase (PostgreSQL)

Redis Cache (Memorystore)

Cloud Storage

Client SDKs

SDK Architecture

Python SDK

JavaScript/TypeScript SDK

SDK Features

Scaling Strategy

Horizontal Scaling

Vertical Scaling

Session Pooling

Observability & Monitoring

Metrics & Logging

Alerting

Performance & Efficiency

Resource Optimization

Future Architecture

Roadmap

Challenges Ahead

Conclusion

Want to Learn More?