Building a production-grade code execution platform that's both secure and fast is no small feat. At Cognitora, we've architected a system that delivers sub-second cold starts, complete workload isolation, and horizontal scalability—all while maintaining enterprise-level security.
This article takes you inside our architecture, explaining the technical decisions, trade-offs, and infrastructure patterns that power Cognitora.
Table of Contents
- Architecture Overview
- Infrastructure Layer
- Orchestration with Nomad
- API Services
- Runtime Images & Execution
- Networking & Security
- Data Layer
- Client SDKs
- Scaling Strategy
- Observability & Monitoring
- Future Architecture
Architecture Overview
Cognitora is built on a microservices architecture running on Google Cloud Platform (GCP), with HashiCorp Nomad as the workload orchestrator. The platform handles two primary workload types:
- Code Interpreter - Stateful, session-based code execution (Bash, Python, Node.js/JavaScript)
- Containers - Custom Docker images with full resource control
High-Level Architecture
Infrastructure Layer
Google Cloud Platform Foundation
Our infrastructure is defined entirely in Terraform, ensuring reproducibility and version control. Key GCP components:
Virtual Private Cloud (VPC)
# Custom VPC with private subnets
resource "google_compute_network" "vpc" {
name = "cognitora-network"
auto_create_subnetworks = false
}
# Worker subnet (private) - 10.2.0.0/16
resource "google_compute_subnetwork" "worker_subnet" {
name = "cognitora-worker-subnet"
ip_cidr_range = "10.2.0.0/16"
region = var.region
network = google_compute_network.vpc.id
private_ip_google_access = true
}
Design Decision: We use private IP ranges with Cloud NAT for outbound internet access. This ensures:
- Worker nodes never expose public IPs
- All ingress traffic flows through load balancers
- Complete network isolation between tenants
Cloud NAT Gateway
For workloads that need outbound internet (API calls, package downloads), we use Cloud NAT:
resource "google_compute_router_nat" "vpc_nat" {
name = "cognitora-nat"
router = google_compute_router.vpc_router.name
region = var.region
nat_ip_allocate_option = "AUTO_ONLY"
source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"
}
Why This Matters: Users can opt-in to networking for their executions. When enabled, traffic flows through NAT with controlled egress—no direct internet exposure for execution sandboxes.
Firewall Architecture
Security is enforced at the network level with defense-in-depth:
- Default Deny - All traffic blocked by default
- Explicit Allow - Only specific ports/protocols opened
- Tag-Based Rules - Firewalls target specific instance groups
- Internal-Only - Most services communicate via private IPs
# Example: Internal Nomad communication
resource "google_compute_firewall" "nomad_internal_communication" {
name = "cognitora-nomad-internal"
network = google_compute_network.vpc.name
source_tags = ["nomad-cluster"]
target_tags = ["nomad-cluster"]
allow {
protocol = "tcp"
ports = ["0-65535"] # Full internal trust within cluster
}
}
Orchestration with Nomad
Why Nomad Over Kubernetes?
We chose HashiCorp Nomad over Kubernetes for several reasons:
Advantages:
- ✅ Simplicity - Single 30MB binary vs K8s complexity
- ✅ Lower Overhead - Runs on smaller instances efficiently
- ✅ Fast Scheduling - Sub-second job placement
- ✅ Multi-Workload - Containers, VMs, binaries in one system
- ✅ Cost - Significantly lower operational overhead
Trade-offs:
- ❌ Smaller ecosystem compared to K8s
- ❌ Less third-party integrations
For our use case (short-lived, isolated workloads), Nomad's simplicity and speed win.
Nomad Cluster Architecture
Server Nodes (3):
- Run Raft consensus for state management
- Schedule jobs across client nodes
- Handle API requests from Public API service
- Automatically fail over if leader dies
Client Nodes (Auto-scaled):
- Execute workloads in isolated containers
- Report resource availability to servers
- Auto-scale based on pending job queue
- Drain and terminate when idle (cost optimization)
Job Specification Example
Here's how a code execution job looks in Nomad:
job "code-execution" {
datacenters = ["dc1"]
type = "batch"
group "interpreter" {
count = 1
# Restart policy for transient failures
restart {
attempts = 2
delay = "15s"
mode = "fail"
}
task "execute" {
driver = "docker"
config {
image = "cognitora-runtime:python3.11"
# User code injected here
args = [
"python3", "-c",
"${user_code}"
]
# Resource limits
cpu_hard_limit = true
memory_hard_limit = 512
# Networking control
network_mode = "${networking_enabled ? "bridge" : "none"}"
}
resources {
cpu = 1000 # 1 CPU core
memory = 512 # 512 MB
}
# Timeout enforcement
kill_timeout = "30s"
}
}
}
Key Features:
- Resource Isolation - CPU/memory hard limits enforced
- Network Control - Enable/disable per-job
- Time Limits - Automatic termination after timeout
- Restart Policy - Handle transient failures
API Services
Public API (Go)
The Public API is our main user-facing service, written in Go for performance and concurrency.
Architecture:
// Simplified service structure
type PublicAPI struct {
nomadClient *nomad.Client
redisCache *redis.Client
supabaseDB *supabase.Client
sessionPool *SessionPool
}
// Request flow
func (api *PublicAPI) ExecuteCode(req ExecuteRequest) (*ExecutionResult, error) {
// 1. Authentication & Authorization
user, err := api.authenticateAPIKey(req.APIKey)
if err != nil {
return nil, ErrUnauthorized
}
// 2. Cost Estimation
cost := api.calculateCost(req.Resources)
if user.Credits < cost {
return nil, ErrInsufficientCredits
}
// 3. Session Management
session := api.sessionPool.GetOrCreate(req.SessionID, req.Language)
// 4. Job Submission to Nomad
job := api.buildNomadJob(req, session)
allocation, err := api.nomadClient.SubmitJob(job)
if err != nil {
return nil, err
}
// 5. Result Polling & Streaming
result := api.waitForResult(allocation.ID)
// 6. Deduct Credits
api.deductCredits(user.ID, cost)
return result, nil
}
Key Responsibilities:
- Authentication - Validate API keys against Supabase
- Rate Limiting - Redis-based rate limiting per account
- Cost Calculation - Compute credits based on resources
- Job Orchestration - Submit jobs to Nomad, track status
- Session Pooling - Reuse warm sessions for performance
- Billing Integration - Track usage, deduct credits
Performance Optimizations:
- Connection Pooling - Reuse Nomad/Redis connections
- Request Coalescing - Batch similar requests
- Caching - Cache user data, API key validation results
- Async Processing - Background jobs for non-critical paths
Why Go?
- ✅ Zero Management - Automatic scaling, health checks
- ✅ Pay-Per-Request - Efficient cost model
- ✅ Global Edge - Low latency worldwide
- ✅ Auto-Scaling - Instant scaling to handle traffic spikes
Web Application (Next.js)
Our user dashboard is a Next.js 15 application, deployed on a managed platform:
Features:
- Authentication - Supabase Auth integration
- Dashboard - Real-time execution monitoring
- API Key Management - Generate, rotate, revoke keys
- Billing - Usage tracking, Stripe integration
- Analytics - Execution history, cost breakdown
Tech Stack:
- Next.js 15 - React 19, Server Components
- Supabase - PostgreSQL with Row-Level Security
- Tailwind CSS - Modern, responsive UI
- Stripe - Payment processing
- Google Analytics - User behavior tracking
Runtime Images & Execution
Custom Runtime Images
We maintain several custom Docker images for different use cases:
1. Code Interpreter Runtime
FROM python:3.11-slim
# Install common packages
RUN pip install --no-cache-dir \
pandas numpy scipy scikit-learn \
requests beautifulsoup4 matplotlib \
sqlalchemy psycopg2-binary
# Security: Non-root user
RUN useradd -m -u 1001 coderunner
USER coderunner
# Disable networking by default (enabled per-request)
ENV NETWORK_ENABLED=false
CMD ["python3"]
Optimizations:
- Layer Caching - Common layers shared across runs
- Minimal Base - Alpine/slim variants for fast pulls
- Pre-warmed - Common packages pre-installed
- Multi-Language - Python, Node.js, Go, Rust, R variants
2. Code Server Runtime (Interactive IDE)
For interactive coding experiences, we provide a VS Code in the browser:
FROM codercom/code-server:latest
# Cognitora extensions
COPY extensions/ /extensions/
RUN code-server --install-extension /extensions/cognitora-runner.vsix
# Workspace setup
COPY workspace/ /home/coder/workspace/
EXPOSE 8080
CMD ["code-server", "--bind-addr", "0.0.0.0:8080"]
Use Cases:
- Persistent development environments
- Team collaboration spaces
- Educational platforms (coding bootcamps)
3. Agent Runtime (AI Agents)
For AI agent workloads (like our OpenAI Agents SDK integration):
FROM python:3.11-slim
# Agent dependencies
RUN pip install --no-cache-dir \
openai anthropic \
cognitora \
agents-sdk
# Agent tools
COPY agent_tools/ /app/tools/
CMD ["python3", "-m", "agents"]
Execution Flow
Timing Breakdown:
- Image Pull: ~200ms (cached) / ~2s (cold)
- Container Start: ~150ms
- Code Execution: Variable (user code)
- Result Collection: ~50ms
- Total Overhead: ~400-500ms
Networking & Security
Multi-Layer Security Model
Networking Control
Users have granular control over networking:
Code Interpreter:
# Networking ENABLED (default for interpreter)
result = client.code_interpreter.execute(
code="import requests; print(requests.get('https://api.github.com').status_code)",
language="python",
networking=True # Can make external API calls
)
Containers:
# Networking DISABLED (default for containers)
execution = client.containers.create_container(
image="python:3.11-slim",
command=["python", "-c", "print('Hello')"],
networking=False # Completely isolated
)
Security Rationale:
- Code Interpreter: Default networking ON (common use case: data fetching)
- Containers: Default networking OFF (principle of least privilege)
Reverse Proxy for Container Access
The Reverse Proxy is a separate service that provides external access to internal container services (like web apps running in containers) via friendly subdomain URLs.
How It Works:
Container Service (10.2.0.24:25001)
↓
Generate Token: "green-dew-15389ucymd"
↓
Public URL: https://green-dew-15389ucymd.cgn.my
↓
User accesses container via friendly URL
Key Features:
- Token-Based Routing - Encodes IP:Port into subdomain tokens
- Example: 10.2.0.24:25001 → green-dew-15389ucymd.cgn.my
- Zero-Latency Lookups - No database queries required
- Port Security - Only allows Nomad port range (20000-32000)
- Private Network Only - Routes only to internal VPC addresses
- Heroku-Style URLs - Human-readable adjective-noun-token format
Use Case Example:
# User deploys a web server container
container = client.containers.create_container(
image="nginx:latest",
port_mapping={8080: "http"}, # Expose port 8080
)
# Platform generates friendly URL
# Container internal: http://10.2.0.24:25001
# Public URL: https://green-dew-15389ucymd.cgn.my
Architecture Integration:
Internet → Load Balancer → Reverse Proxy → VPC Connector → Container (10.2.0.24:25001)
This service is completely separate from the Public API and is specifically designed for exposing containerized web services securely.
Secret Management
Sensitive credentials never touch our code:
# Example: Supabase service key
resource "google_secret_manager_secret" "supabase_key" {
secret_id = "supabase-service-role-key"
replication {
automatic = true
}
}
# Accessed via environment variables
env {
name = "SUPABASE_KEY"
value_from {
secret_key_ref {
name = "supabase-service-role-key"
key = "latest"
}
}
}
Best Practices:
- ✅ Secrets in Google Secret Manager
- ✅ Automatic rotation where possible
- ✅ Audit logs for secret access
- ✅ Never logged or exposed in responses
Data Layer
Supabase (PostgreSQL)
We use Supabase as our primary database for:
- User Accounts - Authentication, profiles, settings
- API Keys - Key management with Row-Level Security (RLS)
- Execution History - Past executions, logs, results
- Billing Data - Credits, subscriptions, transactions
- Usage Analytics - Aggregated metrics per account
Why Supabase?
- ✅ PostgreSQL - Full SQL power
- ✅ Row-Level Security - Database-enforced multi-tenancy
- ✅ Real-time Subscriptions - Live dashboard updates
- ✅ Built-in Auth - OAuth, magic links, etc.
- ✅ Managed - Automatic backups, high availability
Schema Example:
-- API Keys table with RLS
CREATE TABLE api_keys (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
account_id UUID REFERENCES accounts(id),
key_hash TEXT NOT NULL,
name TEXT,
permissions TEXT[],
created_at TIMESTAMPTZ DEFAULT now(),
last_used_at TIMESTAMPTZ,
expires_at TIMESTAMPTZ
);
-- RLS Policy: Users can only see their own keys
ALTER TABLE api_keys ENABLE ROW LEVEL SECURITY;
CREATE POLICY "Users can view own keys"
ON api_keys FOR SELECT
USING (account_id = auth.uid());
Redis Cache (Memorystore)
Redis handles high-velocity, ephemeral data:
Use Cases:
- Session Pooling - Prewarmed session state
- Rate Limiting - Per-user request counters
- API Key Cache - Avoid DB hits on every request
- Job Queue - Background task processing
- Real-time Metrics - Execution counts, uptime
Configuration:
resource "google_redis_instance" "cognitora_redis" {
name = "cognitora-redis"
tier = "BASIC"
memory_size_gb = 1
region = var.region
redis_version = "REDIS_7_0"
redis_configs = {
maxmemory-policy = "allkeys-lru" # Evict least-recently-used
timeout = "300" # Close idle connections
}
}
Performance:
- Sub-millisecond latency - Single-digit ms reads/writes
- High throughput - 10,000+ ops/sec on BASIC tier
- Automatic persistence - RDB snapshots + AOF logs
Cloud Storage
Google Cloud Storage for:
- Runtime Images - Docker image layers
- Execution Logs - Long-term log retention
- User Files - Uploaded files for code execution
- Backups - Database and configuration backups
Lifecycle Policies:
resource "google_storage_bucket" "execution_logs" {
name = "cognitora-execution-logs"
location = "US"
lifecycle_rule {
condition {
age = 90 # Days
}
action {
type = "Delete" # Auto-delete old logs
}
}
}
Client SDKs
We provide first-class SDKs for Python and JavaScript/TypeScript, with identical feature parity.
SDK Architecture
Python SDK
# Installation
pip install cognitora
# Usage
from cognitora import Cognitora
client = Cognitora(api_key="cgk_...")
# Execute code
result = client.code_interpreter.execute(
code="""
import pandas as pd
data = pd.DataFrame({'a': [1, 2, 3]})
print(data.describe())
""",
language="python",
networking=True
)
print(result.data.outputs[0].data)
JavaScript/TypeScript SDK
// Installation
npm install @cognitora/sdk
// Usage
import { Cognitora } from '@cognitora/sdk';
const client = new Cognitora({ apiKey: 'cgk_...' });
// Execute code
const result = await client.codeInterpreter.execute({
code: `
const data = [1, 2, 3, 4, 5];
console.log(data.reduce((a, b) => a + b, 0));
`,
language: 'javascript',
networking: true
});
console.log(result.data.outputs[0].data);
SDK Features
Both SDKs provide:
- ✅ Type Safety - TypeScript definitions / Python type hints
- ✅ Error Handling - Custom exception classes
- ✅ Retry Logic - Automatic retries with backoff
- ✅ File Uploads - Multipart form data handling
- ✅ Async Support - Promise/async-await patterns
- ✅ Session Management - Stateful execution contexts
- ✅ Streaming - Real-time output streaming (coming soon)
Scaling Strategy
Horizontal Scaling
Load Increases → Auto-Scaler Adds Nomad Clients → More Capacity
Load Decreases → Auto-Scaler Drains & Removes Nodes → Cost Savings
Auto-Scaling Configuration:
# Managed Instance Group for Nomad clients
resource "google_compute_region_autoscaler" "nomad_clients" {
name = "nomad-client-autoscaler"
target = google_compute_region_instance_group_manager.nomad_clients.id
autoscaling_policy {
min_replicas = 3
max_replicas = 50
cpu_utilization {
target = 0.7 # Scale up at 70% CPU
}
scale_in_control {
max_scaled_in_replicas {
fixed = 5 # Remove max 5 nodes at once
}
time_window_sec = 300 # Wait 5min between scale-downs
}
}
}
Scaling Metrics:
- CPU Utilization - Average across all clients
- Pending Jobs - Queue depth in Nomad
- Memory Pressure - Available memory per node
- Active Allocations - Running containers per node
Vertical Scaling
For resource-intensive workloads, we support custom instance types:
# Example: High-memory workload
execution = client.containers.create_container(
image="cognitora/ml-runtime:latest",
command=["python", "train.py"],
cpu_cores=8.0,
memory_mb=32768,
max_cost_credits=1000
)
Session Pooling
To achieve sub-second cold starts, we maintain a pool of prewarmed sessions:
Benefits:
- ⚡ <100ms response time - No container startup
- 🔥 Preloaded packages - pandas, requests, etc.
- 🔄 Auto-replenishment - Pool refills in background
- 💰 Cost optimization - Reuse instead of recreate
Observability & Monitoring
Metrics & Logging
Google Cloud Operations (formerly Stackdriver) provides:
Metrics:
- Request latency (p50, p95, p99)
- Error rates by endpoint
- Resource utilization (CPU, memory, disk)
- Cost per execution
- Active users
Logging:
- Application logs (structured JSON)
- Audit logs (who did what, when)
- Execution logs (user code output)
- Error logs with stack traces
Tracing:
- End-to-end request tracing
- Nomad job lifecycle
- Database query performance
Dashboard Example:
Alerting
Proactive monitoring catches issues before users notice:
# Example alert policy
alert:
name: "High Error Rate"
condition: error_rate > 1% for 5 minutes
notification:
- email: ops@cognitora.dev
- slack: #alerts
- pagerduty: on-call
actions:
- auto_scale_up: true
- trigger_incident: true
Alert Categories:
- 🚨 Critical - Service down, data loss risk
- ⚠️ Warning - High latency, resource saturation
- ℹ️ Info - Deployments, configuration changes
Performance & Efficiency
Resource Optimization
Our infrastructure is designed for maximum efficiency:
Optimization Techniques:
- Preemptible VMs - Significant cost reduction on worker nodes
- Committed Use Discounts - Long-term capacity planning
- Idle Node Termination - Auto-remove unused workers after 10 minutes
- Image Layer Caching - Reuse common base layers across executions
- Session Pooling - Amortize cold start costs with prewarmed sessions
- Egress Optimization - Cache external API responses
- Auto-Scaling - Dynamic capacity adjustment based on real-time demand
- Resource Packing - Efficient bin-packing algorithm for container placement
Performance Metrics:
- Cold Start: <500ms (with caching)
- Warm Start: <100ms (from session pool)
- Throughput: 10,000+ requests/minute
- Availability: 99.9%+ uptime
Future Architecture
Roadmap
Q2 2025:
- 🔲 WebSocket support for real-time streaming
- 🔲 Multi-region deployment (US, EU, Asia)
- 🔲 GPU support for ML workloads
Q3 2025:
- 🔲 Kubernetes option (alongside Nomad)
- 🔲 Spot instance support (90% cost reduction)
- 🔲 Custom runtime images (user-provided Dockerfiles)
Q4 2025:
- 🔲 Edge execution (Cloudflare Workers integration)
- 🔲 FaaS-style deployment (serverless containers)
- 🔲 Workflow orchestration (DAG-based pipelines)
Challenges Ahead
Technical Challenges:
- Global Low Latency - Edge execution in <50ms worldwide
- State Management - Distributed sessions across regions
- Cost at Scale - Maintaining low costs as volume grows
- Security - Advanced isolation (VMs, microVMs)
Business Challenges:
- Compliance - SOC2, ISO 27001, HIPAA
- Enterprise Features - SSO, audit logs, VPC peering
- Reliability - 99.99% uptime SLA
Conclusion
Building Cognitora has been a journey in balancing security, performance, and efficiency. Our architecture choices reflect real-world trade-offs:
- Nomad over Kubernetes - Simplicity and speed over ecosystem size
- Serverless edge services - Managed simplicity with automatic scaling
- Custom runtime images - Performance optimization for common use cases
- GCP foundation - Leveraging managed services for operational efficiency
The result is a platform that delivers:
- ⚡ Sub-second cold starts
- 🔒 Enterprise-grade security
- 📊 99.9%+ uptime
- 🚀 Horizontal scalability
Want to Learn More?
- 🚀 Try Cognitora: cognitora.dev
- 📚 API Documentation: cognitora.dev/docs
- 🐍 Python SDK: pip install cognitora
- 📦 JavaScript SDK: npm install @cognitora/sdk
- 💬 GitHub: github.com/Cognitora
Questions? Feedback? We'd love to hear from you: hello@cognitora.dev
Built with ❤️ by the Cognitora team
Last updated: January 2025