Skip to content

Deployment

Deploy Gremia Labs across managed cloud services (recommended) or self-hosted with Docker Compose.

Architecture Overview

graph TB
    subgraph Managed Services
        V[Vercel<br/>Next.js Web App]
        R[Railway<br/>FastAPI Orchestrator]
        SB[Supabase<br/>PostgreSQL + Auth]
        UP[Upstash<br/>Redis]
        ST[Stripe<br/>Billing]
    end

    subgraph CI/CD
        GH[GitHub Actions]
    end

    subgraph Clients
        B[Browser] -->|HTTPS| V
        S[Shell Desktop App] -->|WSS + mTLS| R
    end

    V -->|REST| R
    R --> SB
    R --> UP
    R --> ST
    GH -->|deploy/dev| R
    GH -->|deploy/dev| V

Production uses managed services: Railway (backend), Vercel (frontend), Supabase (database), and Upstash (Redis). A Docker Compose option exists for fully self-hosted deployments.


1. Managed Deployment (Railway + Vercel)

This is the primary deployment method used by Gremia Labs.

1.1 Railway (Backend Orchestrator)

Railway hosts the FastAPI orchestrator (LangGraph agent execution, MCP coordination, billing webhooks).

Setting Production Development
Branch main deploy/dev
Domain gremia-labs-production.up.railway.app gremia-labs-development.up.railway.app
Region europe-west4 europe-west4
Dockerfile infra/docker/Dockerfile.orchestrator infra/docker/Dockerfile.orchestrator

Railway auto-deploys when it detects a push to the configured branch. The production environment deploys directly from main; the development environment deploys from deploy/dev, which is updated by CI after all quality gates pass.

Dockerfile

The orchestrator uses infra/docker/Dockerfile.orchestrator at the repository root. Railway must be configured with dockerfilePath: "infra/docker/Dockerfile.orchestrator" to avoid Railpack auto-detection.

Key Railway environment variables (set via Railway dashboard or CLI):

  • All variables from the Environment Variables table below
  • UVICORN_WORKERS=4 (adjust per plan)
  • LOG_LEVEL=INFO
  • PYTHONUNBUFFERED=1

1.2 Vercel (Frontend Web App)

Vercel hosts the Next.js Builder app (apps/web/).

Setting Production Preview
Branch main deploy/dev
Framework Next.js (auto-detected) Next.js
Root Directory apps/web apps/web

Vercel environment variables are split by target:

  • Production target: points to Supabase PROD and Railway PROD URLs
  • Preview + Development target: points to Supabase DEV and Railway DEV URLs

Required NEXT_PUBLIC_* variables:

Variable Description
NEXT_PUBLIC_SUPABASE_URL Supabase project URL
NEXT_PUBLIC_SUPABASE_ANON_KEY Supabase anonymous key
NEXT_PUBLIC_ORCHESTRATOR_URL Railway backend URL (e.g., https://gremia-labs-production.up.railway.app)

Ignored Build Step

Configure Vercel's "Ignored Build Step" to skip builds on the development branch directly. Only deploy/dev (pushed by CI after quality gates pass) should trigger preview deployments.

1.3 Supabase (Database)

Supabase provides managed PostgreSQL with Row-Level Security, Auth, and real-time subscriptions.

Environment Project Ref Region
Production zscxhmfhrqrdzusrfqfl EU Frankfurt
Development vrsjiyufvqktzazyiynb EU Central

All tables live under the gremia schema. Migrations are in infra/supabase/migrations/ (001 through 020). Apply via the Supabase SQL Editor or MCP tool, not the Supabase CLI.

Schema separation

The public schema belongs to a legacy project (CVAI). Never create or modify tables in public. All Gremia tables use the gremia schema exclusively.

1.4 Redis (Upstash)

Upstash provides serverless Redis with TLS.

  • Connection: REDIS_URL uses rediss:// protocol (TLS-enabled)
  • Usage: Rate limiting, execution cache, audit buffer, WebSocket relay
  • Fallback: The application functions without Redis using in-memory fallbacks (suitable for development only)

2. CI/CD Pipeline

GitHub Actions runs quality gates on every push to development and orchestrates deployment to staging.

Pipeline Flow

graph LR
    A[Push to development] --> B[CI Workflow]
    B --> C{Quality Gates}
    C -->|Pass| D[Push to deploy/dev]
    C -->|Fail| E[Block - Fix Required]
    D --> F[Railway Dev Deploy]
    D --> G[Vercel Preview Deploy]

    H[Merge to main] --> I[Railway Prod Deploy]
    H --> J[Vercel Prod Deploy]

Quality Gate Jobs

The CI workflow (.github/workflows/ci.yml) runs these jobs in parallel:

Job What it checks
quality-js turbo run lint typecheck test, i18n completeness
quality-python ruff check, ruff format --check, mypy app, pytest with coverage
quality-rust cargo check, cargo clippy -- -D warnings, cargo test --lib
audit-npm npm audit --omit=dev --audit-level=high
audit-python pip-audit --skip-editable
audit-rust cargo audit
build turbo run build (depends on quality jobs)
e2e Playwright tests with Chromium

Deploy Trigger

After all quality and build jobs pass, the deploy-dev job force-pushes the development branch to deploy/dev:

- name: Push to deploy/dev (triggers Railway + Vercel)
  run: git push origin HEAD:deploy/dev --force

This uses a DEPLOY_TOKEN (GitHub PAT) to ensure the push triggers Railway and Vercel webhooks.

Production deploys

Pushes and merges to main trigger Railway and Vercel production deployments directly -- there is no CI gate on main. Ensure all changes reach main via reviewed pull requests.


3. Self-Hosted Deployment (Docker Compose)

For environments where managed services are not an option, a full Docker Compose stack is provided.

Prerequisites

Tool Version
Docker 24.0+
Docker Compose 2.20+
Domain name With DNS A record pointing to your server
SSL certificate Let's Encrypt or commercial (for Nginx)

Quick Start

1. Clone the repository

git clone https://github.com/JdPG23/gremia-labs.git
cd gremia-labs

2. Configure environment

cp .env.example .env.production
# Edit .env.production with your values (see Environment Variables below)

Security

Never commit .env.production to version control. Use a secrets manager (Vault, AWS Secrets Manager, etc.) in production.

3. Set up SSL certificates

Place your TLS certificates in a location accessible to the certs Docker volume, or use Let's Encrypt:

apt install certbot python3-certbot-nginx
certbot certonly --standalone -d your-domain.com

4. Build and start

docker compose -f docker-compose.prod.yml up -d --build

This starts 6 services by default:

Service Image / Build Internal Port Description
redis redis:7-alpine 6379 Caching, rate limiting, pub/sub
nginx nginx:1.27-alpine 80, 443 (exposed) TLS termination, reverse proxy, mTLS
orchestrator Built from infra/docker/Dockerfile.orchestrator 8000 FastAPI + LangGraph backend
web Built from infra/docker/Dockerfile.web 3000 Next.js frontend
prometheus prom/prometheus:v2.51.0 9090 Metrics collection (30-day retention)
grafana grafana/grafana:11.4.0 3001 Dashboards and alerting

An optional 7th service is available:

Service Profile Description
supabase-db self-hosted Self-hosted PostgreSQL (replaces Supabase Cloud)

To include the self-hosted database:

docker compose -f docker-compose.prod.yml --profile self-hosted up -d --build

5. Verify

curl https://your-domain.com/health
# Expected: {"status": "ok"}

Docker Compose Configuration

The compose file is at the repository root: docker-compose.prod.yml

# Key architectural decisions:
# - Only ports 80/443 exposed externally (via nginx)
# - All services on a dedicated 'gremia' bridge network
# - Healthchecks on every service
# - env_file: .env.production
# - Redis URL override: redis://redis:6379/0 (internal network)

# Scale the orchestrator horizontally:
docker compose -f docker-compose.prod.yml up -d --scale orchestrator=2

Nginx Configuration

The Nginx config at infra/nginx/nginx.conf provides:

  • TLS 1.3 with modern cipher configuration
  • Optional mTLS for Shell desktop client connections
  • Rate limiting per endpoint type:
    • General API: 30 req/s per IP (burst 60)
    • Auth endpoints: 5 req/s per IP (burst 10)
    • WebSocket handshakes: 2/s per IP (burst 5)
  • Security headers: HSTS, X-Content-Type-Options, X-Frame-Options, CSP
  • WebSocket upgrade for the Shell tunnel (/api/v1/tunnel/ws, 24h timeout)
  • Metrics endpoint restricted to internal Docker network IPs

Key proxy locations:

# WebSocket: Shell tunnel (long-lived, 24h timeout)
location /api/v1/tunnel/ws {
    proxy_pass http://orchestrator;
    proxy_http_version 1.1;
    proxy_set_header Upgrade    $http_upgrade;
    proxy_set_header Connection $connection_upgrade;
    proxy_set_header X-Client-Cert $ssl_client_cert;
    proxy_read_timeout  86400;
    proxy_send_timeout  86400;
}

# REST API (rate-limited)
location /api/ {
    proxy_pass http://orchestrator;
    proxy_set_header Host              $host;
    proxy_set_header X-Real-IP         $remote_addr;
    proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
}

# Next.js frontend (default)
location / {
    proxy_pass http://web;
}

SSL with Let's Encrypt

# Install certbot
apt install certbot python3-certbot-nginx

# Generate certificate
certbot --nginx -d your-domain.com

# Auto-renewal (crontab)
0 0 * * * certbot renew --quiet

4. Environment Variables

Backend (Orchestrator)

These variables are required for the FastAPI orchestrator, whether deployed on Railway or self-hosted.

Variable Required Description
SUPABASE_URL Yes Supabase project URL
SUPABASE_KEY Yes Supabase anonymous (anon) key
SUPABASE_SERVICE_ROLE_KEY Yes Supabase service role key (bypasses RLS)
JWT_SECRET Yes Secret for JWT signing (min 32 chars)
SUPABASE_JWT_SECRET Yes Supabase JWT secret (for token verification)
ANTHROPIC_API_KEY Yes Anthropic API key for AI model calls
ENCRYPTION_KEY Yes AES-256 encryption key (hex-encoded, 64 chars)
STRIPE_SECRET_KEY Yes Stripe secret key
STRIPE_WEBHOOK_SECRET Yes Stripe webhook signing secret
STRIPE_PRO_PRICE_ID Yes Stripe price ID for Pro monthly plan
STRIPE_PRO_ANNUAL_PRICE_ID Yes Stripe price ID for Pro annual plan
STRIPE_ENTERPRISE_PRICE_ID Yes Stripe price ID for Enterprise monthly plan
STRIPE_ENTERPRISE_ANNUAL_PRICE_ID Yes Stripe price ID for Enterprise annual plan
REDIS_URL No Redis/Upstash connection URL (rediss:// for TLS). Falls back to in-memory if absent
VOYAGE_API_KEY No Voyage AI key for RAG embeddings
ALLOWED_ORIGINS No JSON array of allowed CORS origins
LOG_LEVEL No Logging level (default: INFO)
UVICORN_WORKERS No Number of Uvicorn workers (default: 4)

Frontend (Web App)

These are set in Vercel or in the web app's environment for self-hosted deployments.

Variable Required Description
NEXT_PUBLIC_SUPABASE_URL Yes Supabase project URL (client-side)
NEXT_PUBLIC_SUPABASE_ANON_KEY Yes Supabase anonymous key (client-side)
NEXT_PUBLIC_ORCHESTRATOR_URL Yes Backend API URL

Self-Hosted Only

Variable Required Description
POSTGRES_PASSWORD If using supabase-db PostgreSQL password for self-hosted DB
GRAFANA_ADMIN_USER No Grafana admin username (default: admin)
GRAFANA_ADMIN_PASSWORD No Grafana admin password (default: changeme)

Secrets management

Never commit secrets to version control. Use environment-specific secrets in Railway/Vercel dashboards, or a dedicated secrets manager (HashiCorp Vault, AWS Secrets Manager) for self-hosted deployments.

Railway environment variables

When setting ALLOWED_ORIGINS in Railway, use a single-line JSON array. Railway inserts newlines if the value contains line breaks, which breaks JSON parsing.


5. Monitoring

Health Checks

All services expose health endpoints:

# Orchestrator (Railway or self-hosted)
curl https://your-backend-url/health

# Web app
curl https://your-frontend-url/

Prometheus Metrics

The orchestrator exposes a /metrics endpoint with the following metrics:

Metric Type Description
gremia_http_requests_total Counter Total HTTP requests by method, path, status
gremia_http_request_duration_seconds Histogram Request latency distribution
gremia_ws_connections_active Gauge Currently active WebSocket connections
gremia_ws_connections_total Counter Total WebSocket connections over time
gremia_executions_total Counter Total task executions by status

Managed deployment: Configure an external Prometheus instance to scrape the Railway backend URL's /metrics endpoint. Ensure the endpoint is protected (e.g., via a bearer token or IP allowlist on Railway).

Self-hosted deployment: Prometheus and Grafana are included in the Docker Compose stack. Prometheus scrapes the orchestrator on the internal gremia network. Grafana dashboards are provisioned automatically from infra/grafana/dashboards/.

Grafana

Access Grafana at port 3001 (internal in Docker Compose, proxied via Nginx in production).

  • Default credentials: admin / changeme (change immediately)
  • Pre-provisioned data source: Prometheus (http://prometheus:9090)
  • Dashboard provisioning: infra/grafana/provisioning/

Logs

Railway: View logs in the Railway dashboard or via the Railway CLI:

railway logs --environment production

Self-hosted:

# Follow orchestrator logs
docker compose -f docker-compose.prod.yml logs -f orchestrator

# Last 100 lines of a specific service
docker compose -f docker-compose.prod.yml logs --tail=100 orchestrator

6. Scaling

Managed (Railway)

Railway supports vertical scaling (adjust CPU/RAM per service) and horizontal scaling (multiple instances). Configure in the Railway dashboard or via the Railway CLI.

WebSocket affinity

If running multiple orchestrator instances, WebSocket connections require sticky sessions. Configure your load balancer or Railway service to use source IP affinity for the /api/v1/tunnel/ws path.

Self-Hosted

Horizontal scaling

The orchestrator is stateless (state lives in Supabase + Redis). Run multiple instances behind Nginx:

docker compose -f docker-compose.prod.yml up -d --scale orchestrator=3

The Nginx upstream uses ip_hash for WebSocket session affinity.

Resource limits

services:
  orchestrator:
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 2G
        reservations:
          cpus: "0.5"
          memory: 512M

Redis

The default Redis configuration uses maxmemory 256mb with allkeys-lru eviction. Increase for high-traffic deployments:

# In docker-compose.prod.yml, update the redis command:
command: redis-server --appendonly yes --maxmemory 512mb --maxmemory-policy allkeys-lru

7. Backup and Recovery

Database (Supabase)

Managed: Supabase provides automated daily backups on Pro plans and above. Point-in-time recovery is available on Team and Enterprise plans.

Manual backup:

pg_dump $DATABASE_URL > backup_$(date +%Y%m%d).sql

Redis

Redis data is ephemeral (cache, rate-limit counters). No backup is required. On Upstash, persistence is handled automatically.

For self-hosted Redis, the appendonly flag is enabled. Data persists in the redis-data Docker volume.

Encryption Keys

Store the ENCRYPTION_KEY in a secure vault. If this key is lost, encrypted credential data cannot be recovered. Use the key rotation API to rotate keys periodically.

Key loss

There is no recovery mechanism for data encrypted with a lost ENCRYPTION_KEY. Store it in at least two independent secure locations.

Disaster Recovery Checklist

  1. Database: Restore from Supabase backup or manual pg_dump
  2. Environment variables: Re-apply from secrets manager
  3. SSL certificates: Re-issue via Let's Encrypt or restore from backup
  4. Docker volumes (self-hosted): prometheus-data, grafana-data, supabase-data should be on persistent storage with regular snapshots

Updating

Managed

  • Backend: Push to main (or merge a PR). Railway auto-deploys.
  • Frontend: Push to main. Vercel auto-deploys.
  • Database: Apply new migrations via Supabase SQL Editor.

Self-Hosted

cd gremia-labs
git pull origin main
docker compose -f docker-compose.prod.yml build orchestrator web
docker compose -f docker-compose.prod.yml up -d orchestrator web

Zero-downtime updates are supported when running multiple orchestrator replicas behind Nginx.