How We Built an Insurance API That Scales to 10,000 TPS | CoverKit

When we set out to build CoverKit, we knew we needed to handle massive scale. E-commerce checkouts happen in milliseconds, and our API needed to keep up. Here is how we built an infrastructure that handles 10,000+ transactions per second.

The Challenge

Insurance APIs are not like typical CRUD applications. Every request involves:

Complex risk calculations with actuarial models
Real-time fraud detection
Underwriting rule evaluation
Multi-party coordination (carriers, reinsurers)
Strict audit requirements

And it all needs to happen in under 200ms at the p95 level. No pressure.

Architecture Overview

We built CoverKit on Google Cloud Platform using a microservices architecture:

┌─────────────────────────────────────────────────────────┐
│                    Cloud CDN + Armor                     │
└─────────────────────────┬───────────────────────────────┘
                          │
┌─────────────────────────▼───────────────────────────────┐
│                  Cloud Endpoints Gateway                 │
│           (Auth, Rate Limiting, Routing)                │
└─────────────────────────┬───────────────────────────────┘
                          │
        ┌─────────────────┼─────────────────┐
        │                 │                 │
┌───────▼───────┐ ┌───────▼───────┐ ┌───────▼───────┐
│ Quote Engine  │ │ Policy Admin  │ │    Claims     │
│   (Node.js)   │ │   (Node.js)   │ │   (Node.js)   │
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
        │                 │                 │
        └─────────────────┼─────────────────┘
                          │
        ┌─────────────────┼─────────────────┐
        │                 │                 │
┌───────▼───────┐ ┌───────▼───────┐ ┌───────▼───────┐
│   Redis HA    │ │  Cloud SQL    │ │ Cloud Storage │
│  (Caching)    │ │ (PostgreSQL)  │ │  (Documents)  │
└───────────────┘ └───────────────┘ └───────────────┘

Key Decisions

1. Multi-Tier Caching

We implemented a three-tier caching strategy:

L1 (In-Memory): Node.js LRU cache for hot data (<1ms)
L2 (Redis): Distributed cache for shared data (<5ms)
L3 (Computed): Real-time calculation when cache misses (<200ms)

// Our caching strategy achieves 85%+ cache hit rate
const cacheConfig = {
  l1: {
    maxSize: 10000,
    ttl: 60, // 1 minute
  },
  l2: {
    cluster: 'redis-ha',
    ttl: 3600, // 1 hour
    compression: true,
  },
};

async function getQuote(params: QuoteParams): Promise<Quote> {
  const key = computeKey(params);

  // L1 check
  const l1 = memoryCache.get(key);
  if (l1) return l1;

  // L2 check
  const l2 = await redis.get(key);
  if (l2) {
    memoryCache.set(key, l2);
    return l2;
  }

  // Compute
  const quote = await computeQuote(params);
  await Promise.all([
    redis.setex(key, 3600, quote),
    memoryCache.set(key, quote),
  ]);

  return quote;
}

2. Horizontal Scaling with GKE Autopilot

We use GKE Autopilot for automatic scaling based on demand. During peak hours (Black Friday, Cyber Monday), we scale from 10 to 200+ pods automatically.

Key configurations:

CPU-based HPA with custom metrics for queue depth
Pod Disruption Budgets for zero-downtime deployments
Regional clusters for high availability
Workload Identity for secure service-to-service auth

3. Database Optimization

PostgreSQL is our source of truth, but we optimize heavily:

Read replicas for reporting and analytics queries
Connection pooling with PgBouncer (6000+ connections)
Partitioned tables for time-series data
Careful index design (we have a dedicated DBA)

4. Async Processing with Cloud Tasks

Not everything needs to be synchronous. We offload non-critical work:

Webhook delivery (with automatic retries)
Document generation (policies, certificates)
Analytics events
Email notifications

Performance Results

Quote API Latency

45ms

p50

Peak Throughput

12,847

requests/second

Uptime (2024)

99.99%

SLA

Cache Hit Rate

87%

average

Lessons Learned

Cache invalidation is hard: We spent months getting this right. Event-driven invalidation with version keys was the solution.
Observability from day one: We built comprehensive tracing and metrics before scaling. This saved us countless hours.
Graceful degradation matters: When Redis is slow, we fall back to computed values. Never fail the customer.
Load test regularly: We run load tests weekly to catch regressions before they hit production.

What is Next

We are working on edge computing to bring quote generation even closer to customers. Stay tuned for our next engineering deep dive.

Interested in joining our engineering team? Check our open positions.