Back to Blog
Engineering10 min read

How We Built an Insurance API That Scales to 10,000 TPS

David ParkNovember 20, 2024

When we set out to build CoverKit, we knew we needed to handle massive scale. E-commerce checkouts happen in milliseconds, and our API needed to keep up. Here is how we built an infrastructure that handles 10,000+ transactions per second.

The Challenge

Insurance APIs are not like typical CRUD applications. Every request involves:

  • Complex risk calculations with actuarial models
  • Real-time fraud detection
  • Underwriting rule evaluation
  • Multi-party coordination (carriers, reinsurers)
  • Strict audit requirements

And it all needs to happen in under 200ms at the p95 level. No pressure.

Architecture Overview

We built CoverKit on Google Cloud Platform using a microservices architecture:

ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│                    Cloud CDN + Armor                     │
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
                          │
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā–¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│                  Cloud Endpoints Gateway                 │
│           (Auth, Rate Limiting, Routing)                │
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
                          │
        ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
        │                 │                 │
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā–¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā–¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā–¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│ Quote Engine  │ │ Policy Admin  │ │    Claims     │
│   (Node.js)   │ │   (Node.js)   │ │   (Node.js)   │
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
        │                 │                 │
        ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜
                          │
        ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
        │                 │                 │
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā–¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā–¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā–¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│   Redis HA    │ │  Cloud SQL    │ │ Cloud Storage │
│  (Caching)    │ │ (PostgreSQL)  │ │  (Documents)  │
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

Key Decisions

1. Multi-Tier Caching

We implemented a three-tier caching strategy:

  • L1 (In-Memory): Node.js LRU cache for hot data (<1ms)
  • L2 (Redis): Distributed cache for shared data (<5ms)
  • L3 (Computed): Real-time calculation when cache misses (<200ms)
// Our caching strategy achieves 85%+ cache hit rate
const cacheConfig = {
  l1: {
    maxSize: 10000,
    ttl: 60, // 1 minute
  },
  l2: {
    cluster: 'redis-ha',
    ttl: 3600, // 1 hour
    compression: true,
  },
};

async function getQuote(params: QuoteParams): Promise<Quote> {
  const key = computeKey(params);

  // L1 check
  const l1 = memoryCache.get(key);
  if (l1) return l1;

  // L2 check
  const l2 = await redis.get(key);
  if (l2) {
    memoryCache.set(key, l2);
    return l2;
  }

  // Compute
  const quote = await computeQuote(params);
  await Promise.all([
    redis.setex(key, 3600, quote),
    memoryCache.set(key, quote),
  ]);

  return quote;
}

2. Horizontal Scaling with GKE Autopilot

We use GKE Autopilot for automatic scaling based on demand. During peak hours (Black Friday, Cyber Monday), we scale from 10 to 200+ pods automatically.

Key configurations:

  • CPU-based HPA with custom metrics for queue depth
  • Pod Disruption Budgets for zero-downtime deployments
  • Regional clusters for high availability
  • Workload Identity for secure service-to-service auth

3. Database Optimization

PostgreSQL is our source of truth, but we optimize heavily:

  • Read replicas for reporting and analytics queries
  • Connection pooling with PgBouncer (6000+ connections)
  • Partitioned tables for time-series data
  • Careful index design (we have a dedicated DBA)

4. Async Processing with Cloud Tasks

Not everything needs to be synchronous. We offload non-critical work:

  • Webhook delivery (with automatic retries)
  • Document generation (policies, certificates)
  • Analytics events
  • Email notifications

Performance Results

Quote API Latency
45ms
p50
Peak Throughput
12,847
requests/second
Uptime (2024)
99.99%
SLA
Cache Hit Rate
87%
average

Lessons Learned

  1. Cache invalidation is hard: We spent months getting this right. Event-driven invalidation with version keys was the solution.
  2. Observability from day one: We built comprehensive tracing and metrics before scaling. This saved us countless hours.
  3. Graceful degradation matters: When Redis is slow, we fall back to computed values. Never fail the customer.
  4. Load test regularly: We run load tests weekly to catch regressions before they hit production.

What is Next

We are working on edge computing to bring quote generation even closer to customers. Stay tuned for our next engineering deep dive.

Interested in joining our engineering team? Check our open positions.

DP
David Park
Co-founder & CTO