Webcite API: Rate Limiting, Caching, and Batching

Optimize Webcite API performance with rate limiting, caching, and batch processing. Reduce credit usage by 40-60% with practical JavaScript code examples.

Performance optimization diagram showing rate limiting caching and batch processing layers for API calls
T
Teja Thota

Building Webcite, the fact-checking and citation API for AI applications.

Enterprise API traffic grew 60 percent year over year in 2025, according to the Postman State of the API Report, 2025. For teams running verification at scale, that growth means rate limits, redundant requests, and credit waste become real engineering problems. This guide covers three patterns that solve them: rate limiting with exponential backoff, hash-based caching for deduplication, and queue-based batch processing. Each pattern includes working JavaScript code you can deploy against the Webcite REST API today.

Key Takeaways
  • Exponential backoff with jitter prevents 429 errors and ensures retries succeed without overwhelming the API.
  • Hash-based caching eliminates duplicate verification calls and reduces credit usage by 40-60% in production.
  • Concurrency-limited batch processing verifies hundreds of claims per minute while respecting rate limits.
  • Each full Webcite verification uses 4 credits; caching a single repeated claim saves $0.16 on the Builder plan.
  • Monitoring API calls with structured logging catches inefficiencies before they become cost problems.
API Rate Limiting: A mechanism that controls how many requests a client can send to an API within a given time window. Rate limits protect the API from overload and ensure fair resource distribution across all users. Exceeding the limit returns an HTTP 429 (Too Many Requests) status code.

Rate Limiting: Exponential Backoff and Retry Logic

Rate limiting is the first constraint every developer hits at scale. Seventy-three percent of SaaS outages are linked to API overuse or poor traffic management, according to Gartner, 2024. The Webcite API returns HTTP 429 when you exceed your plan’s rate limit. The correct response is not to retry immediately. It is to back off, wait, and retry with increasing delays.

Exponential backoff works by doubling the wait time after each failed attempt. Adding jitter (randomness) prevents multiple clients from retrying at the same instant, which would create a thundering herd problem. AWS, Google Cloud, and Stripe all recommend this pattern in their official documentation, according to AWS Prescriptive Guidance, 2025. In 2025, 82 percent of enterprises adopted an API-first approach, according to Orbilontech, 2026, which means more systems are competing for the same API endpoints and rate limiting becomes a shared concern.

Here is a production-ready retry wrapper for the Webcite API:

async function verifyWithRetry(claim, maxRetries = 3) {
  const baseDelay = 1000

  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const response = await fetch("https://api.webcite.co/api/v1/verify", {
      method: "POST",
      headers: {
        "x-api-key": process.env.WEBCITE_API_KEY,
        "Content-Type": "application/json"
      },
      body: JSON.stringify({
        claim: claim,
        include_stance: true,
        include_verdict: true
      })
    })

    if (response.ok) {
      return response.json()
    }

    if (response.status === 429 && attempt < maxRetries) {
      const delay = baseDelay * Math.pow(2, attempt)
      const jitter = Math.random() * delay * 0.5
      await new Promise(resolve => setTimeout(resolve, delay + jitter))
      continue
    }

    throw new Error(
      `Webcite API error: ${response.status} after ${attempt + 1} attempts`
    )
  }
}

This function retries on 429 responses with delays of approximately 1 second, 2 seconds, and 4 seconds (plus jitter). The jitter component ensures that if ten clients all hit the rate limit at the same moment, their retries spread out randomly instead of colliding again.

The API management market is projected to reach $32.77 billion by 2032, growing at 34 percent compound annual growth rate, according to Orbilontech, 2026. That growth is driven by exactly this kind of programmatic integration. Getting retry logic right is foundational.

Three rules for retry logic:

  1. Only retry on transient errors (429, 500, 502, 503). Never retry on 400 (bad request) or 401 (unauthorized). Those will fail every time.
  2. Cap the maximum number of retries. Three retries is the standard. More than five means something is structurally wrong.
  3. Log every retry. If your retry rate exceeds 5 percent of total requests, you are sending too many requests and need to throttle upstream.

For a breakdown of how credits are consumed per API call, see the Webcite API pricing guide.

Caching: Hash-Based Deduplication and TTL

Caching is the highest-impact optimization you can make. API caching reduces backend load by up to 70 percent and cuts database operations by 60 percent in typical production systems, according to TechTarget, 2025. For verification APIs specifically, the gains are even larger because many applications verify the same claims repeatedly.

Consider an AI writing tool that generates product descriptions. The claim “OpenAI was founded in 2015” might appear in dozens of articles. Without caching, each occurrence costs 4 credits. With caching, you verify it once and serve the cached result for every subsequent request.

The strategy is hash-based deduplication: normalize the claim text, generate a hash, and check the cache before making an API call. Here is the implementation:

const crypto = require("crypto")

// In-memory cache (use Redis in production)
const verificationCache = new Map()
const CACHE_TTL_MS = 24 * 60 * 60 * 1000 // 24 hours

function normalizeClaim(claim) {
  return claim
    .toLowerCase()
    .replace(/\s+/g, " ")
    .replace(/[^\w\s]/g, "")
    .trim()
}

function getCacheKey(claim) {
  const normalized = normalizeClaim(claim)
  return crypto.createHash("sha256").update(normalized).digest("hex")
}

async function verifyWithCache(claim) {
  const cacheKey = getCacheKey(claim)
  const cached = verificationCache.get(cacheKey)

  if (cached && Date.now() - cached.timestamp < CACHE_TTL_MS) {
    return { ...cached.result, fromCache: true }
  }

  const result = await verifyWithRetry(claim)

  verificationCache.set(cacheKey, {
    result: result,
    timestamp: Date.now()
  })

  return { ...result, fromCache: false }
}

The normalization step is critical. “OpenAI was founded in 2015” and “openai was founded in 2015” should resolve to the same cache entry. The function strips punctuation, lowercases, and collapses whitespace before hashing. Fortune 1000 companies make more than 65 billion API calls daily, according to SQ Magazine, 2026. Even a modest deduplication rate delivers significant savings at that scale.

Choosing a TTL

TTL (Time to Live) determines how long cached results remain valid. The right TTL depends on how frequently your verified facts change:

Content type Recommended TTL Reason
Historical facts 7 days Rarely change
Company data 24 hours Updates periodically
Market statistics 6 hours Can change daily
Breaking news claims 1 hour High volatility

A 24-hour TTL is the most common default. It balances freshness against credit savings. For an in-depth look at how the verification API processes each claim, see our explainer guide.

Production Caching with Redis

For production deployments, replace the in-memory Map with Redis. This gives you persistence across server restarts, shared state across multiple application instances, and built-in TTL expiration:

const Redis = require("ioredis")
const redis = new Redis(process.env.REDIS_URL)

async function verifyWithRedisCache(claim) {
  const cacheKey = `verify:${getCacheKey(claim)}`
  const cached = await redis.get(cacheKey)

  if (cached) {
    return { ...JSON.parse(cached), fromCache: true }
  }

  const result = await verifyWithRetry(claim)

  await redis.setex(
    cacheKey,
    86400, // 24-hour TTL in seconds
    JSON.stringify(result)
  )

  return { ...result, fromCache: false }
}

In a production content platform processing 10,000 claims per day, we estimate that 40 to 60 percent of claims are duplicates or near-duplicates. That translates to 4,000 to 6,000 fewer API calls per day. On the Builder plan at $20 per month for 500 credits, caching the equivalent of 1,000 verifications per month saves roughly $40 in additional credits that would otherwise be required.

Batch Processing: Parallel Requests with Concurrency Control

Batch processing is essential for high-volume verification pipelines. Whether you are verifying an entire article, processing a content queue, or running nightly audits, you need to send many requests without exceeding rate limits.

OpenAI’s Batch API reduces costs by 50 percent for asynchronous workloads, according to OpenAI, 2025. The same principle applies to verification: batching lets you maximize throughput while staying within rate limits.

The pattern is concurrency-limited parallel processing. Instead of sending all requests at once (which triggers rate limiting) or sending them one at a time (which is slow), you process a fixed number of requests in parallel:

async function batchVerify(claims, concurrency = 5) {
  const results = []
  const queue = [...claims]

  async function processNext() {
    while (queue.length > 0) {
      const claim = queue.shift()
      const result = await verifyWithCache(claim)
      results.push({ claim, ...result })
    }
  }

  const workers = Array.from(
    { length: Math.min(concurrency, claims.length) },
    () => processNext()
  )

  await Promise.all(workers)
  return results
}

This function creates a pool of workers that pull claims from a shared queue. With a concurrency of 5, you process 5 claims simultaneously. When one finishes, the next claim starts immediately. The total throughput is roughly 5 times faster than sequential processing.

Queue-Based Verification for Content Pipelines

For content pipelines that process articles asynchronously, a queue-based approach gives you better control over throughput and error handling:

class VerificationQueue {
  constructor(options = {}) {
    this.concurrency = options.concurrency || 5
    this.delayMs = options.delayMs || 200
    this.queue = []
    this.processing = 0
    this.results = new Map()
  }

  async add(claims, batchId) {
    return new Promise((resolve) => {
      this.queue.push({
        claims,
        batchId,
        resolve
      })
      this.processQueue()
    })
  }

  async processQueue() {
    while (
      this.queue.length > 0 &&
      this.processing < this.concurrency
    ) {
      const job = this.queue.shift()
      this.processing++

      this.processJob(job).finally(() => {
        this.processing--
        this.processQueue()
      })
    }
  }

  async processJob(job) {
    const batchResults = []

    for (const claim of job.claims) {
      const result = await verifyWithCache(claim)
      batchResults.push({ claim, ...result })
      await new Promise(r => setTimeout(r, this.delayMs))
    }

    this.results.set(job.batchId, batchResults)
    job.resolve(batchResults)
  }
}

// Usage
const queue = new VerificationQueue({ concurrency: 3, delayMs: 250 })

const articleClaims = [
  "Global API traffic grew 60% year over year in 2025",
  "73% of SaaS outages are linked to API overuse",
  "The EU AI Act takes effect in August 2026"
]

const results = await queue.add(articleClaims, "article-123")

The delayMs parameter adds a small gap between requests within a batch. This prevents burst patterns that can trigger rate limiting even when average throughput is within limits. A 200-250 millisecond delay between requests is a safe default for most API plans. AI-related API traffic on Postman increased 73 percent year over year, according to the Postman State of the API Report, 2025. Batch processing with concurrency control is how you keep pace with that growth without burning through credits.

Cost Optimization: How Caching Reduces Credit Usage

Each Webcite verification consumes 4 credits: 2 for citation retrieval, 1 for stance detection, and 1 for the final verdict. The table below shows how caching affects monthly costs at different scales:

Monthly claims Without cache Cache hit rate With cache Credits saved
500 2,000 credits 40% 1,200 credits 800
2,000 8,000 credits 50% 4,000 credits 4,000
10,000 40,000 credits 60% 16,000 credits 24,000

At the Builder plan rate of $0.04 per credit ($20 for 500 credits), saving 800 credits per month equals $32 in avoided costs. At Enterprise scale with 10,000 monthly claims, a 60 percent cache hit rate saves 24,000 credits.

Reduced engineering overhead from eliminating duplicate work saves 42 percent of development time, according to the Postman State of the API Report, 2025. The credit savings compound with the engineering time saved from not debugging redundant API calls.

Three strategies maximize cache hit rates:

  1. Normalize aggressively. Strip punctuation, lowercase, collapse whitespace. “The EU AI Act” and “the eu ai act” should hit the same cache entry.

  2. Cache at the claim level, not the document level. An article with 10 claims might share 6 of them with previously verified content. Claim-level caching captures those overlaps.

  3. Pre-warm the cache. If your application generates content from templates, verify the template claims once and cache the results before users trigger individual verifications. Organizations that implement caching strategies achieve cost reductions of 50 to 90 percent on API-related expenses, according to Treblle, 2025.

Monitoring and Logging Verification Calls

Monitoring is the feedback loop that makes the other optimizations work. Without visibility into your API usage, you cannot tell whether caching is effective, whether retries are excessive, or whether batch sizes are optimal. APIs power 83 percent of web traffic, and monitoring them is critical to avoid costly outages, according to Catchpoint, 2025.

Here is a logging wrapper that tracks the metrics that matter:

const verificationLog = []

async function verifyWithLogging(claim) {
  const startTime = Date.now()
  const logEntry = {
    claim: claim.substring(0, 100),
    timestamp: new Date().toISOString(),
    cached: false,
    retries: 0,
    latencyMs: 0,
    credits: 0,
    status: "pending"
  }

  try {
    const result = await verifyWithCache(claim)

    logEntry.cached = result.fromCache || false
    logEntry.latencyMs = Date.now() - startTime
    logEntry.credits = result.fromCache ? 0 : 4
    logEntry.status = "success"
    logEntry.verdict = result.verdict?.result

    verificationLog.push(logEntry)
    return result
  } catch (error) {
    logEntry.latencyMs = Date.now() - startTime
    logEntry.status = "error"
    logEntry.error = error.message

    verificationLog.push(logEntry)
    throw error
  }
}

function getUsageReport() {
  const total = verificationLog.length
  const cached = verificationLog.filter(e => e.cached).length
  const errors = verificationLog.filter(e => e.status === "error").length
  const totalCredits = verificationLog.reduce(
    (sum, e) => sum + e.credits, 0
  )
  const avgLatency = Math.round(
    verificationLog.reduce((sum, e) => sum + e.latencyMs, 0) / total
  )

  return {
    totalRequests: total,
    cacheHitRate: `${Math.round((cached / total) * 100)}%`,
    errorRate: `${Math.round((errors / total) * 100)}%`,
    creditsConsumed: totalCredits,
    creditsSaved: cached * 4,
    avgLatencyMs: avgLatency
  }
}

A 0.1 percent drop in API uptime equals more than 8 hours of downtime per year, costing enterprises thousands in lost transactions and SLA penalties, according to Catchpoint, 2025. Monitoring catches problems before they reach that threshold.

The five metrics to track:

  1. Cache hit rate. Target 40 percent or higher. Below 30 percent means your normalization is too strict or your content has low overlap.
  2. Retry rate. Should stay below 5 percent. Higher means you are sending requests too fast.
  3. Average latency. Webcite typically responds in 1-3 seconds. Spikes indicate network issues or server load.
  4. Credits consumed vs. saved. The ratio tells you the ROI of your caching layer.
  5. Error rate. Track 4xx and 5xx errors separately. A rising 429 rate means you need to reduce concurrency.

Putting It All Together

Here is the complete pipeline that combines rate limiting, caching, batch processing, and monitoring into a single verification function:

async function verifyArticle(articleText) {
  // Extract verifiable claims
  const claims = articleText
    .split(/[.!?]+/)
    .map(s => s.trim())
    .filter(s => s.length > 20)
    .filter(s => /\d/.test(s) || /[A-Z][a-z]{2,}/.test(s.slice(1)))

  // Batch verify with caching, retries, and logging
  const results = await batchVerify(
    claims.map(c => c),
    5 // concurrency
  )

  // Generate report
  const report = getUsageReport()

  return {
    totalClaims: claims.length,
    results: results.map(r => ({
      claim: r.claim,
      verdict: r.verdict?.result,
      confidence: r.verdict?.confidence,
      cached: r.fromCache
    })),
    usage: report
  }
}

This function extracts claims from article text, verifies them in parallel batches of 5 with automatic caching and retry logic, and returns a report showing how many credits were consumed versus saved.

For teams processing high volumes of content, this pipeline works directly with Webcite’s credit-based pricing. The free tier at 50 credits per month supports testing and development. The Builder plan at $20 per month with 500 credits handles most production workloads when paired with caching. Enterprise plans starting at 10,000 or more credits per month support large-scale content pipelines with dedicated throughput.

In 2026, 80 percent of API traffic will be driven by non-human actors like AI agents and automated pipelines, according to SQ Magazine, 2026. Building rate limiting, caching, and batch processing into your verification pipeline now means your system scales with that traffic growth instead of breaking under it.


Frequently Asked Questions

What is the rate limit for the Webcite API?

Webcite enforces rate limits per API key to ensure fair usage and platform stability. Free tier keys allow up to 10 requests per minute. Builder plan keys allow up to 60 requests per minute. Enterprise plans include custom rate limits based on your throughput requirements.

How does caching reduce Webcite API credit usage?

Caching stores verification results for claims you have already checked. When the same or a near-identical claim appears again, your application returns the cached result instead of making a new API call. In production systems with repeated content, caching typically reduces credit consumption by 40 to 60 percent.

Can I send batch verification requests to the Webcite API?

Yes. You can send multiple verification requests in parallel using Promise.all or a concurrency-limited queue. Each request is a standard POST to the verify endpoint. Batch processing with concurrency control lets you verify hundreds of claims per minute while staying within rate limits.

What is exponential backoff and why should I use it with the Webcite API?

Exponential backoff is a retry strategy where each failed request triggers a progressively longer wait before the next attempt. Instead of hammering the API after a rate limit error, your client waits 1 second, then 2 seconds, then 4 seconds. This prevents cascading failures and ensures your requests succeed on retry.

How much does Webcite API verification cost per request?

Each full verification consumes 4 credits: 2 for citation retrieval, 1 for stance detection, and 1 for the final verdict. The free plan includes 50 credits per month. The Builder plan at $20 per month provides 500 credits. Enterprise plans start at 10,000 or more credits with custom pricing.