Backend MVP Guardrails

Overview

Minimize irreversible decisions. Every write must be idempotent, every aggregate must be replayable, and every incident must be attributable with minimal evidence.

When to Use

MVP backend with single-digit USD/month budget or strict capacity limits
Fast schema evolution or new data sources with unknown fields
Third-party backend dependency (e.g., InsForge) with no status page or DB metrics
Repeated ambiguity about whether failures are vendor or application issues

When NOT to use: throwaway prototypes where data loss and misattribution are acceptable.

Core Pattern (Two Layers)

Layer 1: Principle Guardrails (platform-agnostic)

Source of truth is immutable or append-only. Avoid online recomputation on read paths.
Idempotent writes. Deterministic keys + upsert or unique constraint.
Replayable aggregates. Derived tables can be rebuilt from the source of truth.
Evidence-first attribution. No structured evidence, no blame, no destructive fix.
Cost-first queries. Pre-aggregate, cap ranges, enforce limits, avoid full scans.
Schema evolution is additive. New fields are optional and versioned; unknown fields are rejected by allowlist.

Layer 2: Platform Mapping (InsForge example)

Fact table: half-hour buckets (e.g., vibescore_tracker_hourly)
Idempotency key: user_id + device_id + source + model + hour_start
Aggregates: derived from buckets; do not read raw event tables for dashboards
Retention: keep aggregates longer; cap any event-level tables
Backfill: limited window + upsert; must be replayable
Observability: M1 structured logs (see below)

Responsibility Attribution Protocol (M1)

Required fields: request_id, function, stage, status, latency_ms, error_code, upstream_status, upstream_latency_ms

Attribution rules:

Missing upstream_status => UNKNOWN (do not change data semantics)
upstream_status is 5xx/timeout and function status is 5xx => likely vendor/backbone issue
upstream_status is 2xx and function status is 4xx/5xx => likely application validation/logic issue
latency_ms high and upstream_latency_ms low => likely application-side bottleneck

Stop rule: no data rewrite, schema change, or semantic patch without a replay plan and rollback.

Quick Reference

| Guardrail | Why | Minimum Implementation | | --- | --- | --- | | Idempotent writes | Prevent double-counting | Unique key + upsert | | Replayable aggregates | Safe fixes | Source-of-truth table + backfill job | | Cost caps | Fit low budget | Range limits + pre-aggregates | | Evidence-first | Avoid misfix | M1 structured logs | | Schema allowlist | Avoid data bloat | Reject unknown fields |

Implementation Example (Structured Log)

const start = Date.now();
const requestId = crypto.randomUUID();
const log = (entry) =>
  console.log(JSON.stringify({
    request_id: requestId,
    function: 'example-function',
    ...entry
  }));

try {
  const upstreamStart = Date.now();
  const res = await fetch(upstreamUrl);
  const upstreamLatency = Date.now() - upstreamStart;

  log({
    stage: 'upstream',
    status: res.status,
    upstream_status: res.status,
    upstream_latency_ms: upstreamLatency,
    latency_ms: Date.now() - start,
    error_code: res.ok ? null : 'UPSTREAM_ERROR'
  });
} catch (err) {
  log({
    stage: 'exception',
    status: 500,
    upstream_status: null,
    upstream_latency_ms: null,
    latency_ms: Date.now() - start,
    error_code: 'UPSTREAM_TIMEOUT'
  });
  throw err;
}

Common Mistakes

Online aggregation in dashboard endpoints under low budget
Adding new data sources without updating idempotency keys
Blame without upstream_status evidence
Storing full payloads "just in case" (privacy and cost risk)
Changing data semantics without replay/backfill plan

Rationalization Table

| Excuse | Reality | | --- | --- | | "We are a tiny team, logs are overkill" | Small teams need stronger evidence, not weaker. | | "Vendor is unstable, we cannot know" | You still need M1 logs to avoid misfixes. | | "Budget is low so scans are fine" | Low budget means scans fail sooner. | | "We can patch the numbers" | Patches without replay create permanent drift. |

Red Flags - STOP

No structured logs but attempting responsibility attribution
Data rewrite without replay/backfill plan
Dashboard reads from raw event tables
Unknown fields stored without allowlist
Idempotency key not updated when adding dimensions