Prerequisites & Decision Register

Every decision that must be made before writing a line of code — with options, trade-offs, and a literal checklist to start building.

Product Groundwork — Shared Reality Platform
Stage Pre-Architecture / Pre-Build
Team Size 1–2 developers
Scale Target 500 → 5K → 50K projects
Budget Cap <$50/mo at launch
Updated April 2026

Core Principles

These principles govern every decision in this document. When two options are equally viable, apply the principle that eliminates the most future regret.

Docker First

A Docker image that runs anywhere is more valuable than deep integration with any single platform. Build the image; pick the host later.

Postgres Before Anything

Ask "can Postgres do this?" before adding any new dependency. Queues, sessions, full-text search, and JSON storage all live in Postgres at MVP.

Library Over Service

A library runs in your process. A service is another account, another bill, another migration path. Prefer the library unless the service eliminates significant complexity.

Document Exit Paths

Every vendor included here must have a documented exit path — a standard protocol it speaks and a migration description that takes less than one day.

MVP Dependency Budget

Maximum 5 external services at launch. Count them. If adding a service breaks the budget, something else must come out or be deferred.

Defer Until Trigger

Premature optimization costs more than it saves. Each deferred service has a documented trigger — a metric that signals when it's time to add it.

What This Document Covers

This document sits before the technical architecture. The architecture describes how the system is built. This document captures what must be decided, acquired, and confirmed before building begins. It answers:

How to use this document. Read the decision register once to understand the landscape. Then go to the Accounts checklist, create the 4 required accounts, and run the bootstrap commands at the bottom. The first commit should be possible in under two hours from a fresh machine.

1 Minimum Viable Infrastructure

There is a meaningful distinction between what is needed to ship and what is needed to scale. The table below makes that explicit with two tiers.

Bare Minimum — Can Ship Today
  • Container host running the SvelteKit app + API serverAny host that accepts a Docker image. The application is the artifact.
  • Postgres (managed by provider, or Docker volume in dev)Handles data, sessions, job queue, and documents under 1,000 files.

Service Audit — Questioning Every Dependency

Each of the following is commonly assumed to be necessary at MVP. The audit asks whether that assumption holds for Groundwork at the actual launch scale.

Service Default Assumption Audit Result Trigger to Add
Object Storage
R2 / S3 / Tigris
"Files need a bucket." Defer Postgres bytea or large objects handles documents under ~1,000 files and ~1 GB total. No egress costs, no additional auth, zero new infrastructure. Simple. Document count > 1,000 or total stored > 1 GB
Redis / Cache
Upstash / Redis Cloud
"We'll need cache." Defer Postgres handles sessions natively via BetterAuth's Postgres adapter. Background jobs run via Graphile Worker on Postgres. No identified use case for Redis at <5,000 active sessions. >50K concurrent sessions, or a specific hot-path identified via profiling
Email Service
Resend / Postmark
"We need email." Include Contractor invites, change order approvals, and permit status alerts are core product flows. Email is genuinely required. Resend's free tier covers <3,000 emails/month. Include at launch. Required at MVP — include
Separate Queue Service
Inngest / BullMQ
"Queues need Redis." Defer Graphile Worker runs job queues on existing Postgres using SKIP LOCKED. Under 10,000 jobs/hour, this is indistinguishable from a dedicated queue service in performance terms. Zero additional infrastructure. >10,000 jobs/hour sustained, or complex fan-out workflows
Error Tracking
Sentry
"Need Sentry immediately." Include Structured logging to stdout is sufficient for local dev. For production, Sentry's free tier (5,000 events/month) provides stack traces, release tracking, and user context. 10-minute setup. Include. Required at launch — include
CDN
CloudFront / Fastly
"Need a CDN." Not a decision The hosting provider handles CDN for static assets. Cloudflare DNS provides edge caching for free. A separate CDN account is not a decision at this scale. Global user distribution with sub-100ms latency requirements
Uptime Monitoring
Datadog / PagerDuty
"Need alerting." Minimal UptimeRobot free tier (50 monitors, 5-minute checks) with email alerts covers launch. The hosting provider's built-in metrics dashboard handles the rest. Datadog is premature at this scale. SLA commitments to paying customers

2 Decision Register

Each decision below covers the realistic option set, the key differentiating question, trade-offs, and a recommendation. High-stakes decisions (those with high exit costs) are flagged clearly.

A Hosting Platform
Decided Lock-in: Low
All of these platforms run Docker images. The real differentiator is developer experience, pricing structure, and included extras for a 1–2 person team.
Platform DX Rating Postgres Included Free Tier Est. Monthly (MVP) Lock-in
Railway Excellent Yes $5 credit/mo $10–20/mo Low
Render Good Yes Free static + 90-day Postgres $7–15/mo Low
Google Cloud Run Moderate No (Cloud SQL separate) Scale-to-zero, $0 idle $0–5/mo (idle) Medium
AWS Fargate Low No (RDS separate) Minimal $25–50/mo Medium
Coolify (self-hosted) Good Yes (managed) Free (VPS cost only) $5–10/mo (Hetzner VPS) Lowest
Recommendation: Fly.io Fly.io provides the best DX-to-ops ratio for a solo or 2-person team. Postgres is included at no extra cost, workers and scheduled tasks run natively, private networking between services is built-in, and the free tier is sufficient for early-stage development. The CLI is well-designed. The Docker deployment model means migrating to any other platform is a CI/CD change — not an architectural one.

Runner-up: Railway — marginally simpler DX, but higher cost at scale and a smaller ecosystem.
Exit Path All platforms in this table accept the same Dockerfile. Migration = update the .github/workflows/deploy.yml target and run pg_dump | pg_restore to transfer data. Estimated time: 4–8 hours including DNS propagation.
B Database
Decided Lock-in: Low–Medium
Do we need serverless Postgres with branching? Or is an always-on instance that comes with our hosting provider sufficient for MVP?
Option Always-on? Branching / Dev Envs Backups Lock-in Risk
Neon No (scale-to-zero) Yes — best-in-class Point-in-time recovery Low
Supabase Postgres Yes No Daily + PITR (paid) Medium (if using their auth/storage)
Railway Postgres Yes No Automatic Low
AWS RDS Yes No Automatic + PITR Medium

Supabase lock-in note. Supabase's Postgres itself is standard and portable. The lock-in risk comes from adopting their auth SDK, storage SDK, and edge functions — which create proprietary dependency chains. If Supabase is chosen, treat it as Postgres only and ignore the bundled services.

Recommendation: Fly Postgres (at MVP), Neon (when dev branching becomes valuable) Fly Postgres is zero-cost on the included tier, always-on (no cold start on the first query of the day), and standard enough to migrate away from trivially. The absence of branching is a real gap — but at MVP with 1–2 developers, branching is nice-to-have, not critical. Add a pg_dump cron job on day one for backup. When the team grows and dev environment conflicts become painful, migrate to Neon in an afternoon.
Exit Path Any Postgres instance speaks the same wire protocol. pg_dump -Fc SOURCE | pg_restore -d TARGET migrates all data. No proprietary data formats or APIs are involved.
C Authentication
Decided Lock-in: Critical Decision
Library (runs in your process, stores in your Postgres) vs. Service (runs externally, stores in their system)? Auth is the hardest thing to migrate after the fact.
Option Type SvelteKit Support Data Location Monthly Cost Lock-in
Lucia Library Good (but deprecated) Your Postgres $0 None
Auth.js (NextAuth) Library Adapter available Your Postgres $0 Low
Clerk Service Good Their servers $0 / $25–50+ at growth High
Auth0 Service Good Their servers $0 / $23–100+ High
Supabase Auth Service Good Their servers $0 / $25+ Medium-High

Auth is the hardest service to migrate. If you build your user identity flows around Clerk or Auth0's SDK, every session token, every OAuth connection, every permission model is coupled to their system. Migrating means re-authenticating every user, rebuilding every integration, and handling token invalidation across all clients simultaneously. The DX advantage of auth-as-a-service does not justify this risk at MVP scale.

Recommendation: BetterAuth BetterAuth is a library — it runs inside the SvelteKit application process and stores sessions, users, and OAuth tokens in your Postgres database. It supports email/password, magic links, OAuth (Google, GitHub, etc.), and organization/team concepts natively. The SvelteKit integration is first-class. There is no vendor, no account, no monthly cost, and no migration path needed because all data is in Postgres from day one.

Not Lucia: Lucia's author has deprecated it in favor of the approach BetterAuth represents — a complete, opinionated auth library. Choosing Lucia today means migrating again within 12 months.
Exit Path No exit path needed — all auth data is in your Postgres. Switching libraries means rewriting the auth middleware and adapter layer, not migrating user data.
D Frontend Framework
Decided — SvelteKit Lock-in: Low
Already selected: SvelteKit. This entry exists to document why, so the decision is not re-opened later.

SvelteKit was selected based on:

  • SSR + SPA hybrid in one framework — marketing pages, auth flows, and the real-time dashboard all run from the same codebase without routing gymnastics.
  • Smaller production bundle than Next.js or Remix. Svelte compiles away the framework at build time; the browser receives near-vanilla JS.
  • Built-in form actions and server-side load functions align well with a form-heavy product (contracts, change orders, permit submissions).
  • Growing ecosystem and stable 2.0 — the initial instability of SvelteKit's early versions is behind it. The 2.0 API is considered stable.
  • Team preference and existing expertise.

Mobile app consideration. If native iOS/Android apps become a requirement, SvelteKit does not build those. The API layer would remain intact, but the frontend would require a separate React Native or Flutter project. This is listed in Open Questions with a trigger.

E Object Storage
Deferred Lock-in: Low (S3-compatible)
Does MVP need a dedicated object store, or can Postgres handle document storage until the trigger threshold is reached?

Answer: Postgres is fine under 1,000 documents. Storing permit PDFs, contracts, and inspection photos as bytea in Postgres is a legitimate pattern at small scale. It simplifies the architecture (no additional auth, no presigned URLs, no CORS configuration, no separate SDK), and Postgres large objects support streaming reads for larger files.

Option Protocol Egress Cost Fly.io Integration Lock-in
Tigris S3-compatible $0 egress Native (Fly add-on) Low
Backblaze B2 S3-compatible $0.01/GB (Cloudflare partner = $0) Good (external) Low
AWS S3 S3 (origin) $0.09/GB Good (external) Medium
MinIO (self-hosted) S3-compatible $0 Runs on same VM Lowest
Recommendation: Defer. When needed, use R2 or Tigris. Both speak the S3 API. Switching between them is a four-variable environment change (ENDPOINT, BUCKET, ACCESS_KEY, SECRET_KEY). R2 is recommended if not using Fly.io; Tigris is recommended if using Fly.io (native add-on, no separate account).
Exit Path Any S3-compatible client (AWS SDK v3, @aws-sdk/client-s3) works against all options above. Migrating between providers = update four environment variables and run a sync job (rclone sync).
F Background Jobs
Decided Lock-in: None
Does permit API polling, change order notifications, and digest generation require a dedicated queue service like Redis + BullMQ, or can Postgres handle this?

Answer: Postgres handles this. Postgres SELECT ... FOR UPDATE SKIP LOCKED is a proven pattern for reliable job queues. Two mature libraries implement this pattern cleanly for Node.js:

Option Infrastructure Throughput Cron Support Recommendation
pg-boss Postgres only ~1,000 jobs/sec Yes Also fine
BullMQ Redis required Very high Yes Overkill at MVP
Inngest External service High Yes New dependency
Trigger.dev External service High Yes New dependency
Recommendation: Graphile Worker Zero additional infrastructure. Runs as a Node.js process alongside the application (or as a separate Fly.io Machine). Uses the existing Postgres connection. Supports recurring schedules for permit polling. Job results and history are queryable via SQL. At 500 projects polling permits every 15 minutes, this is approximately 50 jobs/hour — Graphile Worker handles that without breaking a sweat.
G Transactional Email
Decided Lock-in: Low
At MVP scale (<100 emails/day), any provider works. The decision is about API simplicity, free tier generosity, and avoiding lock-in via proprietary template systems.
Provider Free Tier API Style Template Lock-in
Postmark 100/mo (trial) REST None
SendGrid 100/day free REST Optional drag-and-drop (avoid)
AWS SES 62,000/mo (if on EC2) REST None
SMTP relay (host) Provider-dependent SMTP None
Recommendation: Resend Resend's API is simple HTTP — no proprietary SDK required, no template system to learn, and the free tier covers early-stage volumes without a credit card. Templates are plain HTML or React Email (a library, not a service). Switching providers means updating RESEND_API_KEY and the base URL — nothing more.
Exit Path Email is an HTTP POST with a JSON body. Any provider accepts the same to, from, subject, html fields. Changing providers = swap API key and base URL. Templates stay in the codebase.
H Error Tracking
Decided Lock-in: Low
Is structured logging to stdout sufficient for production, or does the team need stack traces, release tracking, and user context from day one?

Answer: Add Sentry at launch. The cost is 10 minutes of setup time. The return is stack traces with source maps, user context when errors occur, release-based regression tracking, and the ability to set alerts on error rate spikes. At 500 projects, a silent data mutation bug in a change order flow is worse than a 500 error — Sentry catches both.

Option Free Tier SvelteKit SDK Self-hostable
Highlight.io 500 sessions/mo Yes Yes
LogRocket 1,000 sessions/mo Yes No
console.error + structured logs Free N/A N/A
Recommendation: Sentry free tier 5,000 errors/month is sufficient for the entire Year 1 trajectory. The @sentry/sveltekit package handles both server-side and client-side error capture with source maps. Do not use Sentry for session replay or performance monitoring at this tier — reserve the quota for error events only.
I Monitoring & Uptime
Decided — Minimal Lock-in: None
At MVP with <500 projects and no SLA commitments, what is the minimum viable monitoring stack?
  • Hosting provider metrics dashboard — CPU, memory, request latency, and error rates are surfaced for free by Fly.io, Railway, and Render. No additional tooling needed.
  • UptimeRobot free tier — 50 monitors, 5-minute check intervals, email alerts. Add a monitor for the main app URL and the health check endpoint (GET /api/health). Free indefinitely.
  • Structured application logs — Write JSON logs to stdout with fields: level, message, userId, projectId, duration. The hosting provider captures these and makes them searchable.

Do not add Grafana, Datadog, or New Relic at MVP. Each adds a ~$25/mo minimum and hours of setup. Add observability infrastructure when there is a specific question it would answer that the above stack cannot.

J CI/CD
Not a Decision Lock-in: Low
GitHub Actions. This is not a decision — it is the industry default and already included with the GitHub account the team already has.
  • Free for private repos: 2,000 minutes/month on the free plan. A Docker build + deploy takes 3–5 minutes, so this covers ~400 deployments/month before any cost.
  • Workflow = Dockerfile + push to registry. The deploy step is just calling the hosting provider's CLI (fly deploy, railway up, etc.). Switching providers = changing one shell command in the workflow file.
  • Run tests and linting on every PR. The CI pipeline enforces the quality gates in the technical standards before code can merge.
.github/workflows/deploy.yml — skeleton YAML
on:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pnpm install --frozen-lockfile
      - run: pnpm typecheck
      - run: pnpm lint
      - run: pnpm test

  deploy:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: fly deploy --remote-only  # ← swap this line to change providers

3 Portability Assessment

Every vendor included in this architecture must speak a documented standard protocol and have a migration path that takes under one day. This table is the accountability record for that claim.

Component Vendor (Recommended) Standard Protocol Migration Path Est. Time Risk
Container Hosting Fly.io Docker image + OCI registry Update deploy.yml to target new provider CLI. Redeploy same image. 2–4 hrs Low
Database Fly Postgres PostgreSQL wire protocol pg_dump -Fc | pg_restore to new host. Update DATABASE_URL. 1–2 hrs Low
Authentication BetterAuth (library) Standard sessions in Postgres No migration needed — data is in Postgres. Swap library = rewrite middleware layer only. 1 day Low
Background Jobs Graphile Worker Postgres tables (portable schema) Switch to pg-boss or BullMQ = rewrite job registration code. Data stays in Postgres. 4–8 hrs Low
Email Resend HTTP REST (SMTP-compatible) Swap API key + base URL. Templates are local code — no migration needed. 30 min Low
Error Tracking Sentry OpenTelemetry (partial) Remove SDK, add replacement. Historical data stays in Sentry (or export via their API). 2–4 hrs Low
Object Storage (deferred) R2 or Tigris S3 API (AWS SDK v3) Update 4 env vars. Run rclone sync source: dest:. 1–2 hrs Low
DNS Cloudflare Standard DNS (IANA) Export zone file, import to new registrar. Update nameservers. 24 hrs (propagation) Low
CI/CD GitHub Actions YAML workflows (standard) Port .github/workflows/*.yml to GitLab CI or Bitbucket syntax. Mostly mechanical. 4–8 hrs Low

Total worst-case migration. If every component needed to change simultaneously — a highly unlikely scenario — the estimated total time is 2–3 days of focused work, primarily dominated by Postgres data transfer and DNS propagation. No component in this stack has a migration path that requires rebuilding user data or re-implementing core product logic.

4 Accounts & CLI Tools Needed

This is a literal checklist. Complete it before writing code. Items are ordered by dependency — some tools are required before others can be configured.

Accounts (create in this order)

Local CLI Tools

Not Needed at MVP

These are commonly acquired prematurely. Do not create these accounts until the documented trigger is reached.

Account / Tool Why Not Yet Trigger to Add
Cloudflare R2 / Tigris / S3 Documents live in Postgres until trigger >1,000 documents or >1 GB stored
Redis / Upstash No identified use case; Postgres handles sessions and queues >50K concurrent sessions or proven hot-path need
Datadog / Grafana Cloud Hosting dashboard + UptimeRobot is sufficient SLA commitments or active oncall rotation
Neon / Supabase Fly Postgres is included and sufficient Dev environment conflicts or need for branch-per-PR databases
Separate CDN account Cloudflare DNS + hosting provider handles static assets Global user base with sub-100ms latency requirements outside the US

Bootstrap: First Day Commands

terminal — run after accounts are created Shell
# 1. Authenticate with providers
gh auth login
flyctl auth login

# 2. Create the SvelteKit project
pnpm create svelte@latest groundwork
cd groundwork
pnpm install

# 3. Initialize local Postgres via Docker (dev only)
docker compose up -d  # expects docker-compose.yml in repo root

# 4. Create Fly.io application and managed Postgres
flyctl launch --no-deploy
flyctl postgres create --name groundwork-db

# 5. Set secrets (do not commit these to .env in production)
flyctl secrets set DATABASE_URL="postgres://..."
flyctl secrets set BETTER_AUTH_SECRET="$(openssl rand -base64 32)"
flyctl secrets set RESEND_API_KEY="re_..."
flyctl secrets set PUBLIC_SENTRY_DSN="https://..."

# 6. First deploy
flyctl deploy
docker-compose.yml — local dev stack YAML
services:
  db:
    image: postgres:16-alpine
    ports: ["5432:5432"]
    environment:
      POSTGRES_DB: groundwork
      POSTGRES_USER: app
      POSTGRES_PASSWORD: dev
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

5 Dependency Budget

The maximum is 5 external services at MVP. Count them. Anything beyond this limit requires removing an existing service or deferring the addition.

1
Hosting
Fly.io — includes Postgres
2
Email
Resend
3
Error Tracking
Sentry free tier
4
DNS + Domain
Cloudflare + registrar

Everything else runs inside the application or inside Postgres.
Authentication (BetterAuth library), background jobs (Graphile Worker library), sessions (Postgres), document storage (Postgres bytea), full-text search (Postgres tsvector), and uptime monitoring (UptimeRobot free) are not counted as external service dependencies — they either run in-process or are free infrastructure utilities.

The budget is a forcing function, not a hard ceiling. If a fifth paid service genuinely reduces complexity more than it adds — document it here first, explain what it replaces, and confirm the portability story. The goal is conscious addition, not prohibition.

6 Cost Projection

Real free tier limits are shown. Many providers give enough to launch a product with paying customers before incurring any cost.

Service Free Tier MVP / Pre-Revenue Year 1 (~500 projects) Year 2 (~5K projects) Year 3 (~50K projects)
Fly.io
Hosting + Postgres
3 VMs + 3 GB Postgres $0 $7–20/mo
1 VM + 10 GB DB
$30–60/mo
2 VMs + larger DB
$80–200/mo
or migrate to Cloud Run
Resend
Email
3,000/mo, 100/day $0 $0
~50 emails/day at 500 projects
$20/mo
~500 emails/day
$90/mo
~5,000 emails/day
Sentry
Error tracking
5,000 errors/mo $0 $0
Well within free limit
$0–26/mo
Approaching paid tier
$26/mo
Team plan
Cloudflare
DNS + CDN
Free forever (DNS) $0 $0 $0 $0
Domain
Registrar (~$12/yr)
$1/mo $1/mo $1/mo $1/mo
UptimeRobot
Monitoring
50 monitors free $0 $0 $0 $7/mo
If more monitors needed
Total (approx.) $0–1/mo $8–22/mo $51–87/mo $197–335/mo

The target of <$50/mo at launch is achievable at $0/mo before revenue and <$22/mo through Year 1. The cost projection remains well inside the $50/mo budget until the platform reaches 5,000+ projects, by which point the product should be generating revenue that justifies scaled infrastructure costs.

7 Open Questions & Deferred Decisions

These decisions are intentionally not made today. Each has a documented trigger — a metric or event that signals when it's time to revisit. Tracking them here prevents both premature optimization and forgotten technical debt.