Prerequisites & Decision Register
Every decision that must be made before writing a line of code — with options, trade-offs, and a literal checklist to start building.
Core Principles
These principles govern every decision in this document. When two options are equally viable, apply the principle that eliminates the most future regret.
Docker First
A Docker image that runs anywhere is more valuable than deep integration with any single platform. Build the image; pick the host later.
Postgres Before Anything
Ask "can Postgres do this?" before adding any new dependency. Queues, sessions, full-text search, and JSON storage all live in Postgres at MVP.
Library Over Service
A library runs in your process. A service is another account, another bill, another migration path. Prefer the library unless the service eliminates significant complexity.
Document Exit Paths
Every vendor included here must have a documented exit path — a standard protocol it speaks and a migration description that takes less than one day.
MVP Dependency Budget
Maximum 5 external services at launch. Count them. If adding a service breaks the budget, something else must come out or be deferred.
Defer Until Trigger
Premature optimization costs more than it saves. Each deferred service has a documented trigger — a metric that signals when it's time to add it.
What This Document Covers
This document sits before the technical architecture. The architecture describes how the system is built. This document captures what must be decided, acquired, and confirmed before building begins. It answers:
- Which services are truly necessary at MVP versus which ones feel necessary?
- Which technology decisions are high-stakes (hard to undo) versus low-stakes (swap in a day)?
- What accounts and tools must exist on day one?
- What does "launch" actually cost per month?
- What decisions are intentionally deferred, and what triggers revisiting them?
How to use this document. Read the decision register once to understand the landscape. Then go to the Accounts checklist, create the 4 required accounts, and run the bootstrap commands at the bottom. The first commit should be possible in under two hours from a fresh machine.
1 Minimum Viable Infrastructure
There is a meaningful distinction between what is needed to ship and what is needed to scale. The table below makes that explicit with two tiers.
- Container host (Fly.io, Railway, or Render)Fly.io recommended for DX-to-ops ratio at this team size.
- Managed Postgres (provider-included or Neon)Automatic backups, connection pooling, branch environments for free.
- Transactional email (Resend)Invite flows, change order notifications, permit status changes.
- Error tracking (Sentry free tier)5,000 events/month. Industry standard integration. Takes 10 minutes.
- Domain + DNSCloudflare for DNS. Free. Handles proxying and basic DDoS mitigation.
Service Audit — Questioning Every Dependency
Each of the following is commonly assumed to be necessary at MVP. The audit asks whether that assumption holds for Groundwork at the actual launch scale.
| Service | Default Assumption | Audit Result | Trigger to Add |
|---|---|---|---|
| Object Storage R2 / S3 / Tigris |
"Files need a bucket." | Defer Postgres bytea or large objects handles documents under ~1,000 files and ~1 GB total. No egress costs, no additional auth, zero new infrastructure. Simple. |
Document count > 1,000 or total stored > 1 GB |
| Redis / Cache Upstash / Redis Cloud |
"We'll need cache." | Defer Postgres handles sessions natively via BetterAuth's Postgres adapter. Background jobs run via Graphile Worker on Postgres. No identified use case for Redis at <5,000 active sessions. | >50K concurrent sessions, or a specific hot-path identified via profiling |
| Email Service Resend / Postmark |
"We need email." | Include Contractor invites, change order approvals, and permit status alerts are core product flows. Email is genuinely required. Resend's free tier covers <3,000 emails/month. Include at launch. | Required at MVP — include |
| Separate Queue Service Inngest / BullMQ |
"Queues need Redis." | Defer Graphile Worker runs job queues on existing Postgres using SKIP LOCKED. Under 10,000 jobs/hour, this is indistinguishable from a dedicated queue service in performance terms. Zero additional infrastructure. |
>10,000 jobs/hour sustained, or complex fan-out workflows |
| Error Tracking Sentry |
"Need Sentry immediately." | Include Structured logging to stdout is sufficient for local dev. For production, Sentry's free tier (5,000 events/month) provides stack traces, release tracking, and user context. 10-minute setup. Include. | Required at launch — include |
| CDN CloudFront / Fastly |
"Need a CDN." | Not a decision The hosting provider handles CDN for static assets. Cloudflare DNS provides edge caching for free. A separate CDN account is not a decision at this scale. | Global user distribution with sub-100ms latency requirements |
| Uptime Monitoring Datadog / PagerDuty |
"Need alerting." | Minimal UptimeRobot free tier (50 monitors, 5-minute checks) with email alerts covers launch. The hosting provider's built-in metrics dashboard handles the rest. Datadog is premature at this scale. | SLA commitments to paying customers |
2 Decision Register
Each decision below covers the realistic option set, the key differentiating question, trade-offs, and a recommendation. High-stakes decisions (those with high exit costs) are flagged clearly.
| Platform | DX Rating | Postgres Included | Free Tier | Est. Monthly (MVP) | Lock-in |
|---|---|---|---|---|---|
| Fly.io ★ Rec | Strong | Yes (Fly Postgres) | 3 shared VMs + 3 GB Postgres | $0–7/mo | Low |
| Railway | Excellent | Yes | $5 credit/mo | $10–20/mo | Low |
| Render | Good | Yes | Free static + 90-day Postgres | $7–15/mo | Low |
| Google Cloud Run | Moderate | No (Cloud SQL separate) | Scale-to-zero, $0 idle | $0–5/mo (idle) | Medium |
| AWS Fargate | Low | No (RDS separate) | Minimal | $25–50/mo | Medium |
| Coolify (self-hosted) | Good | Yes (managed) | Free (VPS cost only) | $5–10/mo (Hetzner VPS) | Lowest |
Runner-up: Railway — marginally simpler DX, but higher cost at scale and a smaller ecosystem.
Dockerfile. Migration = update the .github/workflows/deploy.yml target and run pg_dump | pg_restore to transfer data. Estimated time: 4–8 hours including DNS propagation.
| Option | Always-on? | Branching / Dev Envs | Backups | Lock-in Risk |
|---|---|---|---|---|
| Fly Postgres ★ Rec | Yes | No (manual) | You manage via pg_dump + schedule |
Low |
| Neon | No (scale-to-zero) | Yes — best-in-class | Point-in-time recovery | Low |
| Supabase Postgres | Yes | No | Daily + PITR (paid) | Medium (if using their auth/storage) |
| Railway Postgres | Yes | No | Automatic | Low |
| AWS RDS | Yes | No | Automatic + PITR | Medium |
Supabase lock-in note. Supabase's Postgres itself is standard and portable. The lock-in risk comes from adopting their auth SDK, storage SDK, and edge functions — which create proprietary dependency chains. If Supabase is chosen, treat it as Postgres only and ignore the bundled services.
pg_dump cron job on day one for backup. When the team grows and dev environment conflicts become painful, migrate to Neon in an afternoon.
pg_dump -Fc SOURCE | pg_restore -d TARGET migrates all data. No proprietary data formats or APIs are involved.
| Option | Type | SvelteKit Support | Data Location | Monthly Cost | Lock-in |
|---|---|---|---|---|---|
| BetterAuth ★ Rec | Library | First-class | Your Postgres | $0 | None |
| Lucia | Library | Good (but deprecated) | Your Postgres | $0 | None |
| Auth.js (NextAuth) | Library | Adapter available | Your Postgres | $0 | Low |
| Clerk | Service | Good | Their servers | $0 / $25–50+ at growth | High |
| Auth0 | Service | Good | Their servers | $0 / $23–100+ | High |
| Supabase Auth | Service | Good | Their servers | $0 / $25+ | Medium-High |
Auth is the hardest service to migrate. If you build your user identity flows around Clerk or Auth0's SDK, every session token, every OAuth connection, every permission model is coupled to their system. Migrating means re-authenticating every user, rebuilding every integration, and handling token invalidation across all clients simultaneously. The DX advantage of auth-as-a-service does not justify this risk at MVP scale.
Not Lucia: Lucia's author has deprecated it in favor of the approach BetterAuth represents — a complete, opinionated auth library. Choosing Lucia today means migrating again within 12 months.
SvelteKit was selected based on:
- SSR + SPA hybrid in one framework — marketing pages, auth flows, and the real-time dashboard all run from the same codebase without routing gymnastics.
- Smaller production bundle than Next.js or Remix. Svelte compiles away the framework at build time; the browser receives near-vanilla JS.
- Built-in form actions and server-side load functions align well with a form-heavy product (contracts, change orders, permit submissions).
- Growing ecosystem and stable 2.0 — the initial instability of SvelteKit's early versions is behind it. The 2.0 API is considered stable.
- Team preference and existing expertise.
Mobile app consideration. If native iOS/Android apps become a requirement, SvelteKit does not build those. The API layer would remain intact, but the frontend would require a separate React Native or Flutter project. This is listed in Open Questions with a trigger.
Answer: Postgres is fine under 1,000 documents. Storing permit PDFs, contracts, and inspection photos as bytea in Postgres is a legitimate pattern at small scale. It simplifies the architecture (no additional auth, no presigned URLs, no CORS configuration, no separate SDK), and Postgres large objects support streaming reads for larger files.
| Option | Protocol | Egress Cost | Fly.io Integration | Lock-in |
|---|---|---|---|---|
| Cloudflare R2 ★ Rec | S3-compatible | $0 egress | Good (external) | Low |
| Tigris | S3-compatible | $0 egress | Native (Fly add-on) | Low |
| Backblaze B2 | S3-compatible | $0.01/GB (Cloudflare partner = $0) | Good (external) | Low |
| AWS S3 | S3 (origin) | $0.09/GB | Good (external) | Medium |
| MinIO (self-hosted) | S3-compatible | $0 | Runs on same VM | Lowest |
ENDPOINT, BUCKET, ACCESS_KEY, SECRET_KEY). R2 is recommended if not using Fly.io; Tigris is recommended if using Fly.io (native add-on, no separate account).
@aws-sdk/client-s3) works against all options above. Migrating between providers = update four environment variables and run a sync job (rclone sync).
Answer: Postgres handles this. Postgres SELECT ... FOR UPDATE SKIP LOCKED is a proven pattern for reliable job queues. Two mature libraries implement this pattern cleanly for Node.js:
| Option | Infrastructure | Throughput | Cron Support | Recommendation |
|---|---|---|---|---|
| Graphile Worker ★ Rec | Postgres only | ~2,000 jobs/sec | Yes (built-in) | Use this |
| pg-boss | Postgres only | ~1,000 jobs/sec | Yes | Also fine |
| BullMQ | Redis required | Very high | Yes | Overkill at MVP |
| Inngest | External service | High | Yes | New dependency |
| Trigger.dev | External service | High | Yes | New dependency |
| Provider | Free Tier | API Style | Template Lock-in |
|---|---|---|---|
| Resend ★ Rec | 3,000/mo, 100/day | Simple REST | None — use React Email or plain HTML |
| Postmark | 100/mo (trial) | REST | None |
| SendGrid | 100/day free | REST | Optional drag-and-drop (avoid) |
| AWS SES | 62,000/mo (if on EC2) | REST | None |
| SMTP relay (host) | Provider-dependent | SMTP | None |
RESEND_API_KEY and the base URL — nothing more.
to, from, subject, html fields. Changing providers = swap API key and base URL. Templates stay in the codebase.
Answer: Add Sentry at launch. The cost is 10 minutes of setup time. The return is stack traces with source maps, user context when errors occur, release-based regression tracking, and the ability to set alerts on error rate spikes. At 500 projects, a silent data mutation bug in a change order flow is worse than a 500 error — Sentry catches both.
| Option | Free Tier | SvelteKit SDK | Self-hostable |
|---|---|---|---|
| Sentry ★ Rec | 5,000 errors/mo | Yes (@sentry/sveltekit) | Yes (Sentry self-hosted) |
| Highlight.io | 500 sessions/mo | Yes | Yes |
| LogRocket | 1,000 sessions/mo | Yes | No |
| console.error + structured logs | Free | N/A | N/A |
@sentry/sveltekit package handles both server-side and client-side error capture with source maps. Do not use Sentry for session replay or performance monitoring at this tier — reserve the quota for error events only.
- Hosting provider metrics dashboard — CPU, memory, request latency, and error rates are surfaced for free by Fly.io, Railway, and Render. No additional tooling needed.
- UptimeRobot free tier — 50 monitors, 5-minute check intervals, email alerts. Add a monitor for the main app URL and the health check endpoint (
GET /api/health). Free indefinitely. - Structured application logs — Write JSON logs to stdout with fields:
level,message,userId,projectId,duration. The hosting provider captures these and makes them searchable.
Do not add Grafana, Datadog, or New Relic at MVP. Each adds a ~$25/mo minimum and hours of setup. Add observability infrastructure when there is a specific question it would answer that the above stack cannot.
- Free for private repos: 2,000 minutes/month on the free plan. A Docker build + deploy takes 3–5 minutes, so this covers ~400 deployments/month before any cost.
- Workflow = Dockerfile + push to registry. The deploy step is just calling the hosting provider's CLI (
fly deploy,railway up, etc.). Switching providers = changing one shell command in the workflow file. - Run tests and linting on every PR. The CI pipeline enforces the quality gates in the technical standards before code can merge.
on: push: branches: [main] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - run: pnpm install --frozen-lockfile - run: pnpm typecheck - run: pnpm lint - run: pnpm test deploy: needs: test runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - run: fly deploy --remote-only # ← swap this line to change providers
3 Portability Assessment
Every vendor included in this architecture must speak a documented standard protocol and have a migration path that takes under one day. This table is the accountability record for that claim.
| Component | Vendor (Recommended) | Standard Protocol | Migration Path | Est. Time | Risk |
|---|---|---|---|---|---|
| Container Hosting | Fly.io | Docker image + OCI registry | Update deploy.yml to target new provider CLI. Redeploy same image. |
2–4 hrs | Low |
| Database | Fly Postgres | PostgreSQL wire protocol | pg_dump -Fc | pg_restore to new host. Update DATABASE_URL. |
1–2 hrs | Low |
| Authentication | BetterAuth (library) | Standard sessions in Postgres | No migration needed — data is in Postgres. Swap library = rewrite middleware layer only. | 1 day | Low |
| Background Jobs | Graphile Worker | Postgres tables (portable schema) | Switch to pg-boss or BullMQ = rewrite job registration code. Data stays in Postgres. | 4–8 hrs | Low |
| Resend | HTTP REST (SMTP-compatible) | Swap API key + base URL. Templates are local code — no migration needed. | 30 min | Low | |
| Error Tracking | Sentry | OpenTelemetry (partial) | Remove SDK, add replacement. Historical data stays in Sentry (or export via their API). | 2–4 hrs | Low |
| Object Storage (deferred) | R2 or Tigris | S3 API (AWS SDK v3) | Update 4 env vars. Run rclone sync source: dest:. |
1–2 hrs | Low |
| DNS | Cloudflare | Standard DNS (IANA) | Export zone file, import to new registrar. Update nameservers. | 24 hrs (propagation) | Low |
| CI/CD | GitHub Actions | YAML workflows (standard) | Port .github/workflows/*.yml to GitLab CI or Bitbucket syntax. Mostly mechanical. |
4–8 hrs | Low |
Total worst-case migration. If every component needed to change simultaneously — a highly unlikely scenario — the estimated total time is 2–3 days of focused work, primarily dominated by Postgres data transfer and DNS propagation. No component in this stack has a migration path that requires rebuilding user data or re-implementing core product logic.
4 Accounts & CLI Tools Needed
This is a literal checklist. Complete it before writing code. Items are ordered by dependency — some tools are required before others can be configured.
Accounts (create in this order)
Local CLI Tools
Not Needed at MVP
These are commonly acquired prematurely. Do not create these accounts until the documented trigger is reached.
| Account / Tool | Why Not Yet | Trigger to Add |
|---|---|---|
| Cloudflare R2 / Tigris / S3 | Documents live in Postgres until trigger | >1,000 documents or >1 GB stored |
| Redis / Upstash | No identified use case; Postgres handles sessions and queues | >50K concurrent sessions or proven hot-path need |
| Datadog / Grafana Cloud | Hosting dashboard + UptimeRobot is sufficient | SLA commitments or active oncall rotation |
| Neon / Supabase | Fly Postgres is included and sufficient | Dev environment conflicts or need for branch-per-PR databases |
| Separate CDN account | Cloudflare DNS + hosting provider handles static assets | Global user base with sub-100ms latency requirements outside the US |
Bootstrap: First Day Commands
# 1. Authenticate with providers gh auth login flyctl auth login # 2. Create the SvelteKit project pnpm create svelte@latest groundwork cd groundwork pnpm install # 3. Initialize local Postgres via Docker (dev only) docker compose up -d # expects docker-compose.yml in repo root # 4. Create Fly.io application and managed Postgres flyctl launch --no-deploy flyctl postgres create --name groundwork-db # 5. Set secrets (do not commit these to .env in production) flyctl secrets set DATABASE_URL="postgres://..." flyctl secrets set BETTER_AUTH_SECRET="$(openssl rand -base64 32)" flyctl secrets set RESEND_API_KEY="re_..." flyctl secrets set PUBLIC_SENTRY_DSN="https://..." # 6. First deploy flyctl deploy
services: db: image: postgres:16-alpine ports: ["5432:5432"] environment: POSTGRES_DB: groundwork POSTGRES_USER: app POSTGRES_PASSWORD: dev volumes: - pgdata:/var/lib/postgresql/data volumes: pgdata:
5 Dependency Budget
The maximum is 5 external services at MVP. Count them. Anything beyond this limit requires removing an existing service or deferring the addition.
Everything else runs inside the application or inside Postgres.
Authentication (BetterAuth library), background jobs (Graphile Worker library), sessions (Postgres), document storage (Postgres bytea), full-text search (Postgres tsvector), and uptime monitoring (UptimeRobot free) are not counted as external service dependencies — they either run in-process or are free infrastructure utilities.
The budget is a forcing function, not a hard ceiling. If a fifth paid service genuinely reduces complexity more than it adds — document it here first, explain what it replaces, and confirm the portability story. The goal is conscious addition, not prohibition.
6 Cost Projection
Real free tier limits are shown. Many providers give enough to launch a product with paying customers before incurring any cost.
| Service | Free Tier | MVP / Pre-Revenue | Year 1 (~500 projects) | Year 2 (~5K projects) | Year 3 (~50K projects) |
|---|---|---|---|---|---|
| Fly.io Hosting + Postgres |
3 VMs + 3 GB Postgres | $0 | $7–20/mo 1 VM + 10 GB DB |
$30–60/mo 2 VMs + larger DB |
$80–200/mo or migrate to Cloud Run |
| Resend |
3,000/mo, 100/day | $0 | $0 ~50 emails/day at 500 projects |
$20/mo ~500 emails/day |
$90/mo ~5,000 emails/day |
| Sentry Error tracking |
5,000 errors/mo | $0 | $0 Well within free limit |
$0–26/mo Approaching paid tier |
$26/mo Team plan |
| Cloudflare DNS + CDN |
Free forever (DNS) | $0 | $0 | $0 | $0 |
| Domain Registrar (~$12/yr) |
— | $1/mo | $1/mo | $1/mo | $1/mo |
| UptimeRobot Monitoring |
50 monitors free | $0 | $0 | $0 | $7/mo If more monitors needed |
| Total (approx.) | $0–1/mo | $8–22/mo | $51–87/mo | $197–335/mo |
The target of <$50/mo at launch is achievable at $0/mo before revenue and <$22/mo through Year 1. The cost projection remains well inside the $50/mo budget until the platform reaches 5,000+ projects, by which point the product should be generating revenue that justifies scaled infrastructure costs.
7 Open Questions & Deferred Decisions
These decisions are intentionally not made today. Each has a documented trigger — a metric or event that signals when it's time to revisit. Tracking them here prevents both premature optimization and forgotten technical debt.
-
Trigger: >1K docsWhen to add dedicated object storageAt >1,000 documents or >1 GB total stored in Postgres, the maintenance burden of
byteastorage begins to outweigh simplicity. Add R2 or Tigris at that point. Decision is pre-made: S3-compatible, zero egress, 4 environment variables to swap. -
Trigger: >50K sessionsWhen to add Redis / UpstashAt >50,000 concurrent sessions, or when a specific hot-path is identified via profiling that Postgres cannot serve acceptably, Redis becomes justified. Not before. The trigger must be measured, not assumed. BullMQ would be the queue beneficiary; session caching would be the other.
-
Trigger: Team > 3When to migrate to Neon for database branchingAt 1–2 developers, Fly Postgres with manual branching is manageable. At 3+ developers, conflicting migrations and shared dev databases become friction. Neon's branch-per-PR model eliminates this. The migration is a
pg_dump | pg_restore— not a rewrite. -
Trigger: Product decisionWhether to support native mobile appsSvelteKit does not build native iOS/Android apps. If a native mobile experience becomes a product requirement (as opposed to a progressive web app), the API layer remains intact but a separate React Native or Flutter project would be needed. This decision affects hiring and app store compliance timelines. Flag early if the product direction shifts toward mobile-first.
-
Trigger: Global usersWhen to add a CDN beyond the hosting providerCloudflare DNS and the hosting provider handle static asset delivery for US-based users adequately. When users in Europe or Asia begin representing >20% of traffic and latency complaints emerge, evaluate Fly.io's multi-region deployment (included in Fly pricing) before adding a separate CDN vendor.
-
Trigger: SLA commitmentsWhen to add formal observability (Datadog, Grafana)Structured logs + Sentry + UptimeRobot is sufficient until the team is making SLA commitments to paying enterprise customers. At that point, a proper metrics pipeline (Prometheus + Grafana or Datadog) becomes justified. The cost (~$30–70/mo) is reasonable when paired with an annual enterprise contract.
-
Trigger: Permit API complexityWhich permit data APIs to integrate, and howPermit data availability varies significantly by jurisdiction. Some municipalities have public REST APIs; most do not. The polling strategy (Graphile Worker cron jobs hitting public permit portals) may need to evolve into web scraping with Playwright or a third-party permit data aggregator. This is a product-side open question more than a technical one — document specific target jurisdictions before building the permit polling service.