The system design interview, calibrated to the FAANG bar

Voice-first mock interviews with a written hire/no-hire verdict. Built for senior+ engineers preparing for the loop that actually decides the offer.

01Depth

Hand-wave any box. Lose the dimension.

A real bar-raiser can probe any layer — CDN, lock primitive, retry semantics, blob path. SystemDesign.so holds you to the same bar across every subsystem in the stack.

CLIENT interviewer · systemdesign stage 01 / 16
if they wave →
probe
numbers
failure mode
CLIENT
User's browser issues a request
02Voice · the medium

This is what your interview actually sounds like.

Spoken in your headphones during the session. Not typed in a chat. Three turns from a senior candidate's real session — the push-back, the math demand, the missing primitive.

T6 23:14
You’re avoiding scale. Numbers, please. Reads per second. Then we’ll talk caching.
0:00 / 0:09
T9 31:02
No idempotency on /book. Retry storm → double-charge. How do you guarantee at-most-once?
0:00 / 0:11
T11 38:47
You skipped seat-locking. Two users, one seat, race condition. Walk me through the lock.
0:00 / 0:08
03How it works

Your AI bar-raiser, on demand.

Four steps. 15, 30, 45, or 60 minutes. Every trade-off you missed, flagged.

01 / PICK

Pick a design

22 designs spanning feeds, messaging, storage, payments, maps, search, ride-share, ad-tech, and trading. Pick a length and a level.

02 / TALK

Talk under pressure

You speak your design out loud. A real drawing canvas tracks the diagram. The AI listens, pushes back on every claim, demands the math, and never lets you move on from a hand-wave.

03 / SCORE

Scored every turn

Six dimensions, 100 points. Phase budgets, capture discipline, unverified-number flags — all live, while you draw.

04 / REVIEW

Verdict & debrief

A hire/no-hire verdict, every red flag the judge caught, the must-haves you missed, and which of four behavioral buckets you actually demonstrated.

04The difference

ChatGPT gives answers.
SystemDesign.so runs the interview.

One is a search box. The other is a bar-raiser with a clipboard. The same prompt, side-by-side.

× Generic LLM

“Design a URL shortener.”

youDesign a URL shortener.
gptSure! Here are common components…
• Write API → generator → KV store
• Base62, 7 chars = 3.5T keys
• Cache hot URLs in Redis
• Anything else I can help with?
youSounds good, thanks!
 
No probe. No pressure. No signal.
You get a dump of textbook pieces. Not an interview.
SystemDesign.so

“Design a URL shortener for 200M DAU.”

aiFirst — what's the R:W ratio?
you100:1, read-heavy.
aiGood. Your DB choice?
youCassandra.
aiWhy Cassandra over Postgres for 100:1 reads?
youHmm…
aiYou picked an AP store for a read path. Walk me through eviction on a viral link.
Architecture −3 · Engineering −2 · Reliability +1
Caught a bar-raiser mistake. Scored it. Taught you.
05What makes it work

Nine instruments.
One dashboard.

Inside the engine, module by module. Each one runs its own live readout — the signal, the canvas, the rubric, the verdict, the clock. Click a module to pop the hood. ↑↓ to scrub through them.

MOD-001 · INPUT LATENCY · 78ms · BARGE-IN · READY

Voice or text, on demand.

Real-time voice with smart barge-in — talk over the AI when you're ready to pivot, just like a human loop. Or stick to text when you'd rather think.

LIVE INPUT CHANNEL · WSS
VAD
SPEAKING
WPM
148
FILLERS
3
BARGE-IN
≤80ms
CAND I think the bookings table will get, a few thousand writes per second, so I'd —
MOD-002 · INPUT COHERENCE · 62% · BOXES · 7

If you draw the box, defend it.

An Excalidraw-powered whiteboard with a 28-box systems library. The AI sees what you draw and grades the coherence between your words and your boxes.

CANVAS · SESSION 1A37 7/28 BOXES
CLIENT
GATEWAY
BOOK SVC
QUEUE
POSTGRES
REDIS · PRIMARY?
CDN
WORD↔BOX COHERENCE 62% REDIS LABELLED, POSTGRES SPOKEN
MOD-003 · ENGINE DEPTH · 3 · NEXT-Q · QUEUED

No two sessions ask the same questions.

Every answer routes through the knowledge graph and spawns the next probe — depth escalation, dependency gaps, prerequisite redirects. You can't coast on boilerplate.

PROBE TREE · T7 · 24:21 3-DEEP
CAND: "I'd use Redis as a write-through cache"
What's the eviction policy?DEPTH
Cache invalidation on write?PREREQ
What about cross-region consistency?DEP
MOD-004 · SCORING RECALC · EVERY TURN

Not "good job." A breakdown.

Six dimensions. Sums to 100. Updated every turn, with a quality multiplier that punishes hand-waves. The breakdown your verdict will hang on.

RUBRIC · LIVE ×1.0 MULTIPLIER
Requirements12 / 15
Quantitative7 / 15
API Design11 / 15
Architecture20 / 25
Engineering7 / 15
Reliability4 / 15
TOTAL · LIVE 61/ 100
MOD-005 · LIVE WATCH CLASSES · 6 · FLAGS · 4

Caught the moment you say it.

Six classes of mistake — unverified numbers, math past 2× tolerance, missing failure modes, wrong storage, missing idempotency — flagged inside 200 ms.

MISTAKE FEED · S-12 ≤200ms TO FLAG
23:11UNVERIFIED"a few thousand" — no anchor. Demanded RPS.!
24:21WRONG STORERedis as primary writer. Redirected to durability.×
25:58MISSING IDEMPNo idempotency on /book. Retry-storm risk.×
27:14MATH 2×+"~500 MB/day" off by 4.2×. Math demanded.×
UNVERIFIED · 1 WRONG STORE · 1 MATH · 1 MISSING IDEMP · 1
MOD-006 · CLOCK SESSION · 45-MIN · ELAPSED · 14:08

Same clock as the real loop.

Each phase has a time budget. Live Rushed / On-pace / Slow / Over-budget pills as you go. Learn to manage the clock — not just the answer.

PHASE LEDGER 5 PHASES · 30:52 LEFT
Requirements2:01 / 2:00OVER
Estimation0:35 / 1:00ON-PACE
Entities & API1:28 / 2:00SLOW
Architecture9:18 / 15:00ON-PACE
Deep dives0:46 / 10:00QUEUED
MOD-007 · TRACKING SESSIONS · S-04 vs S-12

Weak spots don't stay weak quietly.

Diff two sessions side-by-side. Watch your six dimensions evolve across attempts. Δ per dimension, per attempt — the same metric your hiring committee would diff.

DELTA · S-04 → S-12 +9.4 NET
S-04 · 3 WEEKS AGO
Reliability4 / 15
Quantitative7 / 15
Architecture14 / 25
VerdictNo Hire
Flags raised6
S-12 · TODAY
Reliability11 / 15
Quantitative12 / 15
Architecture21 / 25
VerdictLean Hire
Flags raised2
MOD-008 · OUTPUT CALL · LEAN NO-HIRE

Strong Hire to Strong No-Hire.

An end-of-session frontier-model judge re-reads the whole transcript and produces a verdict — with must-haves covered, missed, and the red flags it'd raise in a real debrief.

VERDICT · S-12 TICKET-BOOKING
CALL
Lean No-Hire
GAP TO LEAN HIRE · +9 pts RED FLAGS · 3 MUST-HAVES · 3 / 5
×No idempotency key on /book — retry-storm risk.
×Wrong storage tier — Redis as primary writer.
×Skipped capacity math until interrupted.
MOD-009 · ENGINE NODES · 1,247 · ACTIVE · 12

The dependency tree behind every choice.

1,247 trigger nodes per design with depends-on / implies / required-before edges. When you say "Redis," the graph already knows what you owe it.

GRAPH · /book SUBTREE 3 PREREQS UNMET
/book endpoint seat lock idempotency payment SETNX quorum retry key tx state webhook
ACTIVE · 12 DORMANT · 1,235 EDGES · 4,189
06The report

What you walk away with.

Every session ends in a debrief that names a verdict, a level, and the exact rubric points you missed. Not vibes. Receipts.

Design a ticket booking system · Senior target
Builds a working system
×× No Hire Below Senior bar 31 pts to the next bucket

Surface-level on every deep-dive. Three push-backs — scale, idempotency, locking — three pivots back to architecture diagrams. No invalidation policy. No idempotency key. No seat-lock primitive. Pattern, not a slip.

Must-haves covered
  • ·Core read-vs-write split named early
  • ·Hot-key (contested seat) flagged as a risk
Red flags triggered
  • ×Unverified scale claim ("10M users")
  • ×Missing idempotency on /book
  • ×Three latency / SLA prompts skipped
  • ×No seat-lock primitive
Phase timeline 47:12 total
  • Requirements2:01 / 2:00· over budget
  • Estimation0:35 / 1:00
  • Entities + API1:28 / 2:00
  • HLD0:48 / 5:00· rushed
  • Deep dive0:55 / 5:00· cut short
Capture discipline
Verbal vs captured

You verbally mentioned 13 items. You captured 3 in your notes panel.

Functional reqs3 / 1
Non-functional0 / 2
Back-of-envelope0 / 4
Entities0 / 4

Plus a searchable transcript, a delta against your last attempt, and a next question calibrated to your weakest dimension.

07The math

One mock.
Or five reps a month.

The system-design loop is often what stands between you and a senior offer. Below is what those reps actually cost — line by line, not packaged as tiers.

Human mock · per-session

Pay each time the room opens.

Senior FAANG mentor · 60 min$300
Reschedule fee · 1 of 4$45
Same prompt, second takeunavailable
Written feedback~5 bullet points
Repeat-of-the-same-thingawkward
5 reps over 4 weeks $1,500+
Schedule-locked · 1 opinion · no two graded the same
SystemDesign.so Pro · monthly Launch · −50%

One bill. Five 45-min rooms.

Pro plan · 225 min budget$99$49
Effective per session$19.80$9.80
Same rubric, every run100 pts · 6 dims
Re-do the same promptuntil it clicks
Verdict + transcriptevery time
5 reps over 4 weeks $99$49
−$50 / mo  ·  launch offer Limited period offer
Voice-first · 24/7 · designs library 22 / 22
08Level call

Not "good job." A precise behavior.

SystemDesign.so doesn't grade on a curve, and it doesn't hand out titles. It tells you which of four behaviors you actually demonstrated — and the exact gap to the next bucket.

01
Names primitives
You can list the building blocks. Can't yet defend a trade-off under push-back.
0 — 25pts
you · 24
02
Builds a working system
End-to-end design that runs. Stalls when pushed into unfamiliar territory.
26 — 50pts
03
Owns the deep-dive
Idempotency, locking, consistency, failure modes — unprompted, with the math.
51 — 75pts
04
Drives the room
Names the trade-off the interviewer hadn't asked yet. Sets the agenda.
76 — 100pts
10FAQ

The questions everyone actually asks.

A chatbot answers your question. SystemDesign.so asks the questions — the way a FAANG bar-raiser would. It pushes back when you wave your hands, demands numbers when you say "a lot," and writes the verdict at the end. You can't get that from a search box.
No. The judge runs a 12-stage pipeline on top of a frontier model — turn classification, rubric scoring, contradiction-aware memory (so it remembers you said 100:1 reads twelve turns ago), and trigger detection over ~1,200 hand-built failure patterns. A wrapper can chat. It cannot ask the right push-back at the right moment and weigh it against a hand-built rubric.
Six dimensions, 100 points, four behavioral buckets — built with engineers who have run system-design loops at top-tier companies. Each design has a hand-authored grading sheet (must-haves, common traps, scale tells) plus auto-detected red flags. The rubric ships as v3.2; every session it scores feeds the next calibration.
Voice is the default — the whole product is built around it. You'll get the most out of it talking out loud, exactly as you would in the real room. Text mode exists as a fallback for quiet places or accessibility. The drawing canvas works the same in either.
That's the point. The verdict tells you the four-bucket behavior you actually showed, the must-haves you missed, and the next question calibrated to your weakest dimension. Bombing the mirror is the cheapest way to find out what your loop will reveal.
On the MAX plan, yes. Drop a one-line prompt and SystemDesign.so builds the rubric for it on the fly. On Free and Pro you pick from the 22 designs in the library.
Yes. Sessions are tied to your account, never shared, never used to train external models. You can delete any session permanently.
The verdict machine · now live

The interview is free.
The mirror is not.

Talk through one system design. Get an honest hire/no-hire-style verdict on your design — while you still have time to fix it.

Fifteen minutes. No card. You speak, it judges, then you walk away with audio coaching pulled from your own voice and the exact gap to the next bucket.

15-min taste· 6 dimensions· 4 behavioral buckets· 1 verdict