How FaangCoder Works

Kernel-mode driver. Process-memory read. Display-pipeline strip. Four stealth layers below the user-mode boundary every other AI interview tool ships at. This is the architecture in one read instead of a four-post deep-dive cluster.

§1 — Problem framing

What every screenshot-OCR tool does

Snap a screenshot. Upload it to a service. Wait for OCR. Paste the answer back into the editor. Each step adds detection surface — a Win32 capture-API trace, a screen-capture pipeline visible to fingerprinting, OCR latency a senior interviewer will notice, and a clipboard write that lands as a paste event in the platform's content script.

Tools that ship at this depth fail on Enterprise-tier proctoring. CoderPad Enterprise's content script reads paste events and DOM mutations; HackerRank tracks tab-switch and focus-blur deltas; CodeSignal IQ runs keystroke biometrics on pasted code. The screenshot-OCR pipeline is the wrong shape.

§2 — The architectural shift

We don't snap. We read.

Process memory holds the bytes. The browser's V8 isolate has the problem statement parsed into SeqOneByteString objects on the heap. The IDE's text buffer holds your code. The terminal pane holds the test output. None of that requires a pixel pipeline to access — they live inside processes we can read with a kernel driver.

No screenshot. No OCR. No clipboard. The kernel driver attaches via KeStackAttachProcess, reads pages with MmCopyVirtualMemory, and the bytes go straight to the model. Single-digit-millisecond read latency. Bytes verbatim — no font-fallback errors, no anti-aliasing collapse on lookalike characters, no scrolled-region truncation.

┌─ THEIR ARCHITECTURE ─────────────┐  ┌─ OUR ARCHITECTURE ────────────────┐
│                                  │  │                                   │
│   Browser tab (problem rendered) │  │   Browser tab (problem rendered)  │
│                ↓                 │  │                │                  │
│        Screen Capture API        │  │       (no capture API call)       │
│                ↓                 │  │                ↓                  │
│           Frame buffer           │  │      Renderer process memory      │
│                ↓                 │  │                ↑                  │
│           OCR engine             │  │     Ring-0 kernel driver reads    │
│                ↓                 │  │     V8 heap directly              │
│        Recognized text           │  │                │                  │
│                ↓                 │  │                ↓                  │
│            AI model              │  │            AI model               │
│                ↓                 │  │                ↓                  │
│   Clipboard paste into editor    │  │   Overlay (stripped from DWM)     │
│                                  │  │                                   │
│   ⚠ 200–500ms latency           │  │   ✓ Single-digit-ms read          │
│   ⚠ OCR errors on code           │  │   ✓ Bytes verbatim                │
│   ⚠ Visible to capture detector  │  │   ✓ Below capture & enumeration   │
└──────────────────────────────────┘  └───────────────────────────────────┘

For the bytes-level walkthrough — V8 heap layout, pointer compression, OpenProcess versus the kernel-attach path — see Ring-0 Memory Read vs Screen-Capture OCR.

§3 — Four stealth layers

Below the user-mode boundary, in four directions

Most AI interview tools ship one stealth feature: a single user-mode WDA_EXCLUDEFROMCAPTURE flag. That flag is queryable in one syscall, which means the proctor sees the flag itself as a fingerprint. We layer past it.

Layer 1 — Display-pipeline filtering

What: The overlay's surface is excluded from the desktop composite before DWM ever sees it. DXGI Output Duplication, GDI BitBlt, and Windows.Graphics.Capture all read the post-composite frame — and that frame is mathematically identical with or without our overlay running.

Why kernel-mode: A user-mode WDA flag asks DWM, politely, to omit a window. The flag is queryable in one syscall, which means the proctor sees the flag itself as a fingerprint. Filtering at kernel depth leaves no flag to query.

Layer 2 — Process concealment

What: Our process is present and operational from the kernel scheduler's perspective, but the snapshot returned to user-mode enumerators (Toolhelp32, EnumProcesses, NtQuerySystemInformation, WMI's Win32_Process) does not include its row.

Why kernel-mode: A denylist of AI-tool process names cannot match a process that does not appear in the list. Same construct commercial anti-cheat uses to spot rootkits — applied here in reverse.

Layer 3 — Window enumeration blocking

What: EnumWindows, EnumChildWindows, FindWindowEx, GetTopWindow all resolve through win32k.sys to walk the kernel's tagWND list. We filter the result before it crosses back to user mode — the window can receive messages and render, but the buffer the caller gets does not include its handle.

Why kernel-mode: Fair Screen and Karat-style proctor companions both enumerate windows looking for click-through, top-most, or WDA-flagged surfaces. Filtering the enumeration result is the only structural answer to that scan.

Layer 4 — Detection-query spoofing

What: Heuristic queries that do not fit the first three categories ("is there a thread with capture-exclusion attributes?", "does the foreground PID match the foreground audio PID?") get answered at the kernel level, with consistent answers across transports — WMI, NtQuerySystemInformation, EnumProcesses all return the same view.

Why kernel-mode: Lying in user mode leaves a syscall fingerprint. The kernel decides what is true; if the kernel says it, every transport reads it the same way.

For the full architectural tour with the EPROCESS and tagWND structure references, see The Four Stealth Layers.

§4 — The hotkey workflow

Three keystrokes, three data paths

Each hotkey reads from process memory at the kernel each press. So when you ask the model "why this and not BFS?" after the initial Solve, it answers against your actual current code, not a regenerated-from-scratch version.

Alt + EnterSolve

Reads: Problem statement (V8 heap of the renderer)
Sent to model: Problem text + the language tag from your editor
Output: Approach, code, complexity rendered in the overlay

Alt + 1Debug

Reads: Your current code buffer + the most recent test output
Sent to model: Code + failing test + the original problem
Output: Root-cause line, unified diff, summary of the fix

Alt + 2Optimize

Reads: Your current working solution + complexity hint
Sent to model: Code + target complexity bound from the problem
Output: Bottleneck identified, rewrite with new complexity

§5 — Why kernel-mode

Counter-intuitive but correct: kernel-mode is the safer bet

"Install a kernel driver" sounds like the higher-risk option. Architecturally it's the opposite. A signed kernel driver goes through Microsoft's WHQL attestation pipeline, ships under our EV cert, and has an immutable hash on every release. A user-mode tool can be silently hot-patched at any time; a kernel-signed binary cannot.

The deeper reason kernel-mode wins on safety: detection runs at ring-0 for every commercial anti-cheat (BattlEye, EAC, Vanguard) because that's where the ground truth is. A user-mode AI overlay is detectable by a kernel-mode anti-cheat module; a kernel-mode AI overlay is not, because proctoring stacks don't ship signed kernel drivers to candidate machines. The detection ceiling is "the proctor would need their own kernel driver to see us" — which they don't ship.

Full threat model, data stance, attestation chain, and the FAQ on driver risk live at /security.

§6 — What we don't do

Explicit denials

Every claim below is verifiable from the candidate's side via the /proctor test page. If our binary triggered any of these surfaces, the test page would log it.

Don't take screenshots.

No GDI BitBlt, no DXGI Output Duplication, no Windows.Graphics.Capture. The capture-API hook surface is empty.

Don't run OCR.

No Tesseract, no cloud OCR. The bytes are already in process memory; we read them, we do not transcribe them.

Don't touch the clipboard.

No SetClipboardData, no OpenClipboard. Output renders into the overlay, not via paste into the editor.

Don't keep your code on our servers.

Model calls are stateless on our side. We forward the extracted bytes to the model API and the response streams back. Nothing is written to disk.

§7 — Same on every platform

One read pattern, four interview platforms

HackerRank, CodeSignal, CoderPad, and LeetCode all have different anti-cheat stacks — full-screen mode + tab-switch counters on HackerRank, IQ Score keystroke biometrics on CodeSignal, content-script and Chime hooks on CoderPad Enterprise, contest similarity scoring on LeetCode. The detection-side details vary; the input-side does not.

Every one of those platforms renders its problem text into a Chromium DOM that lives in a renderer process. Every one of them holds the candidate's code in an editor buffer in the same renderer. Our kernel driver reads from process memory regardless of which web app rendered the bytes. Same architecture, same hotkeys, same overlay output — across all four.

Per-platform field guides: HackerRank, CodeSignal, CoderPad, LeetCode.

Verify it yourself

The architecture is verifiable from the candidate's side. Run the proctor simulator against any AI interview tool — including ours — to see which detection vectors light up.

Run the 60-second proctor test Read the engineering deep-dives

Cross-references — the four cornerstone engineering posts behind this page: memory read, four stealth layers, WDDM tour, driver attestation.