If you build AI tools that paint pixels on Windows — overlays, capture tools, recording software, anything that has to coexist with the desktop compositor — you should be able to draw the WDDM stack on a whiteboard from memory. Most engineers can't. The Windows display pipeline is one of the better-documented kernel subsystems Microsoft ships, and somehow it remains one of the least-understood layers in commodity engineering practice.
This is the 30-minute tour. We'll walk the stack from the application's swap chain down to the display miniport driver, name the layer Microsoft assigns each component, and call out the points where screen-capture APIs read and where they can't. By the end you should be able to look at any "screen capture API" and predict what it can and cannot see.
The reference for everything below is Microsoft's own Windows Display Driver Model design guide. Most of the public knowledge about WDDM that engineers have encountered is downstream of that document.
Key takeaways
- The Windows display pipeline runs in three regions: user-mode (your application + UMD), kernel-mode (DWM, DXGKRNL, the display port driver
dxgkrnl.sys), and the display miniport driver that talks to the GPU. Each region has its own responsibilities and its own interception points. - DXGI Output Duplication — the modern screen-capture API — reads from the post-composition surface that DWM produces. Anything DWM is asked to omit from compositing (
WDA_EXCLUDEFROMCAPTURE) is missing from that surface. Anything that never enters DWM at all is invisible at a deeper layer. - The DWM compositor is the gatekeeper for what user-mode capture sees. The miniport driver is below DWM. A driver-resident overlay path can present surfaces to the user that bypass DWM entirely — which is what every commercial anti-cheat overlay does.
The 30,000-foot view
Three big boxes, top to bottom.
- User mode. Your application, the user-mode display driver (UMD, usually
*_dxgi.dllper GPU vendor), DXGI itself, and Direct3D / Direct2D / DirectComposition / Win2D / Vulkan / OpenGL. - Kernel mode. The DirectX Graphics Kernel (
dxgkrnl.sys), the Desktop Window Manager (dwm.exeruns in user mode but coordinates with kernel-mode components), and the Win32 windowing subsystem (win32k.sys). - Display port + miniport. The display port driver provides shared infrastructure; the display miniport (vendor-specific, e.g.
nvlddmkm.sys,amdkmdag.sys,igdkmd64.sys) talks to the actual GPU.
Microsoft's WDDM design guide walks these layers in detail. We'll walk them in order of how a frame travels.
Step 1 — The application produces a surface
Your app creates a swap chain. With DXGI:
IDXGISwapChain1* swapChain = nullptr;
factory->CreateSwapChainForHwnd(d3dDevice, hwnd, &desc, nullptr, nullptr, &swapChain);
The swap chain owns one or more back buffers — surfaces the GPU writes into. When the application calls swapChain->Present(...), it signals: "this back buffer is the next frame." The Present call is the formal handoff from your application to the rest of the pipeline.
A surface here is not a pixel buffer in system RAM. It's a GPU resource — the actual bytes live in video memory or, on integrated GPUs, in a partition of system RAM that the GPU has direct access to. The CPU does not normally see those bytes.
This matters for capture. A capture API that wants the bytes of a swap-chain surface must either ask the GPU to copy them somewhere a CPU can read, or it must read post-compositing — after DWM has already produced the desktop frame.
Step 2 — DXGI presents the surface
IDXGISwapChain::Present is the API contract. Internally, it drops down through dxgi.dll into the user-mode display driver, which packages a present command and queues it for the kernel.
The kernel side — dxgkrnl.sys, the DirectX Graphics Kernel — is what actually schedules GPU work. Microsoft documents the command-flow model in the WDDM design guide. The model since Windows Vista: user-mode produces a command stream, kernel-mode validates and schedules it, the display miniport driver issues it to the GPU.
For windowed applications (the default) the present doesn't go straight to the screen. It goes to DWM.
Step 3 — DWM composites the desktop
The Desktop Window Manager is the heart of post-Vista Windows graphics. Every visible window's swap-chain surface is a DirectComposition visual that DWM owns. DWM's job, every refresh interval, is to walk the visual tree and produce one final composited frame — the desktop bitmap — which then gets handed to the display.
This is where transparent windows, drop shadows, animated minimize effects, and the entire glass aesthetic live. It's also where capture happens.
When you take a screenshot via BitBlt(GetDC(NULL), ...), you're reading the desktop's DC — which is the post-composition surface DWM produced. When you use DXGI Desktop Duplication, you're getting the same surface, just through a faster API that gives you a notification when a new frame is ready. When you use the WinRT Windows.Graphics.Capture API behind the Snipping Tool, again — same surface, friendlier ergonomics.
All three capture APIs, on a vanilla Windows install, ultimately read what DWM composited.
The opt-out is SetWindowDisplayAffinity(hwnd, WDA_EXCLUDEFROMCAPTURE). Microsoft's reference page documents the contract: when set, DWM still includes the window in display compositing (the user can see it), but excludes it from any capture surface a subsequent API would read. The flag is on tagWND, queryable from any process via GetWindowDisplayAffinity. It is the cheapest possible way to hide one window from screen capture, and it is also the most fingerprintable, because the proctor's content script can ask for the flag's value directly.
Step 4 — DWM hands the composited surface to the display
DWM's output is a single composited surface. It hands that surface back to DXGI, which presents it through the kernel to the display miniport, which programs the GPU to scan it out to the monitor.
This is the display path. The capture APIs all sit one step earlier — they read what DWM produced, before the scan-out. This separation matters: the bytes that hit the monitor and the bytes that capture APIs see are derived from the same source surface, but the surface itself only goes through one round of compositing and the capture APIs read the result.
What that means architecturally: if a surface never enters DWM's composition tree, it cannot appear in the post-composition bitmap that user-mode capture APIs read. There is nothing to omit; the entry was never made.
This is the property commercial anti-cheat overlays exploit. So does any tool that needs to render to the user without showing up in the OBS recording, the Zoom screen-share, or the proctor's content script.
Step 5 — The miniport driver issues the actual work
Below dxgkrnl.sys sits the display miniport — the vendor-specific kernel driver that talks to the GPU hardware. Microsoft's WDDM design guide covers the miniport surface in detail.
The miniport implements the WDDM-defined DXGKDDI_* callbacks the OS calls into: present, scan-out programming, video memory management, mode setting, hot-plug detection. NVIDIA's nvlddmkm.sys, AMD's amdkmdag.sys, Intel's igdkmd64.sys — all are display miniports.
Why this layer matters for capture: a kernel component sitting alongside the miniport sees the actual GPU state. It can render a surface that's scanned out as part of the display frame but that does not exist in any visual tree DWM is composing. From DWM's perspective, no such window exists. From DXGI Output Duplication's perspective, the post-composition surface contains no such pixels.
This is the layer that makes the difference between "we excluded the overlay from compositing" and "the overlay was never in compositing in the first place." Two architecturally distinct positions; one is a flag, the other is a different rendering path. We walked through what that distinction means in our four stealth layers post — the architecture description is "Layer 1" of that post.
The capture-pipeline interception map
A summary of where each capture path actually reads from, with the layer that can intercept it:
| Capture path | Reads from | Defended by |
|---|---|---|
BitBlt(GetDC(NULL), ...) | Post-DWM desktop surface | WDA_EXCLUDEFROMCAPTURE (user-mode), or never enter DWM |
| DXGI Output Duplication | Post-DWM desktop surface | Same — the API gets the same surface BitBlt sees, faster |
Windows.Graphics.Capture (WinRT) | Post-DWM, per-window or per-monitor | Same flag respected; per-window capture also requires a GraphicsCaptureItem handle the calling app must hold |
| Print Screen → clipboard | Post-DWM desktop surface | Same flag respected |
GDI PrintWindow (per-HWND) | Per-window render via the window's WM_PRINT path | A separate flag (PW_RENDERFULLCONTENT); also respects WDA_EXCLUDEFROMCAPTURE |
| Display miniport scan-out | The actual GPU output | Below user-mode reach; only a kernel driver can intercept |
The structural insight: every user-mode capture API reads from the same place — the post-DWM composited surface. They differ in performance and ergonomics, not in what they can see. That's why the cat-and-mouse of "block this capture path, candidates use a different one" doesn't apply at the user-mode-flag layer; one flag covers all of them, and one flag query reveals all of them.
The way to be invisible across every user-mode capture path is the same way: don't be in the composited surface to begin with. Either (a) tell DWM to exclude you (WDA_EXCLUDEFROMCAPTURE, the flag everyone queries), or (b) render through a path that doesn't reach DWM in the first place.
What changed in WDDM 2.x and 3.x
The original WDDM (Vista, 2007) had paged GPU memory and per-process GPU virtual address spaces conceptually but didn't ship the full implementation until WDDM 2.0 (Windows 10, 2015). The version-by-version feature history is in the WDDM design guide.
WDDM 2.x brought:
- GPU virtual addressing. Each process gets its own GPU VA space. A GPU resource handle in your process is mapped through page tables similar to the CPU MMU.
- IoMMU support. GPU memory accesses are validated through the system IoMMU, preventing cross-process memory access through the GPU.
- Hardware-accelerated GPU scheduling (WDDM 2.7+, Windows 10 2004). GPU work scheduling moves from a software queue managed by
dxgkrnl.systo direct hardware-managed queues on the GPU.
WDDM 3.x (Windows 11) added GPU partitioning and tighter integration with the Hyper-V virtualization stack — covered in the WDDM design guide.
For capture specifically, none of these changes the fundamental shape: DWM still composites, capture APIs still read post-composition. They tighten the isolation between processes' GPU work, but the desktop bitmap is still the desktop bitmap.
Why this matters for AI tools
If you're building an AI tool that paints an overlay over a user's screen, the WDDM stack is the OS the overlay lives in. There are three architecturally distinct postures:
- User-mode app, no flag. Your overlay shows up in every screen capture, every screen share, every recording. Most early AI overlays were this.
- User-mode app,
WDA_EXCLUDEFROMCAPTUREset. Your overlay is excluded from capture surfaces. The flag is queryable in one syscall by anyone watching for it. This is where every consumer AI interview tool sits in 2026. - Kernel-resident path that bypasses DWM entirely. The overlay surface is never in the composited bitmap. There is no flag to query because there's no entry. This is where commercial anti-cheat lives, and it's where a stealth-grade interview tool has to live to survive a CoderPad Enterprise round.
Each posture is a different relationship with WDDM. The capture API isn't your enemy; the question is what surface the API is reading and whether your overlay is in that surface.
For the worked example of how a real proctoring stack composes capture queries with other detection vectors, see our CoderPad Enterprise anti-cheat breakdown. For the input side of the same architecture — pulling code and problem text from process memory rather than re-reading the rendered screen — see memory read versus screen-capture OCR.
FAQ
Can a user-mode app render directly to the GPU and bypass DWM?
For exclusive fullscreen surfaces, historically yes (SetFullscreenState(TRUE)), and capture from another user-mode app would fail until the surface is released. Windowed mode goes through DWM. Modern fullscreen optimizations make the boundary fuzzier, but for an overlay that's compositing on top of a desktop, you're going through DWM unless you're a kernel component.
Does WDA_EXCLUDEFROMCAPTURE work in remote sessions (RDP, virtual desktops)?
Microsoft's documentation lists WDA_MONITOR for monitor-only display (no remote rendering) and WDA_EXCLUDEFROMCAPTURE for capture exclusion in interactive sessions. RDP is its own remoting stack and respects different rules; the safe assumption for a stealth tool is that any flag you set is queryable in any session it's set in.
Why does Snipping Tool sometimes show a yellow capture indicator?
The Windows Graphics Capture API (Windows.Graphics.Capture) emits a per-frame indicator in some Windows builds when it acquires a GraphicsCaptureItem, surfaced through Settings → Privacy & security → App permissions. It's a UX-level indicator, not a security boundary; tools that don't go through the WinRT API don't trigger it.
Where does HDR fit in the pipeline? HDR adds wider color spaces and per-pixel luminance metadata that flows alongside the surface through DWM. The pipeline shape is unchanged; surfaces are just larger pixel formats. Capture APIs still read the post-DWM result, with the SDR vs HDR mapping applied per the destination capture format.
Is there a public sample for a kernel-resident overlay path? Microsoft's WDK samples on GitHub include display miniport sample drivers. None of them ship a "stealth overlay" sample — that's a class of behavior Microsoft doesn't sample-document. The pieces (display miniport, DXGI integration, WDDM callbacks) are documented; assembling them is the work.
For the broader architecture this fits into, see the four stealth layers post. For why the user-mode flag everyone uses isn't enough, see our analysis of single-flag stealth.
FaangCoder is the Windows AI interview overlay built on the architecture this post describes. $399 lifetime, $199 monthly — single product, both plans ship the full kernel-resident stack.
