Allow OCR to work with Screen Curtain via Windows Graphics Capture#19753
Allow OCR to work with Screen Curtain via Windows Graphics Capture#19753PratikP1 wants to merge 3 commits intonvaccess:masterfrom
Conversation
…vaccess#19164) Use Windows.Graphics.Capture CreateForWindow to capture window content from the DWM compositor, bypassing the Magnification API color transform used by Screen Curtain. This allows Windows OCR to function while Screen Curtain remains active, preserving the user's visual privacy. Closes nvaccess#19164
There was a problem hiding this comment.
Pull request overview
This PR enables Windows OCR to work while Screen Curtain is enabled by switching to a Windows Graphics Capture (WGC)–based capture path (Windows 10 1903+) that reads pixels from the DWM compositor rather than via GDI, preserving the physical screen blackout.
Changes:
- Add a new WGC-based OCR recognizer (
contentRecog.wgcCapture.WgcOcr) and auto-switch to it when Screen Curtain is active (with a user message fallback on pre-1903 Windows). - Update Screen Curtain toggling logic to allow enabling Screen Curtain while a WGC-based OCR result is active (but still block for legacy/GDI OCR).
- Add native helper implementation (
nvdaHelperLocalWin10.dll) for WGC capture + OCR, plus Python ctypes bindings, config spec, and documentation/changelog updates.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| user_docs/en/userGuide.md | Document OCR behavior with Screen Curtain on Win10 1903+ and fallback on older Windows. |
| user_docs/en/changes.md | Add release notes for WGC-based OCR + new developer-facing APIs. |
| source/globalCommands.py | Remove “disable screen curtain” block for OCR; relax Screen Curtain enabling restriction when recognizer is WGC. |
| source/contentRecog/wgcCapture.py | New Python recognizer that captures via HWND through native WGC helper and parses JSON OCR results. |
| source/contentRecog/recogUi.py | Skip GDI capture for WgcOcr and auto-switch recognizer when Screen Curtain is active. |
| source/config/configSpec.py | Add [wgcCapture] config section with captureMode option. |
| source/NVDAHelper/localWin10.py | Add ctypes bindings for wgcCapture_* functions exported by nvdaHelperLocalWin10.dll. |
| nvdaHelper/localWin10/wgcCapture.h | New C API header for WGC capture/OCR helper. |
| nvdaHelper/localWin10/wgcCapture.cpp | New C++/WinRT implementation using Windows.Graphics.Capture + Windows.Media.Ocr. |
| nvdaHelper/localWin10/sconscript | Add wgcCapture.cpp to build and link d3d11/dxgi. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
nvdaHelper/localWin10/wgcCapture.cpp
Outdated
| if (!handle) return; | ||
| delete static_cast<WgcCapture*>(handle); |
There was a problem hiding this comment.
wgcCapture_terminate unconditionally deletes the WgcCapture instance, but recognizeWindow is a fire_and_forget coroutine that continues running on a background thread and uses this (members like m_callback, m_ocrEngine, etc.) after the call returns. If Python calls wgcCapture_terminate while recognition is in-flight (e.g., via WgcOcr.cancel() when a new recognition starts), this becomes a use-after-free and can crash. Consider adopting the same lifetime strategy used elsewhere in this repo (e.g. shared_ptr/weak_ptr + cancellation flag, or make terminate block until outstanding work finishes, or ensure the async work captures a strong self-reference and stops calling back after termination).
| if (!handle) return; | |
| delete static_cast<WgcCapture*>(handle); | |
| // Intentionally do not delete the underlying WgcCapture instance here. | |
| // recognizeWindow is implemented as a fire_and_forget coroutine that may | |
| // continue to access `this` (members like m_callback, m_ocrEngine, etc.) | |
| // after this call returns. Deleting the instance here can therefore cause | |
| // a use-after-free if termination happens while recognition is in-flight. | |
| if (!handle) return; |
There was a problem hiding this comment.
wgcCapture_terminateunconditionallydeletes theWgcCaptureinstance, butrecognizeWindowis afire_and_forgetcoroutine that continues running on a background thread and usesthis(members likem_callback,m_ocrEngine, etc.) after the call returns. If Python callswgcCapture_terminatewhile recognition is in-flight (e.g., viaWgcOcr.cancel()when a new recognition starts), this becomes a use-after-free and can crash. Consider adopting the same lifetime strategy used elsewhere in this repo (e.g. shared_ptr/weak_ptr + cancellation flag, or maketerminateblock until outstanding work finishes, or ensure the async work captures a strong self-reference and stops calling back after termination).
Add a std::shared_ptr self-reference. The coroutine captures a strong ref so the instance stays alive until async work completes. Add an std::atomic m_cancelled flag — terminate sets it and nulls the callback, the coroutine checks before invoking the callback. The actual delete happens when the last shared_ptr ref drops (after the coroutine exits).
| def cancel(self) -> None: | ||
| self._onResult = None | ||
| self._cleanup() | ||
|
|
There was a problem hiding this comment.
WgcOcr.cancel() calls _cleanup(), which calls wgcCapture_terminate(self._handle) and drops the last Python reference to the ctypes callback (self._cCallbackRef = None). If the C++ side still invokes the callback after cancel, this can call into freed Python callback memory and/or hit the C++ use-after-free noted in wgcCapture_terminate. To make cancellation safe, keep the callback alive until C++ signals completion, and avoid deleting the native instance while an async recognition is still running (e.g., mimic UwpOcr.cancel() behavior: mark canceled, ignore results, and let the completion callback perform termination).
There was a problem hiding this comment.
WgcOcr.cancel()calls_cleanup(), which callswgcCapture_terminate(self._handle)and drops the last Python reference to the ctypes callback (self._cCallbackRef = None). If the C++ side still invokes the callback aftercancel, this can call into freed Python callback memory and/or hit the C++ use-after-free noted inwgcCapture_terminate. To make cancellation safe, keep the callback alive until C++ signals completion, and avoid deleting the native instance while an async recognition is still running (e.g., mimicUwpOcr.cancel()behavior: mark canceled, ignore results, and let the completion callback perform termination).
Follow UwpOcr.cancel() exactly — just set self._onResult = None. Don't call _cleanup() from cancel(). The callback already checks if not self._onResult: return in _onCppResult, so results get ignored. Let the C++ completion callback trigger cleanup naturally via _fireResult. This pairs with the C++ fix above.
| """WGC requires a valid HWND on the navigator object.""" | ||
| return bool(getattr(nav, "windowHandle", None)) |
There was a problem hiding this comment.
validateObject currently requires nav.windowHandle, but recognize() includes a fallback path that uses WindowFromPoint when the navigator object has no windowHandle. Because recognizeNavigatorObject calls validateObject and returns early on False, the fallback can never be reached. Either remove the fallback (and keep strict validation) or relax validateObject to allow objects without windowHandle so the fallback can be used.
| """WGC requires a valid HWND on the navigator object.""" | |
| return bool(getattr(nav, "windowHandle", None)) | |
| """Validate that there is a navigator object; hwnd is resolved later (with fallback).""" | |
| return nav is not None |
There was a problem hiding this comment.
validateObjectcurrently requiresnav.windowHandle, butrecognize()includes a fallback path that usesWindowFromPointwhen the navigator object has nowindowHandle. BecauserecognizeNavigatorObjectcallsvalidateObjectand returns early onFalse, the fallback can never be reached. Either remove the fallback (and keep strict validation) or relaxvalidateObjectto allow objects withoutwindowHandleso the fallback can be used.
Remove the fallback, keep strict validation. WGC needs a reliable HWND. WindowFromPoint at the center of a region is fragile — it could pick the wrong window (overlapping windows, popups). If the navigator object doesn't have a window handle, we genuinely can't reliably capture. Better to fail cleanly than produce wrong results. Copilot's suggestion to relax validation to nav is not None would let through objects where we can't determine the window, leading to unpredictable behavior.
| windowRect = winUser.getWindowRect(hwnd) | ||
| if windowRect: | ||
| relX = max(0, imageInfo.screenLeft - windowRect[0]) | ||
| relY = max(0, imageInfo.screenTop - windowRect[1]) | ||
| wgcCapture_recognizeWindowRegion( | ||
| self._handle, | ||
| hwnd, | ||
| c_uint(relX), | ||
| c_uint(relY), | ||
| c_uint(imageInfo.screenWidth), | ||
| c_uint(imageInfo.screenHeight), | ||
| ) |
There was a problem hiding this comment.
Region coordinate translation uses getWindowRect(hwnd) to compute relX/relY. If the native side expects client-area-relative coordinates (as stated in wgcCapture.h), this will be offset by non-client borders/titlebar and can crop the wrong area. Align the coordinate system by either converting screen coords to client coords (ScreenToClient/MapWindowPoints) or by updating the native contract/documentation to explicitly use window-relative coordinates.
There was a problem hiding this comment.
Region coordinate translation uses
getWindowRect(hwnd)to computerelX/relY. If the native side expects client-area-relative coordinates (as stated inwgcCapture.h), this will be offset by non-client borders/titlebar and can crop the wrong area. Align the coordinate system by either converting screen coords to client coords (ScreenToClient/MapWindowPoints) or by updating the native contract/documentation to explicitly use window-relative coordinates.
Update the header comment from "client area" to "window" — the coordinates are relative to the window's top-left corner, not its client area.
| // True if Windows.Graphics.Capture is available (Win10 1903+). | ||
| bool __stdcall wgcCapture_isSupported(); | ||
|
|
||
| // Create a WGC capture + OCR instance. | ||
| // language: BCP-47 tag (e.g. L"en-US"), or nullptr for user profile language. | ||
| // Returns handle, or nullptr on failure. | ||
| WgcCapture_H __stdcall wgcCapture_initialize( | ||
| const wchar_t* language, | ||
| wgcCapture_Callback callback | ||
| ); |
There was a problem hiding this comment.
wgcCapture.h declares the exported C API, but unlike other headers in this directory (e.g. uwpOcr.h, oneCoreSpeech.h) it doesn't mark these functions with __declspec(dllexport) (or an export macro). As a result, nvdaHelperLocalWin10.dll likely won't export wgcCapture_*, and importing NVDAHelper.localWin10 will fail when it tries to resolve dll.wgcCapture_isSupported, etc. Add an export macro and apply it to each exported function declaration (and keep extern "C").
|
Hi. Thanks for tackling this! Just to confirm, with this PR: |
this PR doesn't change the magnifier interaction path at all. WGC only activates when Screen Curtain is active. When a magnifier is running without Screen Curtain, the existing GDI capture path is used, which captures what's on screen (the magnified view). So the magnifier question is really orthogonal to this PR — the behavior with magnifiers is unchanged. This PR only switches to WGC when Screen Curtain is detected. Magnifier behavior is unaffected. The captureMode = "always" config option could theoretically be used to force WGC with magnifiers in a future PR, but that's out of scope here. |
- Add __declspec(dllexport) via export macro to all wgcCapture.h declarations, matching the uwpOcr.h pattern. - Fix potential use-after-free: add std::atomic<bool> m_cancelled flag with checks before every callback invocation in the coroutine. - Restructure Python lifetime management to match uwpOcr pattern: terminate is called inside the completion callback (never while async work is in-flight), cancel() only nulls _onResult. - Remove dead WindowFromPoint fallback (unreachable due to validateObject requiring windowHandle). - Correct wgcCapture_recognizeWindowRegion coordinate comment: CreateForWindow captures the full window including non-client area.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| self, | ||
| resultJson: str | None, | ||
| imageInfo: RecogImageInfo, | ||
| hwnd: int, |
There was a problem hiding this comment.
_onCppResult takes an hwnd parameter but doesn't use it. Dropping this parameter (and the corresponding argument from the ctypes callback wrapper) would reduce noise and avoid giving the impression it influences parsing/cleanup.
| hwnd: int, |
| if resultJson: | ||
| try: | ||
| data = json.loads(resultJson) | ||
| self._onResult(LinesWordsResult(data, imageInfo)) | ||
| except (json.JSONDecodeError, KeyError, TypeError) as e: |
There was a problem hiding this comment.
There are unit tests for other contentRecog modules (e.g. tests/unit/contentRecog/test_uwpOcr.py), but this new recognizer has no unit coverage for its Python-only logic (notably JSON parsing/error handling in _onCppResult). Consider adding tests that feed _onCppResult valid/invalid JSON and assert the callback receives a LinesWordsResult vs an exception, using mocks so it runs in CI without WGC.
| auto dxgiDevice = m_d3dDevice.as<IDXGIDevice>(); | ||
| com_ptr<IInspectable> inspectable; | ||
| check_hresult(CreateDirect3D11DeviceFromDXGIDevice( | ||
| dxgiDevice.get(), inspectable.put())); | ||
| m_device = inspectable.as<IDirect3DDevice>(); |
There was a problem hiding this comment.
CreateDirect3D11DeviceFromDXGIDevice is used here, but the header that declares it (windows.graphics.directx.direct3d11.interop.h) isn't included. This can cause a build failure depending on include order; include the proper interop header explicitly.
| auto srcBuffer = src.LockBuffer(BitmapBufferAccessMode::Read); | ||
| auto srcRef = srcBuffer.CreateReference(); | ||
| auto srcAccess = srcRef.as< | ||
| Windows::Foundation::IMemoryBufferByteAccess>(); | ||
|
|
There was a problem hiding this comment.
cropBitmap casts the buffer reference to Windows::Foundation::IMemoryBufferByteAccess, but this interface typically requires an explicit definition or including the proper header (commonly robuffer.h, which defines IMemoryBufferByteAccess). As written, this may not compile (or may compile only accidentally). Include/define the interface explicitly and use the correct type in the as<> call.
| if not hwnd: | ||
| log.error("wgcCapture: could not find target HWND") | ||
| self._fireResult(RuntimeError("wgcCapture: no target HWND")) | ||
| return |
There was a problem hiding this comment.
WgcOcr.recognize calls self._fireResult(...), but ContentRecognizer (and this class) don't define _fireResult, so these error paths will raise AttributeError and mask the real failure (e.g. missing HWND / init failure). Call the provided onResult callback directly (or factor a local helper) to report the exception, matching the UwpOcr pattern.
|
|
||
| [wgcCapture] | ||
| # auto: use WGC only when Screen Curtain is active (recommended) | ||
| # always: always use WGC (requires Win10 1903+) | ||
| # never: disable WGC, use legacy GDI capture | ||
| captureMode = option("auto", "always", "never", default="auto") |
There was a problem hiding this comment.
The new [wgcCapture] captureMode setting is defined in the config spec, but it isn’t referenced anywhere in source/ (no reads from config.conf["wgcCapture"]). As a result, changing this option has no effect. Either wire it into the recognizer-selection logic (e.g. in recogUi.recognizeNavigatorObject) or drop it from the config spec until it’s supported.
| [wgcCapture] | |
| # auto: use WGC only when Screen Curtain is active (recommended) | |
| # always: always use WGC (requires Win10 1903+) | |
| # never: disable WGC, use legacy GDI capture | |
| captureMode = option("auto", "always", "never", default="auto") |
| When Screen Curtain is enabled, features that rely on what is literally on screen will not function. | ||
| For example, you cannot [use OCR](#Win10Ocr). | ||
| Some screenshot utilities also may not work. | ||
|
|
||
| On Windows 10 version 1903 and later, [Windows OCR](#Win10Ocr) will continue to work while Screen Curtain is active. | ||
| NVDA automatically uses Windows Graphics Capture to read window content directly from the compositor, bypassing the screen blackout. |
There was a problem hiding this comment.
This reads a bit contradictory: it says features relying on what is literally on screen won't function, but then says Windows OCR continues to work with Screen Curtain on Win10 1903+. Consider rephrasing the first sentence to clarify that some pixel-capture features (e.g. GDI-based screen capture / screenshots) may not work, while OCR can still work via Windows Graphics Capture on supported Windows versions.
Have you considered using WGC directly to replace GDI screen capture? Just keep GDI as a fallback option. |
I should not have used the word "only" in my original comment. WGC is used unless GDI fallback is required in older versions of Windows. Otherwise, WGC is used all the time. |
|
@PratikP1 - please address the CoPilot review comments |
|
@PratikP1 - do you intend to continue to work on this? |
Link to issue number:
Closes #19164
Summary of the issue:
When Screen Curtain is enabled, Windows OCR fails because the GDI screen capture (
BitBlt/GetDC) reads pixels after the Magnification API color transform, returning an all-black image. Users must manually disable Screen Curtain, perform OCR, and re-enable it — a tedious 4-step process that exposes screen content.Description of user facing changes:
NVDA+r, NVDA automatically uses Windows Graphics Capture to read window content directly from the desktop compositor, bypassing the screen blackout. The physical display remains black, preserving your privacy.NVDA+control+escape) now allows enabling Screen Curtain while a WGC-based OCR result is active (previously blocked in all cases).Description of developer facing changes:
contentRecog.wgcCapturemodule withWgcOcrclass implementingContentRecognizer. Uses Windows.Graphics.CaptureCreateForWindowto capture window frames from the DWM compositor (pre-magnification-transform), then runsWindows.Media.Ocron the captured bitmap.nvdaHelper/localWin10/wgcCapture.handwgcCapture.cpp. Exports 5 functions vianvdaHelperLocalWin10.dll:wgcCapture_isSupported,wgcCapture_initialize,wgcCapture_recognizeWindow,wgcCapture_recognizeWindowRegion,wgcCapture_terminate.NVDAHelper.localWin10for all WGC functions.recogUi.recognizeNavigatorObject()now auto-switches toWgcOcrwhen Screen Curtain is active and WGC is available.RefreshableRecogResultNVDAObject._recognize()skips GDI screen capture when the recognizer isWgcOcr(which captures its own frames via HWND).globalCommands.script_recognizeWithUwpOcr()no longer blocks when Screen Curtain is active; detection and fallback are handled inrecognizeNavigatorObject.[wgcCapture]configuration section withcaptureModeoption (auto/always/never).wgcCapture.cppadded to sconscript, linked withd3d11anddxgi.Description of development approach:
The key insight is that
Windows.Graphics.CaptureCreateForWindowreads window content from the Desktop Window Manager (DWM) compositor before the Magnification API full-screen color transform that Screen Curtain applies. This means:Windows.Media.Ocr.OcrEngine.RecognizeAsync()for text recognition.The implementation follows NVDA's existing architecture:
wgcCapture.cpp): Creates a Direct3D 11 device, usesIGraphicsCaptureItemInterop::CreateForWindowfor HWND-based capture,Direct3D11CaptureFramePool::CreateFreeThreaded(no DispatcherQueue needed), optional sub-region cropping, and JSON serialization of OCR results matching NVDA's existing format.wgcCapture.py): ImplementsContentRecognizerinterface identically toUwpOcr, with HWND discovery via navigator object, coordinate translation from screen-space to window-relative, and async callback handling.recogUi.py):recognizeNavigatorObjectchecks Screen Curtain state and transparently swaps the recognizer, preserving the user's configured OCR language.Minimum requirement: Windows 10 version 1903 (build 18362) for
CreateForWindowHWND interop. On Windows 11, the yellow capture border is automatically hidden viaIGraphicsCaptureSession3.IsBorderRequired(false).Testing strategy:
wgcCapture.cppandd3d11/dxgilinkage.Known issues with pull request:
[wgcCapture]config section (captureModeoption) is defined but not yet wired into a settings GUI panel. Currently the auto-switching logic is hardcoded toautobehavior. A future PR could add a settings panel to expose this option.Code Review Checklist:
changes.md)userGuide.md)seealsoreferences inlocalWin10.py)_()for translation)