Isolate plugins in out-of-process COM host#40120
Conversation
There was a problem hiding this comment.
Pull request overview
This PR moves WSL plugins from in-process LoadLibrary inside wslservice.exe to isolated out-of-process wslpluginhost.exe COM local servers, aiming to keep the service alive even if a plugin crashes.
Changes:
- Introduces
wslpluginhost.exe(COM local server) that loads one plugin DLL and forwards lifecycle events, while proxying plugin API callbacks back to the service. - Adds new COM contracts (
IWslPluginHost,IWslPluginHostCallback) and consolidates proxy/stub generation intowslserviceproxystub.dll. - Updates service-side plugin management and adds a new
shared_mutexpath intended to avoid re-entrancy/deadlock on COM RPC callback threads.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| src/windows/wslpluginhost/exe/resource.h | Adds resource IDs for new host executable. |
| src/windows/wslpluginhost/exe/PluginHost.h | Declares COM class implementing IWslPluginHost plus API callback stubs. |
| src/windows/wslpluginhost/exe/PluginHost.cpp | Implements plugin DLL loading, lifecycle dispatch, and callback forwarding to service. |
| src/windows/wslpluginhost/exe/main.rc | Adds version/icon resources for wslpluginhost.exe. |
| src/windows/wslpluginhost/exe/main.cpp | Implements COM local server entrypoint and class factory registration. |
| src/windows/wslpluginhost/exe/CMakeLists.txt | Adds build target for wslpluginhost.exe. |
| src/windows/wslpluginhost/CMakeLists.txt | Wires new subdirectory into build. |
| src/windows/service/stub/CMakeLists.txt | Adds MIDL proxy/stub sources for WslPluginHost.idl into wslserviceproxystub. |
| src/windows/service/inc/WslPluginHost.idl | Defines out-of-proc COM interfaces for plugin hosting + callbacks. |
| src/windows/service/inc/CMakeLists.txt | Adds new wslpluginhostidl MIDL generation target. |
| src/windows/service/exe/PluginManager.h | Refactors plugin manager to track out-of-proc hosts and adds callback implementation type. |
| src/windows/service/exe/PluginManager.cpp | Implements COM activation of hosts, job object assignment, and service-side callback handlers. |
| src/windows/service/exe/LxssUserSession.h | Adds shared_mutex and makes plugin-callback methods private/friend-only. |
| src/windows/service/exe/LxssUserSession.cpp | Switches plugin callback locking from m_instanceLock to m_callbackLock and gates VM teardown. |
| src/windows/service/exe/CMakeLists.txt | Adds dependency on wslpluginhostidl. |
| src/windows/common/precomp.h | Adds <shared_mutex> include for new locking. |
| msipackage/package.wix.in | Installs wslpluginhost.exe and registers COM AppID/CLSID/interfaces for activation and proxy/stub. |
| msipackage/CMakeLists.txt | Adds wslpluginhost.exe to packaged binaries and build dependencies. |
| CMakeLists.txt | Adds subdirectory for wslpluginhost and adjusts global include directories. |
Comments suppressed due to low confidence (2)
src/windows/service/exe/LxssUserSession.cpp:3644
- CreateLinuxProcess now only takes m_callbackLock, but it calls _RunningInstance(), which is annotated Requires_lock_held(m_instanceLock) and reads m_runningInstances. This is both a locking-contract violation and can race with writers that still use m_instanceLock only. Refactor so callback code can safely read the running-instance map (e.g., provide a callback-safe lookup guarded by m_callbackLock, and ensure all writes to m_runningInstances/m_utilityVm also take m_callbackLock exclusively after m_instanceLock per the stated lock ordering).
// Shared lock prevents _VmTerminate from destroying the VM or instances
// while we use them. See MountRootNamespaceFolder for rationale.
std::shared_lock lock(m_callbackLock);
RETURN_HR_IF(E_NOT_VALID_STATE, !m_utilityVm);
if (Distro == nullptr)
{
*Socket = m_utilityVm->CreateRootNamespaceProcess(Path, Arguments).release();
}
else
{
const auto distro = _RunningInstance(Distro);
THROW_HR_IF(WSL_E_VM_MODE_INVALID_STATE, !distro);
const auto wsl2Distro = dynamic_cast<WslCoreInstance*>(distro.get());
src/windows/service/exe/LxssUserSession.cpp:2614
- m_runningInstances is updated here without taking m_callbackLock, but plugin callbacks now read m_runningInstances under m_callbackLock (and intentionally do not take m_instanceLock). Because these are different locks, this doesn’t provide synchronization and can lead to data races/UB when a callback runs concurrently with instance creation/termination. Writers that mutate m_runningInstances (and m_utilityVm if accessed by callbacks) need to also take m_callbackLock (exclusive) in the documented order m_instanceLock → m_callbackLock.
// This needs to be done before plugins are notified because they might try to run a command inside the distribution.
m_runningInstances[registration.Id()] = instance;
if (version == LXSS_WSL_VERSION_2)
{
auto cleanupOnFailure =
wil::scope_exit_log(WI_DIAGNOSTICS_INFO, [&]() { m_runningInstances.erase(registration.Id()); });
m_pluginManager.OnDistributionStarted(&m_session, instance->DistributionInformation());
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 23 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/windows/service/exe/LxssUserSession.cpp:3643
- CreateLinuxProcess now only takes m_callbackLock, but it reads m_runningInstances via _RunningInstance(), which is annotated Requires_lock_held(m_instanceLock) and the map itself is Guarded_by(m_instanceLock). This is a real race/contract violation (and can also break static analysis). You’ll need a consistent locking strategy for callback threads (e.g., make all accesses/mutations of m_runningInstances + m_utilityVm also take m_callbackLock in the documented order, or refactor callbacks to avoid touching m_runningInstances without m_instanceLock).
// Shared lock prevents _VmTerminate from destroying the VM or instances
// while we use them. See MountRootNamespaceFolder for rationale.
std::shared_lock lock(m_callbackLock);
RETURN_HR_IF(E_NOT_VALID_STATE, !m_utilityVm);
if (Distro == nullptr)
{
*Socket = m_utilityVm->CreateRootNamespaceProcess(Path, Arguments).release();
}
else
{
const auto distro = _RunningInstance(Distro);
THROW_HR_IF(WSL_E_VM_MODE_INVALID_STATE, !distro);
3418f7e to
3f03d4f
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 23 changed files in this pull request and generated 9 comments.
Comments suppressed due to low confidence (1)
src/windows/service/exe/LxssUserSession.cpp:3653
CreateLinuxProcessnow only takesm_callbackLockbut calls_RunningInstance(Distro)(which is annotated_Requires_lock_held_(m_instanceLock)and touchesm_lockedDistributions). This violates the locking contract and can race with instance state changes or deadlock if someone later adds the required lock. Consider adding a callback-safe lookup that is guarded bym_callbackLockonly (and does not call_EnsureNotLocked), or refactoring_RunningInstance/ the guarded annotations so callback code never needsm_instanceLock.
if (Distro == nullptr)
{
*Socket = m_utilityVm->CreateRootNamespaceProcess(Path, Arguments).release();
}
else
{
const auto distro = _RunningInstance(Distro);
THROW_HR_IF(WSL_E_VM_MODE_INVALID_STATE, !distro);
const auto wsl2Distro = dynamic_cast<WslCoreInstance*>(distro.get());
THROW_HR_IF(WSL_E_WSL2_NEEDED, !wsl2Distro);
|
Hey @benhillis 👋 — Following up on this PR. It currently has merge conflicts that need to be resolved, and the CI build didn't run (shows action_required). There are also 20 unresolved review threads, including some security-relevant findings (TOCTOU on DLL signature validation, |
f32e629 to
5641289
Compare
|
CI is all green now and conflicts are resolved. Would love to get some eyes on this when someone has a chance — the main design change is moving plugin loading out of wslservice into separate COM hosts so a bad plugin can't take down the service. |
3492d6b to
57501d9
Compare
57501d9 to
ed98200
Compare
Plugin DLLs are now loaded in isolated wslpluginhost.exe processes instead of directly in wslservice.exe via LoadLibrary. This prevents a buggy or malicious plugin from crashing the WSL service. Architecture: - New IWslPluginHost/IWslPluginHostCallback COM interfaces (WslPluginHost.idl) for cross-process plugin lifecycle management - New wslpluginhost.exe: COM local server (REGCLS_SINGLEUSE), one per plugin, loads the plugin DLL and dispatches notifications - Refactored PluginManager: CoCreateInstance replaces LoadLibrary, PluginError returned via [out] parameter, crash recovery via IsHostCrash() detecting RPC_E_DISCONNECTED/SERVER_DIED Callback safety: - Plugin callbacks (MountFolder, ExecuteBinary) arrive on a different COM RPC thread and use std::shared_lock(m_callbackLock) instead of m_instanceLock to avoid re-entrancy deadlocks - _VmTerminate takes exclusive m_callbackLock before destroying the VM, blocking until in-flight callbacks complete - Lock ordering: m_instanceLock -> m_callbackLock (never reverse) - All writes to m_runningInstances take m_callbackLock exclusive to prevent data races with concurrent callback reads Security: - COM AppID with SYSTEM-only launch/access permissions - Plugin signature validation (ValidateFileSignature) keeps the file handle open until after LoadLibrary to prevent TOCTOU attacks - Plugin host processes use minimal access rights for handles Process lifecycle: - Plugin hosts added to a job object with KILL_ON_JOB_CLOSE for automatic cleanup if wslservice exits - g_pluginHost is process-wide (REGCLS_SINGLEUSE guarantees one plugin per process), nulled on destruction to prevent UAF - std::call_once for thread-safe initialization and job creation Packaging: - WslPluginHost.idl compiled into existing wslserviceproxystub.dll - MSI: COM class/interface registration, AppID security, proxy/stub - wslpluginhost.exe added to build/signing pipeline, WER crash dump list, LSP registration, and test validation Plugins are not loaded for WSL1-only sessions since all plugin hooks require a WSL2 VM. WslPluginApi.h is unchanged - existing plugin DLLs work unmodified.
ed98200 to
c66d3d7
Compare
…sessions The plugin host uses MTA and one host process serves all user sessions. When multiple sessions trigger hooks concurrently, m_pluginErrorMessage (std::optional<std::wstring>) and g_hookThreadId can be raced. Add a mutex to serialize hook dispatch within each host process. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Moves plugin DLLs from LoadLibrary in wslservice.exe to isolated wslpluginhost.exe processes via COM. A crashing plugin kills only its host process — the service logs it and continues. Plugin API is unchanged; Defender and existing plugins work unmodified.
Callbacks (MountFolder, ExecuteBinary) arrive on a different COM thread, so they can't re-enter the recursive mutex. Instead of restructuring every call site, callbacks use a separate \shared_mutex\ — shared for reads, exclusive in _VmTerminate\ before destroying the VM.
Plugin hosts are in a job object for automatic cleanup on service exit. COM activation is SYSTEM-only via AppID. Proxy/stub is consolidated into wslserviceproxystub.dll. One new exe, no new DLLs.