Skip to content

feat(MTP): per-test coverage analysis for MTP runner#1

Open
piotr-nawrot-golba-music wants to merge 9 commits intomasterfrom
feat/mtp-per-test-coverage
Open

feat(MTP): per-test coverage analysis for MTP runner#1
piotr-nawrot-golba-music wants to merge 9 commits intomasterfrom
feat/mtp-per-test-coverage

Conversation

@piotr-nawrot-golba-music
Copy link
Copy Markdown
Owner

Summary

Implements per-test coverage capture for the Microsoft Testing Platform (MTP) test runner by running each test in an isolated server process.

Key changes:

  • SingleMicrosoftTestPlatformRunner.StopAndRemoveServerAsync() — stops server and removes from cache, triggering ProcessExit coverage flush
  • SingleMicrosoftTestPlatformRunner.RunSingleTestForCoverageAsync() — runs one test, stops server, reads per-test coverage file
  • MicrosoftTestPlatformRunnerPool.CaptureCoverageTestByTest() — iterates all tests using the runner pool for parallelism
  • CaptureCoverage() routing — uses per-test capture when CoverageBasedTest flag is set, aggregate otherwise
  • Confidence level: Normal for perTest, Exact for perTestInIsolation

Why process restart: MTP doesn't have an in-process data collector like VsTest's CoverageCollector. Since MutantControl only flushes coverage data on ProcessExit, the most reliable way to get per-test coverage is to stop and restart the server between tests. This is a one-time cost during the coverage capture phase.

Test plan

  • Unit tests pass: dotnet test src/Stryker.TestRunner.MicrosoftTestPlatform.UnitTest/ (150 pass)
  • Full solution builds: dotnet build src/Stryker.slnx (0 errors)
  • Full solution tests pass (1784+ tests, 0 failures)
  • Manual test with xUnit v3 project using --coverage-analysis perTest --test-runner mtp
  • Manual test with xUnit v3 project using --coverage-analysis perTestInIsolation --test-runner mtp
  • Manual test with --coverage-analysis all --test-runner mtp (aggregate fallback)

Related

piotr-nawrot-golba-music and others added 4 commits April 1, 2026 17:24
…ureCoverage

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ution

MTP runner now captures per-test coverage by running each test in an isolated
process. When coverage-analysis is set to perTest or perTestInIsolation,
each test gets its own MTP server process. The server is stopped after each
test, triggering MutantControl.FlushCoverageToFile() via ProcessExit, and
the resulting coverage file is read to build per-test coverage results.

This enables Stryker to determine which tests cover which mutants for MTP-based
frameworks (xUnit v3, TUnit, MSTest with MTP, NUnit with MTP), unlocking
coverage-based test optimization that was previously only available with VsTest.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Owner Author

@piotr-nawrot-golba-music piotr-nawrot-golba-music left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: MTP Per-Test Coverage Capture

✅ Overall Verdict: Architecture is sound, will work for xUnit + MTP with file coverage

All 150 unit tests pass. The core flow (start server → run one test → stop server → ProcessExit triggers FlushCoverageToFile → read file) is correctly implemented and framework-agnostic — xUnit v3, TUnit, MSTest, NUnit via MTP all work the same way.


🟢 What works well

  1. Correct routing: CaptureCoverage correctly routes perTestCoverageBasedTest (Normal confidence) and perTestInIsolationCaptureCoveragePerTest | CoverageBasedTest (Exact confidence). The all/off modes correctly fall through to aggregate path.

  2. Process isolation is correct: StopAndRemoveServerAsync removes the server from _assemblyServers, so GetOrCreateServerAsync always starts a fresh process for the next test. Each process gets its own MutantControl static state — no cross-test contamination.

  3. No race conditions on coverage files: Each runner has a unique _coverageFilePath (stryker-coverage-{id}.txt), and WaitServerProcessExitAsync blocks until the process exits (guaranteeing FlushCoverageToFile completes before the read).

  4. File format consistency: Writer (MutantControl) and reader (ReadCoverageData) both use the "covered;static" format with comma-separated IDs. ✅

  5. Integration with CoverageAnalyser: Returns ICoverageRunResult with the same structure as VsTest — the analyser processes them identically.


🟡 Issues to address

1. Silent coverage failure masks mutants as uncovered (Medium severity)

In RunSingleTestForCoverageAsync (SingleMicrosoftTestPlatformRunner.cs:134-172):

If StopAsync times out and force-kills the process, FlushCoverageToFile never runs. ReadCoverageData returns empty arrays, but the result still gets Normal/Exact confidence. The CoverageAnalyser then believes this test covers zero mutants — effectively hiding those mutants from testing.

Suggestion: After ReadCoverageData, if both lists are empty, consider logging a warning or downgrading to Dubious confidence. This way the analyser won't trust the empty result and will still run those mutants against this test.

2. Misleading test name: RunSingleTestForCoverageAsync_ShouldReturnDubious_WhenNoCoverageFile (Low severity)

This test (SingleMicrosoftTestPlatformRunnerCoverageTests.cs:554-570) tests ReadCoverageData() returning empty arrays — it does not test the CoverageConfidence.Dubious path. The Dubious confidence is only set in the catch block of RunSingleTestForCoverageAsync (line 160-170), which this test doesn't exercise. Similarly, RunSingleTestForCoverageAsync_ShouldReturnCoverageFromFile (line 518) only tests ReadCoverageData, not the actual RunSingleTestForCoverageAsync method.

Suggestion: Rename tests to match what they actually test (ReadCoverageData_...), or add a test that exercises the real RunSingleTestForCoverageAsync exception path via a TestableRunner with a throwing coverageHandler.

3. Missing test for the exception/Dubious path (Low severity)

There's no test that verifies RunSingleTestForCoverageAsync returns CoverageConfidence.Dubious when an exception occurs (e.g., server fails to start, test fails to run). Consider adding a test using TestableRunner with a coverageHandler that throws.

4. Performance consideration — no issue, but worth documenting

Starting/stopping one process per test means N process startups for N tests. For large test suites (1000+ tests), this will be noticeably slower than VsTest's in-process data collector approach. This is an acceptable trade-off since MTP doesn't support data collectors, but worth noting in documentation so users understand the impact of perTest/perTestInIsolation with MTP runners.


🟢 Summary

Aspect Status
Per-test server isolation ✅ Correct
ProcessExit → FlushCoverageToFile ✅ Correct
Coverage file read/write ✅ Correct, format-consistent
Race conditions ✅ None found
xUnit + MTP compatibility ✅ Framework-agnostic
Integration with CoverageAnalyser ✅ Correct
Error handling ⚠️ Silent empty coverage needs attention
Test coverage of new code ⚠️ Good routing tests, but runner-level tests don't exercise real RunSingleTestForCoverageAsync

Bottom line: Yes, running mutations with xUnit and MTP with file coverage watching will work. The architecture is correct and the flow is sound. The issues above are about edge-case resilience and test accuracy, not correctness of the happy path.

When RunSingleTestForCoverageAsync gets empty coverage data (e.g., server
force-killed before FlushCoverageToFile ran), the result now correctly uses
CoverageConfidence.Dubious instead of the requested confidence level. This
prevents silently marking mutants as uncovered when coverage capture failed.

Also fixes misleading test names (renamed RunSingleTestForCoverageAsync_*
to ReadCoverageData_* where they only tested ReadCoverageData), and adds
proper tests for the Dubious confidence paths through the pool.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Owner Author

@piotr-nawrot-golba-music piotr-nawrot-golba-music left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: MTP Per-Test Coverage Capture

Verdict: No significant issues found. The implementation is correct and well-structured.

I thoroughly investigated the concurrency model, coverage file IPC coordination, error handling paths, and data parsing robustness. All 152 unit tests pass.

What I verified:

  1. Concurrency is safe. \Parallel.ForEach\ with \MaxDegreeOfParallelism = _countOfRunners\ combined with \RunThisAsync's \ConcurrentBag.TryTake/Add\ pattern guarantees each runner instance is used by exactly one thread at a time. No race conditions on _assemblyServers\ (protected by _serverLock), and no contention on coverage files (unique path per runner via \stryker-coverage-{id}.txt).

  2. Coverage file coordination is correct. Runner writes \STRYKER_COVERAGE_FILE\ env var (filename only), child test process reconstructs full path via \Path.GetTempPath(). Both sides resolve to the same file. Verified the flow: \DeleteCoverageFile → GetOrCreateServer → RunTest → StopAndRemoveServer → ReadCoverageData → DeleteCoverageFile.

  3. Error handling is comprehensive. Force-killed process (30s timeout in \StopAsync) → empty coverage file → \Dubious\ confidence. Server startup failure → caught by \RunSingleTestForCoverageAsync\ catch-all → \Dubious. \ReadCoverageData\ handles missing file, empty content, malformed data (\int.TryParse\ + filter), and I/O exceptions.

  4. Routing logic is correct. \CoverageBasedTest\ flag → per-test with \Normal\ confidence. \CoverageBasedTest | CaptureCoveragePerTest\ → per-test with \Exact\ confidence. Other modes → aggregate fallback.

  5. \ParseMutantIds\ is robust against partial writes — uses \TryParse\ with \Where(HasValue)\ to silently skip non-numeric entries from a partially-flushed file.

@piotr-nawrot-golba-music
Copy link
Copy Markdown
Owner Author

Concurrency Improvement Suggestions

These aren't blocking issues — the current code is correct given the pool's single-threaded-per-runner invariant — but they'd make the concurrency model more robust and efficient.


1. \RunThisAsync\ — Replace \AutoResetEvent\ with \SemaphoreSlim\ (high value)

File: \MicrosoftTestPlatformRunnerPool.cs, lines 239–276

\RunThisAsync\ currently uses \AutoResetEvent.WaitOne(1000)\ in a polling loop to wait for an available runner. This blocks a ThreadPool thread for up to 1 second per iteration while waiting. Under high concurrency (many mutations queued via \Parallel.ForEach\ or concurrent \TestMultipleMutantsAsync\ calls), this can cause ThreadPool starvation.

\\csharp
// Current: blocks a thread, polls every 1s
while (!_availableRunners.TryTake(out runner))
{
if (!_runnerAvailableHandler.WaitOne(waitIntervalMs)) // ← blocks thread
{
attempts++;
// ...
}
}
\\

Suggestion: Replace \AutoResetEvent _runnerAvailableHandler\ with \SemaphoreSlim(_countOfRunners, _countOfRunners)\ and use \�wait WaitAsync()\ instead of the polling loop. This makes runner checkout fully async (freeing the thread back to the pool while waiting) and eliminates the 1-second polling granularity:

\\csharp
// Proposed: fully async, no thread blocked while waiting
private readonly SemaphoreSlim _runnerSemaphore = new(_countOfRunners, _countOfRunners);

private async Task RunThisAsync(Func<SingleMicrosoftTestPlatformRunner, Task> task)
{
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(300));
await _runnerSemaphore.WaitAsync(cts.Token).ConfigureAwait(false);
// ... TryTake + try/finally { Add + Release }
}
\\


2. \GetOrCreateServerAsync\ — Replace dual \lock\ with \SemaphoreSlim(1,1)\ (correctness improvement)

File: \SingleMicrosoftTestPlatformRunner.cs, \GetOrCreateServerAsync\ (lines ~314–340)

This method has a TOCTOU (time-of-check-to-time-of-use) pattern — it acquires \lock(_serverLock)\ to check the cache, releases it, does \�wait server.StartAsync(), then re-acquires the lock to store the server:

\\csharp
lock (_serverLock)
{
if (_assemblyServers.TryGetValue(assembly, out server) && server.IsInitialized)
return server; // ← check
}
// gap: lock released, await happens here
server = new AssemblyTestServer(...);
await server.StartAsync(); // ← create + start (outside lock)
lock (_serverLock)
{
_assemblyServers[assembly] = server; // ← store
}
\\

If two threads call this for the same assembly simultaneously, both miss the cache, both create and start a server, and the second _assemblyServers[assembly] = server\ silently overwrites the first — leaking a running server process that is never stopped or disposed.

This is currently safe because \RunThisAsync\ guarantees single-threaded access per runner instance. But the invariant is enforced at the pool level, not the class level — making it fragile if the runner is ever used differently.

Suggestion: Replace \object _serverLock\ with \SemaphoreSlim(1,1)\ and hold it across the entire check-create-start-store sequence:

\\csharp
private readonly SemaphoreSlim _serverLock = new(1, 1);

private async Task GetOrCreateServerAsync(string assembly)
{
await _serverLock.WaitAsync().ConfigureAwait(false);
try
{
if (_assemblyServers.TryGetValue(assembly, out var server) && server.IsInitialized)
return server;

    server = new AssemblyTestServer(...);
    await server.StartAsync();
    _assemblyServers[assembly] = server;
    return server;
}
finally
{
    _serverLock.Release();
}

}
\\

This eliminates the race structurally, regardless of how the runner is used.


3. Minor: \CalculateAssemblyTimeout\ — redundant lock acquisition per LINQ element

File: \SingleMicrosoftTestPlatformRunner.cs, \CalculateAssemblyTimeout\

This method acquires _discoveryLock\ once per element inside a .Sum()\ LINQ lambda. Since _testsByAssembly\ is a plain \Dictionary, each iteration re-acquires the lock to read from it. It would be cleaner (and marginally faster for large test suites) to snapshot the dictionary once under a single lock acquisition, then iterate the snapshot outside the lock.

piotr-nawrot-golba-music and others added 2 commits April 1, 2026 19:11
… for async-safe concurrency

- Replace AutoResetEvent with SemaphoreSlim in RunnerPool.RunThisAsync
  to avoid blocking thread-pool threads during runner checkout
- Replace object _serverLock with SemaphoreSlim(1,1) in SingleRunner
  to enable holding the lock across await in GetOrCreateServerAsync,
  eliminating the TOCTOU race in the check-create-start-store pattern
- Fix CalculateAssemblyTimeout to snapshot _testDescriptions once
  instead of acquiring _discoveryLock per LINQ element

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…RunThisAsync

Replace recursive self-call with a while loop sharing a single
CancellationTokenSource so the 300-second timeout acts as a hard
upper bound across all retries, preventing potential infinite loops
if the semaphore/bag invariant is ever broken.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@piotr-nawrot-golba-music
Copy link
Copy Markdown
Owner Author

Review Round 3 — Final Review + Allocation Analysis

Previous Comments Status ✅

All 3 suggestions from the first review have been addressed:

  1. RunThisAsyncAutoResetEventSemaphoreSlim ✅ Applied
  2. GetOrCreateServerAsync — dual lock → single SemaphoreSlim(1,1) hold ✅ Applied
  3. CalculateAssemblyTimeout — lock-per-LINQ-element antipattern ✅ Applied

The recursive retry bug found in review round 2 was also fixed → bounded while loop ✅


🐛 Bug: Stale server in error path leaks coverage data between tests

File: SingleMicrosoftTestPlatformRunner.cs, lines 184–195 (RunSingleTestForCoverageAsync catch block)

If server.RunTestsAsync throws (line 151), the catch block returns Dubious but never stops the server. The server stays alive in _assemblyServers with accumulated MutantControl._coveredMutants data. When the next test reuses this server for the same assembly, ProcessExitFlushCoverageToFile writes combined coverage from both tests — polluting the second test's result.

Fix: Add best-effort cleanup in the catch block:

catch (Exception ex)
{
    _logger.LogWarning(ex, "{RunnerId}: Failed to capture coverage for test {TestId}", RunnerId, testId);
    try { await StopAndRemoveServerAsync(assembly).ConfigureAwait(false); }
    catch { /* best-effort cleanup */ }
    return CoverageRunResult.Create(testId, CoverageConfidence.Dubious, ...);
}

⚠️ Race: _serverLock.Dispose() after Release() can throw ObjectDisposedException

File: SingleMicrosoftTestPlatformRunner.cs, lines 676–677 (Dispose(bool))

_serverLock.Release();  // line 676 — unblocks a waiter
_serverLock.Dispose();  // line 677 — disposes immediately

Between Release() and Dispose(), a concurrent caller unblocked by Release() enters its critical section, then throws ObjectDisposedException when it calls _serverLock.Release() in its finally block. The _disposed field is a plain bool with no memory barrier and nothing checks it before acquiring the semaphore.

Fix: Don't dispose the SemaphoreSlim — it has no unmanaged resources and GC handles it. Or set a volatile bool _disposing flag and check it in all acquisition sites.


📊 Allocation Hot Spots (per-mutation paths)

These execute thousands of times during a mutation run and create measurable GC pressure:

# Location Issue Severity Fix
1 RunnerId property (line 36) $"MtpRunner-{_id}" allocates a new string on every access (~28 call sites, ~6+ per mutation) 🔴 High Cache as readonly string _runnerId in constructor
2 RunTestsInternalAsync (lines 599–616) 2x .ToList() + 3 LINQ iterators + re-filter for failures + per-test string interpolation. Re-iterates finishedTests 3 additional times 🔴 High Single foreach loop building all lists in one pass
3 _testDescriptions.Values.ToList() (line 630) Full collection copy per-mutation per-assembly 🔴 High Pass .Values directly (already a snapshot of refs), or cache and invalidate on discovery
4 CalculateAssemblyTimeout (line 396) new Dictionary<>(_testDescriptions) copies entire dictionary per-mutation 🔴 High Hold lock for the brief .Sum() instead of copying
5 CalculateAssemblyTimeout (lines 400–401) ContainsKey + TryGetValue = double hash lookup per test node 🟡 Low Just TryGetValue in the Sum, drop the Where
6 Debug log (line 87) string.Join(",", mutants.Select(...)) evaluates even when Debug is off 🟠 Medium Guard with if (_logger.IsEnabled(LogLevel.Debug))
7 ParseMutantIds (lines 303–308) 3 LINQ iterators + int? nullable boxing + .Trim() per token 🟠 Medium foreach with StringSplitOptions.TrimEntries
8 RegisterInitialTestResult (line 607) new MtpTestResult(duration) per-test per-mutation — overwrites "initial" on every mutation 🟠 Medium Only register during initial test run, not mutation runs
9 RunTestsInternalAsync (lines 604–606) ContainsKey + indexer [] = double dictionary lookup per test 🟡 Low Use TryGetValue
10 assemblies.Any() (lines 87, 232) Allocates enumerator on IReadOnlyList 🟡 Low Use .Count == 0

Top 3 highest-impact fixes:

  1. Cache RunnerId — trivial one-liner, eliminates ~30k+ string allocs per 5k-mutation run
  2. Single-pass foreach in RunTestsInternalAsync — eliminates ~6 intermediate allocations on the hottest per-mutation path
  3. Stop copying _testDescriptions — eliminates a large collection copy on every mutation

Review covers: concurrency correctness, resource leaks, allocation efficiency. All 152 unit tests pass.

…t spots

- Cache _runnerId as readonly field (was allocating per access at 28 call sites)
- Add StopAndRemoveServerAsync in RunSingleTestForCoverageAsync catch to prevent coverage data leaking between tests
- Remove _serverLock.Dispose() after Release() to prevent ObjectDisposedException race
- Single-pass foreach in RunTestsInternalAsync replacing 2x ToList + multiple re-iterations
- Only register initial test results during initial run (mutantId == -1), not per-mutation
- Move TestRunResult construction inside _discoveryLock to avoid ToList() copy of _testDescriptions.Values
- Hold lock for CalculateAssemblyTimeout sum instead of copying entire dictionary
- Fix ParseMutantIds: foreach + TrimEntries instead of LINQ chain with nullable boxing
- Fix TestRunAccumulator: avoid ToList() just for Count
- Guard debug log string allocation with IsEnabled check
- Replace .Any() with .Count == 0 on IReadOnlyList in pool

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ge, unused param

- Normalize all structured log templates from {_runnerId} to {RunnerId}
- SetCoverageMode now always deletes coverage file to prevent stale data on re-entry
- Add DeleteCoverageFile() to RunSingleTestForCoverageAsync error path
- Remove unused project parameter from CaptureCoverageTestByTest

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant