feat(MTP): per-test coverage analysis for MTP runner#1
feat(MTP): per-test coverage analysis for MTP runner#1piotr-nawrot-golba-music wants to merge 9 commits intomasterfrom
Conversation
…ureCoverage Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ution MTP runner now captures per-test coverage by running each test in an isolated process. When coverage-analysis is set to perTest or perTestInIsolation, each test gets its own MTP server process. The server is stopped after each test, triggering MutantControl.FlushCoverageToFile() via ProcessExit, and the resulting coverage file is read to build per-test coverage results. This enables Stryker to determine which tests cover which mutants for MTP-based frameworks (xUnit v3, TUnit, MSTest with MTP, NUnit with MTP), unlocking coverage-based test optimization that was previously only available with VsTest. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
piotr-nawrot-golba-music
left a comment
There was a problem hiding this comment.
PR Review: MTP Per-Test Coverage Capture
✅ Overall Verdict: Architecture is sound, will work for xUnit + MTP with file coverage
All 150 unit tests pass. The core flow (start server → run one test → stop server → ProcessExit triggers FlushCoverageToFile → read file) is correctly implemented and framework-agnostic — xUnit v3, TUnit, MSTest, NUnit via MTP all work the same way.
🟢 What works well
-
Correct routing:
CaptureCoveragecorrectly routesperTest→CoverageBasedTest(Normal confidence) andperTestInIsolation→CaptureCoveragePerTest | CoverageBasedTest(Exact confidence). Theall/offmodes correctly fall through to aggregate path. -
Process isolation is correct:
StopAndRemoveServerAsyncremoves the server from_assemblyServers, soGetOrCreateServerAsyncalways starts a fresh process for the next test. Each process gets its ownMutantControlstatic state — no cross-test contamination. -
No race conditions on coverage files: Each runner has a unique
_coverageFilePath(stryker-coverage-{id}.txt), andWaitServerProcessExitAsyncblocks until the process exits (guaranteeingFlushCoverageToFilecompletes before the read). -
File format consistency: Writer (
MutantControl) and reader (ReadCoverageData) both use the"covered;static"format with comma-separated IDs. ✅ -
Integration with
CoverageAnalyser: ReturnsICoverageRunResultwith the same structure as VsTest — the analyser processes them identically.
🟡 Issues to address
1. Silent coverage failure masks mutants as uncovered (Medium severity)
In RunSingleTestForCoverageAsync (SingleMicrosoftTestPlatformRunner.cs:134-172):
If StopAsync times out and force-kills the process, FlushCoverageToFile never runs. ReadCoverageData returns empty arrays, but the result still gets Normal/Exact confidence. The CoverageAnalyser then believes this test covers zero mutants — effectively hiding those mutants from testing.
Suggestion: After ReadCoverageData, if both lists are empty, consider logging a warning or downgrading to Dubious confidence. This way the analyser won't trust the empty result and will still run those mutants against this test.
2. Misleading test name: RunSingleTestForCoverageAsync_ShouldReturnDubious_WhenNoCoverageFile (Low severity)
This test (SingleMicrosoftTestPlatformRunnerCoverageTests.cs:554-570) tests ReadCoverageData() returning empty arrays — it does not test the CoverageConfidence.Dubious path. The Dubious confidence is only set in the catch block of RunSingleTestForCoverageAsync (line 160-170), which this test doesn't exercise. Similarly, RunSingleTestForCoverageAsync_ShouldReturnCoverageFromFile (line 518) only tests ReadCoverageData, not the actual RunSingleTestForCoverageAsync method.
Suggestion: Rename tests to match what they actually test (ReadCoverageData_...), or add a test that exercises the real RunSingleTestForCoverageAsync exception path via a TestableRunner with a throwing coverageHandler.
3. Missing test for the exception/Dubious path (Low severity)
There's no test that verifies RunSingleTestForCoverageAsync returns CoverageConfidence.Dubious when an exception occurs (e.g., server fails to start, test fails to run). Consider adding a test using TestableRunner with a coverageHandler that throws.
4. Performance consideration — no issue, but worth documenting
Starting/stopping one process per test means N process startups for N tests. For large test suites (1000+ tests), this will be noticeably slower than VsTest's in-process data collector approach. This is an acceptable trade-off since MTP doesn't support data collectors, but worth noting in documentation so users understand the impact of perTest/perTestInIsolation with MTP runners.
🟢 Summary
| Aspect | Status |
|---|---|
| Per-test server isolation | ✅ Correct |
| ProcessExit → FlushCoverageToFile | ✅ Correct |
| Coverage file read/write | ✅ Correct, format-consistent |
| Race conditions | ✅ None found |
| xUnit + MTP compatibility | ✅ Framework-agnostic |
| Integration with CoverageAnalyser | ✅ Correct |
| Error handling | |
| Test coverage of new code | RunSingleTestForCoverageAsync |
Bottom line: Yes, running mutations with xUnit and MTP with file coverage watching will work. The architecture is correct and the flow is sound. The issues above are about edge-case resilience and test accuracy, not correctness of the happy path.
When RunSingleTestForCoverageAsync gets empty coverage data (e.g., server force-killed before FlushCoverageToFile ran), the result now correctly uses CoverageConfidence.Dubious instead of the requested confidence level. This prevents silently marking mutants as uncovered when coverage capture failed. Also fixes misleading test names (renamed RunSingleTestForCoverageAsync_* to ReadCoverageData_* where they only tested ReadCoverageData), and adds proper tests for the Dubious confidence paths through the pool. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
piotr-nawrot-golba-music
left a comment
There was a problem hiding this comment.
Code Review: MTP Per-Test Coverage Capture
Verdict: No significant issues found. The implementation is correct and well-structured.
I thoroughly investigated the concurrency model, coverage file IPC coordination, error handling paths, and data parsing robustness. All 152 unit tests pass.
What I verified:
-
Concurrency is safe. \Parallel.ForEach\ with \MaxDegreeOfParallelism = _countOfRunners\ combined with \RunThisAsync's \ConcurrentBag.TryTake/Add\ pattern guarantees each runner instance is used by exactly one thread at a time. No race conditions on _assemblyServers\ (protected by _serverLock), and no contention on coverage files (unique path per runner via \stryker-coverage-{id}.txt).
-
Coverage file coordination is correct. Runner writes \STRYKER_COVERAGE_FILE\ env var (filename only), child test process reconstructs full path via \Path.GetTempPath(). Both sides resolve to the same file. Verified the flow: \DeleteCoverageFile → GetOrCreateServer → RunTest → StopAndRemoveServer → ReadCoverageData → DeleteCoverageFile.
-
Error handling is comprehensive. Force-killed process (30s timeout in \StopAsync) → empty coverage file → \Dubious\ confidence. Server startup failure → caught by \RunSingleTestForCoverageAsync\ catch-all → \Dubious. \ReadCoverageData\ handles missing file, empty content, malformed data (\int.TryParse\ + filter), and I/O exceptions.
-
Routing logic is correct. \CoverageBasedTest\ flag → per-test with \Normal\ confidence. \CoverageBasedTest | CaptureCoveragePerTest\ → per-test with \Exact\ confidence. Other modes → aggregate fallback.
-
\ParseMutantIds\ is robust against partial writes — uses \TryParse\ with \Where(HasValue)\ to silently skip non-numeric entries from a partially-flushed file.
Concurrency Improvement SuggestionsThese aren't blocking issues — the current code is correct given the pool's single-threaded-per-runner invariant — but they'd make the concurrency model more robust and efficient. 1. \RunThisAsync\ — Replace \AutoResetEvent\ with \SemaphoreSlim\ (high value)File: \MicrosoftTestPlatformRunnerPool.cs, lines 239–276 \RunThisAsync\ currently uses \AutoResetEvent.WaitOne(1000)\ in a polling loop to wait for an available runner. This blocks a ThreadPool thread for up to 1 second per iteration while waiting. Under high concurrency (many mutations queued via \Parallel.ForEach\ or concurrent \TestMultipleMutantsAsync\ calls), this can cause ThreadPool starvation. \\csharp Suggestion: Replace \AutoResetEvent _runnerAvailableHandler\ with \SemaphoreSlim(_countOfRunners, _countOfRunners)\ and use \�wait WaitAsync()\ instead of the polling loop. This makes runner checkout fully async (freeing the thread back to the pool while waiting) and eliminates the 1-second polling granularity: \\csharp private async Task RunThisAsync(Func<SingleMicrosoftTestPlatformRunner, Task> task) 2. \GetOrCreateServerAsync\ — Replace dual \lock\ with \SemaphoreSlim(1,1)\ (correctness improvement)File: \SingleMicrosoftTestPlatformRunner.cs, \GetOrCreateServerAsync\ (lines ~314–340) This method has a TOCTOU (time-of-check-to-time-of-use) pattern — it acquires \lock(_serverLock)\ to check the cache, releases it, does \�wait server.StartAsync(), then re-acquires the lock to store the server: \\csharp If two threads call this for the same assembly simultaneously, both miss the cache, both create and start a server, and the second _assemblyServers[assembly] = server\ silently overwrites the first — leaking a running server process that is never stopped or disposed. This is currently safe because \RunThisAsync\ guarantees single-threaded access per runner instance. But the invariant is enforced at the pool level, not the class level — making it fragile if the runner is ever used differently. Suggestion: Replace \object _serverLock\ with \SemaphoreSlim(1,1)\ and hold it across the entire check-create-start-store sequence: \\csharp private async Task GetOrCreateServerAsync(string assembly) } This eliminates the race structurally, regardless of how the runner is used. 3. Minor: \CalculateAssemblyTimeout\ — redundant lock acquisition per LINQ elementFile: \SingleMicrosoftTestPlatformRunner.cs, \CalculateAssemblyTimeout\ This method acquires _discoveryLock\ once per element inside a .Sum()\ LINQ lambda. Since _testsByAssembly\ is a plain \Dictionary, each iteration re-acquires the lock to read from it. It would be cleaner (and marginally faster for large test suites) to snapshot the dictionary once under a single lock acquisition, then iterate the snapshot outside the lock. |
… for async-safe concurrency - Replace AutoResetEvent with SemaphoreSlim in RunnerPool.RunThisAsync to avoid blocking thread-pool threads during runner checkout - Replace object _serverLock with SemaphoreSlim(1,1) in SingleRunner to enable holding the lock across await in GetOrCreateServerAsync, eliminating the TOCTOU race in the check-create-start-store pattern - Fix CalculateAssemblyTimeout to snapshot _testDescriptions once instead of acquiring _discoveryLock per LINQ element Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…RunThisAsync Replace recursive self-call with a while loop sharing a single CancellationTokenSource so the 300-second timeout acts as a hard upper bound across all retries, preventing potential infinite loops if the semaphore/bag invariant is ever broken. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Review Round 3 — Final Review + Allocation AnalysisPrevious Comments Status ✅All 3 suggestions from the first review have been addressed:
The recursive retry bug found in review round 2 was also fixed → bounded 🐛 Bug: Stale server in error path leaks coverage data between testsFile: If Fix: Add best-effort cleanup in the catch block: catch (Exception ex)
{
_logger.LogWarning(ex, "{RunnerId}: Failed to capture coverage for test {TestId}", RunnerId, testId);
try { await StopAndRemoveServerAsync(assembly).ConfigureAwait(false); }
catch { /* best-effort cleanup */ }
return CoverageRunResult.Create(testId, CoverageConfidence.Dubious, ...);
}
|
| # | Location | Issue | Severity | Fix |
|---|---|---|---|---|
| 1 | RunnerId property (line 36) |
$"MtpRunner-{_id}" allocates a new string on every access (~28 call sites, ~6+ per mutation) |
🔴 High | Cache as readonly string _runnerId in constructor |
| 2 | RunTestsInternalAsync (lines 599–616) |
2x .ToList() + 3 LINQ iterators + re-filter for failures + per-test string interpolation. Re-iterates finishedTests 3 additional times |
🔴 High | Single foreach loop building all lists in one pass |
| 3 | _testDescriptions.Values.ToList() (line 630) |
Full collection copy per-mutation per-assembly | 🔴 High | Pass .Values directly (already a snapshot of refs), or cache and invalidate on discovery |
| 4 | CalculateAssemblyTimeout (line 396) |
new Dictionary<>(_testDescriptions) copies entire dictionary per-mutation |
🔴 High | Hold lock for the brief .Sum() instead of copying |
| 5 | CalculateAssemblyTimeout (lines 400–401) |
ContainsKey + TryGetValue = double hash lookup per test node |
🟡 Low | Just TryGetValue in the Sum, drop the Where |
| 6 | Debug log (line 87) | string.Join(",", mutants.Select(...)) evaluates even when Debug is off |
🟠 Medium | Guard with if (_logger.IsEnabled(LogLevel.Debug)) |
| 7 | ParseMutantIds (lines 303–308) |
3 LINQ iterators + int? nullable boxing + .Trim() per token |
🟠 Medium | foreach with StringSplitOptions.TrimEntries |
| 8 | RegisterInitialTestResult (line 607) |
new MtpTestResult(duration) per-test per-mutation — overwrites "initial" on every mutation |
🟠 Medium | Only register during initial test run, not mutation runs |
| 9 | RunTestsInternalAsync (lines 604–606) |
ContainsKey + indexer [] = double dictionary lookup per test |
🟡 Low | Use TryGetValue |
| 10 | assemblies.Any() (lines 87, 232) |
Allocates enumerator on IReadOnlyList |
🟡 Low | Use .Count == 0 |
Top 3 highest-impact fixes:
- Cache
RunnerId— trivial one-liner, eliminates ~30k+ string allocs per 5k-mutation run - Single-pass
foreachinRunTestsInternalAsync— eliminates ~6 intermediate allocations on the hottest per-mutation path - Stop copying
_testDescriptions— eliminates a large collection copy on every mutation
Review covers: concurrency correctness, resource leaks, allocation efficiency. All 152 unit tests pass.
…t spots - Cache _runnerId as readonly field (was allocating per access at 28 call sites) - Add StopAndRemoveServerAsync in RunSingleTestForCoverageAsync catch to prevent coverage data leaking between tests - Remove _serverLock.Dispose() after Release() to prevent ObjectDisposedException race - Single-pass foreach in RunTestsInternalAsync replacing 2x ToList + multiple re-iterations - Only register initial test results during initial run (mutantId == -1), not per-mutation - Move TestRunResult construction inside _discoveryLock to avoid ToList() copy of _testDescriptions.Values - Hold lock for CalculateAssemblyTimeout sum instead of copying entire dictionary - Fix ParseMutantIds: foreach + TrimEntries instead of LINQ chain with nullable boxing - Fix TestRunAccumulator: avoid ToList() just for Count - Guard debug log string allocation with IsEnabled check - Replace .Any() with .Count == 0 on IReadOnlyList in pool Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ge, unused param
- Normalize all structured log templates from {_runnerId} to {RunnerId}
- SetCoverageMode now always deletes coverage file to prevent stale data on re-entry
- Add DeleteCoverageFile() to RunSingleTestForCoverageAsync error path
- Remove unused project parameter from CaptureCoverageTestByTest
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Implements per-test coverage capture for the Microsoft Testing Platform (MTP) test runner by running each test in an isolated server process.
Key changes:
SingleMicrosoftTestPlatformRunner.StopAndRemoveServerAsync()— stops server and removes from cache, triggering ProcessExit coverage flushSingleMicrosoftTestPlatformRunner.RunSingleTestForCoverageAsync()— runs one test, stops server, reads per-test coverage fileMicrosoftTestPlatformRunnerPool.CaptureCoverageTestByTest()— iterates all tests using the runner pool for parallelismCaptureCoverage()routing — uses per-test capture whenCoverageBasedTestflag is set, aggregate otherwiseNormalforperTest,ExactforperTestInIsolationWhy process restart: MTP doesn't have an in-process data collector like VsTest's
CoverageCollector. SinceMutantControlonly flushes coverage data onProcessExit, the most reliable way to get per-test coverage is to stop and restart the server between tests. This is a one-time cost during the coverage capture phase.Test plan
dotnet test src/Stryker.TestRunner.MicrosoftTestPlatform.UnitTest/(150 pass)dotnet build src/Stryker.slnx(0 errors)--coverage-analysis perTest --test-runner mtp--coverage-analysis perTestInIsolation --test-runner mtp--coverage-analysis all --test-runner mtp(aggregate fallback)Related