precice · PranjalManhgaye · May 27, 2026 · May 27, 2026 · May 27, 2026 · May 27, 2026
diff --git a/.github/workflows/generate_reference_results_workflow.yml b/.github/workflows/generate_reference_results_workflow.yml
@@ -92,14 +92,17 @@ jobs:
         fi
         git commit -m "${{inputs.commit_msg}}"
         git push
-    - name: Archive main log files
+    - name: Archive system test logs
       if: ${{ always() }}
       uses: actions/upload-artifact@v7
       with:
         name: system_tests_run_${{ github.run_id }}_${{ github.run_attempt }}_reference_logs
         path: |
           runs/*/system-tests-stdout.log
           runs/*/system-tests-stderr.log
+          runs/*/system-tests-build.log
+          runs/*/system-tests-run.log
+          runs/*/system-tests-compare.log
           runs/*/*/system-tests_*.log
     - name: Archive run files
       if: failure()

diff --git a/.github/workflows/run_testsuite_workflow.yml b/.github/workflows/run_testsuite_workflow.yml
@@ -78,14 +78,17 @@ jobs:
         cd tools/tests
         python systemtests.py --build_args=${{ inputs.build_args}} --suites=${{ inputs.suites}} --log-level=${{ inputs.log_level}}
         cd ../../
-    - name: Archive main log files
+    - name: Archive system test logs
       if: ${{ always() }}
       uses: actions/upload-artifact@v7
       with:
         name: system_tests_run_${{ github.run_id }}_${{ github.run_attempt }}_logs
         path: |
           runs/*/system-tests-stdout.log
           runs/*/system-tests-stderr.log
+          runs/*/system-tests-build.log
+          runs/*/system-tests-run.log
+          runs/*/system-tests-compare.log
           runs/*/*/system-tests_*.log
     - name: Archive run files
       if: ${{ failure() || inputs.upload_artifacts == 'TRUE' }}

diff --git a/changelog-entries/801.md b/changelog-entries/801.md
@@ -0,0 +1 @@
+- System tests now write logs during each run (build, run, compare), including `system-tests-build.log`, `system-tests-run.log`, and `system-tests-compare.log` [#801](https://github.com/precice/tutorials/pull/801).
diff --git a/tools/tests/README.md b/tools/tests/README.md
@@ -36,7 +36,7 @@ Workflow for the preCICE v3 release testing:
 
 6. Download the build artifacts from Summary > runs.
 
-    - In there, you may want to check the `system-tests-stdout.log` and `system-tests-stderr.log` files.
+    - In there, inspect the combined logs (`system-tests-stdout.log`, `system-tests-stderr.log`) and the per-stage logs (`system-tests-build.log`, `system-tests-run.log`, `system-tests-compare.log`).
     - The produced results are in `precice-exports/`, the reference results in `reference-results-unpacked`.
     - Compare using, e.g., ParaView or [fieldcompare](https://gitlab.com/dglaeser/fieldcompare): `fieldcompare dir precice-exports/ reference/`. The `--diff` option will give you `precice-exports/diff_*.vtu` files, while you can also try different tolerances with `-rtol` and `-atol`.
 
@@ -105,7 +105,7 @@ In this case, building and running seems to work out, but the tests fail because
 
 The easiest way to debug a systemtest run is first to have a look at the output written into the action on GitHub.
 If this does not provide enough hints, the next step is to download the generated `system_tests_run_<run_id>_<run_attempt>` artifact. Note that by default this will only be generated if the systemtests fail.
-Inside the archive, a test-specific subfolder like `flow-over-heated-plate_fluid-openfoam-solid-fenics_2023-11-19-211723` contains two log files: `system-tests-stderr.log` and `system-tests-stdout.log`. This can be a starting point for a further investigation. When fieldcompare runs with `--diff`, it writes VTK (.vtu) diff files under `precice-exports/`; if the comparison fails, those files are copied into a `diff-results/` subfolder in the same run directory (mirroring any subpaths under `precice-exports/`) so you can open them (e.g. in ParaView) to see where results differ from the reference. On successful comparisons, `diff-results/` is therefore absent.
+Inside the archive, a test-specific subfolder like `flow-over-heated-plate_fluid-openfoam-solid-fenics_2023-11-19-211723` contains combined logs (`system-tests-stderr.log`, `system-tests-stdout.log`) and per-stage logs (`system-tests-build.log`, `system-tests-run.log`, `system-tests-compare.log`). These are a good starting point for further investigation. When fieldcompare runs with `--diff`, it writes VTK (.vtu) diff files under `precice-exports/`; if the comparison fails, those files are copied into a `diff-results/` subfolder in the same run directory (mirroring any subpaths under `precice-exports/`) so you can open them (e.g. in ParaView) to see where results differ from the reference. On successful comparisons, `diff-results/` is therefore absent.
 
 ## Adding new tests
 

diff --git a/tools/tests/systemtests.py b/tools/tests/systemtests.py
@@ -7,9 +7,20 @@
 from metadata_parser.metdata import Tutorials, Case
 import logging
 import time
+import os
+import sys
 from paths import PRECICE_TUTORIAL_DIR, PRECICE_TESTS_RUN_DIR, PRECICE_TESTS_DIR
 
 
+class _ConsoleLogFormatter(logging.Formatter):
+    """Omit level prefix for INFO/DEBUG; keep it for warnings and errors."""
+
+    def format(self, record: logging.LogRecord) -> str:
+        if record.levelno >= logging.WARNING:
+            return f"{record.levelname}: {record.getMessage()}"
+        return record.getMessage()
+
+
 def main():
     parser = argparse.ArgumentParser(description='systemtest')
 
@@ -29,12 +40,30 @@ def main():
     # Parse the command-line arguments
     args = parser.parse_args()
 
-    # Configure logging based on the provided log level
-    logging.basicConfig(level=args.log_level, format='%(levelname)s: %(message)s')
+    # Configure logging
+    handler = logging.StreamHandler()
+    handler.setFormatter(_ConsoleLogFormatter())
+    logging.basicConfig(level=args.log_level, handlers=[handler])
+
+    gh_actions = os.environ.get("GITHUB_ACTIONS", "").lower() == "true"
+    # Skip ANSI colors when TERM is unset or "dumb" (minimal terminal, common in CI).
+    ansi_colors = sys.stdout.isatty() and os.environ.get("TERM", "") not in {"", "dumb"}
+
+    def _style(text: str, color_code: int | None) -> str:
+        if not ansi_colors or color_code is None:
+            return text
+        return f"\x1b[{color_code}m{text}\x1b[0m"
+
+    def _group_start(title: str) -> None:
+        if gh_actions:
+            print(f"::group::{title}", flush=True)
 
-    print(f"Using log-level: {args.log_level}")
+    def _group_end() -> None:
+        if gh_actions:
+            print("::endgroup::", flush=True)
 
     systemtests_to_run = []
+    test_suites_to_execute = []
     available_tutorials = Tutorials.from_path(PRECICE_TUTORIAL_DIR)
 
     build_args = SystemtestArguments.from_args(args.build_args)
@@ -43,7 +72,6 @@ def main():
         test_suites_requested = args.suites.split(',')
         available_testsuites = TestSuites.from_yaml(
             PRECICE_TESTS_DIR / "tests.yaml", available_tutorials)
-        test_suites_to_execute = []
         for test_suite_requested in test_suites_requested:
             test_suite_found = available_testsuites.get_by_name(
                 test_suite_requested)
@@ -72,17 +100,47 @@ def main():
     if not systemtests_to_run:
         raise RuntimeError("Did not find any Systemtests to execute.")
 
-    logging.info(f"About to run the following systemtest in the directory {run_directory}:\n {systemtests_to_run}")
+    total = len(systemtests_to_run)
+
+    if test_suites_to_execute:
+        print("Selected test suite(s):", flush=True)
+        print(flush=True)
+        for test_suite in test_suites_to_execute:
+            print(f"- {test_suite.name}", flush=True)
+        print(flush=True)
+
+    print(f"About to run {total} test(s) in the directory {run_directory}:", flush=True)
+    print(flush=True)
+    for number, systemtest in enumerate(systemtests_to_run, start=1):
+        print(f"{number}. {systemtest}", flush=True)
+    print(flush=True)
+    print(f"Using log-level: {args.log_level}", flush=True)
 
     results = []
     for number, systemtest in enumerate(systemtests_to_run, start=1):
-        logging.info(f"Started running {systemtest},  {number}/{len(systemtests_to_run)}")
-        t = time.perf_counter()
-        result = systemtest.run(run_directory)
-        elapsed_time = time.perf_counter() - t
-        logging.info(f"Running {systemtest} took {elapsed_time:^.1f} seconds")
+        print(flush=True)
+        started_header = f"[{number}/{total}] Started {systemtest}"
+        _group_start(started_header)
+        try:
+            if not gh_actions:
+                logging.info(started_header)
+            t = time.perf_counter()
+            result = systemtest.run(run_directory)
+            elapsed_time = time.perf_counter() - t
+
+            if result.success:
+                status_label = _style("✅ PASS", 32)
+            else:
+                status_label = _style("❌ FAIL", 31)
+        finally:
+            _group_end()
+
+        print(f"{status_label} Finished {systemtest} in {elapsed_time:.1f}s", flush=True)
+        print(f"[{number}/{total}] {systemtest}", flush=True)
+        print(flush=True)
         results.append(result)
 
+    print(flush=True)
     system_test_success = True
     for result in results:
         if not result.success:
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		- System tests now write logs during each run (build, run, compare), including `system-tests-build.log`, `system-tests-run.log`, and `system-tests-compare.log` [#801](https://github.com/precice/tutorials/pull/801).