Skip to content

Fix stuck pending runs on unhandled errors#4814

Open
DanielRyanSmith wants to merge 3 commits intomainfrom
fix-stuck-pending-runs
Open

Fix stuck pending runs on unhandled errors#4814
DanielRyanSmith wants to merge 3 commits intomainfrom
fix-stuck-pending-runs

Conversation

@DanielRyanSmith
Copy link
Copy Markdown
Contributor

This PR addresses the issue where PendingTestRun entities could become stuck in a pending state indefinitely if the results-processor encountered an unhandled exception or reached its retry limit in Cloud Tasks.

Changes:

  • results-processor/processor.py: Added a global try...except block to catch unexpected exceptions and update the run status to INVALID before re-raising for Cloud Tasks retries. This ensures that even if a task eventually fails permanently, the status is not left as 'processing'.
  • results-processor/processor.py: Improved logging by using _log.exception and including the original error message in the log summary for easier diagnosis.
  • api/pending_test_runs.go: Added an in-memory 14-day cutoff filter to the pending runs API. This hides stale orphaned runs that have already been dropped by the task queue from the UI.
  • results-processor/cleanup_stuck_runs.py: Added a utility script to manually mark old stuck runs as INVALID in Datastore.
  • Tests: Added test cases for the 14-day cutoff logic and the new unhandled exception handling.

Fixes #4813

Comment thread api/pending_test_runs.go
Comment on lines +57 to +67
// Only show runs updated within the last 14 days to avoid stuck runs
// that have been dropped from the task queue.
if filter == "pending" {
cutoff := time.Now().Add(-14 * 24 * time.Hour)
filteredRuns := make([]shared.PendingTestRun, 0)
for _, run := range runs {
if run.Updated.After(cutoff) {
filteredRuns = append(filteredRuns, run)
}
}
runs = filteredRuns
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know if this was the best path forward, but the other option is to create a script or something that manually changes the pending status of the existing runs.

- Fix E302, E305 (blank lines) and W293 (blank line whitespace) in cleanup_stuck_runs.py.
- Fix E501 (line too long) in cleanup_stuck_runs.py, processor.py, and processor_test.py.
- Remove unused sys import (F401) in processor.py.
@DanielRyanSmith DanielRyanSmith force-pushed the fix-stuck-pending-runs branch from b992901 to 6c0a26e Compare March 20, 2026 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PendingTestRun entities stuck perpetually if results-processor fails with unhandled exceptions

1 participant