feat: add chaos resilience evaluators (failure communication, partial completion, recovery strategy)#236
Conversation
|
Assessment: Comment Well-structured PR that follows the established evaluator pattern correctly — each evaluator extends Review Categories
Prompt templates are thorough and well-designed with clear rubrics and edge case handling. |
|
Assessment: Comment The evaluator implementations are clean, correct, and consistent with the rest of the codebase. Most issues from the previous review appear to have already been addressed (PEP 604 unions ✓, Remaining Items
Prompt templates are thoughtfully designed with clear evaluation rubrics, mandatory gate conditions, and explicit guidance on edge cases (no-failure baseline, transient vs permanent errors, goal-based vs tool-based subtask decomposition). |
poshinchen
left a comment
There was a problem hiding this comment.
I prefer to have these evaluators in strands_evals/evaluators/chaos/*. Either of below works for me, but I prefer the latter.
from strands_evals.chaos.evaluators import ...from strands_evals.evaluators.chaos import ...
|
Assessment: Comment Clean, well-structured implementation that follows the established evaluator pattern correctly and addresses all previously-flagged issues from the first review round. Remaining Items
Prompt templates are thoughtfully designed with mandatory gate conditions, error-type-aware retry evaluation, and goal-based (not tool-based) subtask decomposition. |
|
Assessment: Comment Clean, well-structured PR. The evaluator implementations correctly follow all established patterns and the previous review's critical issues have been addressed. Two remaining structural items to fix. Review Details
Prompt templates are excellent — thoughtful rubrics with mandatory gate conditions, error type taxonomies, and clear guidance on scoring edge cases. |
fd78e10 to
53f2bf5
Compare
|
Assessment: Comment Solid implementation that follows all established evaluator patterns correctly. The three evaluators, prompt templates, ChaosCase validation, and deferred cleanup items are all well-executed. Review Details
Prompt templates are thoughtfully crafted with mandatory gate conditions, error-type-aware retry evaluation, and goal-based subtask decomposition. |
Description
Adds three chaos-specific LLM-judge evaluators:
Each evaluator ships with a v0 system prompt, follows the existing Evaluator base class interface, and supports both sync and async evaluation paths.
Also addresses deferred PR #224 review comments:
Related Issues
#114
Documentation PR
strands-agents/docs#836
Type of Change
New feature
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli
hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.