Fix: Prevent Orphaned Submissions When SQS Publish Fails#5009
Fix: Prevent Orphaned Submissions When SQS Publish Fails#5009WHOIM1205 wants to merge 6 commits into
Conversation
|
Hey @RishabhJain2018 I fixed an issue in the submission flow where a submission could get saved but never reach the evaluation queue if the SQS publish failed. That was leaving submissions stuck forever and eating into user quota. The fix makes sure we either queue the submission successfully or clean it up properly on failure. I also added tests to cover the failure cases and make sure the existing behavior stays the same. Would love your thoughts or any feedback on the approach. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #5009 +/- ##
==========================================
+ Coverage 92.15% 92.16% +0.01%
==========================================
Files 87 87
Lines 7376 7386 +10
==========================================
+ Hits 6797 6807 +10
Misses 579 579
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 6 files with indirect coverage changes
... and 6 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
| submission.pk, | ||
| challenge_id, | ||
| ) | ||
| submission.delete() |
There was a problem hiding this comment.
Hey @WHOIM1205 , deleting the submission isn't a good idea. Maybe set it as cancelled?
There was a problem hiding this comment.
Good point, agreed
I’ve updated the logic to cancel the submission instead of deleting it and pushed the changes. Thanks for calling this out!
Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>
a4efa13 to
992dd40
Compare
|
@RishabhJain2018 is there anything i can fix in this pr |
|
Can you please check why the Travis build is failing? |
Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>
Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>
Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>
I am not able to fix this |
|
What is the issue? |
Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>
|
Hey @WHOIM1205 , Please try restarting the build. |

Description
This PR fixes a critical consistency bug in EvalAI’s submission pipeline where a submission could be successfully saved to the database but never queued for evaluation if publishing the SQS message failed.
Previously, the database write and SQS publish were performed non-atomically. Any failure during the SQS publish step (network issues, AWS credential expiry, throttling, or outages) resulted in orphaned submissions that remained permanently stuck in
submittedstate and silently consumed the participant’s submission quota.This change ensures that a submission is either:
No orphaned submissions are left behind under any failure scenario.
What Was Fixed
logger.exception()Why This Is Important
This bug primarily affected high-traffic moments like competition deadlines, when SQS failures are most likely.
Without this fix:
With this fix:
Code Changes
Modified File
apps/jobs/views.pyUpdated Function
challenge_submission(POST handler)Summary of Change
publish_submission_messageNo model changes.
No migrations required.
No changes to SQS message format.
Test Coverage
New unit tests were added to verify both failure and success paths.
Added Tests
test_challenge_submission_cleans_up_on_publish_failuretest_challenge_submission_handles_sqs_endpoint_failuretest_challenge_submission_preserves_quota_on_publish_failuretest_challenge_submission_returns_201_when_publish_succeedsWhat These Tests Validate
EndpointConnectionError) are handledAll tests pass successfully.
Impact After Fix
Notes for Reviewers
This change eliminates a production-critical failure mode in EvalAI’s core submission flow.