Skip to content

Fix memory compressor timeout fallback#498

Open
RitwijParmar wants to merge 1 commit into
usestrix:mainfrom
RitwijParmar:ritwij/memory-compressor-timeout-fallback
Open

Fix memory compressor timeout fallback#498
RitwijParmar wants to merge 1 commit into
usestrix:mainfrom
RitwijParmar:ritwij/memory-compressor-timeout-fallback

Conversation

@RitwijParmar
Copy link
Copy Markdown

Summary

  • disable LiteLLM retry amplification for memory-compressor summarization calls
  • add a local extractive fallback summary when compressor LLM calls fail or time out
  • preserve recent scan messages while avoiding long-scan stalls caused by repeated summarization failures

Why

This addresses the reliability failure mode in #470 where long-context scans can get stuck when the memory compressor repeatedly hits LiteLLM timeouts. The fallback keeps the scan moving while retaining ordered message previews and recent operational context.

Tests

  • PYTEST_ADDOPTS='' .venv/bin/python -m pytest -o addopts='' tests/llm/test_memory_compressor.py

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 26, 2026

Greptile Summary

This PR fixes a reliability failure in long-context scans where repeated LiteLLM summarization timeouts could stall the memory compressor. It adds num_retries=0 to prevent retry amplification and introduces _build_fallback_summary, an extractive local summarizer that preserves head/tail message previews when the LLM call fails.

  • num_retries=0 is passed to litellm.completion so a single timeout does not silently trigger multiple retries.
  • _build_fallback_summary returns a context_summary message with up to 12 sampled previews (head + tail) when the LLM raises any exception; the existing exception handler now delegates to it instead of returning messages[0].
  • The empty-response branch (if not summary.strip(): return messages[0]) was not updated to use the new fallback, leaving one failure mode — an LLM that responds with blank content — still returning a raw uncompressed old message.

Confidence Score: 3/5

The change is almost complete but one branch was not updated consistently with the rest.

The empty-summary guard at line 178 still returns messages[0] directly, which is exactly the raw-old-message return the PR is designed to eliminate. An LLM that responds with an empty string triggers this path and re-creates the context inflation the fix targets. The rest of the change — disabling retries and the fallback builder — is correct and well-tested.

strix/llm/memory_compressor.py line 178 needs the same _build_fallback_summary treatment applied to the exception handler above it.

Important Files Changed

Filename Overview
strix/llm/memory_compressor.py Adds num_retries=0 to suppress LiteLLM retry amplification and introduces _build_fallback_summary as an extractive local fallback on exception; the empty-LLM-response branch at line 178 still returns messages[0] (a raw old message) instead of the fallback, leaving one failure mode unaddressed.
tests/llm/test_memory_compressor.py New test file covering retry-disable, timeout fallback, and compress_history fallback path; the empty-response branch (not summary.strip()) is not exercised so the surviving messages[0] return goes undetected.

Comments Outside Diff (1)

  1. strix/llm/memory_compressor.py, line 177-178 (link)

    P1 When the LLM returns an empty or whitespace-only response, the code still returns messages[0] — a raw, potentially large old message — rather than the new fallback. This bypasses the core fix this PR introduces: the PR prevents raw-message returns on exceptions/timeouts but leaves this branch unaddressed. An LLM that responds with an empty string (a real failure mode) would still push an uncompressed old message back into the conversation, re-creating the stall.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: strix/llm/memory_compressor.py
    Line: 177-178
    
    Comment:
    When the LLM returns an empty or whitespace-only response, the code still returns `messages[0]` — a raw, potentially large old message — rather than the new fallback. This bypasses the core fix this PR introduces: the PR prevents raw-message returns on exceptions/timeouts but leaves this branch unaddressed. An LLM that responds with an empty string (a real failure mode) would still push an uncompressed old message back into the conversation, re-creating the stall.
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
strix/llm/memory_compressor.py:177-178
When the LLM returns an empty or whitespace-only response, the code still returns `messages[0]` — a raw, potentially large old message — rather than the new fallback. This bypasses the core fix this PR introduces: the PR prevents raw-message returns on exceptions/timeouts but leaves this branch unaddressed. An LLM that responds with an empty string (a real failure mode) would still push an uncompressed old message back into the conversation, re-creating the stall.

```suggestion
        if not summary.strip():
            return _build_fallback_summary(messages)
```

Reviews (1): Last reviewed commit: "fix memory compressor timeout fallback" | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant