Skip to content

fix(core): Guard event log parsing against unbounded memory growth#28594

Merged
guillaumejacquart merged 5 commits intomasterfrom
iam-528-bug-event-log-parsing-on-startup-causes-oom-on-starter-plan-v1
Apr 21, 2026
Merged

fix(core): Guard event log parsing against unbounded memory growth#28594
guillaumejacquart merged 5 commits intomasterfrom
iam-528-bug-event-log-parsing-on-startup-causes-oom-on-starter-plan-v1

Conversation

@guillaumejacquart
Copy link
Copy Markdown
Contributor

@guillaumejacquart guillaumejacquart commented Apr 16, 2026

Summary

Adds a configurable working-set guard to readLoggedMessagesFromFile so startup recovery aborts parsing a single event log file when the in-memory message count exceeds N8N_EVENTBUS_LOGWRITER_MAXMESSAGESPERPARSE (default 10000). Healthy logs are unaffected because confirm records prune the working set as the file streams; only legacy files full of unconfirmed messages (pre-PR #27334) trip the guard, which prevents the startup crash loop on Starter-plan containers.

How to test: unit tests in message-event-bus-log-writer.test.ts cover both the bloat-abort path and the healthy paired-confirm path. To verify manually, seed ~/.n8n/n8nEventLog.log with >10k orphaned n8n.workflow.started lines, restart n8n, and observe the warn Event log ... exceeded 10000 in-memory messages during parse instead of an OOM.

Related Linear tickets, Github issues, and Community forum posts

https://linear.app/n8n/issue/IAM-528

Review / Merge checklist

  • I have seen this code, I have run this code, and I take responsibility for this code.
  • PR title and summary are descriptive. (conventions)
  • Docs updated or follow-up ticket created.
  • Tests included.
  • PR Labeled with Backport to Beta, Backport to Stable, or Backport to v1 (if the PR is an urgent fix that needs to be backported)

🤖 PR Summary generated by AI

Add a configurable working-set cap to readLoggedMessagesFromFile so that
startup recovery aborts parsing a single event log file when the in-memory
message count exceeds the configured threshold. Prevents crash loops on
instances with legacy log files containing many unconfirmed messages.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Codecov Report

❌ Patch coverage is 74.35897% with 10 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...e-event-bus-writer/message-event-bus-log-writer.ts 73.68% 10 Missing ⚠️

📢 Thoughts on this report? Let us know!

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Bundle Report

Changes will increase total bundle size by 1.52kB (0.0%) ⬆️. This is within the configured threshold ✅

Detailed changes
Bundle name Size Change
editor-ui-esm* 45.76MB 1.52kB (0.0%) ⬆️

ℹ️ *Bundle size includes cached data from a previous commit

Affected Assets, Files, and Routes:

view changes for bundle: editor-ui-esm

Assets Changed:

Asset Name Size Change Total Size Change (%)
assets/constants-*.js -200 bytes 3.14MB -0.01%
assets/index-*.js 2.03kB 1.31MB 0.15%
assets/ParameterInputList-*.js 848 bytes 1.27MB 0.07%
assets/users.store-*.js 4.73kB 1.06MB 0.45%
assets/core-*.js 253 bytes 628.2kB 0.04%
assets/useCanvasMapping-*.js 451 bytes 463.75kB 0.1%
assets/InstanceAiView-*.js 4.17kB 351.86kB 1.2%
assets/RunData-*.js 195 bytes 346.06kB 0.06%
assets/ParameterInputList-*.css 35 bytes 208.27kB 0.02%
assets/InstanceAiView-*.css 254 bytes 167.04kB 0.15%
assets/table-*.js -25.24kB 153.35kB -14.13%
assets/usePostMessageHandler-*.js 608 bytes 137.66kB 0.44%
assets/NodeView-*.js 200 bytes 137.44kB 0.15%
assets/useRootStore-*.js -175 bytes 131.07kB -0.13%
assets/WorkflowLayout-*.js -74 bytes 127.89kB -0.06%
assets/router-*.js 495 bytes 119.81kB 0.41%
assets/canvas.eventBus-*.js 167 bytes 117.56kB 0.14%
assets/NodeCreator-*.js 292 bytes 104.33kB 0.28%
assets/useCanvasOperations-*.js -33 bytes 95.35kB -0.03%
assets/VirtualSchema-*.js 38 bytes 94.5kB 0.04%
assets/NodeSettings-*.js 602 bytes 85.2kB 0.71%
assets/CanvasRunWorkflowButton-*.js 188 bytes 78.46kB 0.24%
assets/TriggerPanel-*.js 9 bytes 59.26kB 0.02%
assets/CreditWarningBanner-*.js 2.95kB 58.12kB 5.34% ⚠️
assets/SettingsInstanceAiView-*.js 1.83kB 46.46kB 4.11%
assets/useLogsTreeExpand-*.js -54 bytes 41.19kB -0.13%
assets/NodeDetailsViewV2-*.js -18 bytes 38.01kB -0.05%
assets/usePushConnection-*.js 130 bytes 31.5kB 0.41%
assets/useRunWorkflow-*.js -83 bytes 27.78kB -0.3%
assets/checkbox-*.js (New) 26.24kB 26.24kB 100.0% 🚀
assets/SettingsInstanceAiView-*.css 155 bytes 23.88kB 0.65%
assets/useCustomAgent-*.js -36 bytes 20.76kB -0.17%
assets/assistant.store-*.js -46 bytes 19.32kB -0.24%
assets/SettingsLogStreamingView-*.js -43 bytes 17.44kB -0.25%
assets/instanceAiSettings.store-*.js 738 bytes 15.73kB 4.92%
assets/InstanceAiOptinModal-*.js -12.92kB 11.82kB -52.22%
assets/useActions-*.js -49 bytes 10.36kB -0.47%
assets/pushConnection.store-*.js 38 bytes 10.14kB 0.38%
assets/InstanceAiOptinModal-*.css -3.18kB 9.28kB -25.53%
assets/usePinnedData-*.js -23 bytes 9.11kB -0.25%
assets/WorkflowPreview-*.js 213 bytes 8.18kB 2.67%
assets/ContactAdministratorToInstall-*.js -44 bytes 5.87kB -0.74%
assets/useExecutionDebugging-*.js -76 bytes 5.55kB -1.35%
assets/dist-*.js (Deleted) -5.34kB 0 bytes -100.0% 🗑️
assets/nodeIcon-*.js -26 bytes 4.7kB -0.55%
assets/NodeIcon-*.js 702 bytes 3.84kB 22.39% ⚠️
assets/DemoLayout-*.js 735 bytes 3.51kB 26.52% ⚠️
assets/useCalloutHelpers-*.js -48 bytes 3.39kB -1.4%
assets/useExpressionResolveCtx-*.js -55 bytes 1.63kB -3.27%

@guillaumejacquart guillaumejacquart marked this pull request as ready for review April 16, 2026 21:04
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 4 files

Architecture diagram
sequenceDiagram
    participant CLI as n8n Startup Process
    participant Bus as MessageEventBusLogWriter
    participant Config as GlobalConfig (Env Vars)
    participant FS as File System (Event Log)
    participant Log as Logger

    Note over CLI,FS: Startup Recovery: Replaying Unsent Events

    CLI->>Bus: readLoggedMessagesFromFile()
    
    Bus->>Config: NEW: Get N8N_EVENTBUS_LOGWRITER_MAXMESSAGESPERPARSE
    Config-->>Bus: limit (Default: 10,000)

    Bus->>FS: createReadStream(logFileName)
    
    loop For each line in file
        FS-->>Bus: event/confirm line
        Bus->>Bus: processLoggedLine()
        
        alt is Event Message
            Bus->>Bus: Add message to in-memory working set
        else is Confirm Message
            Bus->>Bus: Prune matching message from memory
        end

        alt NEW: working set size > limit
            Note right of Bus: Guard triggered (Pathological legacy log)
            Bus->>FS: CHANGED: destroy() stream
            Bus->>Log: NEW: warn("exceeded in-memory messages... aborting")
            Bus-->>CLI: Return partial results (prevents OOM)
        end
    end

    Note over Bus,FS: Healthy Path: confirm records keep working set small
    FS-->>Bus: EOF
    Bus-->>CLI: Return collected messages
Loading

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 16, 2026

Performance Comparison

Comparing currentlatest master14-day baseline

Memory consumption baseline with starter plan resources

Metric Current Latest Master Baseline (avg) vs Master vs Baseline Status
memory-rss-baseline 226.23 MB 278.98 MB 289.99 MB (σ 41.20) -18.9% -22.0% ⚠️
memory-heap-used-baseline 115.24 MB 114.41 MB 114.45 MB (σ 0.27) +0.7% +0.7% 🔴

docker-stats

Metric Current Latest Master Baseline (avg) vs Master vs Baseline Status
docker-image-size-n8n 1269.76 MB 1269.76 MB 1273.60 MB (σ 10.49) +0.0% -0.3%
docker-image-size-runners 386.00 MB 386.00 MB 392.50 MB (σ 11.06) +0.0% -1.7%

Idle baseline with Instance AI module loaded

Metric Current Latest Master Baseline (avg) vs Master vs Baseline Status
instance-ai-heap-used-baseline 186.77 MB 186.52 MB 186.43 MB (σ 0.26) +0.1% +0.2% ⚠️
instance-ai-rss-baseline 386.89 MB 389.20 MB 366.52 MB (σ 22.66) -0.6% +5.6%
How to read this table
  • Current: This PR's value (or latest master if PR perf tests haven't run)
  • Latest Master: Most recent nightly master measurement
  • Baseline: Rolling 14-day average from master
  • vs Master: PR impact (current vs latest master)
  • vs Baseline: Drift from baseline (current vs rolling avg)
  • Status: ✅ within 1σ | ⚠️ 1-2σ | 🔴 >2σ regression

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/cli/src/eventbus/message-event-bus-writer/message-event-bus-log-writer.ts">

<violation number="1" location="packages/cli/src/eventbus/message-event-bus-writer/message-event-bus-log-writer.ts:209">
P2: Do not swallow stream errors with an empty handler; it hides real log-read failures and makes recovery silently degrade.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

@n8n-assistant n8n-assistant Bot added core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team labels Apr 21, 2026
guillaumejacquart and others added 3 commits April 21, 2026 14:43
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…test

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@phyllis-noester phyllis-noester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't work for rolling updates/in multi main or with workers right? not your change but ingesting the logs from a local file in general

try {
const stream = createReadStream(logFileName);
stream.on('error', (error) => {
if ((error as NodeJS.ErrnoException).code !== 'ERR_STREAM_DESTROYED') {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we swallow all errors and never throw?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, errors are logged but not propagated. I think this makes sense, as we don't want this recovery process (furthermore a single log file reading) to hard fail the whole instance. It's a not critical recovery AFAIK (for instance, we've allowed ourselves to tamper with those files in cloud medic system to solve instances failures)

@guillaumejacquart
Copy link
Copy Markdown
Contributor Author

this doesn't work for rolling updates/in multi main or with workers right? not your change but ingesting the logs from a local file in general

Yes you're right, this feature is built upon single node scope with persistent disk. each instance would try and recover. I think it's fine though (although probably improvable) as instance will replay the failed events that they initiated (so a main instance would reply a workflow execution failure, which is expected)

@guillaumejacquart guillaumejacquart added this pull request to the merge queue Apr 21, 2026
Merged via the queue into master with commit a817cbc Apr 21, 2026
108 of 110 checks passed
@guillaumejacquart guillaumejacquart deleted the iam-528-bug-event-log-parsing-on-startup-causes-oom-on-starter-plan-v1 branch April 21, 2026 15:54
@n8n-assistant n8n-assistant Bot mentioned this pull request Apr 28, 2026
@n8n-assistant
Copy link
Copy Markdown
Contributor

n8n-assistant Bot commented Apr 28, 2026

Got released with n8n@2.19.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team Released

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants