feat: enhance prefork functionality with new configuration options and improved child process recovery#4032
feat: enhance prefork functionality with new configuration options and improved child process recovery#4032ReneWerner87 wants to merge 3 commits intomainfrom
Conversation
…d improved child process recovery
Summary of ChangesHello @ReneWerner87, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly upgrades Fiber's prefork capabilities, making it more robust and adaptable. It introduces intelligent handling for systems lacking SO_REUSEPORT by falling back to file descriptor sharing, and enhances application stability through automatic child process recovery. These changes provide developers with greater control and reliability when deploying Fiber applications in prefork mode. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThis PR enhances prefork: detects SO_REUSEPORT support and falls back to FD-based listener sharing when absent; adds configurable child recovery (DisableChildRecovery, MaxChildRecoveries) and DisableReuseportFallback; enables prefork with custom listeners; and updates docs and tests for context-based graceful shutdown. Changes
Sequence Diagram(s)sequenceDiagram
participant Master as Master Process
participant Kernel as Kernel/OS
participant Child as Child Process
participant Monitor as Recovery Monitor
rect rgba(100, 150, 255, 0.5)
Note over Master,Kernel: Startup — SO_REUSEPORT detection & listener setup
Master->>Kernel: Test SO_REUSEPORT support
Kernel-->>Master: Success or Error
alt SO_REUSEPORT Available
Master->>Kernel: Bind per-child listeners with SO_REUSEPORT
else SO_REUSEPORT Unavailable
Master->>Kernel: Bind single listener
Note over Master: Share listener FD via ExtraFiles to children
end
end
rect rgba(150, 200, 100, 0.5)
Note over Master,Child: Child spawning & listener creation
Master->>Child: Spawn child (with inherited FD if fallback)
Child->>Kernel: Create listener (inherited FD or reuseport bind)
Child-->>Master: Signal ready
end
rect rgba(200, 150, 100, 0.5)
Note over Monitor,Master: Runtime — crash detection & recovery
Monitor->>Monitor: Wait for child exit
alt Child Crashes
Monitor->>Master: Notify crash
alt Recovery enabled and under limit
Master->>Master: Increment recoveryCount
Master->>Child: Re-spawn child
else Recovery exhausted or disabled
Master->>Master: Log and propagate error
end
else Graceful Shutdown
Master->>Child: Signal shutdown
Child->>Kernel: Close listener
end
end
sequenceDiagram
participant App as App (ListenConfig)
participant Listen as Listen()
participant CustomLn as Custom Listener
participant Prefork as Prefork Logic
rect rgba(100, 200, 150, 0.5)
Note over App,Prefork: Custom Listener with Prefork
App->>Listen: Call with CustomListener + EnablePrefork=true
Listen->>CustomLn: Extract network & address, get TLS config
alt Network is tcp/tcp4/tcp6
Listen->>CustomLn: Close provided listener
Listen->>Prefork: Delegate with address/network/TLS
Prefork->>Prefork: Execute prefork flow
else Invalid Network
Listen->>Listen: Log warning and skip prefork
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request significantly enhances the prefork functionality by introducing new configuration options for child process recovery and SO_REUSEPORT fallback. The changes include robust mechanisms for automatically restarting crashed child processes and gracefully handling systems without SO_REUSEPORT support through file descriptor sharing. The documentation has been thoroughly updated to reflect these new features, providing clear explanations and code examples. Test cases have also been adapted to integrate the new graceful shutdown context and prefork behaviors, ensuring the stability and correctness of the new implementation.
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
hooks_test.go (1)
557-587: Missingt.Parallel()call in test function.As per the coding guidelines, all tests should invoke
t.Parallel()at the start to maximize concurrency.Proposed fix
func Test_Hook_OnHook(t *testing.T) { + t.Parallel() + app := New()
🤖 Fix all issues with AI agents
In `@docs/api/fiber.md`:
- Around line 158-162: Add blank lines immediately before and after the "How it
works:" list to satisfy markdownlint MD032; specifically, update the "How it
works:" section (the heading text "How it works:" and the following bullet list
that mentions SO_REUSEPORT and DisableReuseportFallback) so there is an empty
line between the heading and the first list item and another empty line after
the last list item.
- Around line 183-188: The "Child Process Recovery" section's bullet list needs
blank lines before and after it to satisfy markdownlint MD032; update the "Child
Process Recovery:" paragraph so there is an empty line between the heading line
("Child Process Recovery:") and the list start, and add an empty line after the
list (after `MaxChildRecoveries: N`) so the list is surrounded by blank lines.
In `@prefork.go`:
- Around line 57-65: In testReuseportSupport, wrap the error returned by
reuseport.Listen using fmt.Errorf with %w (e.g., fmt.Errorf("reuseport listen %s
%s: %w", network, addr, err)) and check the return value of ln.Close(); if Close
returns an error return it wrapped (e.g., fmt.Errorf("close listener: %w",
errClose)). Update imports to include fmt if not present. Ensure the function
returns nil only when both Listen and Close succeed.
- Around line 87-90: Replace the static error creation using fmt.Errorf with
errors.New: change the returned error in the type-assertion block (where
inheritedLn is asserted to *net.TCPListener and assigned to tcpLn) from
fmt.Errorf("prefork: inherited listener is not a TCP listener") to
errors.New("prefork: inherited listener is not a TCP listener"), and add the
errors package to the imports if it's not already present.
- Around line 264-277: Replace the classic C-style loop with a range-based loop:
change the loop that spawns children (currently using for i := 0; i < maxProcs;
i++) to iterate with for i := range make([]struct{}, maxProcs) { ... } and keep
the body intact (calls to startChildProcess(app, inheritedLn), assignments to
children[child.pid], appending to childPIDs, and the goroutine that sends
childEvent{pid: c.pid, err: c.cmd.Wait()} on childEvents). This modernizes
iteration while preserving the use of maxProcs, startChildProcess, children map,
childPIDs slice, childInfo, childEvents channel and childEvent struct.
- Around line 205-226: Refactor the if/else chain that handles the result of
testReuseportSupport(cfg.ListenerNetwork, addr) into a switch (e.g. switch {
case isReusePortError(err) && !cfg.DisableReuseportFallback: ... case
isReusePortError(err): ... default: ... }) to improve clarity, and in the
deferred cleanup for inheritedLn replace the ignored Close with error-checked
Close (e.g. if inheritedLn != nil { if cerr := inheritedLn.Close(); cerr != nil
{ log.Warnf("[prefork] failed to close inherited listener: %v", cerr) } }) so
the Close error is not swallowed; keep references to testReuseportSupport,
isReusePortError, cfg.DisableReuseportFallback, and the inheritedLn variable
unchanged.
- Around line 146-163: Replace the static fmt.Errorf call with errors.New for
the "failed to recreate listener" case and check the return value of f.Close():
after creating ln from net.FileListener(f) and before returning nil, call if
cerr := f.Close(); cerr != nil { return fmt.Errorf("prefork: failed to close
file: %w", cerr) } so the file close error is handled; ensure you add the errors
import and keep the existing fmt.Errorf usage where errors are wrapped (e.g.,
net.FileListener error).
🧹 Nitpick comments (3)
prefork_test.go (1)
56-92: Consider addingt.Parallel()or documenting why it's omitted.Per coding guidelines, tests should invoke
t.Parallel(). However, this test modifies global state (testPreforkMaster,dummyChildCmd), which may prevent safe parallelization. If parallelization is not safe, consider adding a brief comment explaining why.listen.go (1)
316-342: Consider the implications of closing the listener before prefork.Once the listener is closed (line 332), if
app.prefork()fails, the caller has no way to recover or reuse the original listener. This is acceptable if the intent is that prefork failure should be terminal, but consider whether a more informative error message would help users understand this behavior.The current implementation is reasonable given that prefork creates its own listeners, but documenting this behavior in the function's doc comment would improve clarity.
prefork.go (1)
228-231: Optimize struct field alignment.The
childEventstruct can be reordered to reduce memory usage from 24 to 16 pointer bytes.Proposed fix
type childEvent struct { + err error pid int - err error }
There was a problem hiding this comment.
Pull request overview
This pull request enhances Fiber's prefork functionality with significant improvements for robustness and cross-platform compatibility:
Changes:
- Adds automatic fallback to file descriptor sharing when SO_REUSEPORT is not supported, ensuring prefork works on systems like older kernels, AIX, and Solaris
- Implements automatic child process recovery with configurable limits (DisableChildRecovery, MaxChildRecoveries) to maintain desired worker process count
- Enables prefork support for custom listeners via app.Listener()
- Updates tests to use context-based graceful shutdown instead of app.Shutdown()
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| prefork.go | Core implementation of FD sharing fallback, child recovery mechanism, and helper functions for process management |
| listen.go | Adds new ListenConfig fields (DisableReuseportFallback, DisableChildRecovery, MaxChildRecoveries) and prefork support for custom listeners |
| prefork_test.go | Updates tests to use context-based shutdown and DisableChildRecovery flag |
| listen_test.go | Converts prefork tests to use GracefulContext instead of app.Shutdown() |
| hooks_test.go | Updates hook tests for compatibility with new prefork behavior |
| docs/whats_new.md | Documents new prefork enhancements with usage examples |
| docs/extra/internal.md | Updates internal documentation with detailed explanation of FD sharing fallback and child recovery |
| docs/api/fiber.md | Adds comprehensive prefork documentation including new configuration options |
| if cfg.GracefulContext != nil { | ||
| go func() { | ||
| <-cfg.GracefulContext.Done() | ||
| shutdownOnce.Do(func() { close(shutdownCh) }) |
There was a problem hiding this comment.
In prefork master mode, the GracefulContext is used directly to signal shutdown (lines 241-245) but the gracefulShutdown function is never called. This means ShutdownTimeout is not respected in prefork mode, and OnPostShutdown hooks are never executed when the master process shuts down. This creates an inconsistency between prefork and non-prefork behavior. Consider calling app.gracefulShutdown() or at least executing the OnPostShutdown hooks when shutting down in prefork mode to maintain behavioral consistency.
| shutdownOnce.Do(func() { close(shutdownCh) }) | |
| shutdownOnce.Do(func() { | |
| // Ensure master process follows the same graceful shutdown behavior | |
| // as non-prefork mode (ShutdownTimeout, OnPostShutdown hooks, etc.). | |
| app.gracefulShutdown() | |
| close(shutdownCh) | |
| }) |
| dummyChildCmd.Store("invalid") | ||
|
|
||
| cfg = listenConfigDefault() | ||
| err := app.prefork("127.0.0.1:", nil, &cfg) | ||
| cfg.DisableChildRecovery = true | ||
| err = app.prefork("127.0.0.1:", nil, &cfg) | ||
| require.Error(t, err) | ||
|
|
||
| dummyChildCmd.Store("go") |
There was a problem hiding this comment.
The global test flag dummyChildCmd is mutated but never cleaned up in these tests. After line 84 sets it to "invalid" and line 91 resets it to "go", if this test fails between those lines or other tests run concurrently, they may be affected by the modified state. Consider using t.Cleanup() to ensure the flag is properly restored to its original state after the test completes.
listen.go
Outdated
| // Use prefork mode | ||
| // NOTE: This assumes the provided listener was created with reuseport.Listen or similar | ||
| // If the system doesn't support SO_REUSEPORT, prefork will automatically fall back | ||
| // to standard listening mode (see prefork.go for fallback logic) |
There was a problem hiding this comment.
The log message incorrectly states "standard listening mode" when it should say "file descriptor sharing mode" or "FD sharing fallback". The code implements file descriptor sharing, not standard listening mode, which would be a single-process server without prefork.
| // to standard listening mode (see prefork.go for fallback logic) | |
| // to file descriptor sharing mode (FD sharing fallback, see prefork.go for fallback logic) |
| defer func() { | ||
| for _, proc := range children { | ||
| if err = proc.Process.Kill(); err != nil { | ||
| if !errors.Is(err, os.ErrProcessDone) { | ||
| log.Errorf("prefork: failed to kill child: %v", err) | ||
| shutdownOnce.Do(func() { close(shutdownCh) }) |
There was a problem hiding this comment.
The defer function closes shutdownCh using shutdownOnce.Do(), but shutdownCh might already be closed by the graceful context handler goroutine (lines 241-245). While sync.Once protects against double execution, there's still a subtle issue: if the context is cancelled and closes shutdownCh, then the defer runs and tries to close it again through shutdownOnce, this is handled safely. However, if neither happens and the function returns normally via an error path, the shutdownCh will be closed by defer, but the goroutine started at line 242 will continue running and may try to close it again when the context eventually gets cancelled. Consider explicitly cancelling or waiting for the goroutine to ensure clean shutdown.
| GracefulContext: ctx, | ||
| }) | ||
| // Either graceful shutdown or empty error is acceptable | ||
| if err != nil && err.Error() != "" { |
There was a problem hiding this comment.
The error checking logic "if err != nil && err.Error() != ''" is redundant. If err is not nil, err.Error() will never return an empty string. The second condition adds no value and can be simplified to just "if err != nil". This same pattern appears in multiple test files and should be simplified for clarity.
| if err != nil && err.Error() != "" { | |
| if err != nil { |
|
|
||
| ln, err = net.FileListener(f) | ||
| if err != nil { | ||
| if !cfg.DisableStartupMessage { | ||
| time.Sleep(sleepDuration) | ||
| } |
There was a problem hiding this comment.
When using file descriptor sharing mode, the file descriptor is recreated from os.NewFile() with a hardcoded FD number (inheritedListenerFD = 3). This assumes that ExtraFiles[0] always becomes FD 3 in the child, which is true according to Go documentation. However, the code doesn't verify that the file descriptor is actually valid before attempting to create a listener from it. If the child process somehow doesn't receive the expected file descriptor (e.g., due to an OS-level issue), f will be nil, but this check happens after the NewFile call. Consider adding additional validation to ensure the file descriptor is actually a valid listener socket.
| ln, err = net.FileListener(f) | |
| if err != nil { | |
| if !cfg.DisableStartupMessage { | |
| time.Sleep(sleepDuration) | |
| } | |
| // Validate that the inherited file descriptor refers to a valid, open file. | |
| if _, err := f.Stat(); err != nil { | |
| if !cfg.DisableStartupMessage { | |
| time.Sleep(sleepDuration) | |
| } | |
| _ = f.Close() | |
| return fmt.Errorf("prefork: invalid inherited listener file descriptor: %w", err) | |
| } | |
| ln, err = net.FileListener(f) | |
| if err != nil { | |
| if !cfg.DisableStartupMessage { | |
| time.Sleep(sleepDuration) | |
| } | |
| _ = f.Close() |
| // Close the provided listener since prefork will create its own listeners | ||
| if err := ln.Close(); err != nil { | ||
| log.Warnf("[prefork] failed to close provided listener: %v", err) | ||
| } | ||
|
|
||
| // Use prefork mode | ||
| // NOTE: This assumes the provided listener was created with reuseport.Listen or similar | ||
| // If the system doesn't support SO_REUSEPORT, prefork will automatically fall back | ||
| // to standard listening mode (see prefork.go for fallback logic) | ||
| log.Info("[prefork] Starting prefork mode with custom listener address") | ||
| return app.prefork(addr, tlsConfig, &cfg) |
There was a problem hiding this comment.
When prefork is enabled with a custom listener, the provided listener is closed (line 332) before calling prefork(). However, if prefork() fails (e.g., due to invalid address format or other errors), the caller has lost their listener and cannot recover. This breaks the principle of "do no harm on failure". Consider deferring the close until after prefork() succeeds, or at minimum document this behavior so users are aware that the listener will be closed even if prefork fails.
| // Close the provided listener since prefork will create its own listeners | |
| if err := ln.Close(); err != nil { | |
| log.Warnf("[prefork] failed to close provided listener: %v", err) | |
| } | |
| // Use prefork mode | |
| // NOTE: This assumes the provided listener was created with reuseport.Listen or similar | |
| // If the system doesn't support SO_REUSEPORT, prefork will automatically fall back | |
| // to standard listening mode (see prefork.go for fallback logic) | |
| log.Info("[prefork] Starting prefork mode with custom listener address") | |
| return app.prefork(addr, tlsConfig, &cfg) | |
| // Use prefork mode | |
| // NOTE: This assumes the provided listener was created with reuseport.Listen or similar | |
| // If the system doesn't support SO_REUSEPORT, prefork will automatically fall back | |
| // to standard listening mode (see prefork.go for fallback logic) | |
| log.Info("[prefork] Starting prefork mode with custom listener address") | |
| // Call prefork first; only close the original listener if prefork succeeds. | |
| if err := app.prefork(addr, tlsConfig, &cfg); err != nil { | |
| // Do not close the provided listener on failure so the caller can decide how to recover. | |
| return err | |
| } | |
| // Prefork completed successfully; the original listener is no longer needed. | |
| if err := ln.Close(); err != nil { | |
| log.Warnf("[prefork] failed to close provided listener after prefork completed: %v", err) | |
| } | |
| return nil |
| file, err := tcpLn.File() | ||
| if err != nil { | ||
| return nil, fmt.Errorf("prefork: failed to get file descriptor from listener: %w", err) | ||
| } |
There was a problem hiding this comment.
Potential file descriptor leak when starting child processes in FD sharing mode. The file descriptor obtained from tcpLn.File() is passed to cmd.ExtraFiles, but it's never explicitly closed in the parent process after the child starts. According to Go documentation, when you call File() on a TCPListener, it returns a duplicate of the listener's file descriptor. This duplicate should be closed in the parent after the child has inherited it to avoid accumulating file descriptors, especially during child recovery cycles.
| // Pass the FD to the child process via ExtraFiles | ||
| // ExtraFiles[0] will become FD 3 in the child | ||
| cmd.ExtraFiles = []*os.File{file} |
There was a problem hiding this comment.
The comment states "ExtraFiles[0] will become FD 3 in the child" which is accurate. However, this comment doesn't mention that the file descriptor should be closed in the parent process after cmd.Start() succeeds. This is an important detail that affects resource management and could help prevent the file descriptor leak identified above.
listen.go
Outdated
| // NOTE: This assumes the provided listener was created with reuseport.Listen or similar | ||
| // If the system doesn't support SO_REUSEPORT, prefork will automatically fall back | ||
| // to standard listening mode (see prefork.go for fallback logic) |
There was a problem hiding this comment.
The comment on line 337 incorrectly states "This assumes the provided listener was created with reuseport.Listen or similar". However, when a custom listener is provided to app.Listener(), the code closes it (line 332) and then calls prefork() which will either use SO_REUSEPORT or fall back to FD sharing. The assumption about how the listener was created is not relevant since it's closed before prefork starts. This comment could be misleading and should be updated or removed.
| // NOTE: This assumes the provided listener was created with reuseport.Listen or similar | |
| // If the system doesn't support SO_REUSEPORT, prefork will automatically fall back | |
| // to standard listening mode (see prefork.go for fallback logic) | |
| // Prefork will create and manage its own listeners for the given address. | |
| // It will attempt to use SO_REUSEPORT where supported and will automatically | |
| // fall back to its alternative listening strategy (see prefork.go for details). |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@prefork.go`:
- Around line 94-111: The duplicated file descriptor returned by tcpLn.File()
(stored in variable file and passed via cmd.ExtraFiles) is never closed in the
parent, causing a leak; modify the prefork logic so that after cmd.Start()
succeeds you explicitly close file in the parent (e.g., call file.Close() after
a successful cmd.Start()), but do not close it before Start() so the child can
inherit it; ensure any error paths still close the file to avoid leaks.
| file, err := tcpLn.File() | ||
| if err != nil { | ||
| return nil, fmt.Errorf("prefork: failed to get file descriptor from listener: %w", err) | ||
| } | ||
|
|
||
| // Pass the FD to the child process via ExtraFiles | ||
| // ExtraFiles[0] will become FD 3 in the child | ||
| cmd.ExtraFiles = []*os.File{file} | ||
|
|
||
| // Tell the child to use the inherited FD | ||
| cmd.Env = append(cmd.Env, | ||
| fmt.Sprintf("%s=%s", envPreforkFDKey, envPreforkFDVal), | ||
| ) | ||
| } | ||
|
|
||
| if err := cmd.Start(); err != nil { | ||
| return nil, fmt.Errorf("failed to start a child prefork process: %w", err) | ||
| } |
There was a problem hiding this comment.
File descriptor leak in parent process after starting child.
The file descriptor obtained from tcpLn.File() is a duplicate that should be closed in the parent after cmd.Start() succeeds. Currently, this FD is never closed, causing a leak each time a child process is spawned (including during recovery cycles).
🔧 Proposed fix
// Pass the FD to the child process via ExtraFiles
// ExtraFiles[0] will become FD 3 in the child
cmd.ExtraFiles = []*os.File{file}
+ // Note: file will be closed after cmd.Start() below
// Tell the child to use the inherited FD
cmd.Env = append(cmd.Env,
fmt.Sprintf("%s=%s", envPreforkFDKey, envPreforkFDVal),
)
}
if err := cmd.Start(); err != nil {
+ // Close file descriptor on error if it was opened
+ if len(cmd.ExtraFiles) > 0 && cmd.ExtraFiles[0] != nil {
+ _ = cmd.ExtraFiles[0].Close()
+ }
return nil, fmt.Errorf("failed to start a child prefork process: %w", err)
}
+
+ // Close the duplicated file descriptor in the parent process
+ // The child has inherited it and no longer needs the parent's copy
+ if len(cmd.ExtraFiles) > 0 && cmd.ExtraFiles[0] != nil {
+ if closeErr := cmd.ExtraFiles[0].Close(); closeErr != nil {
+ log.Warnf("[prefork] failed to close file descriptor after child start: %v", closeErr)
+ }
+ }
pid := cmd.Process.Pid🤖 Prompt for AI Agents
In `@prefork.go` around lines 94 - 111, The duplicated file descriptor returned by
tcpLn.File() (stored in variable file and passed via cmd.ExtraFiles) is never
closed in the parent, causing a leak; modify the prefork logic so that after
cmd.Start() succeeds you explicitly close file in the parent (e.g., call
file.Close() after a successful cmd.Start()), but do not close it before Start()
so the child can inherit it; ensure any error paths still close the file to
avoid leaks.
|
replaced by #4037 |
No description provided.