-
Notifications
You must be signed in to change notification settings - Fork 4.3k
dns.lookup: default to libc getaddrinfo on Linux #29231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -286,10 +286,19 @@ | |
| .{ "getaddrinfo", .libc }, | ||
| }); | ||
|
|
||
| // `dns.lookup` is specified to behave like getaddrinfo(3), which | ||
| // consults nsswitch.conf / /etc/hosts. On Linux the default | ||
| // backend used to be `c_ares`, which produced results that did | ||
| // not match Node for names defined only in /etc/hosts — see | ||
| // https://github.com/oven-sh/bun/issues/29227. | ||
| // | ||
| // Only `dns.lookup` routes through this default. `dns.resolve*` | ||
| // (and all record-type queries) use c-ares directly, matching | ||
| // Node's behavior. | ||
| pub const default: GetAddrInfo.Backend = switch (bun.Environment.os) { | ||
| .mac, .windows => .system, | ||
| else => .c_ares, | ||
| else => .libc, | ||
| }; | ||
|
Check failure on line 301 in src/dns.zig
|
||
|
|
||
| pub const FromJSError = JSError || error{ | ||
|
Comment on lines
286
to
303
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟣 Pre-existing inconsistency between Extended reasoning...The bug: Concrete proof: Suppose Does this PR increase the blast radius on Linux? The second refutation is correct on this point. The Linux LibC path introduced by this PR routes through: Where the bug actually lives: The Impact: In practice, POSIX Fix: In |
||
| InvalidBackend, | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,82 @@ | ||
| import { expect, test } from "bun:test"; | ||
| import { bunEnv, bunExe, isLinux } from "harness"; | ||
| import { appendFileSync, readFileSync, writeFileSync } from "node:fs"; | ||
|
|
||
| // https://github.com/oven-sh/bun/issues/29227 | ||
| // | ||
| // On Linux, `dns.lookup()` for a name that only has an IPv4 entry in | ||
| // /etc/hosts must return only IPv4, matching Node. Previously Bun's | ||
| // default backend returned an extra `::1` entry (and, because the | ||
| // default result order is `verbatim`, that `::1` became the single | ||
| // result returned by the callback form). | ||
| // | ||
| // This test requires Linux because it mutates /etc/hosts. The bug is | ||
| // Linux-specific — macOS uses LibInfo and Windows uses libuv, both | ||
| // already matching Node. | ||
| test.skipIf(!isLinux)("dns.lookup respects /etc/hosts and matches Node", async () => { | ||
| // Use a random tag so re-runs don't conflict. The tag is long enough | ||
| // that it's extremely unlikely to collide with anything on the host. | ||
| const tag = "bun-issue-29227-" + Math.random().toString(36).slice(2, 10); | ||
| const hostsEntry = `\n127.0.0.1 ${tag}\n`; | ||
|
|
||
| // /etc/hosts is a system file; snapshot-then-restore so a crashed | ||
| // test can't leave the system in a bad state. | ||
| let original: string; | ||
| try { | ||
| original = readFileSync("/etc/hosts", "utf8"); | ||
| } catch { | ||
| // Not writable / not root — skip. CI Linux is root in the container. | ||
| return; | ||
| } | ||
|
|
||
| try { | ||
| appendFileSync("/etc/hosts", hostsEntry); | ||
| } catch { | ||
| return; | ||
| } | ||
|
|
||
| try { | ||
| await using proc = Bun.spawn({ | ||
| cmd: [ | ||
| bunExe(), | ||
| "-e", | ||
| ` | ||
| const dns = require("node:dns"); | ||
| const name = ${JSON.stringify(tag)}; | ||
| dns.lookup(name, { all: true }, (err, results) => { | ||
| if (err) { console.error("ERR:" + err.code); process.exit(1); } | ||
| console.log(JSON.stringify(results)); | ||
| }); | ||
| dns.lookup(name, (err, address, family) => { | ||
| if (err) { console.error("ERR:" + err.code); process.exit(1); } | ||
| console.log("single:" + address + ":" + family); | ||
| }); | ||
| `, | ||
| ], | ||
| env: bunEnv, | ||
| stdout: "pipe", | ||
| stderr: "pipe", | ||
| }); | ||
|
|
||
| const [stdout, stderr, exitCode] = await Promise.all([proc.stdout.text(), proc.stderr.text(), proc.exited]); | ||
|
|
||
| // Filter out the ASAN warning that debug builds print to stderr. | ||
| const realStderr = stderr | ||
| .split("\n") | ||
| .filter(l => l && !l.includes("ASAN")) | ||
| .join("\n"); | ||
| expect(realStderr).toBe(""); | ||
| expect(exitCode).toBe(0); | ||
|
|
||
|
Comment on lines
+67
to
+74
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Avoid empty-
Suggested patch- // Filter out the ASAN warning that debug builds print to stderr.
- const realStderr = stderr
- .split("\n")
- .filter(l => l && !l.includes("ASAN"))
- .join("\n");
- expect(realStderr).toBe("");
- expect(exitCode).toBe(0);
+ // Keep stderr available for debugging but avoid empty-stderr assertions.
+ const realStderr = stderr
+ .split("\n")
+ .filter(l => l && !l.includes("ASAN"))
+ .join("\n");
// Find the `all` array and the single-form line in the output.
const lines = stdout.trim().split("\n");
const allLine = lines.find(l => l.startsWith("["))!;
const singleLine = lines.find(l => l.startsWith("single:"))!;
expect(JSON.parse(allLine)).toEqual([{ address: "127.0.0.1", family: 4 }]);
expect(singleLine).toBe("single:127.0.0.1:4");
+ expect(exitCode).toBe(0);Based on learnings: in Also applies to: 76-81 🤖 Prompt for AI Agents |
||
| // Find the `all` array and the single-form line in the output. | ||
| const lines = stdout.trim().split("\n"); | ||
| const allLine = lines.find(l => l.startsWith("["))!; | ||
| const singleLine = lines.find(l => l.startsWith("single:"))!; | ||
|
Comment on lines
+72
to
+74
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 The new regression test asserts Extended reasoning...What the bug is CLAUDE.md (line 128) states: "When spawning processes, tests should The specific code path After the subprocess is awaited, the test makes the following assertions in this order:
The stdout content assertions (lines 80–81) come after the exitCode assertion (line 73). Why existing code doesn't prevent it There is no lint rule or type check enforcing assertion order; it is a documentation-only convention. The code compiles and runs fine, the wrong order is just a quality issue for diagnostics. Impact If the child process exits with code 1 (e.g., because This tells the developer nothing about why it failed. If the stdout assertions were first, the failure would show the actual JSON output printed by the subprocess, immediately revealing the root cause (wrong address, extra Concrete proof Suppose the subprocess prints
How to fix Move expect(realStderr).toBe('');
// stdout assertions first (CLAUDE.md §128)
expect(JSON.parse(allLine)).toEqual([{ address: '127.0.0.1', family: 4 }]);
expect(singleLine).toBe('single:127.0.0.1:4');
expect(exitCode).toBe(0); |
||
|
|
||
| expect(JSON.parse(allLine)).toEqual([{ address: "127.0.0.1", family: 4 }]); | ||
| expect(singleLine).toBe("single:127.0.0.1:4"); | ||
| } finally { | ||
| // Always restore /etc/hosts, even if assertions fail. | ||
| writeFileSync("/etc/hosts", original); | ||
| } | ||
| }); | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 When
getaddrinfo()returnsEAI_AGAIN(temporary DNS failure — DNS server unreachable, empty/etc/resolv.conf, network partition), the new Linux.libcdefault backend maps it throughinitEAI()which has noAGAINcase, silently returningError.ENOTIMPinstead ofError.ETIMEOUT. Users in Docker/Kubernetes environments will seeerr.code='ENOTIMP'/'DNS resolver does not implement requested operation'for what should be a transient, retryable timeout error. Fix: add.AGAIN => Error.ETIMEOUTto the Linux-specific switch ininitEAIinsrc/deps/c_ares.zig(matching the Windows path at c_ares.zig:1773:UV_EAI_AGAIN => Error.ETIMEOUT).Extended reasoning...
What the bug is
When
getaddrinfo()returnsEAI_AGAIN(-3, "Temporary failure in name resolution"), the new Linux default.libcbackend maps it to the wrong error code. Users receiveerr.code = 'ENOTIMP'("DNS resolver does not implement requested operation") instead of the semantically correctETIMEOUT. This is a meaningful mismatch:ENOTIMPsignals an unsupported operation and is non-retryable in most application code, whereasETIMEOUTcorrectly signals a transient, retryable failure.The specific code path that triggers it
The call chain for the new
.libcbackend on Linux whengetaddrinfo()fails:LibC.run()(dns.zig ~L773): stores.err = @intFromEnum(err)whereerr = EAI_AGAIN = -3then()(dns.zig ~L831):.errbranch callsgetAddrInfoAsyncCallback(-3, null, this)getAddrInfoAsyncCallback(dns.zig:727): callshead.processGetAddrInfoNative(-3, null)processGetAddrInfoNative(dns.zig:1073): callsc_ares.Error.initEAI(-3)initEAI(c_ares.zig ~L1763):-3maps to.AGAINvia@enumFromInt; no case matches; falls toelse => bun.todo(@src(), Error.ENOTIMP)In release builds,
bun.todo()logs nothing and silently returns the value — no panic, no assert, no visible warning.Why existing code doesn't prevent it
The
initEAIfunction has three layers of case handling:.AGAIN \!= .NODATAand\!= .NONAME— miss.SOCKTYPE,.IDN_ENCODE,.ALLDONE,.INPROGRESS,.CANCELED,.NOTCANCELED— miss (else => {} fallthrough)0,.ADDRFAMILY,.BADFLAGS,.FAIL,.FAMILY,.MEMORY,.SERVICE,.SYSTEM— miss (else => bun.todo return ENOTIMP)The Windows path correctly handles this via
libuv.UV_EAI_AGAIN => Error.ETIMEOUTbut that branch is not taken on Linux.The blast-radius increase from this PR
Before this PR: Linux defaulted to
.c_ares, which routes DNS errors throughprocessGetAddrInfo()— a completely different code path that never callsinitEAI. Only users who explicitly passed{ backend: 'libc' }or{ backend: 'system' }hit this bug; they were a small minority.After this PR:
.libcis the new Linux default for alldns.lookup()calls. Every Linux user who hits a temporary DNS failure (empty/etc/resolv.conf, DNS server temporarily down, Docker/Kubernetes service not yet ready, network partition) now getsENOTIMPinstead ofETIMEOUT. Ironically, one of the issues this PR claims to fix (#19086 — ioredisEAI_AGAINfor Docker service hostnames) would now fail with a different wrong error code.Step-by-step proof
Suppose a Docker container runs
dns.lookup('my-service', cb)while the Docker DNS resolver is temporarily unavailable:std.c.getaddrinfo('my-service', ...)returnsEAI_AGAIN(-3)LibC.run()storesthis.* = .{ .err = -3 }then()callsgetAddrInfoAsyncCallback(-3, null, this)processGetAddrInfoNative(-3, null)→c_ares.Error.initEAI(-3)@enumFromInt(-3)yields.AGAIN(glibcEAI_AGAIN=-3per/usr/include/netdb.h)bun.todo(@src(), Error.ENOTIMP)→ returnsError.ENOTIMPerr.code = 'DNS_ENOTIMP', message ='DNS resolver does not implement requested operation'Application code checking
err.code === 'ENOTFOUND'orerr.code === 'ETIMEOUT'for retry logic will not retry, causing hard failures for what should be a transient condition.How to fix
Add
.AGAIN => Error.ETIMEOUTto the Linux-specific switch ininitEAIinsrc/deps/c_ares.zig, matching what Windows does withUV_EAI_AGAIN => Error.ETIMEOUT.