fix(commands): apply batch — multi-node, node-file patch merge, error handling#128
fix(commands): apply batch — multi-node, node-file patch merge, error handling#128
Conversation
The field has been declared on engine.Options since the file was created (c567135, 2024-05) but was never read by Render or any helper. apply.go and template.go were assigning it from their respective --insecure flags; apply_test.go asserted the assignment. Nothing else looked at the value. The runtime guarantee — --insecure (maintenance mode) bypasses FailIfMultiNodes — is provided by WithClientMaintenance not injecting nodes into the gRPC context, not by anything inside engine.Render. The dead field added nothing on top of that. Drop the field, the assignments, and the assertion. golangci-lint is clean. Closes #123 Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
helpers.FailIfMultiNodes(ctx, name) embeds the name in its error message. The call site in engine.Render hardcoded "talm template", which was accurate when Render was only called from the template subcommand. PR #119 made apply call Render too, so users running `talm apply` with a multi-node modeline saw an error talking about `talm template` — confusing. Add Options.CommandName, default to "talm" when empty, set "talm apply" in apply's buildApplyRenderOptions and "talm template" in template's option-build. TestRenderFailIfMultiNodes_UsesCommandName covers both subcommands plus the empty-string fallback and explicitly asserts the historical "talm template" no longer leaks into the apply case. Closes #121 Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
…turns
The per-file loop in apply had two branches with asymmetric error
handling:
- template-render: `err = withApplyClient(...)` (assigning the outer
loop variable) followed by an outer `if err != nil { return err }`
- direct-patch: `if err := withApplyClient(...); err != nil { return err }`
(local scope, immediate return)
The outer check only caught errors from the template-render branch.
Any future tweak to the direct-patch branch could silently swallow an
error — exactly the shadowing class that already bit the project once
(commit b34781e).
Switch the template-render branch to the same local-scope idiom and
delete the outer check. Both branches now read identically and there
is no dead code path to misuse.
Closes #122
Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Before: when a node file's modeline contained `templates=[...]`, both
`talm apply -f node.yaml` and `talm template -f node.yaml` rendered
the listed templates and discarded the rest of the node file. Per-node
hostname, secondary interfaces with deviceSelector, VIP placement, etcd
extraArgs, certSANs and time servers all silently vanished — a
multi-NIC control plane could not bootstrap because the etcd
advertised-subnet interface was missing from the applied config.
Add engine.MergeFileAsPatch using Talos configpatcher (LoadPatches +
Apply). Apply the node file as a strategic merge patch over the
rendered template in both:
- apply.go template-rendering branch (after engine.Render, before
c.ApplyConfiguration)
- template.go templateWithFiles (after generateOutput, before
inplace-write or stdout)
Same merge step in both keeps the documented piped flow
`talm template -f X | talm apply -f -` carrying the node body through
end to end.
A modeline-only node file (no Talos config body) becomes a patch with
empty content; LoadPatches still returns a patch object, but Apply has
nothing to merge — every rendered field round-trips through the
configpatcher's loader unchanged.
TestMergeFileAsPatch covers both paths: the body-overlay case asserts
the custom hostname and the deviceSelector secondary interface land in
the merged output and the auto-generated hostname is gone; the
modeline-only case asserts every rendered field survives.
Closes #126
Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
PR #119 moved talm apply's template rendering from offline to online. The new online path runs inside withApplyClient, whose wrapWithNodeContext helper batches every node from GlobalArgs.Nodes into a single gRPC context. engine.Render then calls helpers.FailIfMultiNodes which rejects multi-node contexts, so a node file with `nodes=[ip1, ip2]` (or `--nodes A,B,C` on the command line) fails before any rendering happens. The pre-#119 direct-patch flow handled multi-node fan-out at the wire level inside ApplyConfiguration, so this never surfaced. Render once per node instead. Add applyTemplatesPerNode that takes a nodes slice and injection points for the render and apply functions, then iterates: for each node it builds a single-node context with client.WithNodes, calls engine.Render (which now sees one node and satisfies the FailIfMultiNodes guard), merges the modeline'd file as a patch, and applies the result. Per-node iteration is also the correct semantic — discovery via lookup() resolves each node's own network topology rather than mashing everything together. Split withApplyClient into a public form (still wraps with the legacy multi-node context for the direct-patch branch) and withApplyClientBare which skips the wrap so the per-node loop can attach contexts itself. Three tests pin the contract: that each render and apply call sees exactly one node in its outgoing-context metadata; that engine.Render with a batched multi-node context still errors (the pre-condition the loop exists to satisfy); that an empty nodes slice errors loudly rather than silently doing nothing. Manually verified the loop assertion fails when the per-iteration context is built with all nodes instead of one. Closes #120 Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
…ure mode
Branch review caught four blockers and three majors in the original
implementation. Fixes:
- engine.MergeFileAsPatch now short-circuits modeline-only patch files
(BLOCKER 2). The Talos config-patcher misclassifies a YAML-comment-only
file as an empty JSON6902 patch, then refuses any multi-document
rendered config (the v1.12+ output format) with 'JSON6902 patches are
not supported for multi-document machine configuration'. Detect 'no
meaningful YAML content' before LoadPatches and return rendered
unchanged in that case. Fix also covers empty files and files with only
comments / document separators / whitespace.
- talm template no longer routes its output through MergeFileAsPatch
(BLOCKER 1, BLOCKER 5, MEDIUM 8). The patcher round-trips through its
config loader, normalising YAML formatting, sorting keys, and (worst
of all) stripping every YAML comment — which deletes the talm modeline
('# talm: ...') and the AUTOGENERATED warning that downstream commands
rely on. Removing the template-side merge keeps stdout and --in-place
outputs intact. The merge stays in apply because that output goes
straight to ApplyConfiguration where formatting and comments do not
matter.
- The pipe-flow comment that motivated the template-side merge is gone
(MAJOR 6). talm apply does not in fact accept stdin via '-f -'
(ExpandFilePaths only handles real paths), so the documented workflow
was a fiction.
- Insecure (maintenance) mode now opens a fresh single-endpoint client
per node (BLOCKER 4). WithClientMaintenance creates one client with
every endpoint and gRPC then round-robins ApplyConfiguration across
them, so most nodes never received the config. Narrow GlobalArgs.Nodes
to the iteration's node and restore it via defer; this requires
splitting applyTemplatesPerNode's client-acquisition into an injected
openClientFunc with two production implementations
(openClientPerNodeMaintenance, openClientPerNodeAuth).
- TestApplyTemplatesPerNode_BatchedContextIsRejected was renamed and
rewritten as TestApplyTemplatesPerNode_NeverBatchesNodes, which
asserts the loop itself never produces a multi-node ctx (MAJOR 7).
The previous test happened to call engine.Render and pass; it would
not have caught a regression in applyTemplatesPerNode.
- Two new tests cover the insecure path
(TestApplyTemplatesPerNode_MaintenanceModeOpensFreshClientPerNode and
TestOpenClientPerNodeMaintenance_NarrowsAndRestoresGlobalNodes).
TestMergeFileAsPatch grew three sub-tests covering multi-doc patching,
empty files, and comments-only files. All five new tests fail without
the corresponding fix.
- Doc comments cleaned up (MINORS 9, 10, 11): the FailIfMultiNodes
rationale lives in one place (applyTemplatesPerNode); MergeFileAsPatch
wraps the out.Bytes() error with file context; withApplyClient and
withApplyClientBare now describe what they actually do.
Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…ferences - engine.MergeFileAsPatch now reads the patch file once (was twice — one ReadFile and one LoadPatches with @-syntax) by switching from configpatcher.LoadPatches to LoadPatch on the already-read bytes. No TOCTOU window between the empty-detection check and patch loading. - New TestMergeFileAsPatch sub-test covers the realistic Talos v1.12+ apply scenario the previous test set missed: multi-document rendered config (legacy machine/cluster doc plus typed HostnameConfig and LinkConfig docs) overlaid by a non-empty strategic-merge patch in the node file. The assertion checks that the legacy machine doc absorbs the patch and the sibling typed docs survive untouched. - New TestTemplateAndApplyDiverge_NodeBodyOverlayLimitation pins the intentional divergence between talm template and talm apply for files with a non-empty body. Apply overlays the body via the config-patcher before sending; template emits the rendered bytes verbatim so the modeline and the AUTOGENERATED-warning comment survive in stdout/in- place output. A future patch that tries to make template match apply will trip this test. - openClientPerNodeAuth now uses client.WithNode (single-target metadata) instead of client.WithNodes (aggregation metadata). The per-iteration intent is one node, not a collection of one — naming matches semantics. Test helper nodesFromOutgoingCtx reads either key so the assertions stay valid. - Removed all internal-tracker references and review-round annotations from test docstrings. Tests are now self-describing: each one names the contract it pins, not the issue ticket that motivated it. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
- Sync openClientFunc docstring with the WithNode (single-target) metadata key the auth-mode opener actually uses; the old docstring still mentioned WithNodes from the previous version. - TestOpenClientPerNodeMaintenance_NarrowsAndRestoresGlobalNodes now drives the production openClientPerNodeMaintenance with an injected fake maintenanceClientFunc instead of an inline stub. The fake snapshots GlobalArgs.Nodes at the moment WithClientMaintenance reads it, proving the narrow-and-restore-via-defer logic is exercised end to end. A regression in the production function now fails this test (previously it would have continued to pass against the test-local stub). - New TestApplyTemplatesPerNode_AuthModeUsesSingleNodeMetadataKey pins the gRPC metadata key the auth-mode opener writes so a future swap back to WithNodes (which would round-trip through nodesFromOutgoingCtx unnoticed) gets caught. - README documents that node files can carry per-node patches in their body, that talm apply applies them as a strategic merge over the rendered template, and that talm template intentionally does not. Recommends apply --dry-run for previewing the exact bytes apply will send. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
CommandName is read by engine.Render to format the FailIfMultiNodes error wording. The engine-side assertion (TestRenderFailIfMultiNodes_UsesCommandName) verifies Render does the right thing with whatever value it receives, but nothing was pinning that buildApplyRenderOptions sets it to "talm apply" in the first place. A future refactor that drops the field would slip through unnoticed. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
📝 WalkthroughWalkthroughThe changes implement per-node template rendering with merge patch semantics in Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Apply as apply.go
participant Client as Talos Client
participant Engine as engine.go
participant Node as Talos Node
User->>Apply: talm apply -f node.yaml<br/>(with templates= in modeline)
loop Per-node (applyTemplatesPerNode)
Apply->>Client: Open per-node client<br/>(maintenance or auth mode)
Apply->>Engine: Render(ctx, opts)<br/>render template
Engine-->>Apply: rendered bytes
Apply->>Engine: MergeFileAsPatch<br/>(rendered, node.yaml)
Engine->>Engine: Load patch file<br/>Apply strategic merge
Engine-->>Apply: merged bytes
Apply->>Client: ApplyConfiguration(merged)
Client->>Node: Apply merged config
Node-->>Client: OK
Client-->>Apply: Success
end
Apply-->>User: Applied to all nodes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request refactors the talm apply template-rendering path to process nodes individually, enabling per-node configuration overlays via strategic merge patches. It introduces the MergeFileAsPatch utility and adds comprehensive tests for both authenticated and maintenance modes. Review feedback suggests ensuring node resolution falls back to the Talos context when CLI arguments are missing, supporting multi-document patches in node files, improving error messaging for missing nodes, and optimizing performance by moving file I/O for patches outside the per-node loop.
| if err := withApplyClientBare(func(parentCtx context.Context, c *client.Client) error { | ||
| openClient := openClientPerNodeAuth(parentCtx, c) |
There was a problem hiding this comment.
In the authenticated template rendering path, the list of nodes is captured from GlobalArgs.Nodes before the client is created. Unlike the direct patch path (which uses wrapWithNodeContext), this path does not fall back to the nodes defined in the talosconfig context if GlobalArgs.Nodes is empty. This means talm apply will fail if nodes are not explicitly provided via CLI or modeline, even if they are defined in the current Talos context.
if err := withApplyClientBare(func(parentCtx context.Context, c *client.Client) error {
if len(nodes) == 0 {
if configContext := c.GetConfigContext(); configContext != nil {
nodes = configContext.Nodes
}
}
openClient := openClientPerNodeAuth(parentCtx, c)| patch, err := configpatcher.LoadPatch(patchBytes) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("loading patch from %s: %w", patchFile, err) | ||
| } | ||
| out, err := configpatcher.Apply(configpatcher.WithBytes(rendered), []configpatcher.Patch{patch}) |
There was a problem hiding this comment.
configpatcher.LoadPatch is used to load the patch from the node file. However, LoadPatch typically only handles a single YAML document. Since Talos v1.12+ and talm templates often produce multi-document configurations, users might need to provide multi-document patches in their node files to target different parts of the configuration. Using configpatcher.LoadPatches (plural) would be more consistent with other parts of the codebase (like FullConfigProcess) and provide better support for complex configurations.
| patch, err := configpatcher.LoadPatch(patchBytes) | |
| if err != nil { | |
| return nil, fmt.Errorf("loading patch from %s: %w", patchFile, err) | |
| } | |
| out, err := configpatcher.Apply(configpatcher.WithBytes(rendered), []configpatcher.Patch{patch}) | |
| patch, err := configpatcher.LoadPatches([]string{string(patchBytes)}) | |
| if err != nil { | |
| return nil, fmt.Errorf("loading patches from %s: %w", patchFile, err) | |
| } | |
| out, err := configpatcher.Apply(configpatcher.WithBytes(rendered), patch) |
| apply applyFunc, | ||
| ) error { | ||
| if len(nodes) == 0 { | ||
| return fmt.Errorf("no nodes specified for template-rendering apply") |
There was a problem hiding this comment.
The error message when no nodes are specified is less descriptive than the one used in the template command. Providing a consistent and helpful error message that suggests how to fix the issue (e.g., using the --nodes flag or modeline) would improve the user experience.
| return fmt.Errorf("no nodes specified for template-rendering apply") | |
| return fmt.Errorf("nodes are not set for the command: please use '--nodes' flag or configuration file to set the nodes to run the command against") |
| // ApplyConfiguration) but unsuitable for human-facing output such as | ||
| // `talm template` — which is why the template subcommand does not call | ||
| // this helper. | ||
| func MergeFileAsPatch(rendered []byte, patchFile string) ([]byte, error) { |
There was a problem hiding this comment.
The MergeFileAsPatch function reads the patchFile from disk and performs an 'effectively empty' check on every call. Since this function is called within a loop over nodes in applyTemplatesPerNode (via renderMergeAndApply), it leads to redundant file I/O and processing. For large clusters, this could become a performance bottleneck. Consider refactoring this to load the patch data once outside the per-node loop and pass the pre-loaded data or patch object to the rendering logic.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/commands/apply.go`:
- Around line 251-265: applyTemplatesPerNode currently re-uses a single
configFile body for every node and calls renderMergeAndApply for each target,
which causes node-specific overlays (hostname, addresses, VIPs) to be applied to
all nodes; update applyTemplatesPerNode to detect when the provided configFile
contains a non-empty overlay and, if len(nodes) > 1, reject the operation with a
clear error, or alternatively resolve/split the overlay per target before
entering the for loop so each openClient call uses a node-specific config;
ensure the same change/error check is applied to the other similar call site
that invokes renderMergeAndApply (the second occurrence referenced in the
comment).
In `@README.md`:
- Around line 143-158: Adjust the paragraph about "Per-node patches inside node
files" to avoid claiming legacy node-body fields (e.g., deviceSelector
interfaces, VIPs, machine.network.interfaces) survive talm apply for Talos
v1.12+ multi-doc output: either restrict the statement to legacy/single-doc
template behavior only, or explicitly add that for v1.12+ multi-doc mode users
must patch the typed resources
(LinkConfig/BondConfig/VLANConfig/Layer2VIPConfig) because the multi-doc path
intentionally ignores legacy machine.network.interfaces due to lack of a safe
1:1 translation; mention talm apply -f node.yaml and talm template -f node.yaml
where appropriate to guide readers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d8100954-d5ef-4b44-96b9-51a4d161b29a
📒 Files selected for processing (6)
README.mdpkg/commands/apply.gopkg/commands/apply_test.gopkg/commands/template.gopkg/engine/engine.gopkg/engine/render_test.go
| func applyTemplatesPerNode( | ||
| opts engine.Options, | ||
| configFile string, | ||
| nodes []string, | ||
| openClient openClientFunc, | ||
| render renderFunc, | ||
| apply applyFunc, | ||
| ) error { | ||
| if len(nodes) == 0 { | ||
| return fmt.Errorf("no nodes specified for template-rendering apply") | ||
| } | ||
| for _, node := range nodes { | ||
| if err := openClient(node, func(ctx context.Context, c *client.Client) error { | ||
| return renderMergeAndApply(ctx, c, opts, configFile, render, apply) | ||
| }); err != nil { |
There was a problem hiding this comment.
Don't replay the same node-body overlay across a multi-node modeline.
applyTemplatesPerNode renders per target, but renderMergeAndApply merges the same configFile body on every iteration. For a file whose modeline targets ["10.0.0.1","10.0.0.2"], a pinned hostname, address, or VIP in that body now gets stamped onto both machines. Please reject non-empty overlays when len(nodes) > 1, or split/resolve them per target before entering this loop.
Also applies to: 311-320
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/commands/apply.go` around lines 251 - 265, applyTemplatesPerNode
currently re-uses a single configFile body for every node and calls
renderMergeAndApply for each target, which causes node-specific overlays
(hostname, addresses, VIPs) to be applied to all nodes; update
applyTemplatesPerNode to detect when the provided configFile contains a
non-empty overlay and, if len(nodes) > 1, reject the operation with a clear
error, or alternatively resolve/split the overlay per target before entering the
for loop so each openClient call uses a node-specific config; ensure the same
change/error check is applied to the other similar call site that invokes
renderMergeAndApply (the second occurrence referenced in the comment).
| > **Per-node patches inside node files.** A node file can carry Talos config | ||
| > below its modeline (for example, a custom `hostname`, secondary | ||
| > interfaces with `deviceSelector`, VIP placement, or extra etcd args). | ||
| > When `talm apply -f node.yaml` runs the template-rendering branch, that | ||
| > body is applied as a strategic merge patch on top of the rendered | ||
| > template before the result is sent to the node — so per-node fields | ||
| > survive even when the template auto-generates conflicting values | ||
| > (e.g. `hostname: talos-XXXXX`). | ||
| > | ||
| > `talm template -f node.yaml` (with or without `-I`) does **not** apply | ||
| > the same overlay: its output is the rendered template plus the modeline | ||
| > and the auto-generated warning, byte-identical to what the template | ||
| > alone would produce. Routing it through the patcher would drop every | ||
| > YAML comment (including the modeline) and re-sort keys, breaking | ||
| > downstream commands that read the file back. Use `apply --dry-run` if | ||
| > you want to preview the exact bytes that will be sent to the node. |
There was a problem hiding this comment.
Scope this overlay note away from Talos v1.12+ network overrides.
This text currently promises that legacy node-body fields such as deviceSelector interfaces and VIPs survive talm apply, but on the multi-doc path those machine.network.interfaces fragments still do not have a safe 1:1 mapping to LinkConfig/BondConfig/VLANConfig/Layer2VIPConfig. As written, this will mislead v1.12+ users into relying on overrides that are not represented semantically the way the docs imply. Please either limit the claim to legacy/single-doc Talos output or explicitly say that v1.12+ requires patching the typed resources instead.
Based on learnings, the multidoc path intentionally ignores legacy machine.network.interfaces because it has no safe 1:1 translation to Talos v1.12 typed resources.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@README.md` around lines 143 - 158, Adjust the paragraph about "Per-node
patches inside node files" to avoid claiming legacy node-body fields (e.g.,
deviceSelector interfaces, VIPs, machine.network.interfaces) survive talm apply
for Talos v1.12+ multi-doc output: either restrict the statement to
legacy/single-doc template behavior only, or explicitly add that for v1.12+
multi-doc mode users must patch the typed resources
(LinkConfig/BondConfig/VLANConfig/Layer2VIPConfig) because the multi-doc path
intentionally ignores legacy machine.network.interfaces due to lack of a safe
1:1 translation; mention talm apply -f node.yaml and talm template -f node.yaml
where appropriate to guide readers.
| func TestTemplateAndApplyDiverge_NodeBodyOverlayLimitation(t *testing.T) { | ||
| const rendered = `version: v1alpha1 | ||
| machine: | ||
| type: controlplane | ||
| network: | ||
| hostname: talos-abcde | ||
| ` | ||
| dir := t.TempDir() | ||
| nodeFile := filepath.Join(dir, "node.yaml") | ||
| const nodeBody = `# talm: nodes=["10.0.0.1"] | ||
| machine: | ||
| network: | ||
| hostname: node0 | ||
| ` | ||
| if err := os.WriteFile(nodeFile, []byte(nodeBody), 0o644); err != nil { | ||
| t.Fatalf("write node file: %v", err) | ||
| } | ||
|
|
||
| // apply path: renders, then overlays. | ||
| merged, err := engine.MergeFileAsPatch([]byte(rendered), nodeFile) | ||
| if err != nil { | ||
| t.Fatalf("MergeFileAsPatch: %v", err) | ||
| } | ||
| if !strings.Contains(string(merged), "hostname: node0") { | ||
| t.Errorf("apply path must overlay hostname: node0; got:\n%s", string(merged)) | ||
| } | ||
|
|
There was a problem hiding this comment.
🟡 The 'template path' block in TestTemplateAndApplyDiverge_NodeBodyOverlayLimitation contains two assertions that are permanently vacuous: templateOutput := []byte(rendered) simply copies the local constant, so bytes.Equal(templateOutput, []byte(rendered)) is always true (the negated check never fires) and strings.Contains(string(templateOutput), "hostname: node0") is always false because the rendered constant contains hostname: talos-abcde. The test comment claims this block will catch a future 'make template match apply' regression, but no production template code is exercised, so such a regression would pass undetected. The apply-path half of the test (calling engine.MergeFileAsPatch) is legitimate; only the template-path assertions are hollow.
Extended reasoning...
What the bug is and how it manifests
In TestTemplateAndApplyDiverge_NodeBodyOverlayLimitation (pkg/commands/apply_test.go), the 'template path' block is supposed to pin the invariant that talm template does NOT overlay the node-file body onto the rendered output. Instead, the block performs no actual test: it creates templateOutput as a direct byte-copy of the rendered constant and then asserts two things that are structurally impossible to fail.
The specific code path that triggers it
Lines 543-569 of apply_test.go (template-path block):
Assertion 1: bytes.Equal(templateOutput, []byte(rendered)) compares rendered to itself. The result is always true, so the negated guard is always false and t.Error is never reached.
Assertion 2: strings.Contains(string(templateOutput), "hostname: node0") searches for node0 inside rendered, which contains hostname: talos-abcde — never node0. This condition is always false and t.Error is never reached.
Why existing code does not prevent it
Neither the compiler nor go test has any way to detect that a predicate is structurally always-false. The test compiles and passes on every run, regardless of what generateOutput(), templateWithFiles(), or any related production code does.
What the impact would be
If a future contributor makes talm template call engine.MergeFileAsPatch (the exact regression the test comment warns about), both assertions would still pass. The first passes because templateOutput is still the copy of the constant rendered, not the output of generateOutput; the second passes because rendered still does not contain node0. The test gives false confidence in the template/apply divergence invariant.
How to fix it
The template-path block should actually call the production template code — specifically generateOutput() (or an equivalent helper) — capture its output as templateOutput, and then assert the two properties. That way a change to generateOutput that starts applying the overlay will cause hostname: talos-abcde to be replaced by hostname: node0 in the output, making the second assertion fire as intended.
Step-by-step proof
renderedis the constant string containinghostname: talos-abcde.templateOutput := []byte(rendered)creates a byte slice with exactly the same content asrendered.bytes.Equal(templateOutput, []byte(rendered))— both sides are identical byte sequences → always returnstrue.- The guard
\!bytes.Equal(...)is alwaysfalse→ the firstt.Erroris unreachable on every run. strings.Contains(string(templateOutput), "hostname: node0")searches fornode0in a string that only containstalos-abcde→ always returnsfalse.- The second
t.Erroris also unreachable on every run. - Both assertions pass permanently regardless of what the production template code does, providing only false confidence in the divergence invariant.
Summary
Five issues clustered in
pkg/commands/apply.goandpkg/engine/engine.go. All were either introduced or surfaced by PR #119 (online template rendering in apply). Bundled because they share the same callback structure and overlap heavily.Per-node template rendering in
talm apply. PR fix(commands): render templates online in apply to resolve lookups #119 moved rendering inside the Talos client callback solookup()resolves real discovery data, but the callback batched all--nodes A,B,C(or modelinenodes=[A,B,C]) into a single gRPC context.engine.Renderthen hithelpers.FailIfMultiNodesand bailed out before any rendering happened. Render once per node instead, each iteration carrying a single-nodeclient.WithNodecontext so the guard passes and discovery resolves per node. In insecure (maintenance) mode each iteration opens a fresh single-endpoint maintenance client becauseWithClientMaintenancewould otherwise dial every endpoint at once and gRPC would round-robinApplyConfigurationacross them — most nodes would never receive the config.Node file is now applied as a patch over the rendered template. When a modeline contained
templates=[...],talm apply -f node.yamlrendered the listed templates and discarded the rest of the node file. Per-node hostname, secondary interfaces withdeviceSelector, VIP placement, etcd extra args — all silently lost. Newengine.MergeFileAsPatchoverlays the body using the Talos config-patcher; modeline-only and effectively-empty files short-circuit before reaching the patcher (otherwise the patcher routes empty input through JSON6902 and rejects any multi-document rendered config — the v1.12+ default output format).talm templateintentionally does not call this overlay because the patcher round-trips through its config loader, dropping the modeline and YAML comments downstream commands rely on; the README documents the divergence.Multi-node error message references the actual subcommand.
helpers.FailIfMultiNodeswas hardcoded to"talm template". After fix(commands): render templates online in apply to resolve lookups #119, users runningtalm applysaw an error talking abouttalm template. ThreadedCommandNamethroughengine.Options; defaults to"talm"when empty.Symmetric error handling in the apply per-file loop. The template-render branch assigned
err = withApplyClient(...)and relied on an outerif err != nil; the direct-patch branch already used local-scopeif err := ...; err != nil { return err }. Two idioms, one outer check that only caught one branch — the same shadowing class that already bit the project once. Both branches now use the local-scope return; the outer check is gone.Removed dead
engine.Options.Insecure. The field has been declared since the file was first added (May 2024) and was never read byRenderor any helper. The runtime guarantee —--insecure(maintenance mode) bypassesFailIfMultiNodes— comes fromWithClientMaintenancenot injecting nodes into the gRPC context, not from anything in the engine.Verification
TestRenderFailIfMultiNodes_UsesCommandName— three sub-tests asserting theCommandNamevalue surfaces in the error and the empty-string fallback.TestMergeFileAsPatch— overlay (single-doc + non-empty patch), modeline-only byte-identity, multi-doc + modeline-only (the regression vector), multi-doc + non-empty strategic-merge patch (the realistic v1.12+ scenario), empty file, comments-only file.TestApplyTemplatesPerNode_*— per-node single-node ctx; never-batches; empty-nodes errors loudly; maintenance mode opens a fresh client per iteration; auth mode writes the single-targetnodemetadata key.TestOpenClientPerNodeMaintenance_NarrowsAndRestoresGlobalNodes— drives the production opener with an injectedmaintenanceClientFuncfake that snapshotsGlobalArgs.Nodesat the momentWithClientMaintenancereads it.TestTemplateAndApplyDiverge_NodeBodyOverlayLimitation— pins the intentional template/apply divergence so a future "make template match apply" change is caught.TestBuildApplyRenderOptions— pins that the apply caller actually setsCommandName(independent of the engine-side coverage).golangci-lint run ./...is clean.Closes
Summary by CodeRabbit
Release Notes
New Features
talm apply, preserving per-node configurations even when templates generate conflicting values.talm apply --dry-runto preview exact bytes sent to nodes.Documentation
talm applyandtalm templatecommands.