Skip to content

fix(commands): apply batch — multi-node, node-file patch merge, error handling#128

Open
lexfrei wants to merge 9 commits intomainfrom
fix/apply-batch-multi-node-and-patch-merge
Open

fix(commands): apply batch — multi-node, node-file patch merge, error handling#128
lexfrei wants to merge 9 commits intomainfrom
fix/apply-batch-multi-node-and-patch-merge

Conversation

@lexfrei
Copy link
Copy Markdown
Contributor

@lexfrei lexfrei commented Apr 16, 2026

Summary

Five issues clustered in pkg/commands/apply.go and pkg/engine/engine.go. All were either introduced or surfaced by PR #119 (online template rendering in apply). Bundled because they share the same callback structure and overlap heavily.

  1. Per-node template rendering in talm apply. PR fix(commands): render templates online in apply to resolve lookups #119 moved rendering inside the Talos client callback so lookup() resolves real discovery data, but the callback batched all --nodes A,B,C (or modeline nodes=[A,B,C]) into a single gRPC context. engine.Render then hit helpers.FailIfMultiNodes and bailed out before any rendering happened. Render once per node instead, each iteration carrying a single-node client.WithNode context so the guard passes and discovery resolves per node. In insecure (maintenance) mode each iteration opens a fresh single-endpoint maintenance client because WithClientMaintenance would otherwise dial every endpoint at once and gRPC would round-robin ApplyConfiguration across them — most nodes would never receive the config.

  2. Node file is now applied as a patch over the rendered template. When a modeline contained templates=[...], talm apply -f node.yaml rendered the listed templates and discarded the rest of the node file. Per-node hostname, secondary interfaces with deviceSelector, VIP placement, etcd extra args — all silently lost. New engine.MergeFileAsPatch overlays the body using the Talos config-patcher; modeline-only and effectively-empty files short-circuit before reaching the patcher (otherwise the patcher routes empty input through JSON6902 and rejects any multi-document rendered config — the v1.12+ default output format). talm template intentionally does not call this overlay because the patcher round-trips through its config loader, dropping the modeline and YAML comments downstream commands rely on; the README documents the divergence.

  3. Multi-node error message references the actual subcommand. helpers.FailIfMultiNodes was hardcoded to "talm template". After fix(commands): render templates online in apply to resolve lookups #119, users running talm apply saw an error talking about talm template. Threaded CommandName through engine.Options; defaults to "talm" when empty.

  4. Symmetric error handling in the apply per-file loop. The template-render branch assigned err = withApplyClient(...) and relied on an outer if err != nil; the direct-patch branch already used local-scope if err := ...; err != nil { return err }. Two idioms, one outer check that only caught one branch — the same shadowing class that already bit the project once. Both branches now use the local-scope return; the outer check is gone.

  5. Removed dead engine.Options.Insecure. The field has been declared since the file was first added (May 2024) and was never read by Render or any helper. The runtime guarantee — --insecure (maintenance mode) bypasses FailIfMultiNodes — comes from WithClientMaintenance not injecting nodes into the gRPC context, not from anything in the engine.

Verification

  • TestRenderFailIfMultiNodes_UsesCommandName — three sub-tests asserting the CommandName value surfaces in the error and the empty-string fallback.
  • TestMergeFileAsPatch — overlay (single-doc + non-empty patch), modeline-only byte-identity, multi-doc + modeline-only (the regression vector), multi-doc + non-empty strategic-merge patch (the realistic v1.12+ scenario), empty file, comments-only file.
  • TestApplyTemplatesPerNode_* — per-node single-node ctx; never-batches; empty-nodes errors loudly; maintenance mode opens a fresh client per iteration; auth mode writes the single-target node metadata key.
  • TestOpenClientPerNodeMaintenance_NarrowsAndRestoresGlobalNodes — drives the production opener with an injected maintenanceClientFunc fake that snapshots GlobalArgs.Nodes at the moment WithClientMaintenance reads it.
  • TestTemplateAndApplyDiverge_NodeBodyOverlayLimitation — pins the intentional template/apply divergence so a future "make template match apply" change is caught.
  • TestBuildApplyRenderOptions — pins that the apply caller actually sets CommandName (independent of the engine-side coverage).
  • golangci-lint run ./... is clean.

Closes

Summary by CodeRabbit

Release Notes

  • New Features

    • Node YAML files can now include a modeline "overlay" body that applies as a strategic merge patch during talm apply, preserving per-node configurations even when templates generate conflicting values.
    • Use talm apply --dry-run to preview exact bytes sent to nodes.
  • Documentation

    • Added clarification on overlay behavior differences between talm apply and talm template commands.

lexfrei added 9 commits April 16, 2026 14:36
The field has been declared on engine.Options since the file was created
(c567135, 2024-05) but was never read by Render or any helper. apply.go
and template.go were assigning it from their respective --insecure flags;
apply_test.go asserted the assignment. Nothing else looked at the value.

The runtime guarantee — --insecure (maintenance mode) bypasses
FailIfMultiNodes — is provided by WithClientMaintenance not injecting
nodes into the gRPC context, not by anything inside engine.Render. The
dead field added nothing on top of that.

Drop the field, the assignments, and the assertion. golangci-lint is
clean.

Closes #123

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
helpers.FailIfMultiNodes(ctx, name) embeds the name in its error
message. The call site in engine.Render hardcoded "talm template",
which was accurate when Render was only called from the template
subcommand. PR #119 made apply call Render too, so users running
`talm apply` with a multi-node modeline saw an error talking about
`talm template` — confusing.

Add Options.CommandName, default to "talm" when empty, set
"talm apply" in apply's buildApplyRenderOptions and "talm template"
in template's option-build. TestRenderFailIfMultiNodes_UsesCommandName
covers both subcommands plus the empty-string fallback and explicitly
asserts the historical "talm template" no longer leaks into the apply
case.

Closes #121

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…turns

The per-file loop in apply had two branches with asymmetric error
handling:

  - template-render: `err = withApplyClient(...)` (assigning the outer
    loop variable) followed by an outer `if err != nil { return err }`
  - direct-patch:    `if err := withApplyClient(...); err != nil { return err }`
    (local scope, immediate return)

The outer check only caught errors from the template-render branch.
Any future tweak to the direct-patch branch could silently swallow an
error — exactly the shadowing class that already bit the project once
(commit b34781e).

Switch the template-render branch to the same local-scope idiom and
delete the outer check. Both branches now read identically and there
is no dead code path to misuse.

Closes #122

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Before: when a node file's modeline contained `templates=[...]`, both
`talm apply -f node.yaml` and `talm template -f node.yaml` rendered
the listed templates and discarded the rest of the node file. Per-node
hostname, secondary interfaces with deviceSelector, VIP placement, etcd
extraArgs, certSANs and time servers all silently vanished — a
multi-NIC control plane could not bootstrap because the etcd
advertised-subnet interface was missing from the applied config.

Add engine.MergeFileAsPatch using Talos configpatcher (LoadPatches +
Apply). Apply the node file as a strategic merge patch over the
rendered template in both:

  - apply.go template-rendering branch (after engine.Render, before
    c.ApplyConfiguration)
  - template.go templateWithFiles (after generateOutput, before
    inplace-write or stdout)

Same merge step in both keeps the documented piped flow
`talm template -f X | talm apply -f -` carrying the node body through
end to end.

A modeline-only node file (no Talos config body) becomes a patch with
empty content; LoadPatches still returns a patch object, but Apply has
nothing to merge — every rendered field round-trips through the
configpatcher's loader unchanged.

TestMergeFileAsPatch covers both paths: the body-overlay case asserts
the custom hostname and the deviceSelector secondary interface land in
the merged output and the auto-generated hostname is gone; the
modeline-only case asserts every rendered field survives.

Closes #126

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
PR #119 moved talm apply's template rendering from offline to online.
The new online path runs inside withApplyClient, whose
wrapWithNodeContext helper batches every node from GlobalArgs.Nodes
into a single gRPC context. engine.Render then calls
helpers.FailIfMultiNodes which rejects multi-node contexts, so a
node file with `nodes=[ip1, ip2]` (or `--nodes A,B,C` on the
command line) fails before any rendering happens. The pre-#119
direct-patch flow handled multi-node fan-out at the wire level inside
ApplyConfiguration, so this never surfaced.

Render once per node instead. Add applyTemplatesPerNode that takes a
nodes slice and injection points for the render and apply functions,
then iterates: for each node it builds a single-node context with
client.WithNodes, calls engine.Render (which now sees one node and
satisfies the FailIfMultiNodes guard), merges the modeline'd file as
a patch, and applies the result. Per-node iteration is also the
correct semantic — discovery via lookup() resolves each node's own
network topology rather than mashing everything together.

Split withApplyClient into a public form (still wraps with the legacy
multi-node context for the direct-patch branch) and withApplyClientBare
which skips the wrap so the per-node loop can attach contexts itself.

Three tests pin the contract: that each render and apply call sees
exactly one node in its outgoing-context metadata; that engine.Render
with a batched multi-node context still errors (the pre-condition the
loop exists to satisfy); that an empty nodes slice errors loudly
rather than silently doing nothing. Manually verified the loop assertion
fails when the per-iteration context is built with all nodes instead
of one.

Closes #120

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…ure mode

Branch review caught four blockers and three majors in the original
implementation. Fixes:

- engine.MergeFileAsPatch now short-circuits modeline-only patch files
  (BLOCKER 2). The Talos config-patcher misclassifies a YAML-comment-only
  file as an empty JSON6902 patch, then refuses any multi-document
  rendered config (the v1.12+ output format) with 'JSON6902 patches are
  not supported for multi-document machine configuration'. Detect 'no
  meaningful YAML content' before LoadPatches and return rendered
  unchanged in that case. Fix also covers empty files and files with only
  comments / document separators / whitespace.

- talm template no longer routes its output through MergeFileAsPatch
  (BLOCKER 1, BLOCKER 5, MEDIUM 8). The patcher round-trips through its
  config loader, normalising YAML formatting, sorting keys, and (worst
  of all) stripping every YAML comment — which deletes the talm modeline
  ('# talm: ...') and the AUTOGENERATED warning that downstream commands
  rely on. Removing the template-side merge keeps stdout and --in-place
  outputs intact. The merge stays in apply because that output goes
  straight to ApplyConfiguration where formatting and comments do not
  matter.

- The pipe-flow comment that motivated the template-side merge is gone
  (MAJOR 6). talm apply does not in fact accept stdin via '-f -'
  (ExpandFilePaths only handles real paths), so the documented workflow
  was a fiction.

- Insecure (maintenance) mode now opens a fresh single-endpoint client
  per node (BLOCKER 4). WithClientMaintenance creates one client with
  every endpoint and gRPC then round-robins ApplyConfiguration across
  them, so most nodes never received the config. Narrow GlobalArgs.Nodes
  to the iteration's node and restore it via defer; this requires
  splitting applyTemplatesPerNode's client-acquisition into an injected
  openClientFunc with two production implementations
  (openClientPerNodeMaintenance, openClientPerNodeAuth).

- TestApplyTemplatesPerNode_BatchedContextIsRejected was renamed and
  rewritten as TestApplyTemplatesPerNode_NeverBatchesNodes, which
  asserts the loop itself never produces a multi-node ctx (MAJOR 7).
  The previous test happened to call engine.Render and pass; it would
  not have caught a regression in applyTemplatesPerNode.

- Two new tests cover the insecure path
  (TestApplyTemplatesPerNode_MaintenanceModeOpensFreshClientPerNode and
  TestOpenClientPerNodeMaintenance_NarrowsAndRestoresGlobalNodes).
  TestMergeFileAsPatch grew three sub-tests covering multi-doc patching,
  empty files, and comments-only files. All five new tests fail without
  the corresponding fix.

- Doc comments cleaned up (MINORS 9, 10, 11): the FailIfMultiNodes
  rationale lives in one place (applyTemplatesPerNode); MergeFileAsPatch
  wraps the out.Bytes() error with file context; withApplyClient and
  withApplyClientBare now describe what they actually do.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…ferences

- engine.MergeFileAsPatch now reads the patch file once (was twice — one
  ReadFile and one LoadPatches with @-syntax) by switching from
  configpatcher.LoadPatches to LoadPatch on the already-read bytes. No
  TOCTOU window between the empty-detection check and patch loading.

- New TestMergeFileAsPatch sub-test covers the realistic Talos v1.12+
  apply scenario the previous test set missed: multi-document rendered
  config (legacy machine/cluster doc plus typed HostnameConfig and
  LinkConfig docs) overlaid by a non-empty strategic-merge patch in the
  node file. The assertion checks that the legacy machine doc absorbs the
  patch and the sibling typed docs survive untouched.

- New TestTemplateAndApplyDiverge_NodeBodyOverlayLimitation pins the
  intentional divergence between talm template and talm apply for files
  with a non-empty body. Apply overlays the body via the config-patcher
  before sending; template emits the rendered bytes verbatim so the
  modeline and the AUTOGENERATED-warning comment survive in stdout/in-
  place output. A future patch that tries to make template match apply
  will trip this test.

- openClientPerNodeAuth now uses client.WithNode (single-target
  metadata) instead of client.WithNodes (aggregation metadata). The
  per-iteration intent is one node, not a collection of one — naming
  matches semantics. Test helper nodesFromOutgoingCtx reads either key
  so the assertions stay valid.

- Removed all internal-tracker references and review-round annotations
  from test docstrings. Tests are now self-describing: each one names
  the contract it pins, not the issue ticket that motivated it.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
- Sync openClientFunc docstring with the WithNode (single-target)
  metadata key the auth-mode opener actually uses; the old docstring
  still mentioned WithNodes from the previous version.

- TestOpenClientPerNodeMaintenance_NarrowsAndRestoresGlobalNodes now
  drives the production openClientPerNodeMaintenance with an injected
  fake maintenanceClientFunc instead of an inline stub. The fake
  snapshots GlobalArgs.Nodes at the moment WithClientMaintenance reads
  it, proving the narrow-and-restore-via-defer logic is exercised end
  to end. A regression in the production function now fails this test
  (previously it would have continued to pass against the test-local
  stub).

- New TestApplyTemplatesPerNode_AuthModeUsesSingleNodeMetadataKey pins
  the gRPC metadata key the auth-mode opener writes so a future swap
  back to WithNodes (which would round-trip through nodesFromOutgoingCtx
  unnoticed) gets caught.

- README documents that node files can carry per-node patches in their
  body, that talm apply applies them as a strategic merge over the
  rendered template, and that talm template intentionally does not.
  Recommends apply --dry-run for previewing the exact bytes apply will
  send.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
CommandName is read by engine.Render to format the FailIfMultiNodes
error wording. The engine-side assertion (TestRenderFailIfMultiNodes_UsesCommandName)
verifies Render does the right thing with whatever value it receives, but
nothing was pinning that buildApplyRenderOptions sets it to "talm apply"
in the first place. A future refactor that drops the field would slip
through unnoticed.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 16, 2026

📝 Walkthrough

Walkthrough

The changes implement per-node template rendering with merge patch semantics in talm apply. When a node file specifies templates via modeline, the command now renders the template once per node, merges the node file as a strategic patch on top of rendered output, and applies the merged config. This preserves per-node customizations defined in node files.

Changes

Cohort / File(s) Summary
Documentation
README.md
Added explanation of overlay body semantics: node file non-modeline content is merged as a strategic patch on top of template output before applying to the node. Clarified that talm template does not apply overlays and that apply --dry-run previews exact bytes sent.
Apply command core logic
pkg/commands/apply.go
Refactored template-rendering flow to render and merge per-node via new applyTemplatesPerNode loop. Introduced two per-node client openers: openClientPerNodeMaintenance (fresh client per node with narrowed GlobalArgs.Nodes) and auth-mode reuse via client.WithNode. Split withApplyClient into bare variant avoiding node metadata injection. Removed Insecure from options; adjusted error propagation.
Apply test suite
pkg/commands/apply_test.go
Updated existing test; added helpers nodesFromOutgoingCtx and fakeAuthOpenClient. Introduced six new test cases verifying per-node iteration semantics, single-node context per iteration, maintenance/auth client behavior, empty-nodes error handling, and a divergence contract test using engine.MergeFileAsPatch.
Template command cleanup
pkg/commands/template.go
Removed unused Insecure field from engine.Options assignment. Added CommandName: "talm template" to explicitly identify caller in rendering engine.
Engine refactoring
pkg/engine/engine.go
Removed Insecure field and added CommandName field to Options (defaults to "talm"). Implemented MergeFileAsPatch(rendered []byte, patchFile string) to apply node file as strategic merge patch, short-circuiting on modeline-only/empty patches. Updated Render() to use opts.CommandName in FailIfMultiNodes call via new helper isEffectivelyEmptyYAML.
Engine tests
pkg/engine/render_test.go
Added TestMergeFileAsPatch covering overlay behavior (node-file overrides, additional content, byte identity for modeline-only/empty/comment-only patches, multi-document preservation). Added TestRenderFailIfMultiNodes_UsesCommandName asserting error messages use opts.CommandName without hardcoded subcommand names.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Apply as apply.go
    participant Client as Talos Client
    participant Engine as engine.go
    participant Node as Talos Node

    User->>Apply: talm apply -f node.yaml<br/>(with templates= in modeline)
    
    loop Per-node (applyTemplatesPerNode)
        Apply->>Client: Open per-node client<br/>(maintenance or auth mode)
        Apply->>Engine: Render(ctx, opts)<br/>render template
        Engine-->>Apply: rendered bytes
        Apply->>Engine: MergeFileAsPatch<br/>(rendered, node.yaml)
        Engine->>Engine: Load patch file<br/>Apply strategic merge
        Engine-->>Apply: merged bytes
        Apply->>Client: ApplyConfiguration(merged)
        Client->>Node: Apply merged config
        Node-->>Client: OK
        Client-->>Apply: Success
    end
    
    Apply-->>User: Applied to all nodes
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 Per-node rendering hops into place,
Merging templates with customization grace,
No more lost hostnames or interfaces stripped,
The overlay patch ensures nothing slipped! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: applying batch processing fixes to multi-node handling, node-file patch merging, and error handling in the apply command.
Linked Issues check ✅ Passed All five linked issues (#120, #121, #122, #123, #126) are directly addressed: per-node rendering (#120), CommandName threading (#121), unified error handling (#122), Insecure field removal (#123), and node-file patch merging (#126).
Out of Scope Changes check ✅ Passed All changes are directly scoped to the five linked issues; no unrelated modifications detected in code or documentation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/apply-batch-multi-node-and-patch-merge

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the talm apply template-rendering path to process nodes individually, enabling per-node configuration overlays via strategic merge patches. It introduces the MergeFileAsPatch utility and adds comprehensive tests for both authenticated and maintenance modes. Review feedback suggests ensuring node resolution falls back to the Talos context when CLI arguments are missing, supporting multi-document patches in node files, improving error messaging for missing nodes, and optimizing performance by moving file I/O for patches outside the per-node loop.

Comment thread pkg/commands/apply.go
Comment on lines +143 to +144
if err := withApplyClientBare(func(parentCtx context.Context, c *client.Client) error {
openClient := openClientPerNodeAuth(parentCtx, c)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In the authenticated template rendering path, the list of nodes is captured from GlobalArgs.Nodes before the client is created. Unlike the direct patch path (which uses wrapWithNodeContext), this path does not fall back to the nodes defined in the talosconfig context if GlobalArgs.Nodes is empty. This means talm apply will fail if nodes are not explicitly provided via CLI or modeline, even if they are defined in the current Talos context.

if err := withApplyClientBare(func(parentCtx context.Context, c *client.Client) error {
					if len(nodes) == 0 {
						if configContext := c.GetConfigContext(); configContext != nil {
							nodes = configContext.Nodes
						}
					}
					openClient := openClientPerNodeAuth(parentCtx, c)

Comment thread pkg/engine/engine.go
Comment on lines +233 to +237
patch, err := configpatcher.LoadPatch(patchBytes)
if err != nil {
return nil, fmt.Errorf("loading patch from %s: %w", patchFile, err)
}
out, err := configpatcher.Apply(configpatcher.WithBytes(rendered), []configpatcher.Patch{patch})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

configpatcher.LoadPatch is used to load the patch from the node file. However, LoadPatch typically only handles a single YAML document. Since Talos v1.12+ and talm templates often produce multi-document configurations, users might need to provide multi-document patches in their node files to target different parts of the configuration. Using configpatcher.LoadPatches (plural) would be more consistent with other parts of the codebase (like FullConfigProcess) and provide better support for complex configurations.

Suggested change
patch, err := configpatcher.LoadPatch(patchBytes)
if err != nil {
return nil, fmt.Errorf("loading patch from %s: %w", patchFile, err)
}
out, err := configpatcher.Apply(configpatcher.WithBytes(rendered), []configpatcher.Patch{patch})
patch, err := configpatcher.LoadPatches([]string{string(patchBytes)})
if err != nil {
return nil, fmt.Errorf("loading patches from %s: %w", patchFile, err)
}
out, err := configpatcher.Apply(configpatcher.WithBytes(rendered), patch)

Comment thread pkg/commands/apply.go
apply applyFunc,
) error {
if len(nodes) == 0 {
return fmt.Errorf("no nodes specified for template-rendering apply")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error message when no nodes are specified is less descriptive than the one used in the template command. Providing a consistent and helpful error message that suggests how to fix the issue (e.g., using the --nodes flag or modeline) would improve the user experience.

Suggested change
return fmt.Errorf("no nodes specified for template-rendering apply")
return fmt.Errorf("nodes are not set for the command: please use '--nodes' flag or configuration file to set the nodes to run the command against")

Comment thread pkg/engine/engine.go
// ApplyConfiguration) but unsuitable for human-facing output such as
// `talm template` — which is why the template subcommand does not call
// this helper.
func MergeFileAsPatch(rendered []byte, patchFile string) ([]byte, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The MergeFileAsPatch function reads the patchFile from disk and performs an 'effectively empty' check on every call. Since this function is called within a loop over nodes in applyTemplatesPerNode (via renderMergeAndApply), it leads to redundant file I/O and processing. For large clusters, this could become a performance bottleneck. Consider refactoring this to load the patch data once outside the per-node loop and pass the pre-loaded data or patch object to the rendering logic.

@lexfrei lexfrei self-assigned this Apr 16, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/commands/apply.go`:
- Around line 251-265: applyTemplatesPerNode currently re-uses a single
configFile body for every node and calls renderMergeAndApply for each target,
which causes node-specific overlays (hostname, addresses, VIPs) to be applied to
all nodes; update applyTemplatesPerNode to detect when the provided configFile
contains a non-empty overlay and, if len(nodes) > 1, reject the operation with a
clear error, or alternatively resolve/split the overlay per target before
entering the for loop so each openClient call uses a node-specific config;
ensure the same change/error check is applied to the other similar call site
that invokes renderMergeAndApply (the second occurrence referenced in the
comment).

In `@README.md`:
- Around line 143-158: Adjust the paragraph about "Per-node patches inside node
files" to avoid claiming legacy node-body fields (e.g., deviceSelector
interfaces, VIPs, machine.network.interfaces) survive talm apply for Talos
v1.12+ multi-doc output: either restrict the statement to legacy/single-doc
template behavior only, or explicitly add that for v1.12+ multi-doc mode users
must patch the typed resources
(LinkConfig/BondConfig/VLANConfig/Layer2VIPConfig) because the multi-doc path
intentionally ignores legacy machine.network.interfaces due to lack of a safe
1:1 translation; mention talm apply -f node.yaml and talm template -f node.yaml
where appropriate to guide readers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d8100954-d5ef-4b44-96b9-51a4d161b29a

📥 Commits

Reviewing files that changed from the base of the PR and between 00358fd and 405b5ca.

📒 Files selected for processing (6)
  • README.md
  • pkg/commands/apply.go
  • pkg/commands/apply_test.go
  • pkg/commands/template.go
  • pkg/engine/engine.go
  • pkg/engine/render_test.go

Comment thread pkg/commands/apply.go
Comment on lines +251 to +265
func applyTemplatesPerNode(
opts engine.Options,
configFile string,
nodes []string,
openClient openClientFunc,
render renderFunc,
apply applyFunc,
) error {
if len(nodes) == 0 {
return fmt.Errorf("no nodes specified for template-rendering apply")
}
for _, node := range nodes {
if err := openClient(node, func(ctx context.Context, c *client.Client) error {
return renderMergeAndApply(ctx, c, opts, configFile, render, apply)
}); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't replay the same node-body overlay across a multi-node modeline.

applyTemplatesPerNode renders per target, but renderMergeAndApply merges the same configFile body on every iteration. For a file whose modeline targets ["10.0.0.1","10.0.0.2"], a pinned hostname, address, or VIP in that body now gets stamped onto both machines. Please reject non-empty overlays when len(nodes) > 1, or split/resolve them per target before entering this loop.

Also applies to: 311-320

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/commands/apply.go` around lines 251 - 265, applyTemplatesPerNode
currently re-uses a single configFile body for every node and calls
renderMergeAndApply for each target, which causes node-specific overlays
(hostname, addresses, VIPs) to be applied to all nodes; update
applyTemplatesPerNode to detect when the provided configFile contains a
non-empty overlay and, if len(nodes) > 1, reject the operation with a clear
error, or alternatively resolve/split the overlay per target before entering the
for loop so each openClient call uses a node-specific config; ensure the same
change/error check is applied to the other similar call site that invokes
renderMergeAndApply (the second occurrence referenced in the comment).

Comment thread README.md
Comment on lines +143 to +158
> **Per-node patches inside node files.** A node file can carry Talos config
> below its modeline (for example, a custom `hostname`, secondary
> interfaces with `deviceSelector`, VIP placement, or extra etcd args).
> When `talm apply -f node.yaml` runs the template-rendering branch, that
> body is applied as a strategic merge patch on top of the rendered
> template before the result is sent to the node — so per-node fields
> survive even when the template auto-generates conflicting values
> (e.g. `hostname: talos-XXXXX`).
>
> `talm template -f node.yaml` (with or without `-I`) does **not** apply
> the same overlay: its output is the rendered template plus the modeline
> and the auto-generated warning, byte-identical to what the template
> alone would produce. Routing it through the patcher would drop every
> YAML comment (including the modeline) and re-sort keys, breaking
> downstream commands that read the file back. Use `apply --dry-run` if
> you want to preview the exact bytes that will be sent to the node.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Scope this overlay note away from Talos v1.12+ network overrides.

This text currently promises that legacy node-body fields such as deviceSelector interfaces and VIPs survive talm apply, but on the multi-doc path those machine.network.interfaces fragments still do not have a safe 1:1 mapping to LinkConfig/BondConfig/VLANConfig/Layer2VIPConfig. As written, this will mislead v1.12+ users into relying on overrides that are not represented semantically the way the docs imply. Please either limit the claim to legacy/single-doc Talos output or explicitly say that v1.12+ requires patching the typed resources instead.

Based on learnings, the multidoc path intentionally ignores legacy machine.network.interfaces because it has no safe 1:1 translation to Talos v1.12 typed resources.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 143 - 158, Adjust the paragraph about "Per-node
patches inside node files" to avoid claiming legacy node-body fields (e.g.,
deviceSelector interfaces, VIPs, machine.network.interfaces) survive talm apply
for Talos v1.12+ multi-doc output: either restrict the statement to
legacy/single-doc template behavior only, or explicitly add that for v1.12+
multi-doc mode users must patch the typed resources
(LinkConfig/BondConfig/VLANConfig/Layer2VIPConfig) because the multi-doc path
intentionally ignores legacy machine.network.interfaces due to lack of a safe
1:1 translation; mention talm apply -f node.yaml and talm template -f node.yaml
where appropriate to guide readers.

Comment on lines +543 to +569
func TestTemplateAndApplyDiverge_NodeBodyOverlayLimitation(t *testing.T) {
const rendered = `version: v1alpha1
machine:
type: controlplane
network:
hostname: talos-abcde
`
dir := t.TempDir()
nodeFile := filepath.Join(dir, "node.yaml")
const nodeBody = `# talm: nodes=["10.0.0.1"]
machine:
network:
hostname: node0
`
if err := os.WriteFile(nodeFile, []byte(nodeBody), 0o644); err != nil {
t.Fatalf("write node file: %v", err)
}

// apply path: renders, then overlays.
merged, err := engine.MergeFileAsPatch([]byte(rendered), nodeFile)
if err != nil {
t.Fatalf("MergeFileAsPatch: %v", err)
}
if !strings.Contains(string(merged), "hostname: node0") {
t.Errorf("apply path must overlay hostname: node0; got:\n%s", string(merged))
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The 'template path' block in TestTemplateAndApplyDiverge_NodeBodyOverlayLimitation contains two assertions that are permanently vacuous: templateOutput := []byte(rendered) simply copies the local constant, so bytes.Equal(templateOutput, []byte(rendered)) is always true (the negated check never fires) and strings.Contains(string(templateOutput), "hostname: node0") is always false because the rendered constant contains hostname: talos-abcde. The test comment claims this block will catch a future 'make template match apply' regression, but no production template code is exercised, so such a regression would pass undetected. The apply-path half of the test (calling engine.MergeFileAsPatch) is legitimate; only the template-path assertions are hollow.

Extended reasoning...

What the bug is and how it manifests

In TestTemplateAndApplyDiverge_NodeBodyOverlayLimitation (pkg/commands/apply_test.go), the 'template path' block is supposed to pin the invariant that talm template does NOT overlay the node-file body onto the rendered output. Instead, the block performs no actual test: it creates templateOutput as a direct byte-copy of the rendered constant and then asserts two things that are structurally impossible to fail.

The specific code path that triggers it

Lines 543-569 of apply_test.go (template-path block):

Assertion 1: bytes.Equal(templateOutput, []byte(rendered)) compares rendered to itself. The result is always true, so the negated guard is always false and t.Error is never reached.

Assertion 2: strings.Contains(string(templateOutput), "hostname: node0") searches for node0 inside rendered, which contains hostname: talos-abcde — never node0. This condition is always false and t.Error is never reached.

Why existing code does not prevent it

Neither the compiler nor go test has any way to detect that a predicate is structurally always-false. The test compiles and passes on every run, regardless of what generateOutput(), templateWithFiles(), or any related production code does.

What the impact would be

If a future contributor makes talm template call engine.MergeFileAsPatch (the exact regression the test comment warns about), both assertions would still pass. The first passes because templateOutput is still the copy of the constant rendered, not the output of generateOutput; the second passes because rendered still does not contain node0. The test gives false confidence in the template/apply divergence invariant.

How to fix it

The template-path block should actually call the production template code — specifically generateOutput() (or an equivalent helper) — capture its output as templateOutput, and then assert the two properties. That way a change to generateOutput that starts applying the overlay will cause hostname: talos-abcde to be replaced by hostname: node0 in the output, making the second assertion fire as intended.

Step-by-step proof

  1. rendered is the constant string containing hostname: talos-abcde.
  2. templateOutput := []byte(rendered) creates a byte slice with exactly the same content as rendered.
  3. bytes.Equal(templateOutput, []byte(rendered)) — both sides are identical byte sequences → always returns true.
  4. The guard \!bytes.Equal(...) is always false → the first t.Error is unreachable on every run.
  5. strings.Contains(string(templateOutput), "hostname: node0") searches for node0 in a string that only contains talos-abcde → always returns false.
  6. The second t.Error is also unreachable on every run.
  7. Both assertions pass permanently regardless of what the production template code does, providing only false confidence in the divergence invariant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant