Skip to content
Open
17 changes: 17 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,23 @@ Re-template and update generated file in place (this will overwrite it):
talm template -f nodes/node1.yaml -I
```

> **Per-node patches inside node files.** A node file can carry Talos config
> below its modeline (for example, a custom `hostname`, secondary
> interfaces with `deviceSelector`, VIP placement, or extra etcd args).
> When `talm apply -f node.yaml` runs the template-rendering branch, that
> body is applied as a strategic merge patch on top of the rendered
> template before the result is sent to the node — so per-node fields
> survive even when the template auto-generates conflicting values
> (e.g. `hostname: talos-XXXXX`).
>
> `talm template -f node.yaml` (with or without `-I`) does **not** apply
> the same overlay: its output is the rendered template plus the modeline
> and the auto-generated warning, byte-identical to what the template
> alone would produce. Routing it through the patcher would drop every
> YAML comment (including the modeline) and re-sort keys, breaking
> downstream commands that read the file back. Use `apply --dry-run` if
> you want to preview the exact bytes that will be sent to the node.
Comment on lines +143 to +158
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Scope this overlay note away from Talos v1.12+ network overrides.

This text currently promises that legacy node-body fields such as deviceSelector interfaces and VIPs survive talm apply, but on the multi-doc path those machine.network.interfaces fragments still do not have a safe 1:1 mapping to LinkConfig/BondConfig/VLANConfig/Layer2VIPConfig. As written, this will mislead v1.12+ users into relying on overrides that are not represented semantically the way the docs imply. Please either limit the claim to legacy/single-doc Talos output or explicitly say that v1.12+ requires patching the typed resources instead.

Based on learnings, the multidoc path intentionally ignores legacy machine.network.interfaces because it has no safe 1:1 translation to Talos v1.12 typed resources.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 143 - 158, Adjust the paragraph about "Per-node
patches inside node files" to avoid claiming legacy node-body fields (e.g.,
deviceSelector interfaces, VIPs, machine.network.interfaces) survive talm apply
for Talos v1.12+ multi-doc output: either restrict the statement to
legacy/single-doc template behavior only, or explicitly add that for v1.12+
multi-doc mode users must patch the typed resources
(LinkConfig/BondConfig/VLANConfig/Layer2VIPConfig) because the multi-doc path
intentionally ignores legacy machine.network.interfaces due to lack of a safe
1:1 translation; mention talm apply -f node.yaml and talm template -f node.yaml
where appropriate to guide readers.


## Using talosctl commands

Talm offers a similar set of commands to those provided by talosctl.
Expand Down
167 changes: 140 additions & 27 deletions pkg/commands/apply.go
Original file line number Diff line number Diff line change
Expand Up @@ -113,32 +113,40 @@ func apply(args []string) error {
withSecretsPath := ResolveSecretsPath(applyCmdFlags.withSecrets)

if len(modelineTemplates) > 0 {
// Template rendering path: connect to the node first, render templates
// online (so lookup() functions resolve real discovery data), then apply.
// Template rendering path: render templates online per node and
// apply the rendered config plus the node file overlay. See
// applyTemplatesPerNode for why the loop is mandatory.
opts := buildApplyRenderOptions(modelineTemplates, withSecretsPath)
nodes := append([]string(nil), GlobalArgs.Nodes...)
fmt.Printf("- talm: file=%s, nodes=%s, endpoints=%s\n", configFile, GlobalArgs.Nodes, GlobalArgs.Endpoints)

err = withApplyClient(func(ctx context.Context, c *client.Client) error {
fmt.Printf("- talm: file=%s, nodes=%s, endpoints=%s\n", configFile, GlobalArgs.Nodes, GlobalArgs.Endpoints)

result, err := engine.Render(ctx, c, opts)
if err != nil {
return fmt.Errorf("template rendering error: %w", err)
}

applyClosure := func(ctx context.Context, c *client.Client, data []byte) error {
resp, err := c.ApplyConfiguration(ctx, &machineapi.ApplyConfigurationRequest{
Data: result,
Data: data,
Mode: applyCmdFlags.Mode.Mode,
DryRun: applyCmdFlags.dryRun,
TryModeTimeout: durationpb.New(applyCmdFlags.configTryTimeout),
})
if err != nil {
return fmt.Errorf("error applying new configuration: %w", err)
}

helpers.PrintApplyResults(resp)

return nil
})
}

if applyCmdFlags.insecure {
openClient := openClientPerNodeMaintenance(applyCmdFlags.certFingerprints, WithClientMaintenance)
if err := applyTemplatesPerNode(opts, configFile, nodes, openClient, engine.Render, applyClosure); err != nil {
return err
}
} else {
if err := withApplyClientBare(func(parentCtx context.Context, c *client.Client) error {
openClient := openClientPerNodeAuth(parentCtx, c)
Comment on lines +143 to +144
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In the authenticated template rendering path, the list of nodes is captured from GlobalArgs.Nodes before the client is created. Unlike the direct patch path (which uses wrapWithNodeContext), this path does not fall back to the nodes defined in the talosconfig context if GlobalArgs.Nodes is empty. This means talm apply will fail if nodes are not explicitly provided via CLI or modeline, even if they are defined in the current Talos context.

if err := withApplyClientBare(func(parentCtx context.Context, c *client.Client) error {
					if len(nodes) == 0 {
						if configContext := c.GetConfigContext(); configContext != nil {
							nodes = configContext.Nodes
						}
					}
					openClient := openClientPerNodeAuth(parentCtx, c)

return applyTemplatesPerNode(opts, configFile, nodes, openClient, engine.Render, applyClosure)
}); err != nil {
return err
}
}
} else {
// Direct patch path: apply config file as patch against empty bundle
opts := buildApplyPatchOptions(withSecretsPath)
Expand Down Expand Up @@ -173,9 +181,6 @@ func apply(args []string) error {
return err
}
}
if err != nil {
return err
}

// Reset args
if !applyCmdFlags.nodesFromArgs {
Expand All @@ -188,23 +193,131 @@ func apply(args []string) error {
return nil
}

// withApplyClient creates a Talos client appropriate for the current apply mode
// and invokes the given action with it.
// withApplyClient creates a Talos client appropriate for the current apply
// mode and invokes the given action with it. The action receives a context
// in which gRPC node metadata is set to the resolved node list — either
// GlobalArgs.Nodes (when set) or the talosconfig context's Nodes (when not).
// Used by the direct-patch branch where multi-node fan-out happens at the
// gRPC layer inside ApplyConfiguration.
func withApplyClient(f func(ctx context.Context, c *client.Client) error) error {
return withApplyClientBare(wrapWithNodeContext(f))
}

// withApplyClientBare connects to Talos for the current apply mode but does
// NOT inject node metadata into the context — leaving that decision to the
// caller. Used by the template-rendering path (see applyTemplatesPerNode for
// the rationale).
func withApplyClientBare(f func(ctx context.Context, c *client.Client) error) error {
if applyCmdFlags.insecure {
// Maintenance mode connects directly to the node IP without talosconfig;
// node context injection is not needed — the maintenance client handles
// node targeting internally via GlobalArgs.Nodes.
// Maintenance mode reads its endpoints directly from
// GlobalArgs.Nodes — gRPC node metadata is not consulted.
return WithClientMaintenance(applyCmdFlags.certFingerprints, f)
}

wrappedF := wrapWithNodeContext(f)

if GlobalArgs.SkipVerify {
return WithClientSkipVerify(wrappedF)
return WithClientSkipVerify(f)
}

return WithClientNoNodes(f)
}

// renderFunc, applyFunc and openClientFunc are injection points for
// applyTemplatesPerNode so unit tests can drive the loop with fakes instead
// of a real Talos client.
type renderFunc func(ctx context.Context, c *client.Client, opts engine.Options) ([]byte, error)
type applyFunc func(ctx context.Context, c *client.Client, data []byte) error

// openClientFunc opens a Talos client suitable for a single node and runs
// action with it. Authenticated mode reuses one parent client and rotates
// the node via single-target gRPC metadata (client.WithNode); insecure
// (maintenance) mode opens a fresh single-endpoint client per node because
// Talos's maintenance client ignores node metadata in the context and
// round-robins between its configured endpoints.
type openClientFunc func(node string, action func(ctx context.Context, c *client.Client) error) error

// applyTemplatesPerNode runs render → MergeFileAsPatch → apply once per
// node. Two reasons it has to iterate:
//
// - engine.Render's FailIfMultiNodes guard rejects a context that carries
// more than one node, so the auth-mode caller has to attach a single
// node per iteration — and discovery via lookup() should resolve each
// node's own topology in any case.
// - In insecure (maintenance) mode the client connects directly to a
// Talos node and ignores nodes-in-context entirely, so each node needs
// its own client; otherwise gRPC round-robins ApplyConfiguration
// across the endpoint list and most nodes never see the config.
//
// Both modes share this loop via openClient.
func applyTemplatesPerNode(
opts engine.Options,
configFile string,
nodes []string,
openClient openClientFunc,
render renderFunc,
apply applyFunc,
) error {
if len(nodes) == 0 {
return fmt.Errorf("no nodes specified for template-rendering apply")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error message when no nodes are specified is less descriptive than the one used in the template command. Providing a consistent and helpful error message that suggests how to fix the issue (e.g., using the --nodes flag or modeline) would improve the user experience.

Suggested change
return fmt.Errorf("no nodes specified for template-rendering apply")
return fmt.Errorf("nodes are not set for the command: please use '--nodes' flag or configuration file to set the nodes to run the command against")

}
for _, node := range nodes {
if err := openClient(node, func(ctx context.Context, c *client.Client) error {
return renderMergeAndApply(ctx, c, opts, configFile, render, apply)
}); err != nil {
Comment on lines +251 to +265
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't replay the same node-body overlay across a multi-node modeline.

applyTemplatesPerNode renders per target, but renderMergeAndApply merges the same configFile body on every iteration. For a file whose modeline targets ["10.0.0.1","10.0.0.2"], a pinned hostname, address, or VIP in that body now gets stamped onto both machines. Please reject non-empty overlays when len(nodes) > 1, or split/resolve them per target before entering this loop.

Also applies to: 311-320

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/commands/apply.go` around lines 251 - 265, applyTemplatesPerNode
currently re-uses a single configFile body for every node and calls
renderMergeAndApply for each target, which causes node-specific overlays
(hostname, addresses, VIPs) to be applied to all nodes; update
applyTemplatesPerNode to detect when the provided configFile contains a
non-empty overlay and, if len(nodes) > 1, reject the operation with a clear
error, or alternatively resolve/split the overlay per target before entering the
for loop so each openClient call uses a node-specific config; ensure the same
change/error check is applied to the other similar call site that invokes
renderMergeAndApply (the second occurrence referenced in the comment).

return fmt.Errorf("node %s: %w", node, err)
}
}
return nil
}

// maintenanceClientFunc is the contract WithClientMaintenance satisfies in
// production and a fake satisfies in tests. Injection lets the unit tests
// run the real openClientPerNodeMaintenance body without dialing a Talos
// node.
type maintenanceClientFunc func(fingerprints []string, action func(ctx context.Context, c *client.Client) error) error

// openClientPerNodeMaintenance returns an openClientFunc that opens a
// fresh single-endpoint maintenance client per node. Multi-node insecure
// apply (first bootstrap of a multi-node cluster) needs this because
// WithClientMaintenance creates a client with all endpoints and gRPC then
// round-robins ApplyConfiguration across them — most nodes never see the
// config. Narrowing GlobalArgs.Nodes to the current iteration's node and
// restoring it via defer keeps the wrapper's signature unchanged.
//
// mkClient is normally WithClientMaintenance; tests pass a fake that
// captures the GlobalArgs.Nodes value at the moment WithClientMaintenance
// would have read it.
func openClientPerNodeMaintenance(fingerprints []string, mkClient maintenanceClientFunc) openClientFunc {
return func(node string, action func(ctx context.Context, c *client.Client) error) error {
savedNodes := append([]string(nil), GlobalArgs.Nodes...)
GlobalArgs.Nodes = []string{node}
defer func() { GlobalArgs.Nodes = savedNodes }()
return mkClient(fingerprints, action)
}
}

// openClientPerNodeAuth returns an openClientFunc that reuses one
// authenticated client (the one withApplyClientBare opened above this
// callback) and rotates the addressed node via client.WithNode on the
// per-iteration context. WithNode (rather than WithNodes) sets the
// "node" metadata key for single-target proxying, which engine.Render's
// FailIfMultiNodes guard treats as one node.
func openClientPerNodeAuth(parentCtx context.Context, c *client.Client) openClientFunc {
return func(node string, action func(ctx context.Context, c *client.Client) error) error {
return action(client.WithNode(parentCtx, node), c)
}
}

return WithClientNoNodes(wrappedF)
// renderMergeAndApply is the per-node body shared by every apply mode.
func renderMergeAndApply(ctx context.Context, c *client.Client, opts engine.Options, configFile string, render renderFunc, apply applyFunc) error {
rendered, err := render(ctx, c, opts)
if err != nil {
return fmt.Errorf("template rendering: %w", err)
}
merged, err := engine.MergeFileAsPatch(rendered, configFile)
if err != nil {
return fmt.Errorf("merging node file as patch: %w", err)
}
return apply(ctx, c, merged)
}

// buildApplyRenderOptions constructs engine.Options for the template rendering path.
Expand All @@ -214,14 +327,14 @@ func withApplyClient(f func(ctx context.Context, c *client.Client) error) error
func buildApplyRenderOptions(modelineTemplates []string, withSecretsPath string) engine.Options {
resolvedTemplates := resolveTemplatePaths(modelineTemplates, Config.RootDir)
return engine.Options{
Insecure: applyCmdFlags.insecure,
TalosVersion: applyCmdFlags.talosVersion,
WithSecrets: withSecretsPath,
KubernetesVersion: applyCmdFlags.kubernetesVersion,
Debug: applyCmdFlags.debug,
Full: true,
Root: Config.RootDir,
TemplateFiles: resolvedTemplates,
CommandName: "talm apply",
}
}

Expand Down
Loading