feat(wizard): add interactive TUI wizard for cluster initialization#117
feat(wizard): add interactive TUI wizard for cluster initialization#117
Conversation
Extract core project generation logic from init command into an exported GenerateProject function. This enables the upcoming interactive TUI wizard to reuse the same generation logic without duplicating code. Changes: - Add GenerateProject(opts GenerateOptions) function with clean API - Add helper functions: writeFileIfNotExists, writeToFile, writeGitignoreForProject - Simplify interactive_init.go to a stub pending TUI implementation - Add comprehensive tests for GenerateProject covering both presets, force overwrite, invalid preset, and default version Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add the foundational wizard package with: - types.go: NodeInfo, Disk, NetInterface, NodeConfig, WizardResult - interfaces.go: Scanner interface for network discovery - validator.go: input validation for cluster names, hostnames, CIDR, endpoints, IPs, and node roles - validator_test.go: comprehensive table-driven tests (42 cases) These types and validators will be used by both the network scanner and the TUI wizard in subsequent PRs. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add scan package for discovering Talos nodes on the network: - parse.go: ParseNmapGrepOutput extracts IPs from nmap -oG output - scanner.go: NmapScanner implements wizard.Scanner interface using nmap for host discovery and talosctl for hardware info collection - CommandRunner interface enables mocking exec.Command in tests - Bounded parallelism (max 10 goroutines) for concurrent node queries - Individual node query failures do not abort the overall scan Tests use mock CommandRunner for deterministic, fast execution. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Implement interactive cluster initialization wizard using bubbletea: - tui/model.go: State machine with 11 steps (preset selection, cluster name, endpoint, CIDR scan, node selection, node config, confirmation, generation). Async operations via tea.Cmd for network scanning and config generation. Back navigation with Esc. - tui/views.go: Lipgloss-styled views for each step - tui/styles.go: Shared style definitions - tui/model_test.go: 16 tests covering state transitions, validation, error handling, back navigation, and view rendering - interactive_init.go: Wires bubbletea wizard to cobra command with GenerateProject as the generation callback Dependencies added: bubbletea, bubbles, lipgloss Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
- Fix nmap flag: --port is invalid, use -p instead - Rewrite GetNodeInfo to collect hostname, disks, and network interfaces via three separate talosctl commands with JSON output parsing - Add parse_talosctl.go: ParseHostname, ParseDisks, ParseLinks functions that parse NDJSON output from talosctl get commands - Filter non-physical interfaces (loopback, bonds, vlans) - Gracefully handle partial failures (individual command errors don't abort the overall node info collection) - Update all scanner tests for new multi-command flow Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add WriteNodeFiles function that creates nodes/<hostname>.yaml files with correct talm modelines. Each file contains: # talm: nodes=["<ip>"], endpoints=["<ip>"], templates=["templates/<role>.yaml"] This enables the standard talm workflow: talm template --file nodes/X.yaml followed by talm apply. Features: - Extracts bare IP from CIDR notation for modeline - Maps role to correct template (controlplane.yaml / worker.yaml) - Creates nodes/ directory if it doesn't exist - Skips existing files to avoid overwriting user edits Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Allow callers to override values.yaml fields after initial generation. The wizard uses this to inject the user-provided endpoint, subnets, and preset-specific fields (floatingIP, clusterDomain, etc.) into the generated values.yaml. Implementation: after writing preset files, if ValuesOverrides is provided, read values.yaml, merge overrides, and write back. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Complete the wizard implementation: - Expand node config form from 4 to 7 fields: role (editable), hostname, disk, interface, address, gateway, DNS - Add validation for all fields on submit (role, hostname, CIDR, gateway IP, DNS IPs) - First node defaults to controlplane, rest to worker (user can change) - Pre-fill fields from discovered hardware (hostname, disk, interface) - Add 'skip scan' flow: press 's' at CIDR step to enter IPs manually - Manual entry validates IPs, pre-selects all entered nodes - Wire generateFn to build ValuesOverrides from WizardResult and call WriteNodeFiles after project generation - Update all tests: 24 TUI tests covering manual entry, validation, role defaults, and view rendering Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Remove all external binary dependencies (nmap, talosctl) from the scanner in favor of pure Go implementations: - Replace nmap with Go TCP connect scanner (net.DialTimeout) that enumerates CIDR hosts and probes the Talos API port directly - Replace talosctl subprocess calls with Talos machinery gRPC client: c.Version() for hostname, c.Disks() for block devices, c.Memory() for RAM, helpers.ForEachResource for network links - Add extract.go with pure functions for parsing gRPC protobuf responses into wizard domain types - Add tcpscan.go with CIDR host enumeration (handles /24 through /32, skips network and broadcast addresses) - Delete parse.go, parse_test.go, parse_talosctl.go, parse_talosctl_test.go (no longer needed) - TCP scan tested with real localhost TCP listener - gRPC extraction tested with constructed protobuf structs Zero runtime dependencies on external binaries. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Fixes from code review: 1. stepDone now handles enter/q to quit (user was trapped after success) 2. Pass Config.TemplateOptions.TalosVersion through interactive wizard 3. mergeValuesOverrides rejects nested map overrides with clear error 4. collectNodeInfo logs warnings for failed node info collection 5. Remove duplicate writeGitignoreForProject, reuse writeGitignoreFile 6. Fix writeFileIfNotExists double existence check (check once, write once) 7. Document why interactive is a root command (flag conflict avoidance) 8. Remove unused WizardResult fields (OIDCIssuerURL, NrHugepages) 9. handleBack from configureNode now calls prepareNodeInputs to restore previous node's data in the input fields 10. enumerateHosts /32 uses ipNet.IP consistently instead of ip Non-blocking: - Remove dead filterPhysicalInterfaces function - Fix fragile port in TestScanTCPPort_NoOpenPort (use closed listener) - Guard empty endpoint in buildValuesOverrides All fixes have corresponding tests. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Review fixes (iteration 2): 1. Eliminate global state race in GenerateProject: extract writeGitignoreForDir(rootDir) instead of mutating Config.RootDir 2. Pass cancellable context to scanNetworkCmd; cancel on Ctrl+C to prevent background goroutine leak during network scanning 3. Reject CIDR ranges larger than /16 in enumerateHosts to prevent allocation of millions of IPs (16M+ for /8) 4. Replace bogus TestValuesOverridesRejectsNestedMaps with real unit tests for mergeValuesOverrides using prepared values.yaml files 5. Skip non-Talos nodes in collectNodeInfo when GetNodeInfo fails (connection errors mean the host is not a Talos node) 8. Add interactive wizard to README Getting Started section Remove accidental pkg/commands/.gitignore artifact Non-blocking: - ValidateEndpoint: add u.Host emptiness check for robustness - Document ValidateHostname as single-label only (no FQDNs) - Add tests for buildValuesOverrides (empty endpoint guard, field population) - Add tests for enumerateHosts /8 rejection and /16 acceptance Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
1. writeGitignoreForDir now accepts kubeconfigName parameter instead of reading from global Config state — eliminates race condition 2. TalosScanner.Timeout field is now used in GetNodeInfo (was dead code) 3. initCmd.RunE now calls GenerateProject for core generation, removing ~100 lines of duplicated logic. Dead functions writeSecretsBundleToFile, validateFileExists, writeToDestination removed 4. ValidateHostname length limit fixed: 63 chars (DNS label) not 253 (FQDN) 5. collectNodeInfo returns informative error when TCP scan finds hosts but none respond as Talos nodes via gRPC 6. WriteNodeFiles sanitizes hostname via filepath.Base to prevent path traversal (e.g. '../escape' becomes 'escape') All fixes have corresponding tests. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
1. GenerateProject uses KubeconfigName from opts instead of global Config 2. Interactive wizard prints encryption warning after generation 3. Error message no longer references --force (not available in interactive) 4. collectNodeInfo sorts results by IP for deterministic ordering 5. ValidateEndpoint pre-checks for https:// prefix with clear error 6. mergeValuesOverrides docstring warns about comment/ordering loss 7. handleBack from error returns to the step that triggered it (prevStep) 8. Skip-scan uses ctrl+s instead of bare 's' to avoid conflict with input 9. Test comments cleaned up (no internal review numbering) 10. enumerateHosts comment improved for clarity Also: initCmd.RunE refactored to call GenerateProject, removing 3 unused functions (writeSecretsBundleToFile, validateFileExists, writeToDestination) Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
…leak 1. GenerateProject now skips existing files silently (Force=false) instead of erroring. Fixes regression where 'talm init' on projects with encrypted files would fail because decrypted files already exist. 2. KubeconfigName passed explicitly via GenerateOptions, removing last global Config dependency from GenerateProject. 3. handleBack from stepConfirm decrements currentNodeIdx and calls prepareNodeInputs — fixes out-of-bounds panic. 4. prevStep uses *step pointer to distinguish nil (no previous) from stepSelectPreset (value 0). Error recovery correctly returns to any step including the first one. 5. Esc from stepScanning cancels the scan context and returns to CIDR step. Fixes context leak where background scan goroutines continued after user navigated away. All fixes have corresponding tests. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
…dation 1. Manual node entry uses ctrl+d instead of bare 'd' to avoid intercepting text input characters (IPv6 contains 'd') 2. writeGitignoreForDir tracks file existence before write and prints correct Created/Updated message 3. scanTCPPort uses net.Dialer.DialContext instead of net.DialTimeout for context-aware cancellation during scanning 4. validateAndBuildNodeConfig rejects empty DiskPath (required for Talos installation) 5. Scan warnings (nodes found by TCP but failed gRPC) are surfaced via ScanResult.Warnings and displayed in the node selection view 6. IP sorting uses net.ParseIP for correct numeric ordering Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
1. Use bytes.Compare on net.IP for correct numeric IP sorting 2. Log collectLinks gRPC errors instead of silently discarding 3. Use ScanNetworkFull in TUI to surface scan warnings (nodes found by TCP but failed gRPC) in the node selection view 4. Add ScanNetworkFull to Scanner interface, move ScanResult to wizard package for clean interface boundaries 5. Require non-empty DiskPath in node configuration validation 6. Fix handleBack from confirm: always remove last configured node to prevent duplicate entries on back-forward navigation 7. Document that mergeValuesOverrides replaces entire lists (shallow) 8. Add idempotency test for GenerateProject (run twice, verify no overwrites) and list-replacement test for mergeValuesOverrides 9. Manual node entry done key changed to ctrl+d (consistent pattern) Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
… ctx 1. Guard scanResultMsg/scanErrorMsg handlers with step check to ignore stale results from cancelled scans that arrive after the user navigated away from stepScanning 2. WriteNodeFiles validates hostname via ValidateHostname and rejects '/' and other invalid characters (filepath.Base alone is insufficient) 3. scanTCPPort checks ctx.Err() after wg.Wait() to return cancellation error instead of partial results All fixes have corresponding tests. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
1. Require non-empty Address (CIDR) in node configuration — prevents empty IP in modeline 2. mergeValuesOverrides skips gracefully when values.yaml doesn't exist 3. mergeValuesOverrides rejects ANY override of a map key (not just map-to-map), preventing silent data loss from scalar-over-map 4. GetNodeInfo returns error when all gRPC calls succeed but return no useful data (ghost nodes) 5. Re-check keyFileExists from disk before kubeconfig encryption (handleTalosconfigEncryption may have created the key) 6. WriteNodeFiles rejects duplicate hostnames with clear error 7. Tests for all: empty address, scalar-over-map, duplicate hostnames, slash hostname Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
1. GenerateProject validates ClusterName is non-empty 2. mergeValuesOverrides rejects map-valued overrides for ANY key (new or existing), enforcing flat-only override semantics 3. Scan context uses 5-minute timeout as safety net 4. README uses long flags (--preset, --name) instead of short All fixes have corresponding tests. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Cancel the scan context and clear cancelScan when scanResultMsg or scanErrorMsg is received, preventing a 5-minute timer goroutine leak. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 0 minutes and 4 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (16)
📝 WalkthroughWalkthroughThis PR introduces an interactive terminal wizard for cluster initialization via Bubble Tea TUI. It adds network scanning (TCP + Talos gRPC APIs), node discovery, and optional configuration overrides. The non-interactive Changes
Sequence DiagramsequenceDiagram
participant User
participant TUI as TUI Model
participant Scanner
participant Talos as Talos API
participant Generator
participant Disk
User->>TUI: Run talm interactive
TUI->>TUI: Display preset selection
User->>TUI: Select preset & cluster name
TUI->>TUI: Display endpoint & CIDR entry
User->>TUI: Enter endpoint & CIDR
TUI->>Scanner: ScanNetwork(cidr)
Scanner->>Talos: TCP scan CIDR (port 50000)
Talos-->>Scanner: Open ports
Scanner->>Talos: GetNodeInfo (hostname, disks, memory, interfaces)
Talos-->>Scanner: NodeInfo (per IP)
Scanner-->>TUI: []NodeInfo + warnings
TUI->>TUI: Display discovered nodes
User->>TUI: Select nodes & configure roles/addresses
User->>TUI: Confirm
TUI->>Generator: GenerateProject(opts + ValuesOverrides)
Generator->>Disk: Write secrets.yaml, talosconfig, nodes/
Disk-->>Generator: Files created
Generator-->>TUI: Success
TUI->>User: Show completion message
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces an interactive TUI wizard for cluster initialization and refactors the project generation logic into a reusable function. The changes include a new network scanner for Talos node discovery, comprehensive validation utilities, and extensive unit tests. Review feedback highlights the need to pass the user-provided API endpoint through the generation options to avoid hardcoded placeholders in the resulting talosconfig and suggests optimizing the network scanner by using direct type assertions instead of redundant YAML marshaling.
| ValuesOverrides map[string]interface{} // optional: merge into generated values.yaml | ||
| } |
There was a problem hiding this comment.
The GenerateOptions struct is missing a field for the cluster endpoint. Since the interactive wizard now collects this information, it should be passed to GenerateProject so it can be used when generating the talosconfig file.
| ValuesOverrides map[string]interface{} // optional: merge into generated values.yaml | |
| } | |
| ValuesOverrides map[string]interface{} // optional: merge into generated values.yaml | |
| Endpoint string // optional: API server endpoint for talosconfig | |
| } |
There was a problem hiding this comment.
Done in 6583d42 — GenerateOptions.Endpoint threaded through to talosconfig generation.
pkg/commands/init.go
Outdated
| configBundle, err := gen.GenerateConfigBundle(genOptions, opts.ClusterName, "https://192.168.0.1:6443", "", []string{}, []string{}, []string{}) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| configBundle.TalosConfig().Contexts[opts.ClusterName].Endpoints = []string{"127.0.0.1"} | ||
| return yaml.Marshal(configBundle.TalosConfig()) |
There was a problem hiding this comment.
The talosconfig is currently generated with a hardcoded placeholder endpoint (https://192.168.0.1:6443). Since the interactive wizard collects the actual cluster endpoint, it should be used here. Additionally, the Endpoints field in the generated config should ideally be updated to match the provided endpoint's hostname instead of being hardcoded to 127.0.0.1.
endpoint := opts.Endpoint
if endpoint == "" {
endpoint = "https://192.168.0.1:6443"
}
configBundle, err := gen.GenerateConfigBundle(genOptions, opts.ClusterName, endpoint, "", []string{}, []string{}, []string{})
if err != nil {
return nil, err
}
configBundle.TalosConfig().Contexts[opts.ClusterName].Endpoints = []string{"127.0.0.1"}
return yaml.Marshal(configBundle.TalosConfig())There was a problem hiding this comment.
Done in 6583d42 — generator now uses opts.Endpoint when provided and sets Contexts[...].Endpoints from url.Parse(endpoint).Hostname() instead of 127.0.0.1.
pkg/commands/interactive_init.go
Outdated
| if err := GenerateProject(GenerateOptions{ | ||
| RootDir: Config.RootDir, | ||
| Preset: result.Preset, | ||
| ClusterName: result.ClusterName, | ||
| TalosVersion: Config.TemplateOptions.TalosVersion, | ||
| Force: false, | ||
| Version: Config.InitOptions.Version, | ||
| ValuesOverrides: overrides, | ||
| }); err != nil { |
There was a problem hiding this comment.
The cluster endpoint collected by the wizard should be passed to GenerateProject so it can be correctly set in the generated talosconfig file.
| if err := GenerateProject(GenerateOptions{ | |
| RootDir: Config.RootDir, | |
| Preset: result.Preset, | |
| ClusterName: result.ClusterName, | |
| TalosVersion: Config.TemplateOptions.TalosVersion, | |
| Force: false, | |
| Version: Config.InitOptions.Version, | |
| ValuesOverrides: overrides, | |
| }); err != nil { | |
| if err := GenerateProject(GenerateOptions{ | |
| RootDir: Config.RootDir, | |
| Preset: result.Preset, | |
| ClusterName: result.ClusterName, | |
| TalosVersion: Config.TemplateOptions.TalosVersion, | |
| Force: false, | |
| Version: Config.InitOptions.Version, | |
| ValuesOverrides: overrides, | |
| Endpoint: result.Endpoint, | |
| }); err != nil { |
pkg/wizard/scan/scanner.go
Outdated
| specBytes, err := yaml.Marshal(specRaw) | ||
| if err != nil { | ||
| return nil | ||
| } | ||
|
|
||
| var specMap map[string]interface{} | ||
| if err := yaml.Unmarshal(specBytes, &specMap); err != nil { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Marshaling specRaw to YAML and then unmarshaling it back into a map is inefficient. Since specRaw is already an interface{} obtained from resource.MarshalYAML, you should attempt a direct type assertion to map[string]interface{} or use a more efficient conversion method if the underlying type is a struct.
| specBytes, err := yaml.Marshal(specRaw) | |
| if err != nil { | |
| return nil | |
| } | |
| var specMap map[string]interface{} | |
| if err := yaml.Unmarshal(specBytes, &specMap); err != nil { | |
| return nil | |
| } | |
| specMap, ok := specRaw.(map[string]interface{}) | |
| if !ok { | |
| return nil | |
| } |
There was a problem hiding this comment.
Done in 6583d42 — extracted via direct type assertion through a shared specMapFromResource helper.
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/commands/init.go (1)
267-319:⚠️ Potential issue | 🟠 MajorRegeneration can leave encrypted artifacts out of sync with the plaintext.
This post-generation path only re-encrypts when the
*.encryptedsibling is missing. IfGenerateProjectrewritessecrets.yaml,talosconfig, or the kubeconfig while an encrypted copy already exists, the encrypted artifact stays stale and no longer matches what was just generated.handleTalosconfigEncryption(false)has the same problem fortalosconfig.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/commands/init.go` around lines 267 - 319, Post-generation re-encryption only runs when the "*.encrypted" sibling is missing, leaving encrypted artifacts stale if the plaintext was regenerated; change the logic for secrets, talosconfig (handleTalosconfigEncryption), and kubeconfig to re-encrypt whenever the plaintext is newer or differs from the encrypted file (e.g., compare modification times or checksums) instead of only when the encrypted file is absent, so replace the current existence-only checks around secretsFile/encryptedSecretsFile, handleTalosconfigEncryption(false), and kubeconfigFile/encryptedKubeconfigFile with a freshness/diff check and call age.EncryptSecretsFile / handleTalosconfigEncryption / age.EncryptYAMLFile when plaintext has changed.
🧹 Nitpick comments (1)
pkg/wizard/nodefile_test.go (1)
190-228: Add regression tests for normalized-name collisions and unknown roles.Please add cases for:
"cp-1"+"../cp-1"should error (same sanitized filename),- role like
"master"should return an error (no silent worker fallback).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/wizard/nodefile_test.go` around lines 190 - 228, Add two new tests in pkg/wizard/nodefile_test.go covering normalized-name collisions and unknown roles: (1) add TestWriteNodeFiles_NormalizedCollision which calls WriteNodeFiles with nodes having Hostname "cp-1" and "../cp-1" and asserts an error (to catch sanitized filename collision), and (2) add TestWriteNodeFiles_UnknownRole which calls WriteNodeFiles with a node whose Role is "master" and asserts an error (to avoid silently treating unknown roles as worker). Reference the existing test patterns (e.g., TestWriteNodeFiles_DuplicateHostnames, TestWriteNodeFiles_InvalidHostname) for setup using t.TempDir() and error assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/commands/init.go`:
- Around line 593-603: GenerateProject currently only rejects empty
opts.ClusterName; add the same cluster-name validation used by the interactive
wizard before proceeding so the non-interactive path enforces identical rules.
Locate GenerateProject and validate opts.ClusterName with the same helper/logic
the wizard uses (or extract that logic into a shared validateClusterName
function) and return a clear error when the name fails (same style as the
existing fmt.Errorf usage). Ensure you reference opts.ClusterName in the check
and keep the subsequent preset validation (isValidPreset/availablePresets)
unchanged.
In `@pkg/commands/interactive_init.go`:
- Around line 64-71: The encryption warning is printed inside generateFn while
the alternate screen (created by tea.NewProgram / model from tui.New) is still
active, so move the note emission to after p.Run() returns: remove or stop
printing the "Secrets are not encrypted..." line from generateFn and instead,
immediately after finalModel, err := p.Run() completes and before returning from
interactive_init.go, check the same condition and print the warning to os.Stderr
so it appears on the restored main terminal buffer; reference generateFn,
tui.New, tea.NewProgram, and p.Run to locate and update the code.
In `@pkg/wizard/nodefile.go`:
- Around line 79-87: templateForRole currently silently maps unknown roles to
the worker template which hides invalid input; change templateForRole to return
an explicit error for unknown roles (e.g., func templateForRole(role string)
(string, error)) instead of defaulting to "templates/worker.yaml", return the
correct template for "controlplane" and "worker" and return a descriptive error
for any other role, then update all callers of templateForRole (where node
artifacts are created) to handle the error and fail fast (propagate or surface
the error) so invalid node roles are rejected rather than mapped.
- Around line 22-40: Duplicate detection currently uses raw node.Hostname before
sanitization so inputs that normalize to the same filename (e.g., "cp-1" and
"../cp-1") collide; change the logic to compute safeName :=
filepath.Base(node.Hostname), validate it via ValidateHostname(safeName), then
use a seen map keyed by safeName to detect duplicates and return an error like
"duplicate hostname after sanitization: %q" when a duplicate is found; ensure
the duplicate-check and seen[safeName]=true are done after the
sanitization/validation and before constructing filePath
(filepath.Join(nodesDir, safeName+".yaml")).
In `@pkg/wizard/scan/scanner.go`:
- Around line 176-179: The warning branch inside the
helpers.ForEachResource(...) call is printing directly to the terminal with
fmt.Fprintf(os.Stderr,...); replace that write with the wizard/TUI warning path
so the message appears in the node-selection warning block instead of smearing
the Bubble Tea screen. Remove the os.Stderr fprintf and instead emit a formatted
warning string (including err) through the package's existing warning API used
by the wizard (search for and call the same function or append to the same
warnings aggregator used elsewhere in pkg/wizard, e.g. the method that populates
the node-selection warning block) so the message is surfaced via the TUI; keep
this change local to the error branch after helpers.ForEachResource and leave
callbackRD and callbackResource unchanged.
In `@pkg/wizard/scan/tcpscan_test.go`:
- Around line 82-90: The test is racy because it closes an ephemeral port then
immediately probes it; another process can bind in the gap. Fix by selecting a
deterministic "closed" port before asserting the negative case: implement a
small loop that uses net.Listen("tcp", "127.0.0.1:0") to obtain a candidate port
(closedPort), closes the listener, then verifies the port is actually closed by
attempting a quick net.Dial("tcp", fmt.Sprintf("127.0.0.1:%d", closedPort)); if
Dial succeeds, pick another candidate and retry; only call scanTCPPort(ctx,
"127.0.0.1/32", closedPort, 1) once you have confirmed the port refuses
connections, so the negative assertion with scanTCPPort is deterministic.
In `@pkg/wizard/scan/tcpscan.go`:
- Around line 16-80: scanTCPPort currently spawns one goroutine per host
(causing huge spikes for /16) instead of using a bounded worker pool; change
scanTCPPort to create a fixed number of worker goroutines equal to maxWorkers
that read IPs from a jobs channel (push each host into the jobs channel, close
it when done), have each worker perform the DialContext logic, append results
under mu, and signal completion with wg; alternatively, enforce a stricter CIDR
limit in enumerateHosts (e.g., reject ranges larger than /24 or compute a
maxHosts cap based on maxWorkers and dialTimeout) so the scan budget is
realistic—update enumerateHosts and scanTCPPort together and keep symbols
scanTCPPort, enumerateHosts, sem, and results consistent.
In `@pkg/wizard/tui/model.go`:
- Around line 197-206: The scan handler is not resetting per-scan state, so old
selectedNodes, cursor and scanWarnings can persist across discovery attempts and
lead to out-of-range indexes or wrong preselection; update the code paths that
set m.discoveredNodes and m.scanWarnings (e.g. the block that sets
m.discoveredNodes = msg.nodes and the other occurrences around the commented
ranges) to also reset m.selectedNodes = nil (or empty slice), m.cursor = 0 (or
appropriate initial index) and m.scanWarnings = msg.warnings (or clear before
assigning) before changing m.step (e.g. stepSelectNodes/stepError), ensuring
previous selection and cursor state cannot leak into the new discovery session.
- Around line 289-307: Backtracking currently drops the saved NodeConfig entries
(m.configuredNodes) and calls prepareNodeInputs() which rebuilds inputs from
discovery defaults; instead, when decrementing m.currentNodeIdx in the back
action and in the stepConfirm branch, stop removing the last entry from
m.configuredNodes and restore the UI inputs from the existing
m.configuredNodes[m.currentNodeIdx] (or the new index after decrement) so edits
(disk, interface, address, gateway, DNS) are preserved; update the logic around
m.currentNodeIdx, the slice mutation of m.configuredNodes, and
prepareNodeInputs() invocation so prepareNodeInputs is only used for new nodes
while editing an existing node pulls values from the stored NodeConfig.
- Around line 209-220: The error handlers for scanErrorMsg (and the similar
generateErrorMsg handler) record prevStep after the model has already been set
to stepScanning/stepGenerating, which causes Esc from stepError to return to a
spinner with no running command; fix by capturing the current step into prev
(e.g. prev := m.step and m.prevStep = &prev) before you cancel the operation or
mutate m.step, then proceed to cancel (cancelScan/cancelGenerate), set m.err and
finally set m.step = stepError so prevStep points to the real prior state rather
than the spinner state.
In `@pkg/wizard/tui/views.go`:
- Around line 242-248: The success view in Model.viewDone doesn't tell users how
to exit even though updateDone() quits on Enter or 'q'; update the viewDone()
string to append a clear exit hint like "Press Enter or 'q' to exit." (or
similar) so users know how to close the success screen; modify the
Model.viewDone() return value to include that instruction and ensure the text
matches the keys handled in updateDone().
In `@pkg/wizard/validator.go`:
- Around line 67-75: The URL validation currently only checks u.Host != "" which
allows inputs like "https://:6443"; update the validator in
pkg/wizard/validator.go (the code that parses into variable u) to explicitly
validate the hostname by ensuring u.Hostname() is not empty (or otherwise a
valid hostname/IP) and return an error when it is empty; keep the existing
scheme and port checks (u.Scheme and u.Port()) but replace or augment the u.Host
check with a hostname-specific check so endpoints like https://:6443 are
rejected.
---
Outside diff comments:
In `@pkg/commands/init.go`:
- Around line 267-319: Post-generation re-encryption only runs when the
"*.encrypted" sibling is missing, leaving encrypted artifacts stale if the
plaintext was regenerated; change the logic for secrets, talosconfig
(handleTalosconfigEncryption), and kubeconfig to re-encrypt whenever the
plaintext is newer or differs from the encrypted file (e.g., compare
modification times or checksums) instead of only when the encrypted file is
absent, so replace the current existence-only checks around
secretsFile/encryptedSecretsFile, handleTalosconfigEncryption(false), and
kubeconfigFile/encryptedKubeconfigFile with a freshness/diff check and call
age.EncryptSecretsFile / handleTalosconfigEncryption / age.EncryptYAMLFile when
plaintext has changed.
---
Nitpick comments:
In `@pkg/wizard/nodefile_test.go`:
- Around line 190-228: Add two new tests in pkg/wizard/nodefile_test.go covering
normalized-name collisions and unknown roles: (1) add
TestWriteNodeFiles_NormalizedCollision which calls WriteNodeFiles with nodes
having Hostname "cp-1" and "../cp-1" and asserts an error (to catch sanitized
filename collision), and (2) add TestWriteNodeFiles_UnknownRole which calls
WriteNodeFiles with a node whose Role is "master" and asserts an error (to avoid
silently treating unknown roles as worker). Reference the existing test patterns
(e.g., TestWriteNodeFiles_DuplicateHostnames,
TestWriteNodeFiles_InvalidHostname) for setup using t.TempDir() and error
assertions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e16c77e1-ab90-4164-9c97-462842693da6
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (22)
.gitignoreREADME.mdgo.modpkg/commands/init.gopkg/commands/init_test.gopkg/commands/interactive_init.gopkg/wizard/interfaces.gopkg/wizard/nodefile.gopkg/wizard/nodefile_test.gopkg/wizard/scan/extract.gopkg/wizard/scan/extract_test.gopkg/wizard/scan/scanner.gopkg/wizard/scan/scanner_test.gopkg/wizard/scan/tcpscan.gopkg/wizard/scan/tcpscan_test.gopkg/wizard/tui/model.gopkg/wizard/tui/model_test.gopkg/wizard/tui/styles.gopkg/wizard/tui/views.gopkg/wizard/types.gopkg/wizard/validator.gopkg/wizard/validator_test.go
Review: interactive mode1. Broken initialization modelThe wizard is a standalone 2. Configure pages lack node-derived dataThe scanner (
This largely defeats the purpose of network scanning — it discovers which nodes exist but doesn't carry their network config into the form. 3. No smart inputs on configure pagesAll 7 fields per node are plain text inputs. No dropdowns for:
4. Missing functionality — cannot add interfacesEach node config supports exactly one interface ( Note: VIP is added via 5. Final page formattingThe confirm page ( With long values or many nodes this becomes hard to read. No alignment, no grouping. 6. Templating broken on non-standard setupsThe wizard generates node stubs with modelines where |
Reject https://:6443 which passed validation before — url.Parse fills Host with ":6443" but Hostname() returns empty. Also tell users how to exit the success screen instead of having them guess the key. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Covers: scanner refactor (direct type assertion instead of yaml round-trip, address + route collection, warnings routed through the TUI); bounded worker pool in tcpscan; deterministic closed-port helper in tests; hostname-sanitized dedup and typed errors for unknown node roles; cluster-name validation in GenerateProject; endpoint passthrough to talosconfig; encryption warning printed after the alt-screen is restored; TUI state reset on rescan; prevStep pointing at actionable steps after async errors; node config preserved across back-navigation; role field as toggle; DNS no longer prefilled; optional management-IP field for DNAT setups; reformatted confirm page; NewForExistingProject lets the wizard reuse an already-initialized project. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
|
Re the outside-diff comment on |
|
Thanks for the detailed review. All six points are addressed in 6583d42.
|
Add
talm interactivecommand — a bubbletea-based TUI wizard that guidesusers through Talos cluster initialization without requiring manual flag
construction.
What it does
The wizard collects cluster configuration through an interactive flow:
(or enter IPs manually with Ctrl+S)
Generated output:
secrets.yaml,talosconfig— via Talos machineryChart.yaml,values.yaml— with user-provided endpoint/subnets merged intemplates/— preset templates (controlplane, worker)nodes/<hostname>.yaml— stub files with modelines fortalm templateArchitecture
pkg/wizard/— domain types, validators, interfaces, node file generationpkg/wizard/scan/— TCP port scanner + Talos gRPC client for hardwarediscovery (no external binaries — pure Go)
pkg/wizard/tui/— bubbletea state machine (12 steps), views, stylespkg/commands/— extractedGenerateProject()reused by bothinitand
interactivecommandsKey design decisions
replaced talosctl with direct Talos gRPC client
initCmd.RunEnow delegates toGenerateProject()wizard.Scannerenables mocking in TUI testsValuesOverridesreplaces top-level keys only,rejects nested map overrides to prevent silent data loss
Testing
154 tests across all new packages covering:
extraction from protobuf, TUI state machine transitions, back navigation,
stale scan result handling, node file generation with path traversal
protection, values override merging, idempotent project generation
Summary by CodeRabbit
New Features
talm interactive) with network discovery, node detection, and guided configuration through a terminal UI.Documentation
Dependencies