[release-1.17] Add CI check for invalid characters in file paths#9690
[release-1.17] Add CI check for invalid characters in file paths#9690kaovilai wants to merge 1 commit intovelero-io:release-1.17from
Conversation
Go's module zip rejects filenames containing certain characters (shell special chars like " ' * < > ? ` |, path separators : \, and non-letter Unicode such as control/format characters). This caused a build failure when a changelog file contained an invisible U+200E LEFT-TO-RIGHT MARK (see PR velero-io#9552). Add a GitHub Actions workflow that validates all tracked file paths on every PR to catch these issues before they reach downstream consumers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> (cherry picked from commit 6d18d9b)
There was a problem hiding this comment.
Pull request overview
Adds a new GitHub Actions workflow that validates all tracked file paths on every pull request to prevent filenames that would be rejected by Go module zip rules (e.g., invisible format characters like U+200E) from reaching downstream consumers.
Changes:
- Introduce a PR workflow that scans
git ls-filesoutput and fails the job when disallowed characters are found. - Emit actionable error output and suggested rename commands for problematic paths.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| clean = '/'.join( | ||
| ''.join(c for c in elem if is_ok(c)) | ||
| for elem in name.split('/') | ||
| ) | ||
| print(f'::error file={name}::File \"{name}\" contains invalid char {char_desc}') | ||
| bad_files.append((name, clean, char_desc)) |
There was a problem hiding this comment.
The workflow command ::error file={name}::... embeds the raw filename in a GitHub Actions command. Since % is allowed by your checker, a filename containing sequences like %0A could be interpreted as a newline when the runner decodes workflow commands (command injection / log corruption). Escape %, \r, and \n per the workflow command spec (or avoid using workflow commands for untrusted data and print a plain error instead).
| # Check each path element (split on /) | ||
| for element in name.split('/'): | ||
| for ch in element: | ||
| if not is_ok(ch): | ||
| cp = ord(ch) | ||
| char_name = unicodedata.name(ch, f'U+{cp:04X}') | ||
| char_desc = f'U+{cp:04X} ({char_name})' | ||
| # Build cleaned path by stripping invalid chars | ||
| clean = '/'.join( | ||
| ''.join(c for c in elem if is_ok(c)) | ||
| for elem in name.split('/') | ||
| ) | ||
| print(f'::error file={name}::File \"{name}\" contains invalid char {char_desc}') | ||
| bad_files.append((name, clean, char_desc)) | ||
| break | ||
|
|
There was a problem hiding this comment.
Once an invalid character is found, the script only breaks out of the innermost character loop, so the same path can be reported multiple times (e.g., invalid chars in multiple path elements). Consider tracking a per-file "invalid" flag and breaking out of the element loop (or continue to the next file) after the first finding to keep output/action annotations deterministic.
| # Build cleaned path by stripping invalid chars | ||
| clean = '/'.join( | ||
| ''.join(c for c in elem if is_ok(c)) | ||
| for elem in name.split('/') | ||
| ) | ||
| print(f'::error file={name}::File \"{name}\" contains invalid char {char_desc}') | ||
| bad_files.append((name, clean, char_desc)) | ||
| break | ||
|
|
||
| if bad_files: | ||
| print() | ||
| print('The following files have characters that are invalid in Go module zip archives:') | ||
| print() | ||
| for original, clean, desc in bad_files: | ||
| print(f' {original} — {desc}') | ||
| print() | ||
| print('To fix, rename the files to remove the problematic characters:') | ||
| print() | ||
| for original, clean, desc in bad_files: | ||
| if clean: | ||
| print(f' mv \"{original}\" \"{clean}\" && git add \"{clean}\"') | ||
| print(f' # or: git mv \"{original}\" \"{clean}\"') | ||
| else: | ||
| print(f' # {original} — cannot auto-suggest rename (non-UTF-8)') |
There was a problem hiding this comment.
The auto-suggested clean rename is built by stripping invalid characters from each path element, but it can produce an empty path element (or even an empty path) if the name is comprised entirely of invalid chars. In that case the suggested mv/git mv commands will be invalid; please guard for empty elements/paths and fall back to a message that no safe auto-suggestion is available.
| # Characters explicitly rejected by Go's fileNameOK | ||
| # (path separators / and \ are inherent to paths so we check per-element) | ||
| bad_ascii = set('\"' + \"'\" + '*<>?\`|:') | ||
|
|
There was a problem hiding this comment.
bad_ascii is defined but never used, which adds noise and suggests there is an additional check that isn't actually performed. Consider removing it, or use it to make the intent of the ASCII checks clearer (e.g., explicitly reject those chars and keep allowed_ascii focused on allow-list behavior).
| # Characters explicitly rejected by Go's fileNameOK | |
| # (path separators / and \ are inherent to paths so we check per-element) | |
| bad_ascii = set('\"' + \"'\" + '*<>?\`|:') | |
| # ASCII characters allowed by Go's fileNameOK in addition to letters | |
| # and digits. Path separators / and \ are inherent to paths, so we | |
| # validate each path element separately below. |
Cherry-pick of #9553 to release-1.17.
Go's module zip rejects filenames containing certain characters (shell special chars, path separators, and non-letter Unicode such as control/format characters). This adds a GitHub Actions workflow that validates all tracked file paths on every PR to catch these issues before they reach downstream consumers.
No filename fixes needed on this branch.
Note
Responses generated with Claude