Skip to content

[release-1.17] Add CI check for invalid characters in file paths#9690

Open
kaovilai wants to merge 1 commit intovelero-io:release-1.17from
kaovilai:pr-filepath-check-release-1.17
Open

[release-1.17] Add CI check for invalid characters in file paths#9690
kaovilai wants to merge 1 commit intovelero-io:release-1.17from
kaovilai:pr-filepath-check-release-1.17

Conversation

@kaovilai
Copy link
Copy Markdown
Collaborator

@kaovilai kaovilai commented Apr 8, 2026

Cherry-pick of #9553 to release-1.17.

Go's module zip rejects filenames containing certain characters (shell special chars, path separators, and non-letter Unicode such as control/format characters). This adds a GitHub Actions workflow that validates all tracked file paths on every PR to catch these issues before they reach downstream consumers.

No filename fixes needed on this branch.

Note

Responses generated with Claude

Go's module zip rejects filenames containing certain characters (shell
special chars like " ' * < > ? ` |, path separators : \, and non-letter
Unicode such as control/format characters). This caused a build failure
when a changelog file contained an invisible U+200E LEFT-TO-RIGHT MARK
(see PR velero-io#9552).

Add a GitHub Actions workflow that validates all tracked file paths on
every PR to catch these issues before they reach downstream consumers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
(cherry picked from commit 6d18d9b)
Copilot AI review requested due to automatic review settings April 8, 2026 21:19
@kaovilai kaovilai changed the title Add CI check for invalid characters in file paths [release-1.17] Add CI check for invalid characters in file paths Apr 8, 2026
@github-actions github-actions bot requested review from reasonerjt and ywk253100 April 8, 2026 21:24
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new GitHub Actions workflow that validates all tracked file paths on every pull request to prevent filenames that would be rejected by Go module zip rules (e.g., invisible format characters like U+200E) from reaching downstream consumers.

Changes:

  • Introduce a PR workflow that scans git ls-files output and fails the job when disallowed characters are found.
  • Emit actionable error output and suggested rename commands for problematic paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +63 to +68
clean = '/'.join(
''.join(c for c in elem if is_ok(c))
for elem in name.split('/')
)
print(f'::error file={name}::File \"{name}\" contains invalid char {char_desc}')
bad_files.append((name, clean, char_desc))
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow command ::error file={name}::... embeds the raw filename in a GitHub Actions command. Since % is allowed by your checker, a filename containing sequences like %0A could be interpreted as a newline when the runner decodes workflow commands (command injection / log corruption). Escape %, \r, and \n per the workflow command spec (or avoid using workflow commands for untrusted data and print a plain error instead).

Copilot uses AI. Check for mistakes.
Comment on lines +55 to +70
# Check each path element (split on /)
for element in name.split('/'):
for ch in element:
if not is_ok(ch):
cp = ord(ch)
char_name = unicodedata.name(ch, f'U+{cp:04X}')
char_desc = f'U+{cp:04X} ({char_name})'
# Build cleaned path by stripping invalid chars
clean = '/'.join(
''.join(c for c in elem if is_ok(c))
for elem in name.split('/')
)
print(f'::error file={name}::File \"{name}\" contains invalid char {char_desc}')
bad_files.append((name, clean, char_desc))
break

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once an invalid character is found, the script only breaks out of the innermost character loop, so the same path can be reported multiple times (e.g., invalid chars in multiple path elements). Consider tracking a per-file "invalid" flag and breaking out of the element loop (or continue to the next file) after the first finding to keep output/action annotations deterministic.

Copilot uses AI. Check for mistakes.
Comment on lines +62 to +85
# Build cleaned path by stripping invalid chars
clean = '/'.join(
''.join(c for c in elem if is_ok(c))
for elem in name.split('/')
)
print(f'::error file={name}::File \"{name}\" contains invalid char {char_desc}')
bad_files.append((name, clean, char_desc))
break

if bad_files:
print()
print('The following files have characters that are invalid in Go module zip archives:')
print()
for original, clean, desc in bad_files:
print(f' {original} — {desc}')
print()
print('To fix, rename the files to remove the problematic characters:')
print()
for original, clean, desc in bad_files:
if clean:
print(f' mv \"{original}\" \"{clean}\" && git add \"{clean}\"')
print(f' # or: git mv \"{original}\" \"{clean}\"')
else:
print(f' # {original} — cannot auto-suggest rename (non-UTF-8)')
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The auto-suggested clean rename is built by stripping invalid characters from each path element, but it can produce an empty path element (or even an empty path) if the name is comprised entirely of invalid chars. In that case the suggested mv/git mv commands will be invalid; please guard for empty elements/paths and fall back to a message that no safe auto-suggestion is available.

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +36
# Characters explicitly rejected by Go's fileNameOK
# (path separators / and \ are inherent to paths so we check per-element)
bad_ascii = set('\"' + \"'\" + '*<>?\`|:')

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad_ascii is defined but never used, which adds noise and suggests there is an additional check that isn't actually performed. Consider removing it, or use it to make the intent of the ASCII checks clearer (e.g., explicitly reject those chars and keep allowed_ascii focused on allow-list behavior).

Suggested change
# Characters explicitly rejected by Go's fileNameOK
# (path separators / and \ are inherent to paths so we check per-element)
bad_ascii = set('\"' + \"'\" + '*<>?\`|:')
# ASCII characters allowed by Go's fileNameOK in addition to letters
# and digits. Path separators / and \ are inherent to paths, so we
# validate each path element separately below.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants