Skip to content

fix: exclude wide characters that would exceed endColumn#44

Closed
RyogaK wants to merge 1 commit intochalk:mainfrom
RyogaK:fix/wide-char-end-boundary
Closed

fix: exclude wide characters that would exceed endColumn#44
RyogaK wants to merge 1 commit intochalk:mainfrom
RyogaK:fix/wide-char-end-boundary

Conversation

@RyogaK
Copy link
Copy Markdown

@RyogaK RyogaK commented Apr 3, 2026

Summary

Fixes #43

When endColumn falls in the middle of a wide character (CJK or emoji), sliceAnsi currently includes the character in full, causing the result to exceed the requested column range. This change excludes such characters so that stringWidth(result) <= endColumn - startColumn is always guaranteed.

Before

sliceAnsi('あいう', 0, 1) → 'あ' (width=2, exceeds end=1)
sliceAnsi('あいう', 0, 3) → 'あい' (width=4, exceeds end=3)

After

sliceAnsi('あいう', 0, 1) → '' (width=0 ≤ 1)
sliceAnsi('あいう', 0, 3) → 'あ' (width=2 ≤ 3)

Implementation

A single condition added to the isPastEnd check in the main loop: a non-continuation character token whose position + visibleWidth exceeds end is treated as past end. The rest of the existing logic (closing active styles, etc.) works as-is.

Discussion: wide emoji graphemes

This change also applies the same rule to wide emoji graphemes (e.g. regional-indicator flags 🇮🇱), updating the existing tests that previously expected "round up" behavior. I believe this is consistent — callers that specify a column range should be able to rely on the result not exceeding it — but I'd like to get your thoughts on whether wide emoji graphemes should be treated differently from CJK wide characters.

Test plan

  • Added test case for CJK wide character boundary behavior ('あいう')
  • Updated surrogate pair test to expect exclusion when exceeding boundary
  • Updated regional-indicator flag tests for consistent boundary behavior
  • All 89 tests pass

When endColumn falls in the middle of a wide character (CJK or emoji),
the character is now excluded so that the result never exceeds the
requested column range.

Fixes #43
@RyogaK
Copy link
Copy Markdown
Author

RyogaK commented Apr 3, 2026

Another idea: we could add an option to control the behavior when the boundary falls inside a wide character, e.g.:

sliceAnsi('あいう', 0, 3);                                     // 'あい' (default, current behavior)
sliceAnsi('あいう', 0, 3, { wideCharBoundary: 'exclude' });    // 'あ'

This would provide a solution for downstream projects that have strict column-width constraints (like cli-truncate) without affecting existing projects that accept the current "round up" behavior.

Happy to implement this approach if you prefer it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sliceAnsi returns wider result than specified endColumn for wide characters

1 participant