fix(feature): avoid IndexError when parsing a dtype with a trailing dot#3731
Open
devteamaegis wants to merge 1 commit into
Open
fix(feature): avoid IndexError when parsing a dtype with a trailing dot#3731devteamaegis wants to merge 1 commit into
devteamaegis wants to merge 1 commit into
Conversation
parse_nested_brackets crashed with 'IndexError: string index out of
range' on inputs like 'bionty.' (e.g. via parse_dtype('cat[bionty.]'))
because it indexed parts[1][0] without checking the segment was
non-empty. Guard the access so malformed dtypes raise a clear
ValidationError instead.
Member
|
Thank you very much for the contribution! Will merge a version of this. Given you ran into this: Have you considered passing Python objects instead of strings? |
Member
|
I think what we should probably do is reason through a few more cases in which strings that a user passes could raise opaque errors. This fix is good but it also feels a bit ad hoc; also testing strategy. Thank you anyway, we'll mull over this for a bit more. |
e3fdb16 to
c94ed3f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's broken
parse_nested_bracketscrashes withIndexError: string index out of rangeon any dtype segment ending in a dot, e.g."bionty.". It's reachable from the publicparse_dtypevia a malformed categorical dtype:Why it happens
After
dtype_str.split("."), the code checkslen(parts) == 2but then readsparts[1][0]without checking thatparts[1]is non-empty — a trailing dot makes it"".Fix
Guard
parts[1] != ""before indexing. The malformed input now falls through to the bare-registry branch, soparse_dtype("cat[bionty.]")raises a clearValidationError("invalid dtype") instead of an opaqueIndexError.Test
test_parse_nested_brackets_trailing_dotchecks"bionty."no longer crashes andparse_dtype("cat[bionty.]")raisesValidationError.