Skip to content

🐛 Fix dtype degradation of partial-null bool/int features#3752

Open
ishitajain9717 wants to merge 4 commits into
mainfrom
tests_fixes
Open

🐛 Fix dtype degradation of partial-null bool/int features#3752
ishitajain9717 wants to merge 4 commits into
mainfrom
tests_fixes

Conversation

@ishitajain9717

@ishitajain9717 ishitajain9717 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Fixes a ValidationError in Record.from_dataframe(...).save() when a bool/int feature has None on some rows but not others. Internally the save path rebuilt the frame with pd.DataFrame(prepared_rows), which let pandas re-infer and degrade those columns (bool → object, int → float64), failing the dtype check even for correctly-typed nullable input.

The frame is now built column-by-column using each feature's declared dtype (convert_to_pandas_dtype), so dtypes never degrade. Adds tests covering both the DataFrameCurator (accepts nullable boolean/Int64, rejects degraded object/float64) and the from_dataframe save round-trip.

Fixing : pfizer-collab/laminlabs#1009

@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 88.23529% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.50%. Comparing base (fce4357) to head (e1c610d).
⚠️ Report is 16 commits behind head on main.

Files with missing lines Patch % Lines
lamindb/models/_feature_manager.py 88.23% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3752      +/-   ##
==========================================
- Coverage   91.55%   86.50%   -5.06%     
==========================================
  Files          88       88              
  Lines       15137    15336     +199     
==========================================
- Hits        13859    13266     -593     
- Misses       1278     2070     +792     
Files with missing lines Coverage Δ
lamindb/models/_feature_manager.py 89.72% <88.23%> (-0.05%) ⬇️

... and 19 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown

Deployment URL: https://a411eaaf.lamindb.pages.dev

}
)
with pytest.raises(ln.errors.ValidationError):
ln.curators.DataFrameCurator(df_degraded, schema).validate()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a test that makes evident which error message is raised here?

# produces after the cast. We are asserting the curator accepts these dtypes.
df_good = pd.DataFrame(
{
"cur-flag": pd.array([True, None, False], dtype="boolean"),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!

@falexwolf falexwolf left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@ishitajain9717 ishitajain9717 changed the title 🧪 Adding test case for object type column 🐛 Fix dtype degradation of partial-null bool/int features in Record.from_dataframe() Jun 23, 2026
@ishitajain9717 ishitajain9717 changed the title 🐛 Fix dtype degradation of partial-null bool/int features in Record.from_dataframe() 🐛 Fix dtype degradation of partial-null bool/int features Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants