Skip to content

feat(sdist): dereference symlinks by default#1265

Open
Doekin wants to merge 5 commits intoscikit-build:mainfrom
Doekin:fix-sdist-symlinks
Open

feat(sdist): dereference symlinks by default#1265
Doekin wants to merge 5 commits intoscikit-build:mainfrom
Doekin:fix-sdist-symlinks

Conversation

@Doekin
Copy link
Copy Markdown

@Doekin Doekin commented Apr 15, 2026

Dereference symlinks by default to ensure sdists are completely self-contained and universally usable out of their source directories.

Closes #801

raymondqin added 3 commits April 15, 2026 13:09
Dereference symlinks by default to ensure sdists are completely self-contained and universally usable out of their source directories.
Copy link
Copy Markdown
Collaborator

@LecrisUT LecrisUT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't know about the default being off. For Fedora packaging for example we would want to keep the symlinks as-is. It's a tough thing because there would be a common sdist for all consumers.

Hmm, doesn't the symlink dereferencing happen automatically on the archive creation/extraction? I remember having had a similar question before

Comment thread src/scikit_build_core/settings/skbuild_model.py Outdated
Comment thread tests/test_pep517_sdist_symlink.py Outdated
Comment thread tests/test_pep517_sdist_symlink.py Outdated
Comment thread src/scikit_build_core/settings/skbuild_model.py Outdated
- Rename setting from 'dereference' to 'resolve_symlinks' (TOML: 'resolve-symlinks')
  per reviewer suggestion
- Shorten setting description to a single line
- Extract can_symlink as a reusable fixture
- Combine symlink tests into a single parametrized test
- Fix config_settings key to use hyphen (resolve-symlinks) not underscore
- Regenerate README.md, docs/reference/configs.md, and schema
@Doekin
Copy link
Copy Markdown
Author

Doekin commented Apr 15, 2026

Good point about Fedora packaging — keeping symlinks as-is makes sense for OS packagers.

Regarding automatic dereferencing: tarfile.open() in Python does not automatically dereference symlinks — it stores them as symlink entries in the archive. When extracted on a system where the symlink target doesn't exist (e.g., a different machine), the extract will create a dangling symlink. That's why we dereference by default: to ensure the sdist is self-contained and usable across machines. Users like Fedora who want to preserve symlinks can set sdist.resolve-symlinks = false.

@LecrisUT
Copy link
Copy Markdown
Collaborator

Good point about Fedora packaging — keeping symlinks as-is makes sense for OS packagers.

It is not a hard rule because we can always switch to using the git repo. It's just something that we need to mull over a bit.

When extracted on a system where the symlink target doesn't exist (e.g., a different machine), the extract will create a dangling symlink.

And how is the elephant in the room behaving (microsoft). If the symlink was created beforehand it cannot extract it?

Dangling symlinks are a different issue. IMO no symlink pointing outside of the sdist root should survive.

Users like Fedora who want to preserve symlinks can set sdist.resolve-symlinks = false.

I was thinking about that, but at that point it's too late isn't it. The main thing is that these sdist are created once and need to be satisfactory for both cases which is where the conundrum comes from. I.e. Fedora packaging, conda, windows users, all use the same sdist. For Fedora packaging it is not really a huge deal because we can switch to git repos.

@LecrisUT LecrisUT requested a review from henryiii April 15, 2026 16:22
@Doekin
Copy link
Copy Markdown
Author

Doekin commented Apr 15, 2026

Thanks for the continued discussion!

I'm not entirely sure I follow the Fedora packaging specifics — would love to understand if there's a concrete scenario where resolve-symlinks = true would break things there.

My understanding of the tension is:

  • Preserving symlinks makes the sdist potentially broken on Windows and for users where the link target isn't present
  • Resolving all symlinks loses information that may matter for some consumers

The resolve-symlinks option at least makes the behavior explicit and opt-outable. But I agree the real tricky part is that the sdist producer has to make this call for all consumers upfront.

Happy to revisit the default or add finer-grained control (e.g. only resolve symlinks pointing outside the sdist root) if you think that's worth doing.

@LecrisUT
Copy link
Copy Markdown
Collaborator

I'm not entirely sure I follow the Fedora packaging specifics — would love to understand if there's a concrete scenario where resolve-symlinks = true would break things there

Sure. The issue is that Fedora packaging consumes the sdist as-is. It does not create it. That's why it is at the mercy of the least common denominator.

The resolve-symlinks option at least makes the behavior explicit and opt-outable. But I agree the real tricky part is that the sdist producer has to make this call for all consumers upfront.

Yes indeed that.

Happy to revisit the default

I think it's fine in the current form, as mentioned there are ways out for us if needed, and the user can specify if symlinks are important, so no blocker on my side on this.

(e.g. only resolve symlinks pointing outside the sdist root) if you think that's worth doing.

This is kind of a different issue that should be done regardless. We might already be doing it, but I lost track of the sdist building process. It is definitely important because there are plenty of cases where the full source that should be in sdist is in the parent level that needs to be moved into the sdist somehow.

@Doekin
Copy link
Copy Markdown
Author

Doekin commented Apr 15, 2026

Thanks for clarifying the Fedora case — now I understand, the packager there is consuming the sdist from PyPI, not creating it, so they have no control over the resolve-symlinks setting. That makes the trade-off much clearer.

On the "resolve only external symlinks" point: with dereference=True (the current default), tarfile actually already handles this case correctly — it follows the link and copies the target's content into the archive, regardless of whether the target is inside or outside the sdist root. So a symlink pointing to a parent directory would have its content included in the sdist. With dereference=False, that same symlink would be stored as-is and become dangling after extraction.

So the current behavior with resolve-symlinks = true already "does the right thing" for external symlinks. A future enhancement could be smarter — preserve internal symlinks while only dereferencing external ones — but that's a separate issue as you said.

@henryiii
Copy link
Copy Markdown
Collaborator

I'd like two small changes. I can implement them later today unless you want to.

All the configuration options get tested in a file, I'd like to add it there too.

I'm nearly sure this breaks our reproducible SDist guarantee, so I think it needs to become version dependent, only activating if a user does not set a minimum version or since the minimum version at or above the next release.

@Doekin
Copy link
Copy Markdown
Author

Doekin commented Apr 17, 2026

Done! Added it to the settings tests and made it version-gated (defaults to true only for minimum-version >= 0.13).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes scikit-build-core’s SDist builder to dereference symlinks by default so generated sdists are self-contained (closing #801), while preserving backward compatibility for projects that pin minimum-version < 0.13.

Changes:

  • Add new sdist.resolve-symlinks setting (defaulting to true, but false when minimum-version < 0.13).
  • Enable tarfile dereferencing during SDist creation based on that setting.
  • Add tests and documentation covering the new behavior and its backcompat rules.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/scikit_build_core/build/sdist.py Passes dereference=... to the tar writer to copy symlink targets into the sdist by default.
src/scikit_build_core/settings/skbuild_model.py Introduces the sdist.resolve_symlinks setting in the settings model.
src/scikit_build_core/settings/skbuild_read_settings.py Applies default/backcompat logic for resolve_symlinks based on minimum-version.
src/scikit_build_core/resources/scikit-build.schema.json Adds the sdist.resolve-symlinks config schema entry.
tests/test_skbuild_settings.py Extends default/backcompat settings tests for the new option.
tests/test_pep517_sdist_symlink.py Adds an integration-style test asserting tar member type changes with dereferencing on/off.
docs/reference/configs.md Documents sdist.resolve-symlinks behavior and versioning.
README.md Adds the new config option to the example configuration snippet.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/scikit_build_core/build/sdist.py
Comment thread tests/test_skbuild_settings.py
)

settings_reader = SettingsReader.from_file(pyproject_toml, {})
assert not settings_reader.settings.sdist.resolve_symlinks
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert not ...resolve_symlinks will also pass if the value is None, so this test wouldn’t catch a regression where the setting isn’t resolved. Please assert the explicit boolean value here (e.g., is False) so the backcompat behavior is actually verified.

Suggested change
assert not settings_reader.settings.sdist.resolve_symlinks
assert settings_reader.settings.sdist.resolve_symlinks is False

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +21
def can_symlink() -> None:
"""Skip the test if symlinks are not supported on this OS."""
try:
Path("_symlink_check").symlink_to("_symlink_check_target")
except OSError:
pytest.skip(
"Creating symlinks is not supported/allowed on this OS without privileges"
)
else:
Path("_symlink_check").unlink(missing_ok=True)
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can_symlink creates and deletes a fixed-name path in the current working directory (_symlink_check). Since CI runs tests with pytest -n auto (xdist), this can race with other workers if they share the same CWD at fixture setup time. Consider creating the check symlink in a unique temp directory (e.g., via tmp_path_factory.mktemp(...)) to avoid cross-test interference.

Suggested change
def can_symlink() -> None:
"""Skip the test if symlinks are not supported on this OS."""
try:
Path("_symlink_check").symlink_to("_symlink_check_target")
except OSError:
pytest.skip(
"Creating symlinks is not supported/allowed on this OS without privileges"
)
else:
Path("_symlink_check").unlink(missing_ok=True)
def can_symlink(tmp_path_factory) -> None:
"""Skip the test if symlinks are not supported on this OS."""
check_dir = tmp_path_factory.mktemp("symlink_check")
check_link = check_dir / "_symlink_check"
try:
check_link.symlink_to("_symlink_check_target")
except OSError:
pytest.skip(
"Creating symlinks is not supported/allowed on this OS without privileges"
)
else:
check_link.unlink(missing_ok=True)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this also could be cached, vs. being computed each time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

When building sdist, file symlink remains a symlink instead of being copied to a concrete file

4 participants