feat(sdist): dereference symlinks by default#1265
feat(sdist): dereference symlinks by default#1265Doekin wants to merge 5 commits intoscikit-build:mainfrom
Conversation
Dereference symlinks by default to ensure sdists are completely self-contained and universally usable out of their source directories.
LecrisUT
left a comment
There was a problem hiding this comment.
Hmm, I don't know about the default being off. For Fedora packaging for example we would want to keep the symlinks as-is. It's a tough thing because there would be a common sdist for all consumers.
Hmm, doesn't the symlink dereferencing happen automatically on the archive creation/extraction? I remember having had a similar question before
- Rename setting from 'dereference' to 'resolve_symlinks' (TOML: 'resolve-symlinks') per reviewer suggestion - Shorten setting description to a single line - Extract can_symlink as a reusable fixture - Combine symlink tests into a single parametrized test - Fix config_settings key to use hyphen (resolve-symlinks) not underscore - Regenerate README.md, docs/reference/configs.md, and schema
|
Good point about Fedora packaging — keeping symlinks as-is makes sense for OS packagers. Regarding automatic dereferencing: |
It is not a hard rule because we can always switch to using the git repo. It's just something that we need to mull over a bit.
And how is the elephant in the room behaving (microsoft). If the symlink was created beforehand it cannot extract it? Dangling symlinks are a different issue. IMO no symlink pointing outside of the sdist root should survive.
I was thinking about that, but at that point it's too late isn't it. The main thing is that these sdist are created once and need to be satisfactory for both cases which is where the conundrum comes from. I.e. Fedora packaging, conda, windows users, all use the same sdist. For Fedora packaging it is not really a huge deal because we can switch to git repos. |
|
Thanks for the continued discussion! I'm not entirely sure I follow the Fedora packaging specifics — would love to understand if there's a concrete scenario where My understanding of the tension is:
The Happy to revisit the default or add finer-grained control (e.g. only resolve symlinks pointing outside the sdist root) if you think that's worth doing. |
Sure. The issue is that Fedora packaging consumes the sdist as-is. It does not create it. That's why it is at the mercy of the least common denominator.
Yes indeed that.
I think it's fine in the current form, as mentioned there are ways out for us if needed, and the user can specify if symlinks are important, so no blocker on my side on this.
This is kind of a different issue that should be done regardless. We might already be doing it, but I lost track of the sdist building process. It is definitely important because there are plenty of cases where the full source that should be in sdist is in the parent level that needs to be moved into the sdist somehow. |
|
Thanks for clarifying the Fedora case — now I understand, the packager there is consuming the sdist from PyPI, not creating it, so they have no control over the On the "resolve only external symlinks" point: with So the current behavior with |
|
I'd like two small changes. I can implement them later today unless you want to. All the configuration options get tested in a file, I'd like to add it there too. I'm nearly sure this breaks our reproducible SDist guarantee, so I think it needs to become version dependent, only activating if a user does not set a minimum version or since the minimum version at or above the next release. |
|
Done! Added it to the settings tests and made it version-gated (defaults to |
There was a problem hiding this comment.
Pull request overview
This PR changes scikit-build-core’s SDist builder to dereference symlinks by default so generated sdists are self-contained (closing #801), while preserving backward compatibility for projects that pin minimum-version < 0.13.
Changes:
- Add new
sdist.resolve-symlinkssetting (defaulting totrue, butfalsewhenminimum-version < 0.13). - Enable tarfile dereferencing during SDist creation based on that setting.
- Add tests and documentation covering the new behavior and its backcompat rules.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/scikit_build_core/build/sdist.py |
Passes dereference=... to the tar writer to copy symlink targets into the sdist by default. |
src/scikit_build_core/settings/skbuild_model.py |
Introduces the sdist.resolve_symlinks setting in the settings model. |
src/scikit_build_core/settings/skbuild_read_settings.py |
Applies default/backcompat logic for resolve_symlinks based on minimum-version. |
src/scikit_build_core/resources/scikit-build.schema.json |
Adds the sdist.resolve-symlinks config schema entry. |
tests/test_skbuild_settings.py |
Extends default/backcompat settings tests for the new option. |
tests/test_pep517_sdist_symlink.py |
Adds an integration-style test asserting tar member type changes with dereferencing on/off. |
docs/reference/configs.md |
Documents sdist.resolve-symlinks behavior and versioning. |
README.md |
Adds the new config option to the example configuration snippet. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) | ||
|
|
||
| settings_reader = SettingsReader.from_file(pyproject_toml, {}) | ||
| assert not settings_reader.settings.sdist.resolve_symlinks |
There was a problem hiding this comment.
assert not ...resolve_symlinks will also pass if the value is None, so this test wouldn’t catch a regression where the setting isn’t resolved. Please assert the explicit boolean value here (e.g., is False) so the backcompat behavior is actually verified.
| assert not settings_reader.settings.sdist.resolve_symlinks | |
| assert settings_reader.settings.sdist.resolve_symlinks is False |
| def can_symlink() -> None: | ||
| """Skip the test if symlinks are not supported on this OS.""" | ||
| try: | ||
| Path("_symlink_check").symlink_to("_symlink_check_target") | ||
| except OSError: | ||
| pytest.skip( | ||
| "Creating symlinks is not supported/allowed on this OS without privileges" | ||
| ) | ||
| else: | ||
| Path("_symlink_check").unlink(missing_ok=True) |
There was a problem hiding this comment.
can_symlink creates and deletes a fixed-name path in the current working directory (_symlink_check). Since CI runs tests with pytest -n auto (xdist), this can race with other workers if they share the same CWD at fixture setup time. Consider creating the check symlink in a unique temp directory (e.g., via tmp_path_factory.mktemp(...)) to avoid cross-test interference.
| def can_symlink() -> None: | |
| """Skip the test if symlinks are not supported on this OS.""" | |
| try: | |
| Path("_symlink_check").symlink_to("_symlink_check_target") | |
| except OSError: | |
| pytest.skip( | |
| "Creating symlinks is not supported/allowed on this OS without privileges" | |
| ) | |
| else: | |
| Path("_symlink_check").unlink(missing_ok=True) | |
| def can_symlink(tmp_path_factory) -> None: | |
| """Skip the test if symlinks are not supported on this OS.""" | |
| check_dir = tmp_path_factory.mktemp("symlink_check") | |
| check_link = check_dir / "_symlink_check" | |
| try: | |
| check_link.symlink_to("_symlink_check_target") | |
| except OSError: | |
| pytest.skip( | |
| "Creating symlinks is not supported/allowed on this OS without privileges" | |
| ) | |
| else: | |
| check_link.unlink(missing_ok=True) |
There was a problem hiding this comment.
I think this also could be cached, vs. being computed each time.
Dereference symlinks by default to ensure sdists are completely self-contained and universally usable out of their source directories.
Closes #801