[Spark] Allow USING INVENTORY table identifier to resolve non-Delta sources#7037
Open
awbarbeau wants to merge 2 commits into
Open
[Spark] Allow USING INVENTORY table identifier to resolve non-Delta sources#7037awbarbeau wants to merge 2 commits into
awbarbeau wants to merge 2 commits into
Conversation
…ources Currently VACUUM ... USING INVENTORY <table_identifier> requires the inventory source to be a Delta table. The inventory is ultimately consumed as a DataFrame and validated against INVENTORY_SCHEMA in VacuumCommand.getFilesFromInventory, so the Delta-only restriction on the identifier path is unnecessary and inconsistent with the subquery path, which already accepts any analyzable relation. This change resolves the inventoryTable plan via Dataset.ofRows so the identifier and subquery paths behave the same way. Existing inventory schema validation still rejects malformed sources. Scope: - Only changes how the inventory source is resolved. - Does not change target-table controls; the VACUUM target still goes through Delta-specific safety and protocol checks. Resolves delta-io#7036 Signed-off-by: Alex Barbeau <30359706+awbarbeau@users.noreply.github.com>
- Update VacuumTableCommand class scaladoc to reflect that the inventory
source is no longer required to be a Delta table.
- Test improvements:
* Rename test "non-delta" -> "non-Delta" for consistent capitalization.
* Rename local val from inventoryTable to inventoryTableName to avoid
shadowing the case class field name.
* Drop hidden-directory rows from the inventory data, since hidden-file
handling is already covered by the adjacent
"vacuum using inventory delta table and should not touch hidden files"
test. The non-Delta test now focuses solely on identifier resolution.
Signed-off-by: Alex Barbeau <30359706+awbarbeau@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which Delta project/connector is this regarding?
Description
Resolves #7036.
VACUUM ... USING INVENTORY <table_identifier>previously calledgetDeltaTable(p, "VACUUM").toDf(sparkSession)on the resolved identifier plan, requiring the inventory source to be a Delta table. The downstream consumerVacuumCommand.getFilesFromInventoryonly needs aDataFramematchingINVENTORY_SCHEMA. The subquery path already accepts any analyzable relation, so the identifier-only Delta restriction is unnecessary and inconsistent.This PR resolves the analyzed plan via
Dataset.ofRows(...)so identifier and subquery paths behave the same way. Schema validation ingetFilesFromInventorycontinues to reject malformed sources.Target-table controls are unchanged; the VACUUM target still goes through Delta-specific safety and protocol checks.
How was this patch tested?
Added
vacuum using inventory non-Delta table identifierinDeltaVacuumSuite, which covers a Parquet inventory source registered as a managed table. Existing inventory tests provide regression coverage for the unchanged paths.Does this PR introduce any user-facing changes?
Yes (relaxation only).
VACUUM ... USING INVENTORY <identifier>no longer requires the inventory source to be a Delta table. Sources with a wrong schema still fail withDELTA_INVALID_INVENTORY_SCHEMA. Non-breaking change.