[core] Introduce Catalog#loadTableMetadata#7851
Conversation
JingsongLi
left a comment
There was a problem hiding this comment.
Review: [core] Introduce Catalog#loadTableMetadata
The interface promotion (moving loadTableMetadata to public on Catalog) is a sensible refactor. DelegateCatalog delegation and Javadoc are properly done.
Correctness Concerns
1. Error-handling semantics change (potential bug)
Callers rely on createOtherBranchTable returning null when a branch does not exist:
FileStoreTable otherTable = createOtherBranchTable(...);
if (otherTable != null) { ... } else { LOG.error(...); }- Before:
SchemaManager.latest()returnsOptional.empty()for a non-existent branch, method returnsnullgracefully. - After:
catalog.loadTableMetadata(branchIdentifier)throwsTableNotExistException. The broadcatch (Exception e) { throw new RuntimeException(e); }escalates -- turning graceful degradation into hard failure.
Suggested fix: catch TableNotExistException specifically and return Optional.empty().
2. JdbcCatalog branch compatibility
JdbcCatalog.loadTableSchema calls assertMainBranch(identifier) which throws UnsupportedOperationException for non-main branches. After this PR, any fallback branch on JdbcCatalog will fail with wrapped UnsupportedOperationException.
Options:
- Fall back to filesystem-based schema loading when the catalog throws
UnsupportedOperationException - Guard this path to only activate for catalog types that support branch resolution
3. Minor: catalog lifecycle cost
Opening a full catalog via catalogLoader.load() just to load a schema could be more expensive than a direct filesystem read (especially for REST or Hive catalogs). The try-with-resources ensures proper cleanup, but worth considering.
Summary
The interface refactoring part is good. The FileStoreTableFactory change needs the error-handling fix (point 1) and JdbcCatalog compatibility (point 2) addressed to avoid regressions.
Purpose
All catalogs have implemented the
loadTableMetadatamethod. In this PR we move this method into the interface, so we can use it if we only want to get some metadata like schemas from tables and don't want the table object itself.Tests
This is a refactor. Existing tests should cover this change.