Add Azure Blob Storage Gen2 (ADLS HNS) support to BlobFileStore#19014
Add Azure Blob Storage Gen2 (ADLS HNS) support to BlobFileStore#19014
Conversation
Make BlobFileStore adaptive: it now auto-detects whether the storage account has Hierarchical Namespace (HNS) enabled and uses native DataLake APIs for atomic moves, real directories, and efficient listing when available. Falls back to standard flat-namespace blob operations otherwise. HNS detection can be overridden via configuration. - Add IFileStoreCapabilities interface and FileStoreCapabilities impl - Add GetFilesAsync, GetDirectoriesAsync, Capabilities to IFileStore - Add UseHierarchicalNamespace option to BlobStorageOptions - Add Azure.Storage.Files.DataLake package dependency - Add MediaCreatedFileAsync and MediaCopiedFileAsync event hooks - Wire up EnsureCapabilitiesAsync at startup in Media.Azure module
Add integration tests that run against Azurite to verify BlobFileStore behavior in both flat-namespace (Gen1) and hierarchical-namespace (Gen2) modes. Tests cover capabilities detection, file CRUD, directory operations, move/copy, and directory content listing. Gen2 tests that use DataLake APIs will fail until Azurite supports the DFS endpoint (--dfsPort flag). This is intentional to track when a newer Azurite version enables full Gen2 testing. - Add abstract BlobFileStoreTestsBase with 26 test methods - Add BlobFileStoreGen1Tests and BlobFileStoreGen2Tests subclasses - Add AzuriteFactAttribute to skip tests when Azurite is unavailable - Add OrchardCore.FileStorage.AzureBlob reference to test project - Start Azurite in main_ci and pr_ci workflows on ubuntu runners
Add DfsEndpoint option to BlobStorageOptions for local emulators where the DFS endpoint runs on a separate port. Parse storage credentials from the connection string to construct the DataLakeServiceClient when an explicit DFS endpoint is configured. Fix Gen2 code paths to handle 404 RequestFailedException from DataLakePathClient.ExistsAsync, which can throw instead of returning false when the path does not exist.
Reverted both workflows back to docker run steps (instead of services) since that lets us pass --skipApiVersionCheck directly. The command now uses your image with the correct flags:
|
I'll try to review this over the weekend. |
…dia startup Move BlobFileStore.EnsureCapabilitiesAsync() from the DI singleton factory (where it blocked via GetAwaiter().GetResult()) into MediaBlobContainerTenantEvents.ActivatingAsync(), which already runs asynchronously at tenant startup. BlobFileStore is now registered as its own singleton so it can be injected. Also remove the unused StorageProvider member from IFileStoreCapabilities and FileStoreCapabilities, and remove the default implementation of IFileStore.Capabilities so every provider must declare capabilities explicitly. FileSystemStore and AwsFileStore updated accordingly.
| private readonly ILogger<FileSystemStore> _logger; | ||
| private readonly string _fileSystemPath; | ||
|
|
||
| public IFileStoreCapabilities Capabilities { get; } = new FileStoreCapabilities(hasHierarchicalNamespace: false, supportsAtomicMove: false); |
There was a problem hiding this comment.
I'm changing this configuration to report that the local FileStorage uses HierarchicalNamespace and supports atomic move. It technically should not break any custom implementations.
…lesystem behavior FileSystemStore was configured with hasHierarchicalNamespace: false and supportsAtomicMove: false, which does not match its actual implementation. The local filesystem uses real Directory.* APIs for first-class directory operations and File.Move() which maps to an atomic OS rename syscall. Updated both flags to true to align with the implementation and be consistent with how BlobFileStore Gen2 reports the same native behaviors.
…tification
Add a StorageName default interface member to IFileStore that returns a
human-readable name for the underlying storage backend. Implementations:
FileSystemStore ("Local"), BlobFileStore ("Azure Blob (Gen1)"/"Azure Blob (Gen2)"
based on detected HNS), AwsFileStore ("Amazon S3"). DefaultMediaFileStore
delegates to the inner store. This enables the media UI to display which storage
provider is active at runtime, including distinguishing Azure Gen1 from Gen2.
…tegration project Extract BlobFileStore Gen1/Gen2 tests from OrchardCore.Tests into a new OrchardCore.Tests.Integration project. This separates integration tests that require external services (Azurite) from unit tests, avoiding unnecessary Docker container startup and extra execution time on every PR commit. The new project is built and run by a dedicated integration_tests.yml workflow that only triggers on review submission, push to main/release, or manual dispatch — matching the same pattern as functional_all_db.yml. Removed Azurite setup from pr_ci.yml and main_ci.yml.
|
This pull request has merge conflicts. Please resolve those before requesting a review. |
|
@Piedone Same here, moved everything to a new project and also new GH workflow. |
|
This pull request has merge conflicts. Please resolve those before requesting a review. |
# Conflicts: # .github/workflows/integration_tests.yml # test/OrchardCore.Tests.Integration/OrchardCore.Tests.Integration.csproj
|
As we discussed in the latest steering meeting to utilize TestContainers, this closed in favor of #19162 |
| /// Default implementation filters <see cref="GetDirectoryContentAsync"/>. Implementations | ||
| /// may override this to avoid enumerating directories for better performance. | ||
| /// </remarks> | ||
| async IAsyncEnumerable<IFileStoreEntry> GetFilesAsync(string path = null) |
There was a problem hiding this comment.
This is supposed to be optimized in the implementations. For instance if a storage contains 1M files and 1 folder , we can now get the folder without listing the files.
However this is not implemented in concrete implementations.
There was a problem hiding this comment.
I added the implementation details that are different in a table on the PR description. It will be easier for you to review that way.
| /// Without this override there is no way to use HNS-dependent features (atomic moves, real | ||
| /// directory operations) in either of those scenarios. | ||
| /// </remarks> | ||
| public bool? UseHierarchicalNamespace { get; set; } |
There was a problem hiding this comment.
This may not be necessary. The BlobStore already checks the API availability and falls back to flat namespace. Is also uses this flag but the user shouldn't care.
There was a problem hiding this comment.
it's still needed. The logic is that it is possible that the GetAccountInfo fails to return the storage type "Gen1 or Gen2". Then it is a fallback that will look into the BlobStorageOptions appsettings file to fallback to the value set manually.
There was a problem hiding this comment.
The issue would be if the BlobStore would fail to retrieve the Gen type when doing GetAccountInfo because of account security. Then it would fall back to Gen1 for a Gen2 which would be really bad. Either we keep this fallback option here or we remove the automatic discovery of Gen type.
|
@copilot can you take a look at @sebastienros review / comments and check why we don't have anything specific about the IFileStoreCapabilities since we don't seem to make use of it in this PR. |
Not necessary for this PR.
The Azurite fork now serves DFS requests on the same port as blob, matching how Azure Blob Storage works in production (single endpoint, different hostnames). DataLakeServiceClient is created directly from the connection string, eliminating the need for a separate DfsEndpoint. - Remove DfsEndpoint from BlobStorageOptions and BlobFileStore - Remove ParseCredentialsFromConnectionString (no longer needed) - Update integration tests: auto-detect HNS via GetAccountInfo for Gen2, use UseHierarchicalNamespaceOverride=false only for Gen1 - Update CI: single port per Azurite container, no separate DFS port Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…string Gen1 tests use UseHierarchicalNamespaceOverride=false against an HNS-enabled Azurite, which triggers the expected warning and forces flat-namespace code paths. Gen2 tests rely on auto-detection. No separate port or connection string needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Gen2 tests now create their containers using DataLakeFileSystemClient, which sends x-ms-namespace-enabled: true and stamps the container as an HNS filesystem — matching how Gen2 filesystems are created in production Azure. Gen1 tests continue using BlobContainerClient. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Gen1: TryCreateDirectoryAsync writes a marker blob (OrchardCore.Media.txt)
to simulate directories in flat namespace storage.
Gen2: TryCreateDirectoryAsync uses the DataLake API to create a real
directory object — no marker blob is written.
These tests assert the raw blob listing directly, proving Azurite is
routing to the correct pipeline for each storage generation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
I updated the Azurite ADLS pull request and the related docker image to clean up this PR. |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@sebastienros I'm done with changes. Mostly needed to fix the Azurite emulator to simplify the code here. The remaining HierarchicalNamespace fields are there as a fallback for when the automatic discovery fails. This can happen when using a connection string to a specific container and that the storage account is protected. The GetAccountInfo() Azure blob storage / Azurite function could return a 403 since the user account would not be allowed to do a GetAccountInfo() to the root of the storage. Something like it ... keeping the fields was suggested by AI for that matter. Maybe it is completely wrong but we would need to do tests with a real Azure Blob Storage and see how it behaves with that use case. I don't like needing an override from appsettings like this but that's a fallback because we have auto-discovery of the Gen type. I may investigate further when I get time. |
Make BlobFileStore adaptive: it now auto-detects whether the storage account has Hierarchical Namespace (HNS) enabled and uses native DataLake APIs for atomic moves, real directories, and efficient listing when available. Falls back to standard flat-namespace blob operations otherwise. HNS detection can be overridden via configuration.
All operations go through separate, unified API endpoints that adapt server-side
The Vue app calls the same endpoints regardless of storage backend:
GetDirectoryInfoAsyncDataLakeFileSystemClient.GetDirectoryClient.ExistsAsyncGetDirectoryContentFlatAsynchdi_isfoldermetadataTryCreateDirectoryAsyncDataLakeDirectoryClient.CreateIfNotExistsAsyncTryDeleteDirectoryAsyncDataLakeDirectoryClient.DeleteAsync(recursive: true)MoveFileAsyncDataLakeFileClient.RenameAsync(atomic)CopyFileAsync+TryDeleteFileAsyncTODO