-
Notifications
You must be signed in to change notification settings - Fork 372
ADLS Gen 2 #2635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Skrypt
wants to merge
29
commits into
Azure:main
Choose a base branch
from
Skrypt:jsavard/adls-gen2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
ADLS Gen 2 #2635
Changes from 24 commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
7dfb9e8
ADSL Gen 2
Skrypt 82fc433
Add HNS toggle, atomic rename, conditional headers, and lease ops to …
Skrypt bebd6c0
Add DFS Swagger spec and generated interface layer for ADLS Gen2
Skrypt 956c39c
Add HNS parent-child hierarchy table for directory relationship tracking
Skrypt fb22f44
Add @azure/storage-file-datalake SDK integration tests for DFS endpoint
Skrypt 94ef926
Add Phase III OAuth ACL enforcement for DFS endpoint
Skrypt fb2c8dc
Fix recursive directory deletion to remove all descendant blobs
Skrypt 07b32a8
Return 409 PathAlreadyExists when creating an existing directory via DFS
Skrypt 23e9269
Fix type confusion through parameter tampering in DFS PathHandler
Skrypt fb455e0
Fix GetAccountInfo method
Skrypt 2b4cb30
feat(hns): Per-container HNS (Gen2) support, GetAccountInfo returns c…
Skrypt 493d244
refactor: DFS pipeline unified on Blob port, cleanup legacy DFS serve…
Skrypt d2f12e4
fix(dfs): resolve three ADLS Gen2 issues reported by Izeren
Skrypt 73d20d3
fix(dfs): reject DFS operations on non-HNS containers with Hierarchic…
Skrypt 38f4556
fix(dfs): make DFS path rename truly atomic across blobs and HNS hier…
Skrypt 94c90bf
fix(dfs): address Copilot PR review — remove legacy dfsHost/dfsPort, …
Skrypt cba5313
fix(dfs): BlobConfiguration default false, fix test routing and SDK r…
Skrypt 92d6a73
fix(dfs): address remaining Copilot review comments
Skrypt e587892
fix(dfs): address latest Copilot review — REPLACE safety, HNS fallbac…
Skrypt 77c9c17
test(dfs): add missing Gen2 coverage + fix two bugs discovered by new…
Skrypt 16d935c
fix(dfs): ContainerHandler HNS default, dialect-safe SQL rename, Blob…
Skrypt 52db20c
fix(dfs): address Copilot review — ACL, body parser, HNS header, dele…
Skrypt c2f6204
fix(dfs): address internal code review — 6 critical, 8 major, 8 minor…
Skrypt 86c3eba
test(dfs): add coverage for all review-identified test gaps
Skrypt 4844856
docs: add pass-2 code review findings to ADLS-gen2-review.md
Skrypt aefab39
fix(dfs): address pass-2 review — HNS metadata safety, listPaths, err…
Skrypt b45b0d2
docs: add pass-3 code review findings to ADLS-gen2-review.md
Skrypt 29abf72
fix: correct 'telemtry' typo in --disableTelemetry CLI help text
Skrypt d83a0e4
fix(dfs): address pass-3 review — correctness, resource management, d…
Skrypt File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,190 @@ | ||
| # ADLS Gen2 Parity Implementation Plan | ||
|
|
||
| ## Context | ||
|
|
||
| Azurite previously had a **thin DFS proxy layer** on a dedicated port (10004) that translated a small subset of ADLS Gen2 DFS REST API calls to Blob REST API calls via HTTP proxying (axios). This covered only filesystem (container) create/delete/HEAD and account listing. Full ADLS Gen2 parity requires native support for path (file/directory) operations, the append-then-flush write pattern, rename/move, ACLs, and list paths — none of which can be achieved by simple query-parameter rewriting. | ||
|
|
||
| ## Architectural Decision: Hybrid (Native DFS Handlers + Shared Port) | ||
|
|
||
| Replace the HTTP proxy with a **native Express pipeline** mounted inside `BlobRequestListenerFactory` that directly accesses `IBlobMetadataStore` and `IExtentStore` — the same store instances used by the blob handlers. DFS and Blob share a single listener on port 10000; routing is done by URL prefix inside the existing server. | ||
|
|
||
| ``` | ||
| Port 10000 | ||
| ├─ /devstoreaccount1/<container>?resource=filesystem → DFS Handlers → IBlobMetadataStore + IExtentStore | ||
| ├─ /devstoreaccount1/<container>/<path> → DFS Handlers → same stores | ||
| └─ everything else → Blob Handlers → same stores | ||
| ``` | ||
|
|
||
| There is no separate DFS server or dedicated DFS port. `--dfsHost` / `--dfsPort` CLI flags and the `azurite.dfsHost` / `azurite.dfsPort` VS Code settings have been removed. | ||
|
|
||
| **Why not keep proxying?** DFS operations like List Paths, Create Directory, Rename, ACLs, and append-then-flush have no single blob API equivalent. Proxying would require multi-call orchestration, lose atomicity, and add latency. | ||
|
|
||
| **Why shared port instead of separate listener?** The DFS and Blob APIs share the same account/container/blob namespace. A separate listener would require passing live store references across server boundaries and duplicating TLS/auth/logging configuration. Mounting DFS routing inside the existing server is simpler and keeps all requests to a single endpoint — matching how Azure itself exposes both APIs on `*.blob.core.windows.net` / `*.dfs.core.windows.net` (separate hostnames but the same backing infrastructure). | ||
|
|
||
| ### Directory Model | ||
|
|
||
| Directories stored as **zero-length BlockBlobs with `hdi_isfolder=true` metadata** — matching Azure's real internal behavior. No separate table needed. | ||
|
|
||
| ### ACL Storage | ||
|
|
||
| New fields on `BlobModel`: `dfsAclOwner`, `dfsAclGroup`, `dfsAclPermissions`, `dfsAcl`. LokiJS is schemaless (just add fields); SQL needs ALTER TABLE. | ||
|
|
||
| --- | ||
|
|
||
| ## Phase 0: Foundation — Shared Store Access & HNS Flag | ||
|
|
||
| **Goal:** Wire DFS server to share stores with blob server; enable HNS mode. | ||
|
|
||
| | File | Change | | ||
| |------|--------| | ||
| | `src/blob/utils/constants.ts` | Set `EMULATOR_ACCOUNT_ISHIERARCHICALNAMESPACEENABLED = true` (or make configurable) | | ||
| | `src/blob/BlobServer.ts` | Expose `metadataStore`, `extentStore`, and `accountDataStore` via public getters | | ||
| | `src/blob/BlobRequestListenerFactory.ts` | Mount `DfsRequestListenerFactory` as a sub-router on DFS URL patterns | | ||
| | `src/blob/DfsRequestListenerFactory.ts` | Rewrite: replace axios proxy with native Express pipeline + DFS routing | | ||
| | `src/blob/IBlobEnvironment.ts`, `BlobEnvironment.ts`, `src/common/Environment.ts`, `VSCEnvironment.ts` | Add `--enableHierarchicalNamespace` option; remove `--dfsHost`/`--dfsPort` | | ||
|
|
||
| **Deliverable:** DFS requests are served on the blob port; existing filesystem tests pass via direct store access. No separate DFS listener or port. | ||
|
|
||
| --- | ||
|
|
||
| ## Phase 1: Path CRUD + List Paths | ||
|
|
||
| **Goal:** Create/delete/read files and directories, list paths — the core operations most ADLS Gen2 SDKs depend on. | ||
|
|
||
| ### New files to create | ||
|
|
||
| | File | Purpose | | ||
| |------|---------| | ||
| | `src/blob/dfs/DfsContext.ts` | DFS request context (account, filesystem, path) — analogous to `BlobStorageContext` | | ||
| | `src/blob/dfs/DfsOperation.ts` | Enum of DFS operations for dispatch | | ||
| | `src/blob/dfs/DfsDispatchMiddleware.ts` | Routes requests by `resource` param, `action` param, method, and headers | | ||
| | `src/blob/dfs/DfsErrorFactory.ts` | JSON error responses (`PathNotFound`, `DirectoryNotEmpty`, etc.) | | ||
| | `src/blob/dfs/DfsSerializer.ts` | JSON response serialization (DFS uses JSON, not XML) | | ||
| | `src/blob/dfs/handlers/FilesystemHandler.ts` | Filesystem ops → container store operations | | ||
| | `src/blob/dfs/handlers/PathHandler.ts` | Path create/delete/read/getProperties + listPaths | | ||
|
|
||
| ### Operations implemented | ||
|
|
||
| - **Create Path** (`PUT ?resource=file|directory`): Creates zero-length BlockBlob; directories get `hdi_isfolder=true` metadata; auto-creates intermediate directories | ||
| - **Delete Path** (`DELETE`): Files → `deleteBlob()`; directories with `recursive=true` → delete all blobs with prefix; `recursive=false` → 409 if non-empty | ||
| - **Get Path Properties** (`HEAD`): Returns `x-ms-resource-type: file|directory` header | ||
| - **Read Path** (`GET`): Streams file content via `downloadBlob()` (follows `BlobHandler.download()` pattern) | ||
| - **List Paths** (`GET ?resource=filesystem&directory=...&recursive=true|false`): JSON response with `paths` array; uses `listBlobs()` with prefix/delimiter; supports continuation via `x-ms-continuation` | ||
|
|
||
| ### Existing files modified | ||
|
|
||
| | File | Change | | ||
| |------|--------| | ||
| | `src/blob/persistence/IBlobMetadataStore.ts` | Add `dfsResourceType`, ACL fields to `BlobModel` / `IBlobAdditionalProperties` | | ||
| | `src/blob/persistence/LokiBlobMetadataStore.ts` | No schema changes needed (schemaless) | | ||
| | `src/blob/persistence/SqlBlobMetadataStore.ts` | Add columns: `dfsResourceType`, `dfsAclOwner`, `dfsAclGroup`, `dfsAclPermissions`, `dfsAcl` | | ||
|
|
||
| ### Tests | ||
|
|
||
| Extend `tests/blob/dfsProxy.test.ts`: | ||
| - Create file / directory, verify as blob | ||
| - Delete file / empty dir / non-empty dir with recursive | ||
| - Get properties with `x-ms-resource-type` | ||
| - Read file content | ||
| - List paths recursive and non-recursive | ||
| - Cross-API: create via DFS → read via Blob API and vice versa | ||
|
|
||
| --- | ||
|
|
||
| ## Phase 2: Append-Flush Write Pattern | ||
|
|
||
| **Goal:** Implement the DFS file write model (create empty → append chunks → flush to commit). | ||
|
|
||
| ### Key insight | ||
|
|
||
| DFS append-then-flush maps directly to existing **BlockBlob uncommitted blocks** infrastructure: each `action=append` becomes a `stageBlock()`, and `action=flush` becomes `commitBlockList()`. No new persistence methods needed. | ||
|
|
||
| ### Changes to `src/blob/dfs/handlers/PathHandler.ts` | ||
|
|
||
| - **`updatePath_Append(position, body)`**: Write body to `IExtentStore` as extent chunk; record as uncommitted block via `metadataStore.stageBlock()`; validate `position` matches current append offset; return 202 | ||
| - **`updatePath_Flush(position, close)`**: Commit all staged blocks via `metadataStore.commitBlockList()`; update content length to `position`; return 200 with updated ETag | ||
|
|
||
| ### Tests | ||
|
|
||
| - Create → append 3 chunks → flush → read back, verify content | ||
| - Append with wrong position → 400 | ||
| - Large file (multi-MB) append | ||
|
|
||
| --- | ||
|
|
||
| ## Phase 3: Rename/Move Path | ||
|
|
||
| **Goal:** Atomic rename for files and directories. | ||
|
|
||
| ### New persistence methods | ||
|
|
||
| | Method | Description | | ||
| |--------|-------------| | ||
| | `IBlobMetadataStore.renameBlob(src, dest)` | Atomic rename of single blob (metadata-only, no extent copy) | | ||
| | `IBlobMetadataStore.renameBlobsByPrefix(srcPrefix, destPrefix)` | Atomic rename of all blobs matching prefix (for directory rename) | | ||
|
|
||
| ### PathHandler addition | ||
|
|
||
| - **`renamePath(x-ms-rename-source)`**: Parse source header → for files: `renameBlob()`; for directories: `renameBlobsByPrefix()`. Supports cross-filesystem rename and conditional headers. | ||
|
|
||
| ### Persistence implementations | ||
|
|
||
| - **LokiJS**: Update document `containerName` and `name` properties | ||
| - **SQL**: `UPDATE ... SET name = REPLACE(name, oldPrefix, newPrefix) WHERE name LIKE 'prefix%'` in transaction | ||
|
|
||
| ### Tests | ||
|
|
||
| - Rename file within filesystem / across filesystems | ||
| - Rename directory (verify children moved) | ||
| - Rename non-existent → 404 | ||
| - Rename with conditional headers | ||
|
|
||
| --- | ||
|
|
||
| ## Phase 4: ACL Operations | ||
|
|
||
| **Goal:** POSIX ACL get/set for emulator parity. | ||
|
|
||
| ### PathHandler additions | ||
|
|
||
| - **`getAccessControl()`**: Read ACL fields from blob record → return as `x-ms-owner`, `x-ms-group`, `x-ms-permissions`, `x-ms-acl` headers. Defaults: `$superuser`/`$superuser`/`rwxr-x---` | ||
| - **`setAccessControl(owner, group, permissions, acl)`**: Validate ACL format → update blob record | ||
| - **`setAccessControlRecursive(mode, acl)`**: `mode` = set|modify|remove; iterate blobs under prefix; support continuation; return JSON with `directoriesSuccessful`, `filesSuccessful`, `failureCount` | ||
|
|
||
| ### Tests | ||
|
|
||
| - Set/get ACL on file and directory | ||
| - Recursive ACL set on directory tree | ||
| - Default ACL values on new paths | ||
|
|
||
| --- | ||
|
|
||
| ## Phase 5: Polish & Remaining Operations | ||
|
|
||
| - **Set Filesystem Properties** (`PATCH ?resource=filesystem`) → `setContainerMetadata()` | ||
| - **`x-ms-properties` encoding/decoding** — new `src/blob/dfs/DfsPropertyEncoding.ts` utility (base64 key=value pairs) | ||
| - **DFS JSON error format**: `{"error":{"code":"...","message":"..."}}` | ||
| - **Lease support** on DFS paths (reuse blob lease infrastructure) | ||
| - **SAS validation** on DFS endpoints (reuse existing authenticators) | ||
| - **Content-MD5/CRC64 validation** on append | ||
|
|
||
| --- | ||
|
|
||
| ## Verification Plan | ||
|
|
||
| 1. **Unit tests**: Extend `tests/blob/dfsProxy.test.ts` per phase | ||
| 2. **Cross-API tests**: Verify DFS-created data is visible via Blob API and vice versa | ||
| 3. **SDK integration**: Test with `@azure/storage-file-datalake` Node.js SDK against the emulator | ||
| 4. **Manual smoke test**: Run Azurite, use Azure Storage Explorer with DFS endpoint | ||
| 5. **Existing blob tests**: Ensure `npm test` still passes (no regression) | ||
|
|
||
| --- | ||
|
|
||
| ## Critical Reference Files | ||
|
|
||
| - `src/blob/handlers/ContainerHandler.ts` — pattern for handler ↔ store interaction | ||
| - `src/blob/handlers/BlockBlobHandler.ts` — `stageBlock`/`commitBlockList` for append-flush reuse | ||
| - `src/blob/handlers/BlobHandler.ts` — `download()` pattern for Read Path | ||
| - `src/blob/persistence/IBlobMetadataStore.ts` — store interface to extend | ||
| - `src/blob/generated/handlers/` — handler interface patterns | ||
| - `src/blob/middlewares/blobStorageContext.middleware.ts` — context extraction pattern for DfsContext | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.