Skip to content

feat: add Nutstore (坚果云) WebDAV data source connector#980

Draft
the-waste-land wants to merge 21 commits intoTencent:mainfrom
the-waste-land:feat/nutstore-connector
Draft

feat: add Nutstore (坚果云) WebDAV data source connector#980
the-waste-land wants to merge 21 commits intoTencent:mainfrom
the-waste-land:feat/nutstore-connector

Conversation

@the-waste-land
Copy link
Copy Markdown
Contributor

Summary

Add Nutstore (坚果云) as a new data source connector, enabling users to sync documents from Nutstore via WebDAV protocol. This is extensible to other WebDAV services (Nextcloud, etc.).

Backend

  • WebDAV client (internal/datasource/connector/nutstore/client.go): Ping, ListDirectory (BFS recursive), DownloadFile, GetShareURL with rate limiting (10 req/s sustained, burst 50) and 503 retry with exponential backoff
  • Connector implementation (connector.go): FetchAll, FetchIncremental (based on source_updated_at comparison), ListResources with file_types filtering
  • Types & config (types.go): WebDAV XML response parsing, NutstoreConfig with base_url/root_path/file_types
  • DI registration: Registered in container.go
  • Shared improvements: Added source_path/source_updated_at to knowledge metadata, stateless validateCredentials API for edit-mode test connection

Frontend

  • Nutstore connector configuration UI in DataSourceEditorDialog (credentials + settings fields)
  • Branded icon fallback for nutstore in DataSourceTypeIcon
  • 4-language i18n (zh-CN, en-US, ko-KR, ru-RU)

Key Technical Decisions

  • BFS recursive listing instead of Depth:infinity for compatibility with WebDAV servers that don't support infinite depth
  • Rate limiting (token bucket 10/s, burst 50) to stay within Nutstore's 18,000 req/30min limit
  • 503 retry with exponential backoff (2s→4s→8s) on all WebDAV operations
  • ListResources uses Depth:1 for fast UI loading; full recursive traversal only during sync

Test plan

  • Unit tests for WebDAV XML parsing and BFS listing (client_test.go)
  • Test connection with valid/invalid Nutstore credentials
  • Verify folder selection and sync with real Nutstore account
  • Verify incremental sync correctly detects changed files
  • Test rate limiting behavior under load
  • Verify frontend i18n in all 4 languages

🤖 Generated with Claude Code

Add nutstore (坚果云) as a configurable data source in the frontend:
- DataSourceEditorDialog: add nutstore connector with credentials (username/password)
  and settings fields (base_url with default, root_path)
- DataSourceTypeIcon: add branded fallback icon for nutstore (blue background)
- i18n: add nutstore translations for zh-CN, en-US, ko-KR, ru-RU including
  connector name, description, field labels, and nutstore-specific guide steps
- Add folder/file resource type labels for directory browsing
1. (High) Fix folder selection sync: ListResources now appends trailing
   slash to directory ExternalIDs so expandResources correctly identifies
   and recursively expands them instead of treating folders as files.

2. (High) Fix edit-mode test connection: both create and edit modes now
   use stateless validateCredentials API instead of persisting config
   before validation. Prevents bad credentials from being saved on
   failed test.

3. (Medium) Fix enterprise Nutstore validation: validateCredentials API
   now accepts optional settings parameter, so custom base_url (e.g.
   enterprise drive.{domain}.com) is used during connection test instead
   of silently falling back to public host.

4. (Medium) Fix URL ingestion dropping source_path: CreateKnowledgeFromURL
   now accepts metadata parameter and extracts source_path and
   source_updated_at, ensuring URL-based items are path-queryable.
… listing

Nutstore WebDAV silently degrades Depth:infinity to Depth:1, causing
sync to only discover files in the first level of each directory.
Replace with manual BFS traversal using Depth:1 per directory.
Errors in subdirectory listing propagate up immediately.
Nutstore enforces 18,000 requests per 30 minutes. Add token bucket rate
limiter at 8 req/s (safety margin) to all WebDAV requests via doRequest.
Also add exponential backoff retry on 503 in BFS recursive listing.
Parseable files are uploaded to CStore and served from there, so
fetching a Nutstore share link is unnecessary. This halves the API
requests for parseable files, reducing rate limit pressure.
Allows faster directory traversal while staying within Nutstore's
18,000 req/30min limit. Token bucket burst of 50 enables short
spikes without triggering rate limiting.
…ces use Depth:1

- DownloadFile: wrap with 503 retry (3 retries, exponential backoff 2s→4s→8s)
- GetShareURL: same 503 retry pattern, explicitly return error on 503
  (previously 503 was silently swallowed as "not supported")
- ListResources: change from ListDirectoryRecursive to ListDirectory Depth:1
  for fast UI loading. Full recursive traversal only happens during sync.
- Add scripts/dev-noair.sh: start backend without air hot-reload to prevent
  long-running sync tasks from being interrupted by recompilation
- Change nutstore root_path placeholder from '/我的文档' to '我的文档'
  (WebDAV path should not start with slash in our connector)
- Use unique bucket name in TestOssEnsureBucket_CreateFails to ensure
  the create-bucket code path is exercised
1. Mixed dir+file resources: use per-resource-type filtering —
   full-directory resources allow all files, single-file resources
   only allow the specific file. Previously allowedFiles was a
   global filter that blocked directory contents.

2. Deduplicate dirs to prevent repeated scanning when multiple
   single files share the same parent directory.
Full-directory resources now allow all files under the selected path,
not just direct children. Extracted isFileAllowed() with regression
tests covering mixed dir+file selection with nested subdirectories.
Set ParentID to empty string so frontend identifies root nodes correctly.
Remove misleading expand arrow since lazy-loading is not implemented.
Add missing resourceType.folder and resourceType.file translations for
all four locales (en-US, zh-CN, ko-KR, ru-RU).
Comment on lines +166 to +184
func (c *Connector) FetchIncremental(ctx context.Context, config *types.DataSourceConfig, cursor *types.SyncCursor) ([]types.FetchedItem, *types.SyncCursor, error) {
// For now, FetchIncremental delegates to FetchAll and lets the service layer
// handle dedup via external_id matching (delete + re-create pattern).
// This is the same pattern used by the Feishu connector where the cursor
// tracks edit times in-memory. A more efficient DB-based comparison
// can be added later without changing the Connector interface.

items, err := c.FetchAll(ctx, config, config.ResourceIDs)
if err != nil {
return nil, nil, err
}

nextCursor := &types.SyncCursor{
LastSyncTime: time.Now(),
}

return items, nextCursor, nil
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FetchIncremental should implement proper incremental fetching logic.

config := &types.DataSourceConfig{
Type: connectorType,
Credentials: credentials,
Settings: settings,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base_url in settings must be checked by ssrf utils.

Comment on lines +116 to +123
// SourcePath records the document's full path in the source system.
// Unified Unix-style path starting with "/".
// Examples: "/product-docs/spec/D27.pdf" (Nutstore), "" (manual upload)
SourcePath string `json:"source_path" gorm:"type:varchar(1000);index"`
// SourceUpdatedAt records the document's last modification time in the source system.
// Filled by Connector from external system's mtime/editTime.
// Semantically different from UpdatedAt (WeKnora internal update time).
SourceUpdatedAt *time.Time `json:"source_updated_at" gorm:"index"`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we indeed need these two fields, we should add a migration script in the migrations directory.

@the-waste-land the-waste-land marked this pull request as draft April 21, 2026 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants