Skip to content

RST Auto Synchronization#318

Open
iamjoemccormick wants to merge 10 commits into
mainfrom
iamjoe/feat/rst-auto-sync
Open

RST Auto Synchronization#318
iamjoemccormick wants to merge 10 commits into
mainfrom
iamjoe/feat/rst-auto-sync

Conversation

@iamjoemccormick
Copy link
Copy Markdown
Member

@iamjoemccormick iamjoemccormick commented Apr 17, 2026

What does this PR do / why do we need it?

Required for all PRs.

At a high level this adds support for integrating Watch into Remote so file system modification events can trigger Remote jobs. To achieve this a number of improvements were made to Watch itself, including the addition of common/reusable subscriber and dispatch packages.

Currently only the ability to automatically restore offloaded files is implemented.

Automatically syncing files when they are closed will be a fairly trivial addition once #312 is merged, as that (amongst other things) adds the ability to set a delay_execution when submitting job requests. The plan is to only auto sync files where the cooldown >0 and set the cooldown as the delay_execution. This allows us to avoid having to add a separate journal/mechanism to keep track of files as we wait for their cooldown to expire - if the file is reopened we can just cancel the job. As those changes are fairly self contained/independent of everything else in this PR, the rest is ready for review.

Related Issue(s)

Required when applicable.

Closes https://github.com/ThinkParQ/bee-remote/issues/18

Where should the reviewer(s) start reviewing this?

Only required for larger PRs when this may not be immediately obvious.

The changes are split into standalone commits which are intended to be reviewed oldest->newest as they build on each other.

Are there any specific topics we should discuss before merging?

Not required.

What are the next steps after this PR?

Not required.

Checklist before merging:

Required for all PRs.

When creating a PR these are items to keep in mind that cannot be checked by GitHub actions:

  • Documentation:
    • Does developer documentation (code comments, readme, etc.) need to be added or updated?
    • Does the user documentation need to be expanded or updated for this change?
  • Testing:
    • Does this functionality require changing or adding new unit tests?
    • Does this functionality require changing or adding new integration tests?
  • Git Hygiene:

For more details refer to the Go coding standards and the pull request process.

@iamjoemccormick
Copy link
Copy Markdown
Member Author

@claude review once

Comment thread watch/pkg/dispatch/dispatch.go
Comment thread common/beegfs/entry.go
Comment thread watch/pkg/dispatch/dispatch.go
@iamjoemccormick iamjoemccormick force-pushed the iamjoe/feat/rst-auto-sync branch 6 times, most recently from 5e16b67 to 72360dc Compare April 28, 2026 20:31
@iamjoemccormick iamjoemccormick marked this pull request as ready for review April 28, 2026 20:32
@iamjoemccormick iamjoemccormick requested a review from a team as a code owner April 28, 2026 20:32
@iamjoemccormick iamjoemccormick self-assigned this Apr 28, 2026
@iamjoemccormick iamjoemccormick added common/rst Issues primarily affecting the RST package. rst/remote Issues primarily affecting the Remote service. watch Issues primarily affecting the Watch service. labels Apr 28, 2026
@iamjoemccormick iamjoemccormick force-pushed the iamjoe/feat/rst-auto-sync branch from 72360dc to 43d6ea9 Compare April 28, 2026 20:44
@iamjoemccormick iamjoemccormick requested a review from swartzn April 28, 2026 20:46
While integrating Watch with Remote, a few updates were needed:

- Detect the meta node ID and include it on the gRPC stream context.
- Default to waiting for subscribers to ack the last event received before streaming events.
Bugs in the RST push+stub flow could cause state corruption or local data loss when a job is
cancelled under certain race conditions:

1. Remote's UpdateWork() had no terminal state guard. A late-arriving COMPLETED work result (e.g.
   from Sync journal replay after restart or a gRPC context cancellation race) could trigger
   job.Complete() on an already-cancelled job, overwriting the CANCELLED state and violating the
   user's cancel intent. For multipart uploads this typically results in a FAILED job (the multipart
   was already aborted so finishUpload fails), but the state corruption makes job history confusing
   and difficult to reason about. For non-multipart uploads this could create a stub file after the
   job was cancelled, though data loss should not occur since the contents were already synced to
   the bucket. Fixed by checking job.InTerminalState() before processing. The work result is still
   persisted for inspection, but no completion logic runs.

2. Sync's gRPC server discarded work results when the work manager returned both a result and an
   error. This happens when Remote tries to cancel already-COMPLETED work — the manager returns the
   COMPLETED result alongside an error, but the server returned only the gRPC error. Remote never
   learned the work was COMPLETED and set the state to UNKNOWN. Fixed by returning the work result
   without a gRPC error when the manager provides one, so Remote sees the actual state.

Assisted-by: Claude:claude-opus-4-6

3. updateRstCfg wrapped the sentinel with %s and only the inner error with %w, so errors.Is(err,
   ErrJobAlreadyOffloaded) returned false when updateRstConfig failed. That promoted the error to
   ErrJobFailedPrecondition downstream and tripped the GenerateWorkRequests lock-clear defer,
   leaving a stubbed-but-unlocked file. Wrap both with %w so the sentinel stays in the unwrap chain
   and the defer correctly skips this case.

Assisted-by: Claude:claude-opus-4-7
Assisted-by: Claude:claude-sonnet-4-6
The subscriber service is split from the gRPC server so optionally the service can be reused with an
existing gRPC server.
Wraps the subscriber service adding the ability to dispatch default or event specific functions as
events are received. Can be wired to either an existing gRPC server, or used to setup a new one.

Optional rate limits can be defined for all users to limit what event types are dispatched within a
configurable time window. These limits can be overridden for specific or ranges of user IDs for all
or a subset of event types. By default no events for any user are dispatched.
Wire the Watch event dispatcher + subscriber service into Remote, and define a dispatch function.
Squash into: feat(rst): support specifying restore policy with push and pull

Note these changes are also needed for the later eventFilter commit so this could also be made a standalone commit.
@iamjoemccormick iamjoemccormick force-pushed the iamjoe/feat/rst-auto-sync branch from 43d6ea9 to 62e1222 Compare April 28, 2026 22:49
@iamjoemccormick iamjoemccormick force-pushed the iamjoe/feat/rst-auto-sync branch from 753f810 to 39dd103 Compare May 11, 2026 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common/rst Issues primarily affecting the RST package. rst/remote Issues primarily affecting the Remote service. watch Issues primarily affecting the Watch service.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant