Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion app/forms/concerns/hyrax/redirects_field_behavior.rb
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ def deserialize!(params)
# Builds plain hashes (the persisted shape) from the submitted
# `redirects_attributes` payload. Drops rows marked for destruction
# or with a blank path. Normalizes paths to the canonical form
# stored in the uniqueness ledger.
# stored in the redirects table.
def redirects_attributes_populator(fragment:, **_options)
return unless respond_to?(:redirects)
return unless Hyrax.config.redirects_active?
Expand Down
2 changes: 1 addition & 1 deletion app/models/hyrax/redirect_path.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# frozen_string_literal: true

module Hyrax
# ActiveRecord-backed uniqueness ledger for redirect paths.
# ActiveRecord-backed redirects table for global path uniqueness.
#
# Maintained by Hyrax::Transactions::Steps::SyncRedirectPaths and
# Hyrax::Transactions::Steps::RemoveRedirectPaths; queried by
Expand Down
20 changes: 10 additions & 10 deletions documentation/redirects.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ When the config is off, the loader filters `redirects.yaml` out of the schema se

## Path normalization

Every redirect path in Hyrax — whether typed into a form, looked up by the resolver, written to the uniqueness ledger, or queried by the validator — passes through `Hyrax::RedirectPathNormalizer.call`. This is the single source of truth for "what does this path look like on disk?". Normalization rules:
Every redirect path in Hyrax — whether typed into a form, looked up by the resolver, written to the redirects table, or queried by the validator — passes through `Hyrax::RedirectPathNormalizer.call`. This is the single source of truth for "what does this path look like on disk?". Normalization rules:

1. If the input parses as a URL with a scheme and host (e.g. `https://old.example.edu/handle/12345/678`), keep only the path component.
2. Strip query strings (`?utm_source=foo`) and fragments (`#section`).
Expand All @@ -136,7 +136,7 @@ The normalizer is idempotent — `normalize(normalize(x)) == normalize(x)` — a
- `Hyrax::RedirectsFieldBehavior#redirects_attributes_populator` normalizes each entry's path on form submission, before validation. The normalized form is what the validator sees and what the resource persists.
- `Hyrax::RedirectsController#show` (the resolver) normalizes the incoming request path before the Solr lookup, so `/foo/` and `/foo` both resolve.
- `Hyrax::RedirectsLookup` normalizes its input on construction, so callers can pass any reasonable form.
- `Hyrax::Transactions::Steps::SyncRedirectPaths` normalizes paths before writing to the ledger, as defense in depth.
- `Hyrax::Transactions::Steps::SyncRedirectPaths` normalizes paths before writing to the redirects table, as defense in depth.

A user pasting a full URL from an old DSpace page (`https://old.example.edu/handle/123?utm=email`) sees the form quietly accept and persist `/handle/123`.

Expand Down Expand Up @@ -172,9 +172,9 @@ end

The validator rejects any redirect path that equals one of these prefixes, or starts with one followed by `/` (so `/admin` is reserved, and `/admin/anything` is reserved, but `/administrator` would *not* be reserved by the `/admin` entry).

### Uniqueness lookup and the `hyrax_redirect_paths` ledger
### Uniqueness lookup and the `hyrax_redirect_paths` table

Global uniqueness is enforced by a Postgres table, `hyrax_redirect_paths`, which has a unique B-tree index on `path`. The table is a derived ledger of every redirect path currently in use, and the unique index gives the hard guarantee that no two records can share a path even under concurrent saves. A second non-unique index on `resource_id` supports the per-resource sync described below.
Global uniqueness is enforced by a Postgres table, `hyrax_redirect_paths`, which has a unique B-tree index on `path`. The table is a derived record of every redirect path currently in use, and the unique index gives the hard guarantee that no two records can share a path even under concurrent saves. A second non-unique index on `resource_id` supports the per-resource sync described below.

`Hyrax::RedirectsLookup` is the single point of truth for "is this path taken?". It queries the table:

Expand All @@ -184,12 +184,12 @@ SELECT 1 FROM hyrax_redirect_paths WHERE path = ? AND resource_id <> ? LIMIT 1;

The validator calls `Hyrax::RedirectsLookup.taken?(path, except_id: record.id)` to give the user friendly feedback at form-submit time. If two simultaneous requests both pass validation (because both checked the table before either committed), the unique index rejects the second one at insert time and the enclosing transaction returns `Failure`.

### Sync between `redirects` attribute and the ledger
### Sync between `redirects` attribute and the redirects table

The ledger is kept in sync by two `dry-transaction` steps composed into the create/update/destroy transactions:
The redirects table is kept in sync by two `dry-transaction` steps composed into the create/update/destroy transactions:

- `Hyrax::Transactions::Steps::SyncRedirectPaths` — runs after the resource is saved (in `WorkCreate`, `WorkUpdate`, `CollectionCreate`, `CollectionUpdate`). Deletes the resource's existing rows and reinserts the current redirect set in a single DB transaction. On `ActiveRecord::RecordNotUnique` (race lost), returns `Failure([:redirect_path_collision, ...])`, which short-circuits the enclosing transaction and surfaces back to the controller. No-op when either the config or the Flipflop is off — there's no point writing ledger rows when the feature isn't actively in use.
- `Hyrax::Transactions::Steps::RemoveRedirectPaths` — runs before `delete_resource` in `WorkDestroy` and `CollectionDestroy`. Clears the resource's rows so deleted resources don't leave dangling claims on redirect paths. Gated only on the config (not the Flipflop): cleanup must happen regardless of whether the feature is currently in active use, so that an admin toggling the Flipflop off mid-deployment doesn't leave orphaned rows that could later collide with new resources after a re-enable.
- `Hyrax::Transactions::Steps::SyncRedirectPaths` — runs after the resource is saved (in `WorkCreate`, `WorkUpdate`, `CollectionCreate`, `CollectionUpdate`). Deletes the resource's existing rows and reinserts the current redirect set in a single DB transaction. On `ActiveRecord::RecordNotUnique` (race lost), returns `Failure([:redirect_path_collision, ...])`, which short-circuits the enclosing transaction and surfaces back to the controller. On any other `ActiveRecord::StatementInvalid` (missing table, schema drift, connection drop), logs the error and returns `Failure([:redirect_path_sync_error, ...])` rather than raising — the resource is already persisted by the time this step runs, so a soft-fail is preferable to a 500 with partial state. No-op when either the config or the Flipflop is off.
- `Hyrax::Transactions::Steps::RemoveRedirectPaths` — runs before `delete_resource` in `WorkDestroy` and `CollectionDestroy`. Clears the resource's rows so deleted resources don't leave dangling claims on redirect paths. Same `StatementInvalid` treatment as the sync step: returns `Failure([:redirect_path_remove_error, ...])` instead of raising. Gated only on the config (not the Flipflop): cleanup must happen regardless of whether the feature is currently in active use, so that an admin toggling the Flipflop off mid-deployment doesn't leave orphaned rows that could later collide with new resources after a re-enable.

Neither step does anything when the config is off, so adopters who don't enable the redirects feature pay no cost for them.

Expand Down Expand Up @@ -287,5 +287,5 @@ Alternatively, gate the inclusion of the calling code itself on `Hyrax.config.re
- `Hyrax::RedirectValidator` (`app/validators/hyrax/redirect_validator.rb`) — the form-level entry validator.
- `Hyrax::RedirectPathNormalizer` (`app/services/hyrax/redirect_path_normalizer.rb`) — canonical-form normalization for redirect paths.
- `Hyrax::RedirectsLookup` (`app/services/hyrax/redirects_lookup.rb`) — the uniqueness lookup against `hyrax_redirect_paths`.
- `Hyrax::RedirectPath` (`app/models/hyrax/redirect_path.rb`) — ActiveRecord model for the `hyrax_redirect_paths` ledger.
- `Hyrax::Transactions::Steps::SyncRedirectPaths` and `Hyrax::Transactions::Steps::RemoveRedirectPaths` (`lib/hyrax/transactions/steps/`) — the transaction steps that keep the ledger in sync with each resource's `redirects` attribute.
- `Hyrax::RedirectPath` (`app/models/hyrax/redirect_path.rb`) — ActiveRecord model for the `hyrax_redirect_paths` table.
- `Hyrax::Transactions::Steps::SyncRedirectPaths` and `Hyrax::Transactions::Steps::RemoveRedirectPaths` (`lib/hyrax/transactions/steps/`) — the transaction steps that keep the redirects table in sync with each resource's `redirects` attribute.
8 changes: 6 additions & 2 deletions lib/hyrax/transactions/steps/remove_redirect_paths.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ module Hyrax
module Transactions
module Steps
# A `dry-transaction` step that removes a resource's rows from the
# `hyrax_redirect_paths` uniqueness ledger. Runs as part of the destroy
# `hyrax_redirect_paths` redirects table. Runs as part of the destroy
# transactions so deleted resources don't leave dangling claims on
# redirect paths.
#
Expand All @@ -16,8 +16,12 @@ class RemoveRedirectPaths
# @param [Valkyrie::Resource] resource
# @return [Dry::Monads::Result]
def call(resource)
Hyrax::RedirectPath.where(resource_id: resource.id.to_s).delete_all if removable?(resource)
return Success(resource) unless removable?(resource)
Hyrax::RedirectPath.where(resource_id: resource.id.to_s).delete_all
Success(resource)
rescue ActiveRecord::StatementInvalid => e
Hyrax.logger.error("[redirects] remove_redirect_paths failed: #{e.message}")
Failure([:redirect_path_remove_error, e.message])
end

private
Expand Down
5 changes: 4 additions & 1 deletion lib/hyrax/transactions/steps/sync_redirect_paths.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ module Hyrax
module Transactions
module Steps
# A `dry-transaction` step that mirrors a saved resource's `redirects`
# entries into the `hyrax_redirect_paths` uniqueness ledger. The unique
# entries into the `hyrax_redirect_paths` redirects table. The unique
# index on `path` enforces global uniqueness at the DB level — if a
# concurrent save already claimed a path, the insert raises
# ActiveRecord::RecordNotUnique and this step returns Failure, which
Expand All @@ -26,6 +26,9 @@ def call(object)
Success(object)
rescue ActiveRecord::RecordNotUnique => e
Failure([:redirect_path_collision, e.message])
rescue ActiveRecord::StatementInvalid => e
Hyrax.logger.error("[redirects] sync_redirect_paths failed: #{e.message}")
Failure([:redirect_path_sync_error, e.message])
end

private
Expand Down
15 changes: 15 additions & 0 deletions spec/lib/hyrax/transactions/steps/remove_redirect_paths_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -35,5 +35,20 @@
expect(step.call(resource)).to be_success
end
end

context 'when the underlying redirects table query raises StatementInvalid' do
before do
allow(Hyrax::RedirectPath).to receive(:where)
.and_raise(ActiveRecord::StatementInvalid, 'PG::UndefinedTable: relation "hyrax_redirect_paths" does not exist')
allow(Hyrax.logger).to receive(:error)
end

it 'returns Failure with a redirect_path_remove_error tag and logs the error' do
result = step.call(resource)
expect(result).to be_failure
expect(result.failure.first).to eq(:redirect_path_remove_error)
expect(Hyrax.logger).to have_received(:error).with(/remove_redirect_paths failed/)
end
end
end
end
15 changes: 15 additions & 0 deletions spec/lib/hyrax/transactions/steps/sync_redirect_paths_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,21 @@
end
end

context 'when the underlying redirects table query raises StatementInvalid' do
before do
allow(Hyrax::RedirectPath).to receive(:where)
.and_raise(ActiveRecord::StatementInvalid, 'PG::UndefinedTable: relation "hyrax_redirect_paths" does not exist')
allow(Hyrax.logger).to receive(:error)
end

it 'returns Failure with a redirect_path_sync_error tag and logs the error' do
result = step.call(resource)
expect(result).to be_failure
expect(result.failure.first).to eq(:redirect_path_sync_error)
expect(Hyrax.logger).to have_received(:error).with(/sync_redirect_paths failed/)
end
end

context 'when the redirects feature is inactive' do
before { allow(Hyrax.config).to receive(:redirects_active?).and_return(false) }

Expand Down
Loading