Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,11 +125,13 @@ may have unexpected consequences if applied to other TIMDEX UI apps.
- `MATOMO_CONTAINER_URL`: This is one of two options for integrating a TIMDEX UI application with Matomo - the Tag Manager. This is the only parameter needed for using a tag manager container.
- `MATOMO_SITE_ID`: Integrating with Matomo using the legacy approach (instead of Tag Manager) requires two values: the site id and a URL. This is one of those legacy values.
- `MATOMO_URL`: Integrating with Matomo using the legacy approach (instead of Tag Manager) requires two values: the site id and a URL. This is one of those legacy values.
- `MERGED_SEARCH_THREAD_TIMEOUT`: The number of seconds before an individual fetcher thread times out during parallel searches on the "all" tab (default 20). When a thread exceeds this timeout, the service continues with results from other sources. Must be less than `RACK_TIMEOUT` to allow graceful failure.
- `OPENALEX_EMAIL`: required to enable OpenAlex OpenAccess lookups. In dev use your personal email. In production we'll use a Moira.
- `ORIGINS`: sets origins for CORS (currently used only for TACOS API calls).
- `PLATFORM_NAME`: The value set is added to the header after the MIT Libraries logo. The logic and CSS for this comes from our theme gem.
- `PRIMO_NDE_VID`: The Primo view ID for NDE Only required if `FEATURE_PRIMO_NDE_LINKS` is enabled. Ask Enterprise Systems for value.
- `PRIMO_TIMEOUT`: The number of seconds before a Primo request times out (default 6).
- `RACK_TIMEOUT`: The number of seconds before a request times out at the Rack middleware level. On Heroku, the hard limit is 30 seconds, so this should be set to less than 30 (recommended 25 seconds). When both Primo and TIMDEX fetch threads are active, ensure this is greater than `MERGED_SEARCH_THREAD_TIMEOUT` to allow graceful timeout handling. This is typically configured via Heroku environment variables.
- `REQUESTS_PER_PERIOD` - number of requests that can be made for general throttles per `REQUEST_PERIOD`
- `REQUEST_PERIOD` - time in minutes used along with `REQUESTS_PER_PERIOD`
- `REDIRECT_REQUESTS_PER_PERIOD`- number of requests that can be made that the query string starts with our legacy redirect parameter to throttle per `REQUEST_PERIOD`
Expand Down
12 changes: 10 additions & 2 deletions app/controllers/search_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,10 @@ def results

# Render the response in HTML or JSON format
respond_to do |format|
format.json { render json: { results: @results, pagination: @pagination, errors: @errors } }
format.json do
render json: { results: @results, pagination: @pagination, errors: @errors,
incomplete_results: @incomplete_results }
end
format.html { render :results }
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 2 issues:

1. Assignment Branch Condition size for results is too high. [<3, 20, 5> 20.83/17] [rubocop:Metrics/AbcSize]


2. Method has too many lines. [22/10] [rubocop:Metrics/MethodLength]

end
Expand Down Expand Up @@ -70,6 +73,7 @@ def load_geodata_results

# Handle errors
@errors = extract_errors(response)
@incomplete_results = nil
return unless @errors.nil?

hits = response.dig(:data, 'search', 'hits') || 0
Expand All @@ -85,13 +89,15 @@ def load_primo_results
@pagination = data[:pagination]
@errors = data[:errors]
@show_primo_continuation = data[:show_continuation]
@incomplete_results = nil
end

def load_timdex_results
data = fetch_timdex_data
@results = data[:results]
@pagination = data[:pagination]
@errors = data[:errors]
@incomplete_results = nil
end

def load_all_results
Expand All @@ -114,6 +120,7 @@ def load_all_results
@errors = data[:errors]
@pagination = data[:pagination]
@show_primo_continuation = data[:show_primo_continuation]
@incomplete_results = data[:incomplete_results]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 2 issues:

1. Assignment Branch Condition size for load_all_results is too high. [<15, 14, 1> 20.54/17] [rubocop:Metrics/AbcSize]


2. Method has too many lines. [14/10] [rubocop:Metrics/MethodLength]

end

def fetch_primo_data(offset: nil, per_page: nil)
Expand Down Expand Up @@ -391,7 +398,8 @@ def handle_primo_errors(error)
'message' => 'Hmm, we seem to be having difficulties...',
'description' => 'In the meantime, try searching these tools directly.',
'links' => [
{ 'label' => "MIT's WorldCat", 'description' => 'Books and media', 'url' => 'https://libraries.mit.edu/worldcat' },
{ 'label' => "MIT's WorldCat", 'description' => 'Books and media',
'url' => 'https://libraries.mit.edu/worldcat' },
{ 'label' => 'Google Scholar', 'description' => 'Articles', 'url' => 'https://scholar.google.com/' },
{ 'label' => 'ArchivesSpace', 'description' => 'MIT archives', 'url' => 'https://archivesspace.mit.edu/' },
{ 'label' => 'DSpace@MIT', 'description' => 'MIT research', 'url' => 'https://dspace.mit.edu/' }
Expand Down
65 changes: 60 additions & 5 deletions app/models/merged_search_service.rb
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
require 'digest'
require 'timeout'

# Orchestrates merged "all" tab searches across Primo and TIMDEX.
#
Expand Down Expand Up @@ -115,18 +116,52 @@ def write_cached_totals(totals)
# fetchers. Each fetcher should return the usual response hash including
# `:results` and `:hits`.
#
# Individual fetchers are wrapped with a timeout to prevent runaway API
# calls from causing request timeouts. If a fetcher times out, returns
# an empty response with an error message.
#
# WARNING: exceptions raised inside these threads will not automatically
# propagate to the caller; callers/tests should account for this.
#
# @param offset [Integer] api offset to request
# @param per_page [Integer] number of items to request
# @return [Array<Hash,Hash>] [primo_response, timdex_response]
def parallel_fetch(offset:, per_page:)
# Individual thread timeout in seconds. Set to allow most requests to complete
# while protecting against runaway API calls. Must be less than the total
# Rack::Timeout value to allow graceful failure.
thread_timeout = ENV.fetch('MERGED_SEARCH_THREAD_TIMEOUT', '20').to_i

primo = nil
timdex = nil

Comment on lines 129 to +131
threads = []
threads << Thread.new { primo = @primo_fetcher.call(offset: offset, per_page: per_page, query: @enhanced_query) }
threads << Thread.new { timdex = @timdex_fetcher.call(offset: offset, per_page: per_page, query: @enhanced_query) }
threads << Thread.new do
Timeout.timeout(thread_timeout) do
primo = @primo_fetcher.call(offset: offset, per_page: per_page, query: @enhanced_query)
end
rescue Timeout::Error
# Timeout: return empty results with timed_out flag (not in errors field,
# so partial results can still display with incomplete_results warning)
primo = { results: [], hits: 0, errors: nil, timed_out: true }
rescue StandardError => e
# Other exceptions: return empty results with error message
primo = { results: [], hits: 0, errors: e.message }
end

threads << Thread.new do
Timeout.timeout(thread_timeout) do
timdex = @timdex_fetcher.call(offset: offset, per_page: per_page, query: @enhanced_query)
end
rescue Timeout::Error
# Timeout: return empty results with timed_out flag (not in errors field,
# so partial results can still display with incomplete_results warning)
timdex = { results: [], hits: 0, errors: nil, timed_out: true }
rescue StandardError => e
# Other exceptions: return empty results with error message
timdex = { results: [], hits: 0, errors: e.message }
end

threads.each(&:join)
[primo, timdex]
Comment thread
qltysh[bot] marked this conversation as resolved.
Comment thread
qltysh[bot] marked this conversation as resolved.
end
Comment on lines 114 to 137
Expand Down Expand Up @@ -175,7 +210,7 @@ def fetch_all_tab_page_chunks(paginator)
# @param current_page [Integer]
# @param per_page [Integer]
# @param deeper [Boolean] whether this was a deeper-page flow
# @return [Hash] response with :results, :errors, :pagination, :show_primo_continuation
# @return [Hash] response with :results, :errors, :pagination, :show_primo_continuation, :incomplete_results
def assemble_all_tab_result(paginator, primo_data, timdex_data, current_page, per_page, deeper: false)
primo_total = primo_data[:hits] || 0
timdex_total = timdex_data[:hits] || 0
Expand All @@ -184,6 +219,9 @@ def assemble_all_tab_result(paginator, primo_data, timdex_data, current_page, pe
errors = combine_errors(primo_data[:errors], timdex_data[:errors])
pagination = Analyzer.new(@enhanced_query, timdex_total, :all, primo_total, per_page).pagination

# Detect if results are incomplete due to timeouts
incomplete_results = detect_incomplete_results(primo_data, timdex_data)

show_primo_continuation = if deeper
# Use the Primo-specific API offset (calculated from the paginator)
# when deciding whether to show a Primo continuation.
Expand All @@ -201,7 +239,24 @@ def assemble_all_tab_result(paginator, primo_data, timdex_data, current_page, pe
primo_data[:show_continuation]
end

{ results: merged, errors: errors, pagination: pagination, show_primo_continuation: show_primo_continuation }
{ results: merged, errors: errors, pagination: pagination, show_primo_continuation: show_primo_continuation,
incomplete_results: incomplete_results }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 4 issues:

1. Assignment Branch Condition size for assemble_all_tab_result is too high. [<8, 17, 12> 22.29/17] [rubocop:Metrics/AbcSize]


2. Cyclomatic complexity for assemble_all_tab_result is too high. [10/7] [rubocop:Metrics/CyclomaticComplexity]


3. Method has too many lines. [16/10] [rubocop:Metrics/MethodLength]


4. Perceived complexity for assemble_all_tab_result is too high. [11/8] [rubocop:Metrics/PerceivedComplexity]

end

# Detect if search results are incomplete due to source timeouts.
#
# When a fetcher times out, it returns timed_out: true in the response.
# This method identifies which sources timed out and returns an indicator for the UI.
#
# @param primo_data [Hash] response from Primo fetcher
# @param timdex_data [Hash] response from TIMDEX fetcher
# @return [Hash, nil] { sources: Array<String> } e.g., { sources: ['Primo'] } or nil if complete
def detect_incomplete_results(primo_data, timdex_data)
timed_out_sources = []
timed_out_sources << 'Primo' if primo_data[:timed_out]
timed_out_sources << 'TIMDEX' if timdex_data[:timed_out]

timed_out_sources.any? ? { sources: timed_out_sources } : nil
end

# Merge multiple error arrays into a single array or nil when empty.
Expand All @@ -221,7 +276,7 @@ def build_paginator_from_totals(totals, current_page, per_page)
current_page: current_page, per_page: per_page)
end

# Note: default fetcher implementations were removed to enforce explicit
# NOTE: default fetcher implementations were removed to enforce explicit
# dependency injection. Callers must provide `primo_fetcher` and
# `timdex_fetcher` when constructing `MergedSearchService`.

Expand Down
16 changes: 16 additions & 0 deletions app/views/search/_incomplete_results_warning.html.erb
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
<% return unless incomplete_results.present? %>
<% sources = incomplete_results[:sources] %>

<aside role="alert" class="incomplete-results-warning">
<h3>Partial Results</h3>

<p>
Search results are incomplete because
<% if sources.length == 1 %>
<strong><%= sources.first %></strong> is temporarily unavailable.
<% else %>
<strong><%= sources.join(' and ') %></strong> are temporarily unavailable.
<% end %>
Showing available results from other sources.
</p>
</aside>
2 changes: 2 additions & 0 deletions app/views/search/results.html.erb
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
<div id="content-wrapper" class="space-wrap">
<%= render(partial: 'shared/error', collection: @errors) %>

<%= render(partial: 'search/incomplete_results_warning', locals: { incomplete_results: @incomplete_results }) %>

<%= render(partial: 'trigger_tacos') if tacos_enabled? %>

<%= turbo_frame_tag "search-results", data: { turbo_action: "advance" } do %>
Expand Down
2 changes: 2 additions & 0 deletions app/views/search/results_geo.html.erb
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@

<%= render(partial: 'shared/error', collection: @errors) %>

<%= render(partial: 'search/incomplete_results_warning', locals: { incomplete_results: @incomplete_results }) %>

<div class="<%= @filters.present? ? 'layout-1q3q' : 'layout-3q1q' %> layout-band top-space">
<% if @filters.present? %>
<aside id="filters" class="col1q">
Expand Down
116 changes: 63 additions & 53 deletions test/models/merged_search_service_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ class MergedSearchServiceTest < ActiveSupport::TestCase
query = { q: 'test' }

service = MergedSearchService.new(enhanced_query: query, active_tab: 'all',
primo_fetcher: fake_fetcher, timdex_fetcher: fake_fetcher)
primo_fetcher: fake_fetcher, timdex_fetcher: fake_fetcher)

# populate cache so service uses it instead of summary calls
Rails.cache.write(service.send(:totals_cache_key), { primo: 50, timdex: 50 })
Expand Down Expand Up @@ -139,57 +139,67 @@ class MergedSearchServiceTest < ActiveSupport::TestCase
# intentionally removed; the service now requires injected fetchers so
# per-backend behavior should be tested in their respective unit tests.

test 'merge_results handles unbalanced API responses correctly' do
# Test case 1: Primo has fewer results than TIMDEX
paginator = MergedSearchPaginator.new(primo_total: 3, timdex_total: 5, current_page: 1, per_page: 8)
primo_results = %w[P1 P2 P3]
timdex_results = %w[T1 T2 T3 T4 T5]
svc = MergedSearchService.new(enhanced_query: { q: 'test' }, active_tab: 'all', primo_fetcher: fake_fetcher,
timdex_fetcher: fake_fetcher)
merged = svc.send(:merge_results, paginator, primo_results, timdex_results)
expected = %w[P1 T1 P2 T2 P3 T3 T4 T5]
assert_equal expected, merged

# Test case 2: TIMDEX has fewer results than Primo
paginator = MergedSearchPaginator.new(primo_total: 5, timdex_total: 3, current_page: 1, per_page: 8)
primo_results = %w[P1 P2 P3 P4 P5]
timdex_results = %w[T1 T2 T3]
svc = MergedSearchService.new(enhanced_query: { q: 'test' }, active_tab: 'all', primo_fetcher: fake_fetcher,
timdex_fetcher: fake_fetcher)
merged = svc.send(:merge_results, paginator, primo_results, timdex_results)
expected = %w[P1 T1 P2 T2 P3 T3 P4 P5]
assert_equal expected, merged

# Test case 3: Results exceed per_page limit (default 20)
paginator = MergedSearchPaginator.new(primo_total: 15, timdex_total: 15, current_page: 1, per_page: 20)
primo_results = (1..15).map { |i| "P#{i}" }
timdex_results = (1..15).map { |i| "T#{i}" }
svc = MergedSearchService.new(enhanced_query: { q: 'test' }, active_tab: 'all', primo_fetcher: fake_fetcher,
timdex_fetcher: fake_fetcher)
merged = svc.send(:merge_results, paginator, primo_results, timdex_results)
assert_equal 20, merged.length
assert_equal 'P1', merged[0]
assert_equal 'T1', merged[1]
assert_equal 'P2', merged[2]
assert_equal 'T2', merged[3]

# Test case 4: One array is empty
paginator = MergedSearchPaginator.new(primo_total: 0, timdex_total: 3, current_page: 1, per_page: 3)
primo_results = []
timdex_results = %w[T1 T2 T3]
svc = MergedSearchService.new(enhanced_query: { q: 'test' }, active_tab: 'all', primo_fetcher: fake_fetcher,
timdex_fetcher: fake_fetcher)
merged = svc.send(:merge_results, paginator, primo_results, timdex_results)
assert_equal %w[T1 T2 T3], merged

# Test case 5: more than 10 results from a single source can display when appropriate
paginator = MergedSearchPaginator.new(primo_total: 7, timdex_total: 11, current_page: 1, per_page: 18)
primo_results = (1..7).map { |i| "P#{i}" }
timdex_results = (1..11).map { |i| "T#{i}" }
svc = MergedSearchService.new(enhanced_query: { q: 'test' }, active_tab: 'all', primo_fetcher: fake_fetcher,
timdex_fetcher: fake_fetcher)
merged = svc.send(:merge_results, paginator, primo_results, timdex_results)
expected = %w[P1 T1 P2 T2 P3 T3 P4 T4 P5 T5 P6 T6 P7 T7 T8 T9 T10 T11]
assert_equal expected, merged
test 'detect_incomplete_results identifies which sources timed out' do
svc = MergedSearchService.new(enhanced_query: { q: 'foo' }, active_tab: 'all', primo_fetcher: fake_fetcher,
timdex_fetcher: fake_fetcher)

# Test 1: No timeouts
primo_data = { results: ['p1'], hits: 10, errors: nil }
timdex_data = { results: ['t1'], hits: 20, errors: nil }
result = svc.send(:detect_incomplete_results, primo_data, timdex_data)
assert_nil result

# Test 2: Primo times out
primo_data = { results: [], hits: 0, errors: nil, timed_out: true }
timdex_data = { results: ['t1'], hits: 20, errors: nil }
result = svc.send(:detect_incomplete_results, primo_data, timdex_data)
assert_equal({ sources: ['Primo'] }, result)

# Test 3: TIMDEX times out
primo_data = { results: ['p1'], hits: 10, errors: nil }
timdex_data = { results: [], hits: 0, errors: nil, timed_out: true }
result = svc.send(:detect_incomplete_results, primo_data, timdex_data)
assert_equal({ sources: ['TIMDEX'] }, result)

# Test 4: Both time out
primo_data = { results: [], hits: 0, errors: nil, timed_out: true }
timdex_data = { results: [], hits: 0, errors: nil, timed_out: true }
result = svc.send(:detect_incomplete_results, primo_data, timdex_data)
assert_equal({ sources: %w[Primo TIMDEX] }, result)

# Test 5: timed_out flag takes precedence (timeout doesn't go in errors field)
primo_data = { results: [], hits: 0, errors: nil, timed_out: true }
timdex_data = { results: ['t1'], hits: 20, errors: 'Some other error' }
result = svc.send(:detect_incomplete_results, primo_data, timdex_data)
assert_equal({ sources: ['Primo'] }, result)
end

test 'assemble_all_tab_result includes incomplete_results flag for timeouts' do
query = { q: 'test' }

# Fetcher that returns timed_out flag
primo_fetcher = lambda do |offset:, per_page:, query:|
{ results: [], hits: 0, errors: nil, timed_out: true, show_continuation: false }
end

timdex_fetcher = lambda do |offset:, per_page:, query:|
{ results: ['bar'], hits: 37, errors: nil }
end

svc = MergedSearchService.new(enhanced_query: query, active_tab: 'all',
primo_fetcher: primo_fetcher, timdex_fetcher: timdex_fetcher)

res = svc.fetch(page: 1, per_page: 20)

# Results should include TIMDEX data even though Primo timed out
assert res[:results].present?
assert_equal(['bar'], res[:results])

# Incomplete results flag should be set
assert res[:incomplete_results].present?
assert_equal(['Primo'], res[:incomplete_results][:sources])

# Overall errors should be nil (timeout doesn't block partial results)
assert_nil res[:errors]
end
end
Loading