Skip to content

Graceful degredation when one API times out#388

Open
JPrevost wants to merge 5 commits into
mainfrom
use-556
Open

Graceful degredation when one API times out#388
JPrevost wants to merge 5 commits into
mainfrom
use-556

Conversation

@JPrevost
Copy link
Copy Markdown
Member

@JPrevost JPrevost commented May 8, 2026

Why are these changes being introduced:

  • The current search functionality relies on multiple APIs to fetch results. If one of the APIs experiences a timeout, it can lead to a complete failure of the search feature, resulting in a poor user experience.
  • This change aims to implement graceful degradation, allowing the system to return partial results from the remaining APIs while notifying users of any incomplete results due to timeouts.

Relevant ticket(s):

How does this address that need:

  • Implements graceful degradation when one API times out, ensuring partial results are still returned.
  • Adds a warning message to the user interface when results are incomplete due to an API timeout.

Developer

Accessibility
  • ANDI or WAVE has been run in accordance to our guide.
  • This PR contains no changes to the view layer.
  • New issues flagged by ANDI or WAVE have been resolved.
  • New issues flagged by ANDI or WAVE have been ticketed (link in the Pull Request details above).
  • No new accessibility issues have been flagged.
New ENV
  • All new ENV is documented in README.
  • All new ENV has been added to Heroku Pipeline, Staging and Prod.
  • ENV has not changed.
Approval beyond code review
  • UXWS/stakeholder approval has been confirmed.
  • UXWS/stakeholder review will be completed retroactively.
  • UXWS/stakeholder review is not needed.
Additional context needed to review

E.g., if the PR includes updated dependencies and/or data
migration, or how to confirm the feature is working.

Code Reviewer

Code
  • I have confirmed that the code works as intended.
  • Any CodeClimate issues have been fixed or confirmed as
    added technical debt.
Documentation
  • The commit message is clear and follows our guidelines
    (not just this pull request message).
  • The documentation has been updated or is unnecessary.
  • New dependencies are appropriate or there were no changes.
Testing
  • There are appropriate tests covering any new functionality.
  • No additional test coverage is required.

Why are these changes being introduced:

* The current search functionality relies on multiple APIs to fetch
  results. If one of the APIs experiences a timeout, it can lead to a
  complete failure of the search feature, resulting in a poor user
  experience.
* This change aims to implement graceful degradation, allowing the
  system to return partial results from the remaining APIs while
  notifying users of any incomplete results due to timeouts.

Relevant ticket(s):

* https://mitlibraries.atlassian.net/browse/USE-556

How does this address that need:

* Implements graceful degradation when one API times out, ensuring
  partial results are still returned.
* Adds a warning message to the user interface when results are
  incomplete due to an API timeout.
@coveralls
Copy link
Copy Markdown

coveralls commented May 8, 2026

Coverage Report for CI Build 25691255553

Warning

Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes.
Quick fix: rebase this PR. Learn more →

Coverage decreased (-0.1%) to 98.241%

Details

  • Coverage decreased (-0.1%) from the base build.
  • Patch coverage: 2 uncovered changes across 1 file (17 of 19 lines covered, 89.47%).
  • No coverage regressions found.

Uncovered Changes

File Changed Covered %
app/controllers/search_controller.rb 8 6 75.0%

Coverage Regressions

No coverage regressions found.


Coverage Stats

Coverage Status
Relevant Lines: 1421
Covered Lines: 1396
Line Coverage: 98.24%
Coverage Strength: 68.38 hits per line

💛 - Coveralls

@qltysh
Copy link
Copy Markdown

qltysh Bot commented May 8, 2026

❌ 15 blocking issues (15 total)

Tool Category Rule Count
rubocop Lint Assignment Branch Condition size for results is too high. [<3, 20, 5> 20.83/17] 5
rubocop Lint Method has too many lines. [22/10] 5
rubocop Lint Cyclomatic complexity for fetch\_primo\_data is too high. [12/7] 2
rubocop Lint Perceived complexity for fetch\_primo\_data is too high. [13/8] 2
rubocop Lint Class has too many lines. [311/100] 1

incomplete_results: @incomplete_results }
end
format.html { render :results }
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 2 issues:

1. Assignment Branch Condition size for results is too high. [<3, 20, 5> 20.83/17] [rubocop:Metrics/AbcSize]


2. Method has too many lines. [22/10] [rubocop:Metrics/MethodLength]

@errors = data[:errors]
@pagination = data[:pagination]
@show_primo_continuation = data[:show_primo_continuation]
@incomplete_results = data[:incomplete_results]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 2 issues:

1. Assignment Branch Condition size for load_all_results is too high. [<15, 14, 1> 20.54/17] [rubocop:Metrics/AbcSize]


2. Method has too many lines. [14/10] [rubocop:Metrics/MethodLength]

Comment thread app/models/merged_search_service.rb

{ results: merged, errors: errors, pagination: pagination, show_primo_continuation: show_primo_continuation }
{ results: merged, errors: errors, pagination: pagination, show_primo_continuation: show_primo_continuation,
incomplete_results: incomplete_results }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 4 issues:

1. Assignment Branch Condition size for assemble_all_tab_result is too high. [<8, 17, 12> 22.29/17] [rubocop:Metrics/AbcSize]


2. Cyclomatic complexity for assemble_all_tab_result is too high. [10/7] [rubocop:Metrics/CyclomaticComplexity]


3. Method has too many lines. [16/10] [rubocop:Metrics/MethodLength]


4. Perceived complexity for assemble_all_tab_result is too high. [11/8] [rubocop:Metrics/PerceivedComplexity]

@mitlib mitlib temporarily deployed to timdex-ui-pi-use-556-vqaeaofy1 May 8, 2026 20:07 Inactive
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements graceful degradation for merged (“all” tab) search when one backend times out, allowing partial results to be returned while notifying users that results may be incomplete.

Changes:

  • Add per-fetcher thread timeouts in MergedSearchService and surface an :incomplete_results indicator in the merged response.
  • Render a new UI warning banner when results are incomplete; include incomplete_results in the JSON response.
  • Add/adjust unit tests and document the new timeout-related environment variable(s).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
app/models/merged_search_service.rb Wraps parallel fetch threads with a timeout and adds incomplete-results detection to the assembled response.
app/controllers/search_controller.rb Plumbs @incomplete_results through controller instance variables and includes it in JSON responses.
app/views/search/results.html.erb Renders the incomplete-results warning partial on the standard results page.
app/views/search/results_geo.html.erb Renders the incomplete-results warning partial on the geodata results page.
app/views/search/_incomplete_results_warning.html.erb New partial that displays a “Partial Results” warning when a source times out.
test/models/merged_search_service_test.rb Adds tests for timeout detection and the new :incomplete_results response field.
README.md Documents MERGED_SEARCH_THREAD_TIMEOUT and adds guidance about Rack-level timeouts.
Comments suppressed due to low confidence (1)

app/views/search/results.html.erb:15

  • The incomplete-results warning is rendered outside the search-results Turbo Frame, but pagination/tab links target that frame (data-turbo-frame='search-results'). On Turbo frame navigations, only the frame content updates, so the warning may not appear/disappear correctly as timeouts occur/recover. Consider rendering the warning inside the search-results frame (or in its own frame that also updates).
<div id="content-wrapper" class="space-wrap">
  <%= render(partial: 'shared/error', collection: @errors) %>

  <%= render(partial: 'search/incomplete_results_warning', locals: { incomplete_results: @incomplete_results }) %>

  <%= render(partial: 'trigger_tacos') if tacos_enabled? %>

  <%= turbo_frame_tag "search-results", data: { turbo_action: "advance" } do %>
    <div id="results-section" data-matomo-seen="Search, Search Performed, Tab: {{getActiveTabName}}">

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/models/merged_search_service.rb Outdated
Comment on lines +141 to +163
threads << Thread.new do
Timeout.timeout(thread_timeout) do
primo = @primo_fetcher.call(offset: offset, per_page: per_page, query: @enhanced_query)
end
rescue Timeout::Error
primo_error = 'Primo search timed out'
primo = { results: [], hits: 0, errors: primo_error }
rescue StandardError => e
primo_error = e.message
primo = { results: [], hits: 0, errors: primo_error }
end

threads << Thread.new do
Timeout.timeout(thread_timeout) do
timdex = @timdex_fetcher.call(offset: offset, per_page: per_page, query: @enhanced_query)
end
rescue Timeout::Error
timdex_error = 'TIMDEX search timed out'
timdex = { results: [], hits: 0, errors: timdex_error }
rescue StandardError => e
timdex_error = e.message
timdex = { results: [], hits: 0, errors: timdex_error }
end
Comment thread app/models/merged_search_service.rb Outdated
# Individual thread timeout in seconds. Set to allow most requests to complete
# while protecting against runaway API calls. Must be less than the total
# Rack::Timeout value to allow graceful failure.
thread_timeout = ENV.fetch('MERGED_SEARCH_THREAD_TIMEOUT', '20').to_i
Comment on lines 135 to +139
primo = nil
timdex = nil
primo_error = nil
timdex_error = nil

Comment on lines 115 to 167
# Perform parallel fetches from Primo and TIMDEX using the configured
# fetchers. Each fetcher should return the usual response hash including
# `:results` and `:hits`.
#
# Individual fetchers are wrapped with a timeout to prevent runaway API
# calls from causing request timeouts. If a fetcher times out, returns
# an empty response with an error message.
#
# WARNING: exceptions raised inside these threads will not automatically
# propagate to the caller; callers/tests should account for this.
#
# @param offset [Integer] api offset to request
# @param per_page [Integer] number of items to request
# @return [Array<Hash,Hash>] [primo_response, timdex_response]
def parallel_fetch(offset:, per_page:)
# Individual thread timeout in seconds. Set to allow most requests to complete
# while protecting against runaway API calls. Must be less than the total
# Rack::Timeout value to allow graceful failure.
thread_timeout = ENV.fetch('MERGED_SEARCH_THREAD_TIMEOUT', '20').to_i

primo = nil
timdex = nil
primo_error = nil
timdex_error = nil

threads = []
threads << Thread.new { primo = @primo_fetcher.call(offset: offset, per_page: per_page, query: @enhanced_query) }
threads << Thread.new { timdex = @timdex_fetcher.call(offset: offset, per_page: per_page, query: @enhanced_query) }
threads << Thread.new do
Timeout.timeout(thread_timeout) do
primo = @primo_fetcher.call(offset: offset, per_page: per_page, query: @enhanced_query)
end
rescue Timeout::Error
primo_error = 'Primo search timed out'
primo = { results: [], hits: 0, errors: primo_error }
rescue StandardError => e
primo_error = e.message
primo = { results: [], hits: 0, errors: primo_error }
end

threads << Thread.new do
Timeout.timeout(thread_timeout) do
timdex = @timdex_fetcher.call(offset: offset, per_page: per_page, query: @enhanced_query)
end
rescue Timeout::Error
timdex_error = 'TIMDEX search timed out'
timdex = { results: [], hits: 0, errors: timdex_error }
rescue StandardError => e
timdex_error = e.message
timdex = { results: [], hits: 0, errors: timdex_error }
end

threads.each(&:join)
[primo, timdex]
end
Comment thread README.md Outdated
Comment on lines +128 to +134
- `MERGED_SEARCH_THREAD_TIMEOUT`: The number of seconds before an individual fetcher thread times out during parallel searches on the "all" tab (default 20). When a thread exceeds this timeout, the service continues with results from other sources. Must be less than `RACK_TIMEOUT` to allow graceful failure.
- `OPENALEX_EMAIL`: required to enable OpenAlex OpenAccess lookups. In dev use your personal email. In production we'll use a Moira.
- `ORIGINS`: sets origins for CORS (currently used only for TACOS API calls).
- `PLATFORM_NAME`: The value set is added to the header after the MIT Libraries logo. The logic and CSS for this comes from our theme gem.
- `PRIMO_NDE_VID`: The Primo view ID for NDE Only required if `FEATURE_PRIMO_NDE_LINKS` is enabled. Ask Enterprise Systems for value.
- `PRIMO_TIMEOUT`: The number of seconds before a Primo request times out (default 6).
- `RACK_TIMEOUT`: The number of seconds before a request times out at the Rack middleware level. On Heroku, the hard limit is 30 seconds, so this should be set to less than 30 (recommended 25 seconds). When both Primo and TIMDEX fetch threads are active, ensure this is greater than `MERGED_SEARCH_THREAD_TIMEOUT` to allow graceful timeout handling. This is typically configured via Heroku environment variables.
@JPrevost JPrevost temporarily deployed to timdex-ui-pi-use-556-vqaeaofy1 May 8, 2026 20:44 Inactive
Comment thread app/models/merged_search_service.rb
rescue HTTP::TimeoutError
{ results: [], pagination: {}, errors: nil, show_continuation: false, hits: 0, timed_out: true }
rescue StandardError => e
{ results: [], pagination: {}, errors: handle_primo_errors(e), show_continuation: false, hits: 0 }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 4 issues:

1. Assignment Branch Condition size for fetch_primo_data is too high. [<15, 18, 14> 27.29/17] [rubocop:Metrics/AbcSize]


2. Cyclomatic complexity for fetch_primo_data is too high. [12/7] [rubocop:Metrics/CyclomaticComplexity]


3. Method has too many lines. [26/10] [rubocop:Metrics/MethodLength]


4. Perceived complexity for fetch_primo_data is too high. [13/8] [rubocop:Metrics/PerceivedComplexity]

{ results: [], pagination: {}, errors: errors, hits: 0 }
end
rescue Net::ReadTimeout
{ results: [], pagination: {}, errors: nil, hits: 0, timed_out: true }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 2 issues:

1. Assignment Branch Condition size for fetch_timdex_data is too high. [<10, 19, 7> 22.58/17] [rubocop:Metrics/AbcSize]


2. Method has too many lines. [17/10] [rubocop:Metrics/MethodLength]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants