Conversation
Why are these changes being introduced: * The current search functionality relies on multiple APIs to fetch results. If one of the APIs experiences a timeout, it can lead to a complete failure of the search feature, resulting in a poor user experience. * This change aims to implement graceful degradation, allowing the system to return partial results from the remaining APIs while notifying users of any incomplete results due to timeouts. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-556 How does this address that need: * Implements graceful degradation when one API times out, ensuring partial results are still returned. * Adds a warning message to the user interface when results are incomplete due to an API timeout.
Coverage Report for CI Build 25691255553Warning Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes. Coverage decreased (-0.1%) to 98.241%Details
Uncovered Changes
Coverage RegressionsNo coverage regressions found. Coverage Stats
💛 - Coveralls |
❌ 15 blocking issues (15 total)
|
| incomplete_results: @incomplete_results } | ||
| end | ||
| format.html { render :results } | ||
| end |
| @errors = data[:errors] | ||
| @pagination = data[:pagination] | ||
| @show_primo_continuation = data[:show_primo_continuation] | ||
| @incomplete_results = data[:incomplete_results] |
|
|
||
| { results: merged, errors: errors, pagination: pagination, show_primo_continuation: show_primo_continuation } | ||
| { results: merged, errors: errors, pagination: pagination, show_primo_continuation: show_primo_continuation, | ||
| incomplete_results: incomplete_results } |
There was a problem hiding this comment.
Found 4 issues:
1. Assignment Branch Condition size for assemble_all_tab_result is too high. [<8, 17, 12> 22.29/17] [rubocop:Metrics/AbcSize]
2. Cyclomatic complexity for assemble_all_tab_result is too high. [10/7] [rubocop:Metrics/CyclomaticComplexity]
3. Method has too many lines. [16/10] [rubocop:Metrics/MethodLength]
4. Perceived complexity for assemble_all_tab_result is too high. [11/8] [rubocop:Metrics/PerceivedComplexity]
There was a problem hiding this comment.
Pull request overview
Implements graceful degradation for merged (“all” tab) search when one backend times out, allowing partial results to be returned while notifying users that results may be incomplete.
Changes:
- Add per-fetcher thread timeouts in
MergedSearchServiceand surface an:incomplete_resultsindicator in the merged response. - Render a new UI warning banner when results are incomplete; include
incomplete_resultsin the JSON response. - Add/adjust unit tests and document the new timeout-related environment variable(s).
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
app/models/merged_search_service.rb |
Wraps parallel fetch threads with a timeout and adds incomplete-results detection to the assembled response. |
app/controllers/search_controller.rb |
Plumbs @incomplete_results through controller instance variables and includes it in JSON responses. |
app/views/search/results.html.erb |
Renders the incomplete-results warning partial on the standard results page. |
app/views/search/results_geo.html.erb |
Renders the incomplete-results warning partial on the geodata results page. |
app/views/search/_incomplete_results_warning.html.erb |
New partial that displays a “Partial Results” warning when a source times out. |
test/models/merged_search_service_test.rb |
Adds tests for timeout detection and the new :incomplete_results response field. |
README.md |
Documents MERGED_SEARCH_THREAD_TIMEOUT and adds guidance about Rack-level timeouts. |
Comments suppressed due to low confidence (1)
app/views/search/results.html.erb:15
- The incomplete-results warning is rendered outside the
search-resultsTurbo Frame, but pagination/tab links target that frame (data-turbo-frame='search-results'). On Turbo frame navigations, only the frame content updates, so the warning may not appear/disappear correctly as timeouts occur/recover. Consider rendering the warning inside thesearch-resultsframe (or in its own frame that also updates).
<div id="content-wrapper" class="space-wrap">
<%= render(partial: 'shared/error', collection: @errors) %>
<%= render(partial: 'search/incomplete_results_warning', locals: { incomplete_results: @incomplete_results }) %>
<%= render(partial: 'trigger_tacos') if tacos_enabled? %>
<%= turbo_frame_tag "search-results", data: { turbo_action: "advance" } do %>
<div id="results-section" data-matomo-seen="Search, Search Performed, Tab: {{getActiveTabName}}">
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| threads << Thread.new do | ||
| Timeout.timeout(thread_timeout) do | ||
| primo = @primo_fetcher.call(offset: offset, per_page: per_page, query: @enhanced_query) | ||
| end | ||
| rescue Timeout::Error | ||
| primo_error = 'Primo search timed out' | ||
| primo = { results: [], hits: 0, errors: primo_error } | ||
| rescue StandardError => e | ||
| primo_error = e.message | ||
| primo = { results: [], hits: 0, errors: primo_error } | ||
| end | ||
|
|
||
| threads << Thread.new do | ||
| Timeout.timeout(thread_timeout) do | ||
| timdex = @timdex_fetcher.call(offset: offset, per_page: per_page, query: @enhanced_query) | ||
| end | ||
| rescue Timeout::Error | ||
| timdex_error = 'TIMDEX search timed out' | ||
| timdex = { results: [], hits: 0, errors: timdex_error } | ||
| rescue StandardError => e | ||
| timdex_error = e.message | ||
| timdex = { results: [], hits: 0, errors: timdex_error } | ||
| end |
| # Individual thread timeout in seconds. Set to allow most requests to complete | ||
| # while protecting against runaway API calls. Must be less than the total | ||
| # Rack::Timeout value to allow graceful failure. | ||
| thread_timeout = ENV.fetch('MERGED_SEARCH_THREAD_TIMEOUT', '20').to_i |
| primo = nil | ||
| timdex = nil | ||
| primo_error = nil | ||
| timdex_error = nil | ||
|
|
| # Perform parallel fetches from Primo and TIMDEX using the configured | ||
| # fetchers. Each fetcher should return the usual response hash including | ||
| # `:results` and `:hits`. | ||
| # | ||
| # Individual fetchers are wrapped with a timeout to prevent runaway API | ||
| # calls from causing request timeouts. If a fetcher times out, returns | ||
| # an empty response with an error message. | ||
| # | ||
| # WARNING: exceptions raised inside these threads will not automatically | ||
| # propagate to the caller; callers/tests should account for this. | ||
| # | ||
| # @param offset [Integer] api offset to request | ||
| # @param per_page [Integer] number of items to request | ||
| # @return [Array<Hash,Hash>] [primo_response, timdex_response] | ||
| def parallel_fetch(offset:, per_page:) | ||
| # Individual thread timeout in seconds. Set to allow most requests to complete | ||
| # while protecting against runaway API calls. Must be less than the total | ||
| # Rack::Timeout value to allow graceful failure. | ||
| thread_timeout = ENV.fetch('MERGED_SEARCH_THREAD_TIMEOUT', '20').to_i | ||
|
|
||
| primo = nil | ||
| timdex = nil | ||
| primo_error = nil | ||
| timdex_error = nil | ||
|
|
||
| threads = [] | ||
| threads << Thread.new { primo = @primo_fetcher.call(offset: offset, per_page: per_page, query: @enhanced_query) } | ||
| threads << Thread.new { timdex = @timdex_fetcher.call(offset: offset, per_page: per_page, query: @enhanced_query) } | ||
| threads << Thread.new do | ||
| Timeout.timeout(thread_timeout) do | ||
| primo = @primo_fetcher.call(offset: offset, per_page: per_page, query: @enhanced_query) | ||
| end | ||
| rescue Timeout::Error | ||
| primo_error = 'Primo search timed out' | ||
| primo = { results: [], hits: 0, errors: primo_error } | ||
| rescue StandardError => e | ||
| primo_error = e.message | ||
| primo = { results: [], hits: 0, errors: primo_error } | ||
| end | ||
|
|
||
| threads << Thread.new do | ||
| Timeout.timeout(thread_timeout) do | ||
| timdex = @timdex_fetcher.call(offset: offset, per_page: per_page, query: @enhanced_query) | ||
| end | ||
| rescue Timeout::Error | ||
| timdex_error = 'TIMDEX search timed out' | ||
| timdex = { results: [], hits: 0, errors: timdex_error } | ||
| rescue StandardError => e | ||
| timdex_error = e.message | ||
| timdex = { results: [], hits: 0, errors: timdex_error } | ||
| end | ||
|
|
||
| threads.each(&:join) | ||
| [primo, timdex] | ||
| end |
| - `MERGED_SEARCH_THREAD_TIMEOUT`: The number of seconds before an individual fetcher thread times out during parallel searches on the "all" tab (default 20). When a thread exceeds this timeout, the service continues with results from other sources. Must be less than `RACK_TIMEOUT` to allow graceful failure. | ||
| - `OPENALEX_EMAIL`: required to enable OpenAlex OpenAccess lookups. In dev use your personal email. In production we'll use a Moira. | ||
| - `ORIGINS`: sets origins for CORS (currently used only for TACOS API calls). | ||
| - `PLATFORM_NAME`: The value set is added to the header after the MIT Libraries logo. The logic and CSS for this comes from our theme gem. | ||
| - `PRIMO_NDE_VID`: The Primo view ID for NDE Only required if `FEATURE_PRIMO_NDE_LINKS` is enabled. Ask Enterprise Systems for value. | ||
| - `PRIMO_TIMEOUT`: The number of seconds before a Primo request times out (default 6). | ||
| - `RACK_TIMEOUT`: The number of seconds before a request times out at the Rack middleware level. On Heroku, the hard limit is 30 seconds, so this should be set to less than 30 (recommended 25 seconds). When both Primo and TIMDEX fetch threads are active, ensure this is greater than `MERGED_SEARCH_THREAD_TIMEOUT` to allow graceful timeout handling. This is typically configured via Heroku environment variables. |
| rescue HTTP::TimeoutError | ||
| { results: [], pagination: {}, errors: nil, show_continuation: false, hits: 0, timed_out: true } | ||
| rescue StandardError => e | ||
| { results: [], pagination: {}, errors: handle_primo_errors(e), show_continuation: false, hits: 0 } |
There was a problem hiding this comment.
Found 4 issues:
1. Assignment Branch Condition size for fetch_primo_data is too high. [<15, 18, 14> 27.29/17] [rubocop:Metrics/AbcSize]
2. Cyclomatic complexity for fetch_primo_data is too high. [12/7] [rubocop:Metrics/CyclomaticComplexity]
3. Method has too many lines. [26/10] [rubocop:Metrics/MethodLength]
4. Perceived complexity for fetch_primo_data is too high. [13/8] [rubocop:Metrics/PerceivedComplexity]
| { results: [], pagination: {}, errors: errors, hits: 0 } | ||
| end | ||
| rescue Net::ReadTimeout | ||
| { results: [], pagination: {}, errors: nil, hits: 0, timed_out: true } |
Why are these changes being introduced:
Relevant ticket(s):
How does this address that need:
Developer
Accessibility
New ENV
Approval beyond code review
Additional context needed to review
E.g., if the PR includes updated dependencies and/or data
migration, or how to confirm the feature is working.
Code Reviewer
Code
added technical debt.
Documentation
(not just this pull request message).
Testing