Skip to content

Conversation

@eilmiv
Copy link
Collaborator

@eilmiv eilmiv commented Dec 18, 2025

Summary of changes

  • Added filtering that happens after reading from a remote for ingestion and is configurable by the user
  • Support for allow list and block list
  • Support for may different kinds of filters, including e.g. a filter that matches a substring in the description
  • All string filters are case-insensitive

Motivation and context

It is useful to not ingest everything from a source. Solves #1192. Important part of the exchange feature in the mTeSS-x project.

Screenshots

image

Checklist

  • I have read and followed the CONTRIBUTING guide.
  • I confirm that I have the authority necessary to make this contribution on behalf of its copyright owner and agree
    to license it to the TeSS codebase under the
    BSD license.

@eilmiv eilmiv marked this pull request as ready for review December 19, 2025 10:35
@fbacall fbacall requested a review from Copilot December 22, 2025 18:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a comprehensive filtering system for ingestion sources, allowing users to configure allow lists and block lists to selectively import materials and events. This addresses issue #1192 and supports the mTeSS-x project's exchange feature.

Key Changes

  • Introduced a new SourceFilter model with support for multiple filter types (keyword, title, description, URL, etc.) and modes (allow/block)
  • Added filtering logic to the ingestion pipeline that executes after reading from remote sources
  • Created a dynamic UI for managing filters with JavaScript-based form controls

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
db/migrate/20251209112056_create_source_filters.rb Creates the source_filters table with mode, type, and value fields
db/migrate/20251218100418_remove_keyword_filter_from_sources.rb Removes deprecated keyword_filter column from sources
app/models/source_filter.rb New model implementing filter matching logic for various field types
app/models/source.rb Adds association to source_filters and implements passes_filter? method
app/controllers/sources_controller.rb Updates controller to handle nested source_filters attributes
app/views/sources/_form.html.erb Adds UI sections for allow list and block list filter configuration
app/views/sources/_source_filter_form.html.erb Partial for individual filter form fields
app/assets/javascripts/source_filters.js JavaScript for dynamically adding/removing filter forms
app/assets/stylesheets/sources.scss Styling for filter form layout
lib/ingestors/ingestor.rb Integrates filter method into ingestion write process
app/workers/source_test_worker.rb Applies filters during source testing
test/unit/ingestors/ingestor_test.rb Comprehensive tests for filter behavior
test/fixtures/source_filters.yml Test fixtures for various filter configurations
test/fixtures/materials.yml Test materials for filter testing
test/fixtures/events.yml Test events for filter testing
test/models/source_filter_test.rb Empty test file placeholder

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

eilmiv and others added 9 commits January 9, 2026 11:17
Remove unused keyword_filter from permitted parameters when editing sources.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Member

@fbacall fbacall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few suggestions

Comment on lines +5 to +7
t.string :filter_mode
t.string :filter_by
t.string :filter_value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think strip the filter_ prefix from these field names, and rename by to property

Comment on lines +40 to +49
if %w[title url doi description license difficulty_level subtitle city country timezone].include?(filter_by)
val.to_s.casecmp?(filter_value)
# prefix string match
elsif %w[url_prefix].include?(filter_by)
val.to_s.downcase.start_with?(filter_value.downcase)
# contains string match
elsif %w[description_contains prerequisites_contains learning_objectives_contains subtitle_contains].include?(filter_by)
val.to_s.downcase.include?(filter_value.downcase)
# array string match
elsif %w[target_audience keyword resource_type event_type].include?(filter_by)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this, lets declare all the possible filters along with their types in a constant or something, then lookup the type and do the appropriate comparison in this method.

Comment on lines +57 to +66
{
'event_type' => 'event_types',
'keyword' => 'keywords',
'url_prefix' => 'url',
'description_contains' => 'description',
'prerequisites_contains' => 'prerequisites',
'learning_objectives_contains' => 'learning_objectives',
'subtitle_contains' => 'subtitle',
'license' => 'licence'
}.fetch(filter_by, filter_by)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, the mapping from the filter name to the model attribute could also be part of the "possible filters" constant.

belongs_to :source

auto_strip_attributes :filter_value
validates :filter_mode, :filter_by, presence: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should validate presence of the value too?

Comment on lines +6 to +8
# test "the truth" do
# assert true
# end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the behaviour is tested in the ingestor tests, I think it would still be nice to have some simple low-level unit tests that directly demonstrate correct behaviour of things like the match method.

(Basically I agree with what Copilot said)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants